Big data architecture for medical imaging processing

Medical imaging produces a huge amount of data daily. Even though some management systems already exist, there is no study that presents a workflow for that kind of data. This one aimed to presents two architecture of such a workflow within two different frameworks: Hadoop and spark. They, in fact, debate these architectures for the classification part that is included in their workflow. The final goal of this is to help diagnostic and decision making.

Their workflow starts with the image acquisition which uses the classic and already existing techniques. They then clean and extract some useful information from these images before clustering them. These first steps are done prior to any analysis. The next step is modeling the image before classification thanks to a support vector machine algorithm (Widely used binary classification algorithm). The classification is used to send the data to the most suitable practician (e.g., oncologist, neurologist). Their goal of prediction and decision making is achieved through convolutional neural networks. This is followed by a validation step of the previous algorithm.

Their workflow leans toward data management as well and therefore uses compression techniques (lossless compression is needed for such data) in order to reduce transfer and computational time and should implement a Not only SQL technology database for sharing and storage which is easier to use.

figure 1: tasks performed by the workflow

the Hadoop architecture presents the Map-reduce technique, which aims to parallelize the computation. The map phase produces a key-value pair with the data, and the reduce phase merges the value of the same key to aggregate data. The selling point for Hadoop is the ease of implementation, although spark is obviously better.

The spark architecture allows near real-time processing, which Hadoop map-reduce does not. Spark is based on the map-reduce model but adds RDDs on top.

Both architectures were successfully created. The computational performances that Spark allows and the embedded libraries that it offers make it the most complete architecture between these two.

References: Tchagna Kouanou, A., Tchiotsop, D., Kengne, R., Zephirin, D. T., Adele Armele, N. M., & Tchinda, R. (2018). An optimal big data workflow for biomedical image analysis. Informatics in Medicine Unlocked, 11, 68‑74. https://doi.org/10.1016/j.imu.2018.05.001