U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • v.14(9); 2022 Sep

Logo of cureus

Bioimaging: Evolution, Significance, and Deficit

Harsh s lahoti.

1 Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Medical Sciences, Wardha, IND

Sangita D Jogdand

2 Pharmacology, Jawaharlal Nehru Medical College, Datta Meghe Institute of Medical Sciences, Wardha, IND

Bioimaging is a digital technology-based medical advancement which is still relatively new. It has to do with real-time visualization of biological processes. This innovative imaging technology combines anatomical structure with functional data such as electric and magnetic fields, motion which is mechanical, and metabolism to provide information on anatomical structure. It's a non-invasive procedure that gives you a bird's-eye view of the human body, with more depth and detail as you go. As a result, bioimaging is a strong tool for seeing the interior functioning of the organism and its disorders. Examples of bioimaging in the medical industry include X-ray and ultrasound pictures, MRI, 3D and 4D body images utilizing Computed Tomography (CT) scans, DEXA scans which is useful for assessing bone density in osteoporosis, and so on. Maximum-resolution, two-positive charge fluorescent excitation microscopy, fluorescence redistribution after photobleaching, and fluorescence resonance energy transfer are some of the recent advancements in biological imaging. It provides us a cellular-level means of obtaining photographs of the entire body, anatomical locations, organs, tissues, and biological indicators. It may be used to aid illness management and therapy, as well as to detect, diagnose, and characterize the problems in clinical settings.

Introduction and background

Bioimaging is a term that refers to a procedure in which there is no involvement of tools that can invade the skin or physically enter the body, and it allows scientists to view biological functions in real time. The purpose of bioimaging is to cause as minimal disruption to live processes as possible. It is also widely used to get data on the three-dimensional structure of the viewed item without requiring physical interaction [ 1 ]. In a broader sense, bioimaging refers to technologies for viewing biological substances that have been fixed for monitoring. In the fundamental and medical sciences, bioimaging can be used to examine typical anatomy and physiology and gather research data. Due to the multifaceted nature of bioimaging research, interdisciplinary teams with expertise in electrical engineering, mechanical engineering, biomedical engineering, and other fields are required [ 2 ].

Multi-modal (such as combined ultrasound and light imaging) and multi-scale imaging is frequently needed for complex bioimaging applications (e.g., molecular to cellular to organ). Imaging makes it possible to understand intricate structures and dynamic interacting processes deep within the body. Many imaging techniques make use of the entire energy spectrum. Examples of clinical modalities are ultrasound, CT using X-rays, optical coherence tomography (OCT), and MRI [ 3 ]. Research methods include, among others, electron microscopy, mass spectrometry imaging, fluorescence tomography, biochemical luminescence, various forms of OCT, and optoacoustic imaging. Light microscopy methods include confocal, multi-photon, total internal reflection, and super-resolution fluorescence microscopy [ 4 ]

The evolution of medical imaging has a long history that dates back to the 1890s. It grew in popularity in the 1980s and has been extensively researched in recent years due to technical improvements [ 5 ]. The same fundamental concept underlies all imaging techniques. The body or region to be diagnosed is traversed by a wave beam that transmits or reflects radiation. A detector arrests this radiation and handles it to create a depiction pattern. The wave format varies in various procedures [ 6 ]. While CT employs X-rays, MRI and single-photon emission CT (SPECT) use radio frequency waves and gamma rays [ 5 ]. The area of biomedical imaging has advanced over the past 100 years, starting with Roentgen's initial discovery of the X-ray and ending with a new imaging approach with MRI, CT, and PET. A brief overview of different bioimaging techniques is discussed. Some of the technologies under investigation at the moment are magnetic resonance spectroscopy (MRS), functional MRI, diffusion-weighted MRI, and molecular imaging. Examples of molecular imaging methods include PET, SPECT, and optical imaging [ 7 ].

About 400 years back, multiple-lens microscopes were invented and employed in medical research. Employing computational methods for picture analysis, researchers were able to utilize their potential fully after the invention of digital photography. As digital technology advances, bioimaging is anticipated to become cheaper, quicker, and the backbone of medical research. Bioimaging may be broadly differentiated into four categories; they are molecular bioimaging, biomedical imaging, bioimaging in drug discovery, and computational bioimaging [ 8 ].

X-ray and ultrasound pictures

X-rays have been the most popular, frequently quickest, and inexpensive therapeutic imaging method since Wilhelm Conrad Röntgen discovered them in 1895. The medical X-ray imaging equipment portfolio has expanded into a wide range of specialized equipment for many purposes, starting with early radiographic systems that employed X-ray films as detectors. Although X-ray sources release a broad variety of X-ray image energies and X-ray image collisions within the human body vary for each energy and substance, most X-ray imaging done today is in black and white [ 9 ]. Figure  1  shows a modern view of an X-ray machine currently used in practice in institutes all over the world for the purpose of diagnosis.

An external file that holds a picture, illustration, etc.
Object name is cureus-0014-00000028923-i01.jpg

Image credit: Harsh Lahoti

X-ray medical imaging has two major categories: structural images that reveal anatomical structures and functional ideas that measure changes in biological functions such as metabolism, blood flow, local chemical composition, and biochemical processes. They're commonly utilized to image the structure of bone, metal implants, and soft tissue cavities [ 10 ]. Radiography very often plays a crucial part in evaluating the several bony structures of the body; however, it is beyond the scope of this article to adequately describe the complete spectrum of uses of conventional radiographs. It is also possible to evaluate the lungs, and contrast can aid in examining soft tissue organs throughout the body, such as the uterus and the gastrointestinal system, as in the case of hysterosalpingography [ 11 ]. Stereotactic breast biopsies, intra-articular steroid injections, catheter angiography, and other operations can all be performed with the help of radiography. Numerous diseases, including fractures, different forms of pneumonia, cancers, and congenital anatomic anomalies can be evaluated with radiography. Some X-ray developments for clinical uses are mammography, diagnosis of arthritis, and diagnosis of lung disease [ 12 ].

As it generates high-resolution pictures and greater soft tissue dissimilarity, MRI is preferably utilized for detection instead of CT, ultrasound, and X-ray. MRI can also scan any 2D sections or 3D volume of the body in a sense according to the need of the patient to move between scans [ 13 ]. Recently MRI with greater than three Tesla are used for examination of temporomandibular joint pathologies like fine perforations of articular disk and fibrinous adhesions of temporomandibular joint (TMJ). As a result, an MRI may be used as a guide tool through the body during medical operations. MRI-guided operations can streamline the whole process and benefit doctors and patients by allowing diagnosis, medication, and assessment after procedure to all be completed in only clinical workflow [ 13 ]. Figure  2  shows the view of currently used MRI machine in the majority of health sector areas for the diagnosis of disease.

An external file that holds a picture, illustration, etc.
Object name is cureus-0014-00000028923-i02.jpg

It does not employ radiation, unlike CT. MRI's diagnostic capabilities are constantly being improved. MRI with greater than two Tesla is developed for more precision. MRI has replaced CT in various sectors and has multiple uses. With a few exceptions, MRI is generally used for optional tests, and it is growing more common as new technologies like diffusion and perfusion become available. For work-up and follow-up, the usage of MRI in cancer imaging is increasing continuously. MRI is highly used for non-malignant lesions, imaging vertebral anomalies, meninges of the brain, namely pia mater, dura mater, and arachnoid membranes covering the brain, intracranial tumours, and fine soft tissue details, due to its great soft-tissue dissimilarity potentiality [ 14 ].

Merits of MRI

An MRI scanner may be used to grasp snapshots of different sections of the body (e.g., skull, abdomen, joints, lower limbs, etc.) in various directions of imaging. The MRI scans help the clinician in easy prognosis of vivid illnesses [ 15 ].

Demerits of MRI

Radiation revelation during an MRI operation is not a concern since ionizing radiations are not used. As MRI uses a very powerful magnet, it should not be done on those who have heart and other pacemakers inserted, clips for intracranial aneurysms, implanted cochlea, a few artificial limbs, surgically implanted infusion pumps, neurostimulators, bone-growth promoters, any other iron-based metal implants, or a few different types of intrauterine devices. MRI is also contraindicated in patients having surgical clips, screws, metallic plates and sutures, wire mesh, and metallic pins in their bodies. Metallic objects which are used internally, like shrapnel or bullet, are also denied. You should let your clinician know whether you are carrying a baby in the womb or think you might be. MRI also has demerits of claustrophobia (fear of closed place) and noise created during imaging. MRI is generally not proposed for individuals during gestation because of the risk of the elevation in the amniotic fluid's temperature [ 16 ].

CT Scan

CT has changed diagnostic decision-making since its inception in the 1970s. It has improved surgery, cancer detection, and therapy, more preferred treatment after an accident and significant injury, stroke therapeutics, and cardiac care [ 17 ]. To create cross-sectional pictures, or "slices," of the patient's body, a small X-ray beam is focused on the patient and rapidly rotates around the body. The CT scan paved the way for more effective treatment of deadly conditions like cancer, strokes, heart, problems, orthodontics, and accident injuries. Due to the COVID-19 outbreak, CT was recently utilized to diagnose patients who had been diagnosed for lung infection viral pneumonia caused by COVID-19 and it was shown to be very sensitive [ 18 ]. Figure  3  shows the view of currently used CT machine in various places for detection of fatal conditions.

An external file that holds a picture, illustration, etc.
Object name is cureus-0014-00000028923-i03.jpg

A variety of tiny, distinct bodily components are measured using CT to determine their radiographic densities. In a typical tomography slice, each of these components is represented by a pixel, which is then used to display the elements of a two-dimensional image. In order for the radiographic density of attentiveness to appear between dark and light in the visual picture, the operator allots a range of grey shades between black and white to a specific range of densities [ 19 ].

For most of the brain investigations, MRI has replaced CT. However, it is still the preferred imaging method for acute cranial trauma. For the viscera in the abdomen, CT is often preferable to MRI. The evaluation of solid organs is more debatable. Modern CT and MRI are competitive for the liver, spleen, kidneys and perhaps the pancreas. For the pelvic organs, MRI is better. The test will depend on the area's competence, equipment accessibility, cost, and radiation exposure [ 20 ].

Photon-counting CT is an emerging technology with the potential to change clinical CT dramatically. Photon-counting CT uses new energy-resolving X-ray detectors with mechanisms that differ substantially from conventional energy-integrating sensors. Photon-counting CT detectors count the number of incoming photons and measure photon energy. This technique results in a higher contrast-to-noise ratio, improved spatial resolution, and optimized spectral imaging. Photon-counting CT can reduce radiation exposure, reconstruct images at a higher resolution, correct beam-hardening artefacts, maximize the use of contrast agents, and create opportunities for quantitative imaging relative to current CT technology [ 21 ].

Advanced bioimaging techniques

In the past few years, there has been much more advancement in bioimaging techniques which help a physician in quicker and more efficient diagnosis.


From the observed lateral resolution pictures, the super-resolution approach rebuilds a very higher resolution image [ 22 ]. Because super-resolution has been around for almost thirty years, both multi-frame and single-frame super-resolution have essential uses in our day-to-day lives [ 23 ]. Super-resolution contributes to the resolution of this problem by generating high-resolution MRI from otherwise low-quality MRI images. Super-resolution is utilized to distinguish repetition from different sources that are closer than the usual diffraction limit. A higher capture frequency was combined with this search, which improved resolution. The ability to discern between two objects, in this case, vessels, beyond the traditional limit is super-resolution. The application will eventually decide whether or not any of the customary limits are applicable. Most people believe the lower resolution limit is represented by the diffraction barrier at half wavelength. Utilizing the 1.22 (focal length/aperture) Rayleigh resolution criteria is an additional choice. Even if the latter definition is typically more flexible, achieving a 150-micron resolution at 15 centimetres depth with a 5 centimetres aperture 5-Mega Hertz transducer is still an exciting exploitation for therapeutic implementation, mainly for more enormous organs. In both circumstances, the specific limitation has to be described in the publication precisely so that research may be compared more easily [ 24 ].

Advantages of contemporary super-resolution microscopy: The research of sub-cellular architecture and dynamics at the nanoscale is made possible by super-resolution microscopy. Both the sample's surface and its interior, which extends up to 100 m deep, are readily visible to researchers. Thanks to improved temporal resolutions, time-lapse imaging allows researchers to collect precise three-dimensional super-resolution picture data. Some super-resolution microscopy techniques combine intrinsic optical sectioning with quick data capture and dual-colour super-resolution to deliver high-quality pictures quickly for further actions [ 25 ].

Disadvantages of contemporary super-resolution microscopy: At higher resolutions, spherical aberration and vibration are even more problematic. Additionally, because of their high excitation intensity or lengthy exposure durations, certain living samples are more negatively impacted by super-resolution imaging than others. Another issue with many super-resolution systems is their lack of adaptability; if an experimental procedure changes in the midst of an application, many hardware-based super-resolution systems like spatial resolution or pixel density and charge-coupled device are difficult to adjust [ 26 ].

Fluorescence Recovery/Redistribution After Photobleaching (FRAP)

Since it was initially brought into cell biology research, the phenomenon of FRAP has gotten a lot of interest. The approach was created in the 1970s, and its biological applicability was limited to the mobility of fluorescently labelled cell membrane components. In the 1980s, the introduction of confocal scanning microscopy made it possible to study the behaviour of molecules inside cells without specialized equipment. However, FRAP has not gained widespread acceptance till date, owing to the time and effort necessary to extract, label and, inject proteins and other chemicals into cells [ 27 ].

In contrast to FRAP investigations, bleaching is carried out repeatedly at the same region of the material in fluorescence loss in photobleaching (FLIP), which prevents fluorescence recovery. The area of interest has not yet been bleached [ 28 ]. Cell biology provides instances of this in transporting proteins and lipids in the plasma membrane, cytoplasm, and nucleus. The qualities and usefulness of the finished product in commercial applications, including medicines, food, textiles, sanitary goods, and cosmetics, are considerably enhanced by the diffusion of solute and solvent molecules [ 29 , 30 ]. Since each of these systems is different, precise local measurements of mass transport processes are required to comprehend the characteristics of soft biomaterials. FRAP uses fluorescence microscopy to measure regional molecular mobility on a micrometre scale [ 31 ].

Fluorescence Resonance Energy Transfer (FRET)

Fluorescence resonance irradiative energy is transmitted from an excited molecular fluorophore (donor) to another fluorophore (acceptor) via long-range intermolecular interactions linking between dipoles in the energy transfer process. FRET can be a trustworthy technique for finding out molecular closeness at angstrom distances (10-100) if the donor and acceptor are situated between the donor and acceptor's Forster radius, which is typically 3-6 nm and is the interval at which the donor's excitation energy is split in half and passed to the acceptor. FRET offers a sensitive approach for analyzing a range of biological activities that affect molecule closeness since its effectiveness depends on the inverse sixth power of intermolecular separation [ 32 ].

In the fluorescence resonance, the energy transfer process and excited donor fluorophore can non-radiatively transfer its excitation energy to neighbouring acceptor chromospheres through long-distance dipole-dipole interactions [ 33 ]. According to the energy transfer theory, an activated fluorophore acts as an oscillating dipole that may exchange power with another dipole with a resonance frequency close to its own. Similar resonance energy transmission occurs in coupled oscillators that oscillate at the same frequency, such as a pair of tuning forks. The release and reabsorption of photons required for radiative energy transfer are controlled by the specimen's geometrical and optical properties and by the container's structure and wavefront paths [ 34 ].

Over the last two decades, emerging biological imaging technologies have generated tremendous biological discoveries, many of which directly rely on computational methodologies. The development of image informatics solutions, from capture to storage, analysis to mining, and visualization to distribution is critical to the future of biological imaging innovation. Continued development of computational methods for bioimaging will not only pave the way for new imaging technologies but also enable biological breakthroughs that would not otherwise be possible. For the thousands of scientists who depend on bioimaging, investing in the creation and upkeep of essential software programs and the connections between them will pay off handsomely [ 35 ]. Despite the benefits of bioimaging, using microscopes has several disadvantages. Aside from the cost of acquiring, storing, and maintaining equipment, sample preparation is often required. For example, in field emission scanning electron microscope (FESEM), the sample has to first be dried and disseminated with gold particles, which is time-consuming and tedious. Furthermore, methods like FESEM's high resolution might sometimes be a double-edged sword. FESEM for example was particularly useful in the previous work for examining the structure and interphase between the bacteria and the cell. It was difficult to hunt for small quantities of bacteria when the researchers used less inoculants to investigate the cells for 23 hours to determine if they survived. Complementary methods, such as fluorescence microscopy, are frequently required in situations like these to acquire an overview of the material [ 36 ].

Application of artificial intelligence (AI) in radiology

Radiology practice will undergo a significant transition as a result of deep learning and AI technologies' quick development and integration into routine clinical imaging [ 37 ]. AI is increasingly playing a significant role in various health care applications, such as drug development, remote patient monitoring, medical imaging and diagnostics, risk management, wearable technology, virtual assistants, and hospital administration. It is also anticipated that the application of AI would be advantageous in many fields involving massive data, such as processing data from DNA and RNA sequencing. Radiology, pathology, dermatology and ophthalmology are just a few medical specialities that rely on imaging data and they have already started to gain from using AI techniques. Examples of AI clinical applications in radiology are thoracic imaging, abdominal and pelvic imaging, head and neck imaging, pathologies around teeth, colonoscopy, brain imaging, mammography, and many more [ 38 ]. The boon of AI in radiology is that the radiologist may focus on challenging situations requiring specialized attention by letting AI handle a sufficient portion of the image diagnostic workload. The weight of image diagnosis can be sufficiently assumed by AI, freeing the radiologist to concentrate on the complicated situations that need their specialized attention. AI helps the teleradiologist and also decreases burnout. It is used to diagnose and assess patients over great distances, reducing the time it takes to evacuate emergency patients from rural and isolated locations. One of the demerits of AI is it lacks human empathy [ 39 ].


Bioimaging gives clinicians a chief tool for checking patients' reactions to therapy. It promises illness detection in therapy in a non-invasive and safe manner. Bioimaging is a very important innovative imaging technology that has a lot of significance in today's world. The designing policies for imaging inquest are required for accurate imaging to access effective cancer management both in vitro and in vivo. It is of utmost importance in medical sciences as it has advanced the process of diagnosis of various diseases. It has helped to prevent a large number of illnesses and many more complications. It helps clinicians in early diagnosis to prevent future consequences. In this article, we have discussed various bioimaging techniques and their advancement. We have also discussed the merits and demerits of bioimaging techniques. The introduction of bioimaging in medical sciences has proved to be an asset to the world. 

The content published in Cureus is the result of clinical experience and/or research by independent individuals or organizations. Cureus is not responsible for the scientific accuracy or reliability of data or conclusions published herein. All content published within Cureus is intended only for educational, research and reference purposes. Additionally, articles published within Cureus should not be deemed a suitable substitute for the advice of a qualified health care professional. Do not disregard or avoid professional medical advice due to content published within Cureus.

The authors have declared that no competing interests exist.

Help | Advanced Search

Electrical Engineering and Systems Science > Image and Video Processing

Title: biomedical image reconstruction: a survey.

Abstract: Biomedical image reconstruction research has been developed for more than five decades, giving rise to various techniques such as central and filtered back projection. With the rise of deep learning technology, biomedical image reconstruction field has undergone a massive paradigm shift from analytical and iterative methods to deep learning methods To drive scientific discussion on advanced deep learning techniques for biomedical image reconstruction, a workshop focusing on deep biomedical image reconstruction, MLMIR, is introduced and is being held yearly since 2018. This survey paper is aimed to provide basic knowledge in biomedical image reconstruction and the current research trend in biomedical image reconstruction based on the publications in MLMIR. This survey paper is intended for machine learning researchers to grasp a general understanding of the biomedical image reconstruction field and the current research trend in deep biomedical image reconstruction.

Submission history

Access paper:.

  • Download PDF
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • Open access
  • Published: 03 February 2024

Advantages of transformer and its application for medical image segmentation: a survey

  • Qiumei Pu 1 ,
  • Zuoxin Xi 1 , 2 ,
  • Shuai Yin 1 ,
  • Zhe Zhao 3 &
  • Lina Zhao 2  

BioMedical Engineering OnLine volume  23 , Article number:  14 ( 2024 ) Cite this article

378 Accesses

Metrics details

Convolution operator-based neural networks have shown great success in medical image segmentation over the past decade. The U-shaped network with a codec structure is one of the most widely used models. Transformer, a technology used in natural language processing, can capture long-distance dependencies and has been applied in Vision Transformer to achieve state-of-the-art performance on image classification tasks. Recently, researchers have extended transformer to medical image segmentation tasks, resulting in good models.

This review comprises publications selected through a Web of Science search. We focused on papers published since 2018 that applied the transformer architecture to medical image segmentation. We conducted a systematic analysis of these studies and summarized the results.

To better comprehend the benefits of convolutional neural networks and transformers, the construction of the codec and transformer modules is first explained. Second, the medical image segmentation model based on transformer is summarized. The typically used assessment markers for medical image segmentation tasks are then listed. Finally, a large number of medical segmentation datasets are described.

Even if there is a pure transformer model without any convolution operator, the sample size of medical picture segmentation still restricts the growth of the transformer, even though it can be relieved by a pretraining model. More often than not, researchers are still designing models using transformer and convolution operators.


Medical image segmentation is a significant study area in computer vision, to classify medical pictures at the pixel level and then precisely segment the target item. Segmentation datasets are created from unimodal or multimodal pictures obtained by professional medical equipment such as magnetic resonance imaging (MRI), computed tomography (CT), and ultrasonography (US). Traditional nondeep learning medical picture segmentation approaches depend mostly on thresholding [ 1 ], region growth [ 2 ], border detection [ 3 ], and other techniques. To produce superior segmentation results, picture features must be manually extracted before segmentation. The feature extraction methods for various datasets are frequently diverse, and some professional experience is necessary [ 4 , 5 , 6 ]. The deep learning-based segmentation approach can automatically learn the feature that represents the picture, but it requires a high-performance computer and takes a long time to train the network.

With the continual advancement of computer equipment such as Graphic Processing Units (GPU) in recent years, training most deep learning models is no longer constrained. At present, the segmentation model-based convolutional neural network (CNN) is extensively employed in a variety of medical picture segmentation applications [ 7 , 8 ], including tumor segmentation [ 9 ], skin lesion region segmentation [ 10 ], left and right ventricular segmentation [ 11 ], and fundus blood vessel segmentation [ 12 ]. U-Net [ 13 ] is one of the most extensively utilized models. Through skip connections, U-Net integrates the multiscale detail information in the picture downsampling process with the global properties of low-resolution images. This encoder–decoder design, which combines information at multiple scales, considerably enhances segmentation model performance and is frequently utilized in the field of medical picture segmentation. However, CNN can only employ very tiny convolution kernels to balance model accuracy and computational complexity, limiting it to a relatively restricted perceptual domain. It excels at obtaining local characteristics but falls short of capturing long-distance dependencies. Similar to domains such as autonomous driving, satellite image analysis, and pedestrian recognition, medical image analysis also encounter challenges like unclear boundaries [ 14 ], low contrast, varying object sizes, and complex patterns. Addressing these challenges often hinges on incorporating a broader contextual perspective, encompassing global background information.

figure 1

Combining CNN with Transformer improves various medical image segmentation tasks

Through the self-attention process, the popular transformer [ 15 ] in machine translation and sentiment analysis may gather global context information. Following the successful application of pure transformer architecture to the field of computer vision by ViT [ 16 ], an increasing number of transformer-based models have been developed to optimize medical picture segmentation approaches (Fig.  1 ). We analyzed articles published in the last 5 years on web of science using two sets of keywords, as shown in Fig.  2 . The first set of keywords included ’medical image’ and ’segmentation,’ while the second set consisted of ’medical image,’ ’segmentation,’ and ’transformer.’ As depicted in Fig.  2 a, medical image segmentation has consistently remained a prominent research area, with nearly 5000 publications each year. The introduction of Vision Transformer (ViT) in 2020 marked the beginning of increased interest in using transformers for medical image segmentation, leading to rapid growth. The number of articles surged by more than 400%, particularly in 2021 and 2022. The finding from Fig.  2 b also demonstrates the growing proportion of the second group, which is a subset of the first group of literature. These statistical findings underscore the significant potential of transformers in the field of medical image segmentation.

figure 2

Using web of science to retrieve and statistically analyze literature. a Statistics of literature quantity for two sets of keywords. b The proportion of literature related to transformers in medical image segmentation literature

Currently, several review articles have summarized literature related to Transformers in the field of medical image segmentation. However, these reviews are often context-specific, focusing on different medical applications, such as categorization based on disease types [ 17 ], task-oriented summaries [ 18 , 19 ], or aggregations based on specific medical images or diseases [ 20 , 21 , 22 ]. The synthesis and categorization based on network structures are crucial for optimizing deep learning models for diverse tasks, yet research in this domain is currently limited. This paper explores recent advancements in research on medical image segmentation tasks using transformer and encoder–decoder structural models. It provides a comprehensive study and analysis of relevant deep learning network structures, aiming to further uncover the potential of transformer and encoder–decoder structural models in medical image segmentation tasks. The objective is to guide researchers in designing and optimizing network structures for practical applications.

In the " Basic model structure " section, we will delve into the pertinent information regarding the encoder–decoder structure and transformer. " Medical Image Segmentation Method Based on Transformer " section will present a comprehensive summary of transformer segmentation methods, considering four perspectives: Transformer in the encoder, Transformer in the codec, Transformer in the skip connections, and the application of the pure Transformer structure. Each subsection within " Medical Image Segmentation Method Based on Transformer " section sequentially elaborates on the optimization and enhancement details of various models. Detailed evaluation metrics for medical image segmentation are outlined in " Evaluation Indicators " section. " Dataset " section systematically organizes the medical image segmentation datasets suitable for reproducing model results. Finally, " Summary and Outlook " will encapsulate the conclusion and provide insights for future developments.

Basic model structure

Codec structure in medical image segmentation.

Because of the codec structure, the entire network is made up of an encoder module and a decoder module. The encoder is primarily responsible for extracting features from the input, while the decoder is responsible for additional feature optimization and job processing on the encoder’s output. Hinton [ 23 ] initially presented this architecture in Science in 2006, with the primary goal of compressing and denoising rather than segmentation. The input is an image, which is downsampled and encoded to generate features that are smaller than the original picture, a process known as compression, and then sent through a decoder, which should restore the original image. For each image, we need to save only one feature and one decoder. Similarly, this concept may be applied to picture denoising, which involves adding fake noise to the original image during the training stage and then inserting it into the codec to restore the original image. This concept was then used for the picture segmentation problem. Encoders in medical picture segmentation tasks are often based on existing backbone networks such as VGG and ResNet. The decoder is often constructed to meet the job requirements, labeling each pixel progressively by upsampling. In 2015, Long introduced a groundbreaking approach called the Fully Convolutional Neural Network (FCN) [ 24 ] for semantic segmentation, as illustrated in Fig.  3 a. The FCN converts the CNN’s final fully connected layer to a convolutional layer and merges features from multiple layers using simple skip connections. Finally, deconvolution restoration is used to achieve end-to-end picture segmentation. The FCN segmentation results are far from comparable to the manual segmentation results because of upsampling and fusing features of various depths. There are still many locations with segmentation faults, particularly around the edges. At the same time, the architecture of the FCN’s single-path topology makes it impossible to preserve meaningful spatial information in upsampled feature maps and lacks network space consistency.

figure 3

Codecs and transformer architectures. a FCN network structure [ 24 ]. b A transformer block [ 15 ]. c Classical U-Net architecture [ 13 ]

One of the most often used models in medical picture segmentation tasks is the U-Net model, which is built on the principle of FCN to extract multiscale features. As shown in Fig.  3 c, the U-Net network initially executes four downsampling operations on the input picture to extract image feature information, followed by four sets of upsampling. To assist the decoder in repairing the target features, a skip connection with a symmetric structure is inserted between the downsampling and upsampling procedures. On the right, the output of the downsampled convolutional block is concatenated with the input of the deconvolutional block with the same depth. The initial difference between U-Net and FCN is that U-Net is extremely symmetric, and the decoder is very similar to the encoder, but FCN’s decoder is quite simple, simply utilizing a deconvolution operation and no convolutional structure thereafter. The skip connection is the second distinction. FCN uses summation, whereas U-Net employs concatenation. In MICCAI 2016, Cicek et al. expanded 2D U-Net to 3D U-Net and utilized 3D U-Net [ 25 ] to segment dense collective pictures from sparse annotations. nnU-Net [ 26 ] is an adaptive framework for any dataset based on U-Net, 3D U-Net, and U-Net Cascade. It can automatically adjust all hyperparameters according to the properties of a given dataset without human intervention throughout the process, achieving advanced performance in six well-recognized segmentation challenges. U-Net has quickly become an essential network model in medical picture segmentation due to its great performance and unique topology.

  • Transformer

Benjio’s team proposed the attention mechanism in 2014, and it has since been widely used in various fields of deep learning, such as computer vision to capture the receptive field on an image, or NLP to locate key tokens or features. The multihead attention mechanism, position encoding, layer regularization [ 27 ], feedforward neural network, and skip connection are the main components of the encoder. The decoder differs from the encoder in that it includes an additional masked multihead attention module in the input layer, but the rest of the components are the same. The self-attention mechanism is an important part of the transformer, and its unique design allows it to handle variable-length inputs, capture long-distance dependencies, and seq2seq.

where q , k, and v are vectors of input X after linear mapping, and \(d_k\) is the dimension of the vector. After parallel computing, the multihead attention mechanism extracts features from multiple self-attention mechanism modules and concatenates them in the channel dimension. Various groups of self-attention mechanisms can learn various types of feature representations from subspaces at various locations.

where Q , K , and V are matrices made up of multiple q , k , and v vectors. \(i = 1,2,\dots ,H; d_k =d_v = d_{model}/H\) ; \(W_i^Q\) and \(W_i^K\) are matrices in the form of ( \(d_{model}\) , \(d_k\) ), \(W_i^V\) is matrices in the form of ( \(d_{model}\) , \(d_v\) ), and the three matrices are parameter matrices used to map input.

The decoder’s masked multihead attention mechanism takes into account the fact that during the testing and verification phases, the model can only obtain information before the current position. To avoid the model’s reliance on information after the current position in the testing phase, the information after the current position is masked in the training phase, ensuring that only information before the position is used to infer the current result. Because of the unique design of self-attention, it is insensitive to sequence position information, which is important in both natural language processing and computer vision tasks, so position information must still be incorporated into transformers. Transformers frequently use sine and cosine functions to learn position information.

Layer regularization overcomes batch regularization’s shortcoming of making it difficult to handle tasks with variable input sequences. It shifts the scope of regularization from across samples to within the same sample’s hidden layer, so that regularization is independent of input size. Skip connection is a widely used technique for improving the performance and convergence of deep neural networks, as it alleviates the convergence of nonlinear changes via the linear components propagated through the neural network layers. If the patch is too small in the transformer, there will be a false-gradient explosion or disappearance.

Vision transformer

In 2020, Google introduced the ViT [ 16 ], a model that leverages the transformer architecture for image classification. ViT innovatively partitions input images into multiple patches, each measuring 16x16 pixels. These patches are then individually transformed into fixed-length vectors and integrated into the Transformer framework, as illustrated in Fig.  4 a. Subsequent encoder operations closely mirror the original Transformer architecture, as depicted in Fig.  4 b. While not the pioneer in exploring transformers for computer vision, ViT stands out as a seminal contribution due to its “simple” yet effective model, robust scalability (larger models demonstrating superior performance), and its groundbreaking influence on subsequent research in the field. With sufficiently large pretraining datasets, ViT surpasses CNN, overcoming the limitation of transformers lacking inductive bias and showcasing enhanced transfer learning capabilities in downstream tasks.

In March 2021, Microsoft Research Asia proposed a universal backbone network named Swin Transformer [ 28 ]. The Swin Transform Block is constructed differently from ViT, employing Window Multihead Self-Attention (W-MSA) and Shifted Window Multi-head Self-Attention (SW-MSA). When computing W-MSA, an 8x8 feature map is divided into 2x2 patches, each with a size of 4x4. For SW-MSA, the entire set of patches is shifted by half the patch size, creating a new window with non-overlapping patches. This approach introduces connections between adjacent non-overlapping windows, significantly increasing the receptive field. However, it also raises the issue of increasing the number of patches within the window from 4 to 9. To maintain the original patch count, the authors employ a cyclic shift operation, as illustrated in Fig.  4 c. W-MSA calculates attention within each window, while SW-MSA utilizes global modeling, akin to ViT, to establish long-distance dependencies. As depicted in Fig.  4 d, Swin Transformer’s unique design not only introduces local feature extraction capabilities similar to convolution but also substantially reduces computation. Swin Transformer achieves state-of-the-art performance in machine vision tasks such as image classification, object detection, and semantic segmentation.

figure 4

Key components of the ViT and Swin Transformer. a The ViT architecture, showcases the transformation of input feature maps into patches, followed by linear mapping and processing through the Transformer. The result undergoes classification via an MLP. b The details of the ViT encoder, emphasizing the integration of multihead attention modules. c The feature map evolution in Swin Transformer during W-MSA and SW-MSA computation, highlighting the cyclic shift operation for integrating shifted window feature maps. d Swin Transformer Block, outlining its computational process

Medical image segmentation method based on transformer

Prior to the application of transformer to the field of medical image segmentation, segmentation models such as FCN and U-Net performed well in various downstream image segmentation tasks. Researchers have used various methods to improve the U-Net model to meet the needs of different tasks and data, and a series of variant models based on the U-Net model have appeared; for example, 3D U-Net [ 25 ], ResUNet [ 29 ], U-Net++ [ 30 ], and so on. However, since the introduction of ViT, an increasing number of researchers have focused on the attention mechanism, attempting to apply it locally or globally in complex network structures to achieve better results. By incorporating a transformer module during encoder downsampling, TransUNet [ 31 ] outperforms models such as V-Net [ 32 ], DARR [ 33 ], U-Net [ 13 ], AttnUNet [ 34 ], and ViT [ 16 ] in a variety of medical applications, including multiorgan segmentation and heart segmentation. TransUNet, like U-Net, has become a popular network for medical image segmentation. Because of the complexities of medical image segmentation tasks, high-quality manually labeled datasets can only be produced on a small scale. To achieve better performance on medical image datasets, it is necessary to continuously optimize the application of transformer in the encoder/decoder network. Following that, this paper will discuss transformer-based medical image segmentation methods based on model optimization position.

Transformer encoder structure

TransUNet, depicted in Fig.  5 , stands as the pioneering application of the transformer model in the realm of image segmentation. The authors serialize the feature map obtained through U-Net downsampling and then process the serialized features with a block made up of 12 original transformer layers. The benefits of long-distance dependencies can be obtained using transformers to capture global key features. The experimental results show that TransUNet outperforms the previous best model, AttnUNet, on the Synapse dataset. TransBTS [ 35 ] replaces 2D CNNS with 3D CNNS and uses a structural design similar to TransUNet to achieve 3D multimodal brain tumor segmentation in MRI imaging. Similar to TransBTS, the UNETR [ 36 ] employs the same 12 transformer blocks in its encoder. However, UNETR differs in that it utilizes the outputs of the 3rd, 6th, 9th, and 12th transformer blocks as inputs for four downsampling convolutional neural network modules in the encoder. UNETR demonstrates excellent performance in both BTCV [ 37 ] and MSD [ 38 ], two 3D image segmentation tasks. Furthermore, Swin UNETR [ 39 ] goes a step further by replacing the Transformer blocks in UNETR with Swin Transformer blocks, achieving superior results on the BraTS2021 dataset compared to nnU-Net, SegResNet, and TransBTS. AFTer-UNet [ 40 ] employs an axial fusion transformer encoder between CNN encoder and CNN decoder to integrate contextual information across adjacent slices. The axial Fusion transformer encoder calculates attention along the axial direction and within individual slices, reducing computational complexity. This approach significantly outperforms models like CoTr and SwinUnet on multiorgan segmentation datasets, including BCV [ 41 ], Thorax-85 [ 42 ], and SegTHOR [ 43 ].

In general, most methods for dealing with 2D image segmentation can also be used to deal with continuous video data, as long as the video data are input as a 2D image frame by frame. The cost of this is that we cannot fully exploit the time continuity of the video data. Zhang et al. [ 44 ] created an additional convolution branch based on TransUNet to extract the features of the previous frame data, and then combined the results of the downsampling of the two parts with the results of the upsampling via the skip connection to achieve a better video data segmentation effect. X-Net [ 45 ] extends U-Net by introducing an additional Transformer-based encoder–decoder branch, facilitating information fusion across branches through skip connections. Zhang et al. proposed a new architecture called TransFuse, which can run convolution-based and pure transformer-based encoders in parallel and then fuse the features from the two branches together to jointly predict segmentation results via the BiFusion module, greatly improving the model’s inference speed [ 46 ]. This work adds a new perspective to the use of transformer-based models by investigating whether a network using only transformers and no convolution can perform better segmentation tasks.

figure 5

TransUnet applied transformer structure to medical image segmentation firstly [ 31 ].  a  schematic of the Transformer layer; b  architecture of the proposed TransUNet

The primary goal of the self-attention mechanism is to model the long-distance dependence between pixels to obtain global context information. On the other hand, convolution produces feature maps at various scales that frequently contain complex information. Before the appearance of ViT, researchers discovered numerous effective methods for expanding the convolution receptive field using convolution. Dilated convolutions are the most well-known of these, and DeepLabV3 [ 47 ] uses dilated spatial pyramid pooling to great effect, while CE-Net [ 48 ] captures multiscale information using dense dilated convolutions and residual multikernel pooling. As a result, taking into account global context information and multiscale information is a very effective method. Yuanfeng Ji [ 49 ] et al. proposed MCTrans, a self-attention transformer module and a cross-attention transformer module. The self-attention transformer module performs pixel-level context modeling at multiple scales. To ensure intraclass consistency and interclass discrimination, the cross-attention transformer module is used to learn the corresponding semantic relationship of different categories, that is, the difference between feature expressions of different classes and the connection between feature expressions of different classes. DC-Net [ 50 ] also reflects the emphasis on multiscale features in this model. The authors create a Global Context Transformer Encoder (GCTE) and a module for Adaptive Context Fusion (ACFM). GCTE connects the transformer encoder to the back of CNN down-sampling, serializes the multiscale features obtained by CNN and the input image, and then obtains a better feature representation via the transformer encoder. The ACFM is made up of four cascaded feature decoding blocks, each with two 1 \(\times\) 1 convolutions and a 3 \(\times\) 3 deconvolution. The adaptive weight \(\omega _i\) is converted by the authors into adaptive spatial weight (APW) and adaptive channel weight (ACW). The ACFM can better fuse context information and improve decoder performance using the two weight parts of the APW and ACW.

Although transformers have achieved outstanding results in a variety of downstream medical image tasks, it is undeniable that they have more parameters to train than convolutional models. As a result, how to optimize the model using global context information obtained by the transformer to meet the requirements of lightweight tasks for model size and inference speed has become a hot topic in research. SA-Net [ 51 ] was proposed in early transformer-related research to reduce the number of parameters in CNN and transformer using a random ranking algorithm. The sandwich parameter-shared encoder structure [ 52 ] was investigated by Reid M et al. In the field of medical image segmentation, the CoTr model [ 53 ] was proposed by Xie Y et al. The encoder structure was created by combining the bridge structure DeTrans, which was made up of the MS-DMSA layer and only focused on a small set of key sampling locations around the reference location, with CNN, which greatly reduced the time and space complexity. TransBridge [ 54 ] employs a bridge structure similar to CoTr, but adds a shuffle layer and group convolution to the transformer’s embedding part to reduce the number of parameters and the length of the embedding sequence. The experimental results show that after 78.7% parameter reduction, on the EchoNet-Dynamic dataset, TransBridge outperforms CoTr, ResUNet [ 29 ], DeepLabV3 [ 55 ], and other models.

Transformer codec structure

TransUNet demonstrated the importance of transformers in encoders, and the symmetries of encoder–decoder architectures make it simple to extend transformers to decoder architectures. U-Transformer [ 56 ] uses the Multihead Cross-attention Module (MHCA) to combine the high-level feature maps with complex abstract information and the high-resolution feature maps obtained through the skip connection in each splicing process of upsampling and skip connection, which is used to suppress the irrelevant regions and noise regions of the high-resolution feature maps. The feature map obtained by convolution is expanded pixel by pixel as a transformer patch in the encoder section, and then a single transformer layer is used to extract global context information. Luo C et al. [ 57 ] improved the use of transformer in encoders based on the TransUNet and U-Transformer. To build the UCATR model, a block of 12 transformer layers is used to replace the single MultiHead self-attention in the U-Transformer. The experimental results show that the UCATR model can recover more refined spatial information than the original TransUNet and U-Transformer. SWTRU [ 58 ] proposes a novel Star-shaped Window self-attention mechanism to be applied in the decoder structure and introduces the Filtering Feature Integration Mechanism (FFIM) to integrate and reduce the dimensionality of the fused multilayer features. These improvements result in a better segmentation effect in CHLISC [ 59 , 60 ], LGG [ 61 , 62 ], and ISIC2018 [ 63 ]. Since in most vision tasks the visual dependencies between regions nearby are usually stronger than those far away, MT-UNet [ 64 ] performs local self-attention on fine-grained local context and global self-attention only on coarse-grained global context. When calculating global attention maps, axial attention [ 65 ] is used to reduce the amount of calculation, and further introduce a learnable Gaussian matrix [ 66 ] to enhance the weight of nearby tokens. MT-UNet performs better than models such as ViT and TransUNet on the Synapse and ACDC datasets.

Although transformers have done much useful work in medical image segmentation tasks, training and deploying transformer-based models remains difficult due to a large amount of training time and memory space overhead. To reduce the impact of the sequence length overhead, one common method is to use the feature maps obtained by downsampling as the input sequence rather than the entire input image. High-resolution images, on the other hand, are critical for location-sensitive tasks such as medical image segmentation, because the majority of false segmentations occur within the region of interest’s boundary range. Second, in medical image data with small data volumes, transformers have no inductive bias and can be infinitely enlarged.

Gao Y et al. [ 67 ] combined the benefits of convolution and the attention mechanism for medical image segmentation, replacing the last layer of convolution with a transformer module in each downsampling block, avoiding large-scale transformer pretraining while capturing long-distance correlation information. At the same time, to extract the detailed long-distance information on the high-resolution feature map, two projections are used to project the K and V ( K and \(V \in R_{n \times d}\) ) into low-dimensional embedding ( K and \(V \in R_{k \times d}\) , \(k = hw \ll n\) ), where  h and w are the reduced sizes of the feature map after subsampling, which reduces the overall complexity from \(O(n^2)\) to O ( n ). In addition, the authors also learn the content–location relationship in medical images using relative position encoding in the self-attention module. Valanarasu J et al. [ 68 ] proposed an MedT model based on a gated location-sensitive attention mechanism, which allowed the model to perform well on smaller datasets during training. Feiniu Yuan et al. [ 69 ] introduced CTC-Net, a synergistic network that combines both CNN and transformer for medical image segmentation. This approach involves feature extraction through both a CNN encoder and a Swin Transformer encoder, followed by feature fusion facilitated by an Feature Complementary Module (FCM) incorporating channel attention and spatial attention mechanisms.

Transformer in skip connections

The mechanism of skip connections was initially introduced in U-Net, aiming to bridge the semantic gap between the encoder and decoder, and has proven to be effective in recovering fine-grained details of the target objects. Subsequently, UNet++ [ 30 ], AttnUnet [ 34 ], and MultiResUNet [ 70 ] further reinforced this mechanism. However, in UCTransUnet [ 71 ], the authors pointed out that skip connections in U-Net are not always effective in various medical image segmentation tasks. For instance, in the GlaS [ 72 ] dataset, a U-Net model without skip connections outperforms the one with skip connections, and using different numbers of skip connections also yields different results. Therefore, the authors considered adopting a more suitable approach for feature fusion at different depths. They replaced the simple skip connections in U-Net with the CTrans module, consisting of multiscale Channel Cross fusion with Transformer (CCT) and Channel-wise Cross-Attention (CCA). This modification demonstrated competitive results on the GlaS and MoNuSeg [ 73 ] datasets.

Pure transformer structure

Researchers have attempted to use transformer as a complete replacement for convolution operators in codec structures due to its significant advantage in global context feature extraction. Karimi D et al. [ 74 ] pioneered the nonconvolutional deep neural network for 3D medical image segmentation, demonstrating through experiments that a neural network fully composed of transformer modules can achieve segmentation accuracy superior to or comparable to the most advanced CNN model 3D UNet++ [ 30 ].

figure 6

The Swin-Unet structure [ 75 ]

Based on the Swin Transformer, Cao H et al. [ 75 ]created Swin-Unet, a pure transformer model similar to U-Net. The model employs two consecutive Swin Transformer blocks as a bottleneck, which are then assembled in a U-Net-like configuration. The structure of Swin-Unet is shown in Fig.  6 . By comparing Swin-Unet with V-Net [ 32 ], DARR [ 33 ], ResUnet [ 29 ], AttnUnet [ 34 ] and TransUnet [ 31 ] on two datasets of Synapse and ACDC, the authors obtained significantly better performance than other models. Swin-PANet [ 76 ] is a dual supervision network structure proposed by Zhihao Liao et al. Swin-PANet is made up of two networks: a prior attention network and a hybrid transformer network. The prior attention network applies the sliding window-based subattention mechanism to the intermediate supervision network, whereas the hybrid transformer network aggregates the features of the jump connection and the prior attention network and refines the boundary details. GlaS [ 72 ] and MoNuSeg [ 73 ] yield better results. DS-TransUNet [ 77 ] is constructed upon the SwinTransformer framework and enhances feature representation with a dual-scale encoder. More precisely, the approach employs medical images segmented at both large and small scales as inputs to the encoder. This allows the model to effectively capture coarse-grained and fine-grained feature representations.

These models demonstrate the Swin Transformer’s utility for medical image datasets. Because the Swin Transformer is more lightweight and suitable for medical image segmentation tasks than transformers that require large amounts of data pretraining in NLP, further investigating its application can help overcome the challenge of limiting model progress in medical image datasets.

Evaluation indicators

The objective evaluation of the performance of medical image segmentation algorithms is essential for their practical application in diagnosis. The segmentation results must be assessed both qualitatively and quantitatively. For segmentation tasks with multiple categories, let k be the number of classes in the segmentation result, \(p_{ij}\) be the total number of pixels whose class i is predicted to be the total number of class j , and \(p_{ii}\) be the total number of pixels whose class i is predicted to be the total number of class i . When \(k=2\) , we can divide the results of a segmentation task with only two classes into four categories: True positive (TP) indicates that both the observed and predicted data classes are correct. True negative (TN) indicates that both the actual and predicted data classes are incorrect. The term false positive (FP) refers to when the actual data class is false while the predicted data class is true. The term false negative (FN) denotes that the actual data class is true while the predicted data class is false. The following are examples of commonly used evaluation metrics.

The F1 score, or F-measure, is a metric used in binary classification analysis, representing the harmonic mean of precision and recall. Precision is the ratio of true positive results to all identified positive results, while recall is the ratio of true positive results to all actual positive instances. By combining precision and recall in a single metric, the F1 score provides a balanced measure of a test’s accuracy. It ranges from 0 to 1, with 1 indicating perfect precision and recall, and 0 if either precision or recall is zero.

The prediction results are evaluated using pixel accuracy (PA), which stands for the proportion of total pixels classified correctly over the total number of pixels of original samples. The PA value is closer to one, the segmentation is more accurate. The closer the value is to one, the more accurate the segmentation. The formula for calculation is as follows:

Mean pixel accuracy (MPA) is a step up from PA. It calculates PA for each class separately, then averages PA for all classes.

The Jaccard index, or Jaccard similarity coefficient, serves as a statistical measure to assess the similarity and diversity between sample sets. Introduced by Grove Karl Gilbert in 1884, it is formulated as the ratio of verification [ 78 ]. The Jaccard coefficient quantifies the similarity of finite sample sets by calculating the size of their intersection divided by the size of their union. This metric is also referred to as Intersection over Union (IoU).

The mean intersection over union (mIoU) is used to calculate different categories of IoU in the image, and then calculate the average value is calculated as the final result. For image segmentation, the calculation formula of mIoU is as follows:

The Dice coefficient is a fixed similarity measurement function that is commonly used to determine the similarity of two samples. In the segmentation task, we consider the model prediction result and the real mask to be two sets with the same number of elements, and the value of the Dice coefficient is used to judge the quality of the model prediction result.

The directed average Hausdorff distance from point set X to Y is given by the sum of all minimum distances from all points from point set X to Y divided by the number of points in X . The average Hausdorff distance can be calculated as the mean of the directed average Hausdorff distance from X to Y and directed average Hausdorff distance from Y to X . In the medical image segmentation domain, the point sets X and Y refer to the voxels of the ground truth and the segmentation, respectively. The average Hausdorff distance between the voxel sets of ground truth and segmentation can be calculated in millimeters or voxels.

Unlike general image datasets, medical image annotation requires doctors with professional experience to devote significant time to annotation. The majority of the early pathological image data are of a small scale. Deep learning models, particularly transformer-based models, rely heavily on large-scale data to perform well. A novel labeling strategy involves training a deep learning model with a small amount of data and then manually modifying the model’s prediction results to continuously expand and improve the dataset. Some public datasets used in many popular medical image segmentation tasks have been compiled in Table 1 to assist readers in conducting relevant experiments quickly. In the “Resolving power (pixel)” column of Table 1 , “~” indicates that the image resolution in the dataset is not uniform. For example, in the GLAS dataset, the minimum image resolution is 567 × 430 and the maximum resolution is 755 × 522. “*” is only used in 3D image datasets to indicate that the number of channels in the dataset is not fixed, even if the image resolution is the same.

Summary and outlook

Transformers have emerged as a hot topic in the field of deep learning, and they can be found in a variety of downstream tasks in NLP and computer vision. The hybrid model of the convolutional neural network and transformer performs well in the task of medical image segmentation. However, using transformer to process medical images still presents significant challenges:

1. The medical image dataset is small: labeling medical images requires doctors with professional experience, and medical images have high resolution, so labeling medical images takes time and money. Existing medical image datasets have a small sample size. Using transformers to their full potential in capturing long-distance dependencies necessitates more samples, which most medical image datasets lack.

2. Transformer lacks location information: Object location information is critical for segmentation results in medical image segmentation tasks. Transformer can only embed position information through learning because it does not contain position information. However, the location information is different for different data sets, and the requirements for location information are different, so the methods of learning location are also different, which has a significant impact on the model’s generalization.

3. The self-attention mechanism only works between image patches: after the image is serialized, the calculation of the attention weight is only performed between image patches, and the relationship between the pixels within the image patch is ignored. Critical information between pixels can affect model accuracy when segmenting, recognizing, or detecting small objects and tasks with blurred boundaries.

figure 7

Three network structures for transformer applications in medical image segmentation tasks

Based on the transformer’s current status and challenges in medical image segmentation, the following suggestions and prospects for future research are made:

The transformer’s ability to extract global key features from large datasets has been leveraged to train the model on large datasets with auxiliary tasks or to learn existing labeled image features to automatically generate high-confidence pseudo labels. These approaches are effective in addressing the challenge of small-scale medical image datasets.

Integrating prior knowledge about the location can assist the model in highlighting important features of the target task. The position encoding for transformer can be thoughtfully designed to incorporate prior knowledge of the image position, thereby enhancing the model’s ability to generalize.

Optimizing the model structure is crucial. A large receptive field transformer can extract global key features, while a convolutional neural network is better suited for capturing small local features through continuous convolution pooling, which is essential for segmentation tasks. Therefore, the fusion strategy between the two methods needs to be optimized to fully leverage their respective strengths and ensure the model’s optimal performance.

The transformer has become one of the most popular deep learning frameworks in the last 2 years. It can alleviate the problems of scattered target regions and large shape differences in medical image segmentation tasks due to its advantage of obtaining global context. As shown in Fig.  7 , both CNN and transformer have their advantages. The transformer can use the convolutional neural network structure to fully exploit the ability of sample information to extract multiscale local spatial features, allowing the model’s global and local information to achieve a better balance and improve model performance. We summarize recent research on the hybrid model of convolutional neural networks and transformers in this paper. Transformers have good development prospects and high research significance in the field of medical image segmentation, based on the performance of the model in this paper.

Availability of data and materials

All data can be found on the corresponding page of the cited literature.

Xu A, Wang L, Feng S, Qu Y. Threshold-based level set method of image segmentation. In: 2010 Third International Conference on Intelligent Networks and Intelligent Systems, pp. 703–706 (2010). IEEE

Cigla C, Alatan A.A. Region-based image segmentation via graph cuts. In: 2008 15th IEEE International Conference on Image Processing, pp. 2272–2275 (2008). IEEE

Yu-Qian Z, Wei-Hua G, Zhen-Cheng C, Jing-Tian T, Ling-Yun L. Medical images edge detection based on mathematical morphology. In: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pp. 6492–6495 (2006). IEEE

Ma Z, Tavares J.M.R, Jorge R.N. A review on the current segmentation algorithms for medical images. In: International Conference on Imaging Theory and Applications, vol. 1, pp. 135–140 (2009). SciTePress

Ferreira A, Gentil F, Tavares JMR. Segmentation algorithms for ear image data towards biomechanical studies. Comput Methods Biomech Biomed Eng. 2014;17(8):888–904.

Article   Google Scholar  

Ma Z, Tavares JMR, Jorge RN, Mascarenhas T. A review of algorithms for medical image segmentation and their applications to the female pelvic cavity. Comput Methods Biomech Biomed Eng. 2010;13(2):235–46.

Liu Y, Wang J, Wu C, Liu L, Zhang Z, Yu H. Fovea-unet: Detection and segmentation of lymph node metastases in colorectal cancers with deep learning (2023)

Gu H, Gan W, Zhang C, Feng A, Wang H, Huang Y, Chen H, Shao Y, Duan Y, Xu Z. A 2d–3d hybrid convolutional neural network for lung lobe auto-segmentation on standard slice thickness computed tomography of patients receiving radiotherapy. BioMed Eng OnLine. 2021;20:1–13.

Jin Q, Meng Z, Sun C, Cui H, Su R. Ra-unet: a hybrid deep attention-aware network to extract liver and tumor in ct scans. Front Bioeng Biotechnol. 2020;8: 605132.

Article   PubMed   PubMed Central   Google Scholar  

Sarker M.M.K, Rashwan H.A, Akram F, Banu S.F, Saleh A, Singh V.K, Chowdhury F.U, Abdulwahab S, Romani S, Radeva P, et al. Slsdeep: Skin lesion segmentation based on dilated residual and pyramid pooling networks. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11, pp. 21–29 (2018). Springer

Wang Z, Peng Y, Li D, Guo Y, Zhang B. Mmnet: a multi-scale deep learning network for the left ventricular segmentation of cardiac mri images. Appl Intell. 2022;52(5):5225–40.

Guo C, Szemenyei M, Yi Y, Wang W, Chen B, Fan C. Sa-unet: Spatial attention u-net for retinal vessel segmentation. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 1236–1242 (2021). IEEE

Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). Springer

Razzak M.I, Naz S, Zaib A. Deep learning for medical image processing: Overview, challenges and the future. Classification in BioApps: Automation of Decision Making, 323–350 (2018)

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A.N, Kaiser Ł, Polosukhin I: Attention is all you need. Advances in neural information processing systems 30 (2017)

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

Xiao H, Li L, Liu Q, Zhu X, Zhang Q. Transformers in medical image segmentation: a review. Biomed Signal Process Control. 2023;84: 104791.

Atabansi CC, Nie J, Liu H, Song Q, Yan L, Zhou X. A survey of transformer applications for histopathological image analysis: new developments and future directions. BioMed Eng OnLine. 2023;22(1):96.

Azad R, Kazerouni A, Heidari M, Aghdam EK, Molaei A, Jia Y, Jose A, Roy R, Merhof D. Advances in medical image analysis with vision transformers: a comprehensive review. Med Image Anal. 2024;91: 103000. https://doi.org/10.1016/j.media.2023.103000 .

Article   PubMed   Google Scholar  

Nanni L, Fantozzi C, Loreggia A, Lumini A. Ensembles of convolutional neural networks and transformers for polyp segmentation. Sensors. 2023;23(10):4688.

Article   ADS   PubMed   PubMed Central   Google Scholar  

Ghazouani F, Vera P, Ruan S. Efficient brain tumor segmentation using swin transformer and enhanced local self-attention. International Journal of Computer Assisted Radiology and Surgery, 1–9. 2023.

Ali H, Mohsen F, Shah Z. Improving diagnosis and prognosis of lung cancer using vision transformers: a scoping review. BMC Med Imaging. 2023;23(1):129.

Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–7.

Article   ADS   MathSciNet   CAS   PubMed   Google Scholar  

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. 2015.

Çiçek Ö, Abdulkadir A, Lienkamp S.S, Brox T, Ronneberger O. 3d u-net: learning dense volumetric segmentation from sparse annotation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, pp. 424–432. 2016. Springer

Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–11.

Article   CAS   PubMed   Google Scholar  

Ba J.L, Kiros J.R, Hinton G.E. Layer normalization. arXiv preprint arXiv:1607.06450 . 2016.

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022. 2021.

Xiao X, Lian S, Luo Z, Li S. Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), pp. 327–331. 2018. IEEE

Zhou Z, Rahman Siddiquee M.M, Tajbakhsh N, Liang J. Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pp. 3–11. 2018. Springer

Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille A.L, Zhou Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 . 2021.

Milletari F, Navab N, Ahmadi S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. 2016. Ieee

Fu S, Lu Y, Wang Y, Zhou Y, Shen W, Fishman E, Yuille A. Domain adaptive relational reasoning for 3d multi-organ segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pp. 656–666. 2020. Springer

Oktay O, Schlemper J, Folgoc L.L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla N.Y, Kainz B, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 . 2018.

Wang W, Chen C, Ding M, Yu H, Zha S, Li J. Transbts: Multimodal brain tumor segmentation using transformer. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 109–119. 2021. Springer

Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, Roth H.R, Xu D. Unetr: Transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584. 2022.

Landman B, Xu Z, Igelsias J, Styner M, Langerak T, Klein A. Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In: Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge, vol. 5, p. 12. 2015.

Antonelli M, Reinke A, Bakas S, Farahani K, Kopp-Schneider A, Landman BA, Litjens G, Menze B, Ronneberger O, Summers RM, et al. The medical segmentation decathlon Nature communications. 2022; 13(1):4128.

Hatamizadeh A, Nath V, Tang Y, Yang D, Roth H.R, Xu D. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In: International MICCAI Brainlesion Workshop, pp. 272–284. 2021. Springer

Yan X, Tang H, Sun S, Ma H, Kong D, Xie X. After-unet: Axial fusion transformer unet for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3971–3981. 2022.

Landman B, Xu Z, Igelsias J.E, Styner M, Langerak T.R, Klein A. 2015 miccai multi-atlas labeling beyond the cranial vault workshop and challenge. In: Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge. 2015.

Chen X, Sun S, Bai N, Han K, Liu Q, Yao S, Tang H, Zhang C, Lu Z, Huang Q, et al. A deep learning-based auto-segmentation system for organs-at-risk on whole-body computed tomography images for radiation therapy. Radiother Oncol. 2021;160:175–84.

Lambert Z, Petitjean C, Dubray B, Kuan S. Segthor: Segmentation of thoracic organs at risk in ct images. In: 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6. 2020. IEEE

Zhang G, Wong H.-C, Wang C, Zhu J, Lu L, Teng G. A temporary transformer network for guide-wire segmentation. In: 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–5. 2021. IEEE

Li Y, Wang Z, Yin L, Zhu Z, Qi G, Liu Y. X-net: a dual encoding–decoding method in medical image segmentation. The Visual Computer, 1–11. 2021.

Zhang Y, Liu H, Hu Q. Transfuse: Fusing transformers and cnns for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 14–24. 2021. Springer

Chen L.-C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 . 2017.

Gu Z, Cheng J, Fu H, Zhou K, Hao H, Zhao Y, Zhang T, Gao S, Liu J. Ce-net: context encoder network for 2d medical image segmentation. IEEE Trans Med Imaging. 2019;38(10):2281–92.

Ji Y, Zhang R, Wang H, Li Z, Wu L, Zhang S, Luo P. Multi-compound transformer for accurate biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 326–336. 2021. Springer

Xu R, Wang C, Xu S, Meng W, Zhang X. Dc-net: Dual context network for 2d medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 503–513. 2021. Springer

Zhang Q.-L, Yang Y.-B. Sa-net: Shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. 2021. IEEE

Reid M, Marrese-Taylor E, Matsuo Y. Subformer: Exploring weight sharing for parameter efficiency in generative transformers. arXiv preprint arXiv:2101.00234 . 2021.

Xie Y, Zhang J, Shen C, Xia Y. Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, pp. 171–180. 2021. Springer

Deng K, Meng Y, Gao D, Bridge J, Shen Y, Lip G, Zhao Y, Zheng Y. Transbridge: A lightweight transformer for left ventricle segmentation in echocardiography. In: Simplifying Medical Ultrasound: Second International Workshop, ASMUS 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 2, pp. 63–72. 2021. Springer

Chen L.-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818. 2018.

Petit O, Thome N, Rambour C, Themyr L, Collins T, Soler L. U-net transformer: Self and cross attention for medical image segmentation. In: Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pp. 267–276. 2021. Springer

Luo C, Zhang J, Chen X, Tang Y, Weng X, Xu F. Ucatr: Based on cnn and transformer encoding and cross-attention decoding for lesion segmentation of acute ischemic stroke in non-contrast computed tomography images. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 3565–3568. 2021. IEEE

Zhang J, Liu Y, Wu Q, Wang Y, Liu Y, Xu X, Song B. Swtru: star-shaped window transformer reinforced u-net for medical image segmentation. Comput Biol Med. 2022;150: 105954.

Selvi E, SELVER M, Kavur A, GÜZELİŞ C, DİCLE O. Segmentation of abdominal organs from mr images using multi-level hierarchical classification. J Faculty Eng Arch Gazi Univ. 2015; 30(3).

Bilic P, Christ P, Li HB, Vorontsov E, Ben-Cohen A, Kaissis G, Szeskin A, Jacobs C, Mamani GEH, Chartrand G, et al. The liver tumor segmentation benchmark (lits). Med Image Anal. 2023;84: 102680.

Buda M, Saha A, Mazurowski MA. Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm. Comput Biol Med. 2019;109:218–25.

Mazurowski MA, Clark K, Czarnek NM, Shamsesfandabadi P, Peters KB, Saha A. Radiogenomics of lower-grade glioma: algorithmically-assessed tumor shape is associated with tumor genomic subtypes and patient outcomes in a multi-institutional study with the cancer genome atlas data. J Neuro Oncol. 2017;133:27–35.

Article   CAS   Google Scholar  

Codella N, Rotemberg V, Tschandl P, Celebi M.E, Dusza S, Gutman D, Helba B, Kalloo A, Liopyris K, Marchetti M, et al. Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368 . 2019.

Wang H, Xie S, Lin L, Iwamoto Y, Han X.-H, Chen Y.-W, Tong R. Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2390–2394. 2022. IEEE

Ho J, Kalchbrenner N, Weissenborn D, Salimans T. Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 . 2019.

Guo M, Zhang Y, Liu T. Gaussian transformer: a lightweight approach for natural language inference. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6489–6496. 2019.

Gao Y, Zhou M, Metaxas D.N. Utnet: a hybrid transformer architecture for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, pp. 61–71. 2021. Springer

Valanarasu J.M.J, Oza P, Hacihaliloglu I, Patel V.M. Medical transformer: Gated axial-attention for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 36–46. 2021. Springer

Yuan F, Zhang Z, Fang Z. An effective cnn and transformer complementary network for medical image segmentation. Pattern Recogn. 2023;136: 109228.

Ibtehaz N, Rahman MS. Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw. 2020;121:74–87.

Wang H, Cao P, Wang J, Zaiane O.R. Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2441–2449. 2022.

Sirinukunwattana K, Pluim JP, Chen H, Qi X, Heng P-A, Guo YB, Wang LY, Matuszewski BJ, Bruni E, Sanchez U, et al. Gland segmentation in colon histology images: the glas challenge contest. Med Image Anal. 2017;35:489–502.

Kumar N, Verma R, Sharma S, Bhargava S, Vahadane A, Sethi A. A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans Med Imaging. 2017;36(7):1550–60.

Karimi D, Vasylechko S.D, Gholipour A. Convolution-free medical image segmentation using transformers. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 78–88. 2021. Springer

Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M. Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision, pp. 205–218. 2022. Springer

Liao Z, Xu K, Fan N. Swin transformer assisted prior attention network for medical image segmentation. In: Proceedings of the 8th International Conference on Computing and Artificial Intelligence, pp. 491–497. 2022.

Lin A, Chen B, Xu J, Zhang Z, Lu G, Zhang D. Ds-transunet: dual swin transformer u-net for medical image segmentation. IEEE Trans Instrumentation Measure. 2022;71:1–15.

Google Scholar  

Murphy AH. The finley affair: a signal event in the history of forecast verification. Weather Forecasting. 1996;11(1):3–20.

Article   ADS   Google Scholar  

Hoover A, Kouznetsova V, Goldbaum M. Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans Med Imaging. 2000;19(3):203–10.

Staal J, Abràmoff MD, Niemeijer M, Viergever MA, Van Ginneken B. Ridge-based vessel segmentation in color images of the retina. IEEE Trans Med Imaging. 2004;23(4):501–9.

Ruggeri A, Scarpa F, De Luca M, Meltendorf C, Schroeter J. A system for the automatic estimation of morphometric parameters of corneal endothelium in alizarine red-stained images. Br J Ophthalmol. 2010;94(5):643–7.

Fraz MM, Remagnino P, Hoppe A, Uyyanonvara B, Rudnicka AR, Owen CG, Barman SA. Blood vessel segmentation methodologies in retinal images-a survey. Comput Methods Programs Biomed. 2012;108(1):407–33.

Budai A, Bock R, Maier A, Hornegger J, Michelson G, et al. Robust vessel segmentation in fundus images. International Journal of biomedical imaging. 2013.

Caicedo JC, Goodman A, Karhohs KW, Cimini BA, Ackerman J, Haghighi M, Heng C, Becker T, Doan M, McQuin C, et al. Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nat Methods. 2019;16(12):1247–53.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Naylor P, Laé M, Reyal F, Walter T. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Trans Med Imaging. 2018;38(2):448–59.

Porwal P, Pachade S, Kamble R, Kokare M, Deshmukh G, Sahasrabuddhe V, Meriaudeau F. Indian diabetic retinopathy image dataset (idrid): a database for diabetic retinopathy screening research. Data. 2018;3(3):25.

Li T, Gao Y, Wang K, Guo S, Liu H, Kang H. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf Sci. 2019;501:511–22.

Gamper J, Alemi Koohbanani N, Benet K, Khuram A, Rajpoot N. Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification. In: Digital Pathology: 15th European Congress, ECDP 2019, Warwick, UK, April 10–13, 2019, Proceedings 15, pp. 11–19. 2019. Springer

Valanarasu JMJ, Yasarla R, Wang P, Hacihaliloglu I, Patel VM. Learning to segment brain anatomy from 2d ultrasound with less data. IEEE J Selected Topics Signal Process. 2020;14(6):1221–34.

Jha D, Smedsrud P.H, Riegler M.A, Halvorsen P, Lange T, Johansen D, Johansen H.D. Kvasir-seg: A segmented polyp dataset. In: MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26, pp. 451–462. 2020. Springer

Zhang Y, Higashita R, Fu H, Xu Y, Zhang Y, Liu H, Zhang J, Liu J. A multi-branch hybrid transformer network for corneal endothelial cell segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 99–108. 2021. Springer

Litjens G, Toth R, Van De Ven W, Hoeks C, Kerkstra S, Van Ginneken B, Vincent G, Guillard G, Birbeck N, Zhang J, et al. Evaluation of prostate segmentation algorithms for mri: the promise12 challenge. Med Image Anal. 2014;18(2):359–73.

Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P-A, Cetin I, Lekadir K, Camara O, Ballester MAG, et al. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans Med Imaging. 2018;37(11):2514–25.

Hatamizadeh A, Terzopoulos D, Myronenko A. End-to-end boundary aware networks for medical image segmentation. In: Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10, pp. 187–194. 2019. Springer

Heller N, Sathianathen N, Kalapara A, Walczak E, Moore K, Kaluzniak H, Rosenberg J, Blake P, Rengel Z, Oestreich M, et al. The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes. arXiv preprint arXiv:1904.00445 . 2019.

Download references

This work was supported by the National Key Research and Development Program of China (2020YFA0710700 and 2021YFA1200904), the National Natural Science Foundation of China (12375326 and 31971311), the Innovation Program for IHEP (E35457U2).

Author information

Authors and affiliations.

School of Information Engineering, Minzu University of China, Beijing, 100081, China

Qiumei Pu, Zuoxin Xi & Shuai Yin

CAS Key Laboratory for Biomedical Effects of Nanomaterials and Nanosafety Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, 100049, China

Zuoxin Xi & Lina Zhao

The Fourth Medical Center of PLA General Hospital, Beijing, 100039, China

You can also search for this author in PubMed   Google Scholar


LZ: conceptualization, methodology. QP and ZX: collection, organizing, and review of the literature. QP, ZX and SY: preparing the manuscript. LZ, QP and ZX: manuscript review and modification. LZ, QP, ZX, SY and ZZ: editing and revision. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lina Zhao .

Ethics declarations

Consent for publication.

Not applicable.

Competing interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Pu, Q., Xi, Z., Yin, S. et al. Advantages of transformer and its application for medical image segmentation: a survey. BioMed Eng OnLine 23 , 14 (2024). https://doi.org/10.1186/s12938-024-01212-4

Download citation

Received : 22 September 2023

Accepted : 22 January 2024

Published : 03 February 2024

DOI : https://doi.org/10.1186/s12938-024-01212-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Medical image
  • Segmentation

BioMedical Engineering OnLine

ISSN: 1475-925X

biomedical image processing research papers

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 12 April 2022

Machine learning for medical imaging: methodological failures and recommendations for the future

  • Gaël Varoquaux 1 , 2 , 3 &
  • Veronika Cheplygina   ORCID: orcid.org/0000-0003-0176-9324 4  

npj Digital Medicine volume  5 , Article number:  48 ( 2022 ) Cite this article

57k Accesses

149 Citations

283 Altmetric

Metrics details

  • Computer science
  • Medical research
  • Research data

Research in computer analysis of medical images bears many promises to improve patients’ health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.


Machine learning, the cornerstone of today’s artificial intelligence (AI) revolution, brings new promises to clinical practice with medical images 1 , 2 , 3 . For example, to diagnose various conditions from medical images, machine learning has been shown to perform on par with medical experts 4 . Software applications are starting to be certified for clinical use 5 , 6 . Machine learning may be the key to realizing the vision of AI in medicine sketched several decades ago 7 .

The stakes are high, and there is a staggering amount of research on machine learning for medical images. But this growth does not inherently lead to clinical progress. The higher volume of research could be aligned with the academic incentives rather than the needs of clinicians and patients. For example, there can be an oversupply of papers showing state-of-the-art performance on benchmark data, but no practical improvement for the clinical problem. On the topic of machine learning for COVID, Robert et al. 8 reviewed 62 published studies, but found none with potential for clinical use.

In this paper, we explore avenues to improve clinical impact of machine learning in medical imaging. After sketching the situation, documenting uneven progress in Section It’s not all about larger datasets, we study a number of failures frequent in medical imaging papers, at different steps of the “publishing lifecycle”: what data to use (Section Data, an imperfect window on the clinic), what methods to use and how to evaluate them (Section Evaluations that miss the target), and how to publish the results (Section Publishing, distorted incentives). In each section, we first discuss the problems, supported with evidence from previous research as well as our own analyses of recent papers. We then discuss a number of steps to improve the situation, sometimes borrowed from related communities. We hope that these ideas will help shape research practices that are even more effective at addressing real-world medical challenges.

It’s not all about larger datasets

The availability of large labeled datasets has enabled solving difficult machine learning problems, such as natural image recognition in computer vision, where datasets can contain millions of images. As a result, there is widespread hope that similar progress will happen in medical applications, algorithm research should eventually solve a clinical problem posed as discrimination task. However, medical datasets are typically smaller, on the order of hundreds or thousands: 9 share a list of sixteen “large open source medical imaging datasets”, with sizes ranging from 267 to 65,000 subjects. Note that in medical imaging we refer to the number of subjects, but a subject may have multiple images, for example, taken at different points in time. For simplicity here we assume a diagnosis task with one image/scan per subject.

Few clinical questions come as well-posed discrimination tasks that can be naturally framed as machine-learning tasks. But, even for these, larger datasets have to date not lead to the progress hoped for. One example is that of early diagnosis of Alzheimer’s disease (AD), which is a growing health burden due to the aging population. Early diagnosis would open the door to early-stage interventions, most likely to be effective. Substantial efforts have acquired large brain-imaging cohorts of aging individuals at risk of developing AD, on which early biomarkers can be developed using machine learning 10 . As a result, there have been steady increases in the typical sample size of studies applying machine learning to develop computer-aided diagnosis of AD, or its predecessor, mild cognitive impairment. This growth is clearly visible in publications, as on Fig. 1 a, a meta-analysis compiling 478 studies from 6 systematic reviews 4 , 11 , 12 , 13 , 14 , 15 .

figure 1

A meta-analysis across 6 review papers, covering more than 500 individual publications. The machine-learning problem is typically formulated as distinguishing various related clinical conditions, Alzheimer’s Disease (AD), Healthy Control (HC), and Mild Cognitive Impairment, which can signal prodromal Alzheimer’s . Distinguishing progressive mild cognitive impairment (pMCI) from stable mild cognitive impairment (sMCI) is the most relevant machine-learning task from the clinical standpoint. a Reported sample size as a function of the publication year of a study. b Reported prediction accuracy as a function of the number of subjects in a study. c Same plot distinguishing studies published in different years.

However, the increase in data size (with the largest datasets containing over a thousand subjects) did not come with better diagnostic accuracy, in particular for the most clinically relevant question, distinguishing pathological versus stable evolution for patients with symptoms of prodromal Alzheimer’s (Fig. 1 b). Rather, studies with larger sample sizes tend to report worse prediction accuracy. This is worrisome, as these larger studies are closer to real-life settings. On the other hand, research efforts across time did lead to improvements even on large, heterogeneous cohorts (Fig. 1 c), as studies published later show improvements for large sample sizes (statistical analysis in Supplementary Information) . Current medical-imaging datasets are much smaller than those that brought breakthroughs in computer vision. Although a one-to-one comparison of sizes cannot be made, as computer vision datasets have many classes with high variation (compared to few classes with less variation in medical imaging), reaching better generalization in medical imaging may require assembling significantly larger datasets, while avoiding biases created by opportunistic data collection, as described below.

Data, an imperfect window on the clinic

Datasets may be biased: reflect an application only partly.

Available datasets only partially reflect the clinical situation for a particular medical condition, leading to dataset bias 16 . As an example, a dataset collected as part of a population study might have different characteristics that people who are referred to the hospital for treatment (higher incidence of a disease). As the researcher may be unaware of the corresponding dataset bias is can lead to important that shortcomings of the study. Dataset bias occurs when the data used to build the decision model (the training data), has a different distribution than the data on which it should be applied 17 (the test data). To assess clinically-relevant predictions, the test data must match the actual target population, rather than be a random subset of the same data pool as the train data, the common practice in machine-learning studies. With such a mismatch, algorithms which score high in benchmarks can perform poorly in real world scenarios 18 . In medical imaging, dataset bias has been demonstrated in chest X-rays 19 , 20 , 21 , retinal imaging 22 , brain imaging 23 , 24 , histopathology 25 , or dermatology 26 . Such biases are revealed by training and testing a model across datasets from different sources, and observing a performance drop across sources.

There are many potential sources of dataset bias in medical imaging, introduced at different phases of the modeling process 27 . First, a cohort may not appropriately represent the range of possible patients and symptoms, a bias sometimes called spectrum bias 28 . A detrimental consequence is that model performance can be overestimated for different groups, for example between male and female individuals 21 , 26 . Yet medical imaging publications do not always report the demographics of the data.

Imaging devices or procedures may lead to specific measurement biases. A bias particularly harmful to clinically relevant automated diagnosis is when the data capture medical interventions. For instance, on chest X-ray datasets, images for the “pneumothorax” condition sometimes show a chest drain, which is a treatment for this condition, and which would not yet be present before diagnosis 29 . Similar spurious correlations can appear in skin lesion images due to markings placed by dermatologists next to the lesions 30 .

Labeling errors can also introduce biases. Expert human annotators may have systematic biases in the way they assign different labels 31 , and it is seldom possible to compensate with multiple annotators. Using automatic methods to extract labels from patient reports can also lead to systematic errors 32 . For example, a report on a follow-up scan that does not mention previously-known findings, can lead to an incorrect “negative” labels.

Dataset availability distorts research

The availability of datasets can influence which applications are studied more extensively. A striking example can be seen in two applications of oncology: detecting lung nodules, and detecting breast tumors in radiological images. Lung datasets are widely available on Kaggle or grand-challenge.org , contrasted with (to our knowledge) only one challenge focusing on mammograms. We look at the popularity of these topics, here defined by the fraction of papers focusing on lung or breast imaging, either in literature on general medical oncology, or literature on AI. In medical oncology this fraction is relatively constant across time for both lung and breast imaging, but in the AI literature lung imaging publications show a substantial increase in 2016 (Fig. 2 , methodological details in Supplementary Information ). We suspect that the Kaggle lung challenges published around that time contributed to this disproportional increase. A similar point on dataset trends has been made throughout the history of machine learning in general 33 .

figure 2

We show the percentage of papers on lung cancer (in blue) vs breast cancer (in red), relative to all papers within two fields: medical oncology (solid line) and AI (dotted line). Details on how the papers are selected are given in the Supplementary Information) . The percentages are relatively constant, except lung cancer in AI, which shows an increase after 2016.

Let us build awareness of data limitations

Addressing such problems arising from the data requires critical thinking about the choice of datasets, at the project level, i.e. which datasets to select for a study or a challenge, and at a broader level, i.e. which datasets we work on as a community.

At the project level, the choice of the dataset will influence the models trained on the data, and the conclusions we can draw from the results. An important step is using datasets from multiple sources, or creating robust datasets from the start when feasible 9 . However, existing datasets can still be critically evaluated for dataset bias 34 , hidden subgroups of patients 29 , or mislabeled instances 35 . A checklist for such evaluation on computer vision datasets is presented in Zendel et al. 18 . When problems are discovered, relabeling a subset of the data can be a worthwhile investment 36 .

At the community level, we should foster understanding of the datasets’ limitations. Good documentation of datasets should describe their characteristics and data collection 37 . Distributed models should detail their limitations and the choices made to train them 38 .

Meta-analyses which look at evolution of dataset use in different areas are another way to reflect on current research efforts. For example, a survey of crowdsourcing in medical imaging 39 shows a different distribution of applications than surveys focusing on machine learning 1 , 2 . Contrasting more clinically-oriented venues to more technical venues can reveal opportunities for machine learning research.

Evaluations that miss the target

Evaluation error is often larger than algorithmic improvements.

Research on methods often focuses on outperforming other algorithms on benchmark datasets. But too strong a focus on benchmark performance can lead to diminishing returns , where increasingly large efforts achieve smaller and smaller performance gains. Is this also visible in the development of machine learning in medical imaging?

We studied performance improvements in 8 Kaggle medical-imaging challenges, 5 on detection of diagnosis of diseases and 3 on image segmentation (details in Supplementary Information) . We use the differences in algorithms performance between the public and private leaderboards (two test sets used in the challenge) to quantify the evaluation noise –the spread of performance differences between the public and private test sets–, in Fig. 3 . We compare its distribution to the winner gap —the difference in performance between the best algorithm, and the “top 10%” algorithm.

figure 3

The blue violin plot shows the evaluation noise —the distribution of differences between public and private leaderboards. A systematic shift between public and private set (positive means that the private leaderboard is better than the public leaderboard) indicates overfitting or dataset bias. The width of this distribution shows how noisy the evaluation is, or how representative the public score is for the private score. The brown bar is the winner gap , the improvement between the top-most model (the winner) and the 10% best model. It is interesting to compare this improvement to the shift and width in the difference between the public and private sets: if the winner gap is smaller, the 10% best models reached diminishing returns and did not lead to a actual improvement on new data.

Overall, 6 of the 8 challenges are in the diminishing returns category. For 5 challenges—lung cancer, schizophrenia, prostate cancer diagnosis and intracranial hemorrhage detection—the evaluation noise is worse than the winner gap. In other words, the gains made by the top 10% of methods are smaller than the expected noise when evaluating a method.

For another challenge, pneumothorax segmentation, the performance on the private set is worse than on the public set, revealing an overfit larger than the winner gap. Only two challenges (covid 19 abnormality and nerve segmentation) display a winner gap larger than the evaluation noise, meaning that the winning method made substantial improvements compared to the 10% competitor.

Improper evaluation procedures and leakage

Unbiased evaluation of model performance relies on training and testing the models with independent sets of data 40 . However incorrect implementations of this procedure can easily leak information, leading to overoptimistic results. For example some studies classifying ADHD based on brain imaging have engaged in circular analysis 41 , performing feature selection on the full dataset, before cross-validation. Another example of leakage arises when repeated measures of an individual are split across train and test set, the algorithm then learning to recognize the individual patient rather than markers of a condition 42 .

A related issue, yet more difficult to detect, is what we call “overfitting by observer”: even when using cross-validation, overfitting may still occur by the researcher adjusting the method to improve the observed cross-validation performance, which essentially includes the test folds into the validation set of the model. Skocik et al. 43 provide an illustration of this phenomenon by showing how by adjusting the model this way can lead to better-than-random cross-validation performance for randomly generated data. This can explain some of the overfitting visible in challenges (Section Evaluation error is often larger than algorithmic improvements), though with challenges a private test set reveals the overfitting, which is often not the case for published studies. Another recommendation for challenges would be to hold out several datasets (rather than a part of the same dataset), as is for example done in the Decathlon challenge 44 .

Metrics that do not reflect what we want

Evaluating models requires choosing a suitable metric. However, our understanding of “suitable” may change over time. For example, an image similarity metric which was widely used to evaluate image registration algorithms, was later shown to be ineffective as scrambled images could lead to high scores 45 .

In medical image segmentation, Maier-Hein et al. 46 review 150 challenges and show that the typical metrics used to rank algorithms are sensitive to different variants of the same metric, casting doubt on the objectivity of any individual ranking.

Important metrics may be missing from evaluation. Next to typical classification metrics (sensitivity, specificity, area under the curve), several authors argue for a calibration metric that compares the predicted and observed probabilities 28 , 47 .

Finally, the metrics used may not be synonymous with practical improvement 48 , 49 . For example, typical metrics in computer vision do not reflect important aspects of image recognition, such as robustness to out-of-distribution examples 49 . Similarly, in medical imaging, improvements in traditional metrics may not necessarily translate to different clinical outcomes, e.g. robustness may be more important than an accurate delineation in a segmentation application.

Incorrectly chosen baselines

Developing new algorithms builds upon comparing these to baselines. However, if these baselines are poorly chosen, the reported improvement may be misleading.

Baselines may not properly account for recent progress, as revealed in machine-learning applications to healthcare 50 , but also other applications of machine learning 51 , 52 , 53 .

Conversely, one should not forget simple approaches effective for the problem at hand. For example, Wen et al. 14 show that convolutional neural networks do not outperform support vector machines for Alzheimer’s disease diagnosis from brain imaging.

Finally, minute implementation details of algorithms may be important and many are not aware of implementation factors 54 .

Statistical significance not tested, or misunderstood

Experimental results are by nature noisy: results may depend on which specific samples were used to train the models, the random initializations, small differences in hyper-parameters 55 . However, benchmarking predictive models currently lacks well-adopted statistical good practices to separate out noise from generalizable findings.

A first, well-documented, source of brittleness arises from machine-learning experiments with too small sample sizes 56 . Indeed, testing predictive modeling requires many samples, more than conventional inferential studies, else the measured prediction accuracy may be a distant estimation of real-life performance. Sample sizes are growing, albeit slowly 57 . On a positive note, a meta-analysis of public vs private leaderboards on Kaggle 58 suggests that overfitting is less of an issue with “large enough” test data (at least several thousands).

Another challenge is that strong validation of a method requires it to be robust to details of the data. Hence validation should go beyond a single dataset, and rather strive for statistical consensus across multiple datasets 59 . Yet, the corresponding statistical procedures require dozens of datasets to establish significance and are seldom used in practice. Rather, medical imaging research often reuses the same datasets across studies, which raises the risk of finding an algorithm that performs well by chance, in an implicit multiple comparison problem 60 .

But overall medical imaging research seldom analyzes how likely empirical results are to be due to chance: only 6% of segmentation challenges surveyed 61 , and 15% out of 410 popular computer science papers published by ACM used a statistical test 62 .

However, null-hypothesis tests are often misinterpreted 63 , with two notable challenges: (1) the lack of statistically significant results does not demonstrate the absence of effect, and (2) any trivial effect can be significant given enough data 64 , 65 . For these reasons, Bouthiellier et al. 66 recommend to replace traditional null-hypothesis testing with superiority testing , testing that the improvement is above a given threshold.

Let us redefine evaluation

Higher standards for benchmarking.

Good machine-learning benchmarks are difficult. We compile below several recognized best practices for medical machine learning evaluation 28 , 40 , 67 , 68 :

Safeguarding from data leakage by separating out all test data from the start, before any data transformation.

A documented way of selecting model hyper-parameters (including architectural parameters for neural networks, the use of additional (unlabeled) dataset or transfer learning 2 ), without ever using data from the test set.

Enough data in the test set to bring statistical power, at least several hundreds samples, ideally thousands or more 9 , and confidence intervals on the reported performance metric—see Supplementary Information . In general, more research on appropriate sample sizes for machine learning studies would be helpful.

Rich data to represent the diversity of patients and disease heterogeneity, ideally multi-institutional data including all relevant patient demographics and disease state, with explicit inclusion criteria; other cohorts with different recruitment go the extra mile to establish external validity 69 , 70 .

Strong baselines that reflect the state of the art of machine-learning research, but also historical solutions including clinical methodologies not necessarily relying on medical imaging.

A discussion the variability of the results due to arbitrary choices (random seeds) and data sources with an eye on statistical significance—see Supplementary Information .

Using different quantitative metrics to capture the different aspects of the clinical problem and relating them to relevant clinical performance metrics. In particular, the potential health benefits from a detection of the outcome of interest should be used to choose the right trade off between false detections and misses 71 .

Adding qualitative accounts and involving groups that will be most affected by the application in the metric design 72 .

More than beating the benchmark

Even with proper validation and statistical significance testing, measuring a tiny improvement on a benchmark is seldom useful. Rather, one view is that, beyond rejecting a null, a method should be accepted based on evidence that it brings a sizable improvement upon the existing solutions. This type of criteria is related to superiority tests sometimes used in clinical trials 73 , 74 , 75 . These tests are easy to implement in predictive modeling benchmarks, as they amount to comparing the observed improvement to variation of the results due to arbitrary choices such as data sampling or random seeds 55 .

Organizing blinded challenges, with a hidden test set, mitigate the winner’s curse. But to bring progress, challenges should not only focus on the winner. Instead, more can be learned by comparing the competing methods and analyzing the determinants of success, as well as failure cases.

Evidence-based medicine good practices

A machine-learning algorithm deployed in clinical practice is a health intervention. There is a well-established practice to evaluate the impact of health intervention, building mostly on randomized clinical trials 76 . These require actually modifying patients’ treatments and thus should be run only after thorough evaluation on historical data.

A solid trial evaluates a well-chosen measure of patient health outcome, as opposed to predictive performance of an algorithm. Many indirect mechanisms may affect this outcome, including how the full care processes adapts to the computer-aided decision. For instance, a positive consequence of even imperfect predictions may be reallocating human resources to complex cases. But a negative consequence may be over-confidence leading to an increase in diagnostic errors. Cluster randomized trials can account for how modifications at the level of care unit impact the individual patient: care units, rather than individuals are randomly allocated to receive the intervention (the machine learning algorithm) 77 . Often, double blind is impossible: the care provider is aware of which arm of the study is used, the baseline condition or the system evaluated. Providers’ expectations can contribute to the success of a treatment, for instance via indirect placebo or nocebo effects 78 , making objective evaluation of the health benefits challenging, if these are small.

Publishing, distorted incentives

No incentive for clarity.

The publication process does not create incentives for clarity. Efforts to impress may give rise to unnecessary “mathiness” of papers or suggestive language 79 (such as “human-level performance”).

Important details may be omitted, from ablation experiments showing what part of the method drives improvements 79 , to reporting how algorithms were evaluated in a challenge [ 46 ]. This in turn undermines reproducibility: being able to reproduce the exact results or even draw the same conclusions 80 , 81 .

Optimizing for publication

As researchers our goal should be to solve scientific problems. Yet, the reality of the culture we exist in can distort this objective. Goodhart’s law summarizes well the problem: when a measure becomes a target, it ceases to be a good measure . As our academic incentive system is based publications, it erodes their scientific content via Goodhart’s law.

Methods publication are selected for their novelty. Yet, comparing 179 classifiers on 121 datasets shows no statistically significant differences between the top methods [ 82 ]. In order to sustain novelty, researchers may be introducing unnecessary complexity into the methods, that do not improve their prediction but rather contribute to technical debt, making systems harder to maintain and deploy 83 .

Another metric emphasized is obtaining “state-of-the-art” results, which leads to several of the evaluation problems outlined in Section Evaluations that miss the target. The pressure to publish “good” results can aggravate methodological loopholes 84 , for instance gaming the evaluation in machine learning 85 . It is then all too appealing to find after-the-fact theoretical justifications of positive yet fragile empirical findings. This phenomenon, known as HARKing (hypothesizing after the results are known) 86 , has been documented in machine learning 87 and computer science in general 62 .

Finally, the selection of publications creates the so-called “file drawer problem” 88 : positive results, some due to experimental flukes, are more likely to be published than corresponding negative findings. For example, in 410 most downloaded papers from the ACM, 97% of the papers which used significance testing had a finding with p -value of less than 0.05 62 . It seems highly unlikely that only 3% of the initial working hypotheses—even for impactful work—turned out not confirmed.

Let us improve our publication norms

Fortunately there are various alleys to improve reporting and transparency. For instance, the growing set of open datasets could be leveraged for collaborative work beyond the capacities of a single team 89 . The set of metrics studied could then be broadened, shifting the publication focus away from a single-dimension benchmark. More metrics can indeed help understanding a method’s strengths and weaknesses 41 , 90 , 91 , exploring for instance calibration metrics 28 , 47 , 92 or learning curves 93 . The medical-research literature has several reporting guidelines for prediction studies 67 , 94 , 95 . They underline many points raised in previous sections: reporting on how representative the study sample is, on the separation between train and test data, on the motivation for the choice of outcome, evaluation metrics, and so forth. Unfortunately, algorithmic research in medical imaging seldom refers to these guidelines.

Methods should be studied on more than prediction performance: reproducibility 81 , carbon footprint 96 , or a broad evaluation of costs should be put in perspective with the real-world patient outcomes, from a putative clinical use of the algorithms 97 .

Preregistration or registered reports can bring more robustness and trust: the motivation and experimental setup of a paper are to be reviewed before empirical results are available, and thus the paper is be accepted before the experiments are run 98 . Translating this idea to machine learning faces the challenge that new data is seldom acquired in a machine learning study, yet it would bring sizeable benefits 62 , 99 .

More generally, accelerating the progress in science calls for accepting that some published findings are sometimes wrong 100 . Popularizing different types of publications may help, for example publishing negative results 101 , replication studies 102 , commentaries 103 and reflections on the field 68 or the recent NeurIPS Retrospectives workshops. Such initiatives should ideally be led by more established academics, and be welcoming of newcomers 104 .


Despite great promises, the extensive research in medical applications of machine learning seldom achieves a clinical impact. Studying the academic literature and data-science challenges reveals troubling trends: accuracy on diagnostic tasks progresses slower on research cohorts that are closer to real-life settings; methods research is often guided by dataset availability rather than clinical relevance; many developments of model bring improvements smaller than the evaluation errors. We have surveyed challenges of clinical machine-learning research that can explain these difficulties. The challenges start with the choice of datasets, plague model evaluation, and are amplified by publication incentives. Understanding these mechanisms enables us to suggest specific strategies to improve the various steps of the research cycle, promoting publications best practices 105 . None of these strategies are silver-bullet solutions. They rather require changing procedures, norms, and goals. But implementing them will help fulfilling the promises of machine-learning in healthcare: better health outcomes for patients with less burden on the care system.

Data availability

For reproducibility, all data used in our analyses are available on https://github.com/GaelVaroquaux/ml_med_imaging_failures .

Code availability

For reproducibility, all code for our analyses is available on https://github.com/GaelVaroquaux/ml_med_imaging_failures .

Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42 , 60–88 (2017).

Article   PubMed   Google Scholar  

Cheplygina, V., de Bruijne, M. & Pluim, J. P. W. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med. Image Anal. 54 , 280–296 (2019).

Zhou, S. K. et al. A review of deep learning in medical imaging: Image traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE1-19 (2020).

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health (2019).

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25 , 44–56 (2019).

Article   CAS   PubMed   Google Scholar  

Sendak, M. P. et al. A path for translation of machine learning products into healthcare delivery. Eur. Med. J. Innov. 10 , 19–00172 (2020).

Google Scholar  

Schwartz, W. B., Patil, R. S. & Szolovits, P. Artificial intelligence in medicine (1987).

Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3 , 199–217 (2021).

Article   Google Scholar  

Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology192224 (2020).

Mueller, S. G. et al. Ways toward an early diagnosis in Alzheimer’s disease: the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimer’s Dement. 1 , 55–66 (2005).

Dallora, A. L., Eivazzadeh, S., Mendes, E., Berglund, J. & Anderberg, P. Machine learning and microsimulation techniques on the prognosis of dementia: A systematic literature review. PLoS ONE 12 , e0179804 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Arbabshirani, M. R., Plis, S., Sui, J. & Calhoun, V. D. Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls. NeuroImage 145 , 137–165 (2017).

Sakai, K. & Yamada, K. Machine learning studies on major brain diseases: 5-year trends of 2014–2018. Jpn. J. Radiol. 37 , 34–72 (2019).

Wen, J. et al. Convolutional neural networks for classification of Alzheimer’s disease: overview and reproducible evaluation. Medical Image Analysis 101694 (2020).

Ansart, M. et al. Predicting the progression of mild cognitive impairment using machine learning: a systematic, quantitative and critical review. Medical Image Analysis 101848 (2020).

Torralba, A. & Efros, A. A. Unbiased look at dataset bias. In Computer Vision and Pattern Recognition (CVPR) , 1521–1528 (2011).

Dockès, J., Varoquaux, G. & Poline, J.-B. Preventing dataset shift from breaking machine-learning biomarkers. GigaScience 10 , giab055 (2021).

Zendel, O., Murschitz, M., Humenberger, M. & Herzner, W. How good is my test data? introducing safety analysis for computer vision. Int. J. Computer Vis. 125 , 95–109 (2017).

Pooch, E. H., Ballester, P. L. & Barros, R. C. Can we trust deep learning models diagnosis? the impact of domain shift in chest radiograph classification. In MICCAI workshop on Thoracic Image Analysis (Springer, 2019).

Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15 , e1002683 (2018).

Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H. & Ferrante, E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences (2020).

Tasdizen, T., Sajjadi, M., Javanmardi, M. & Ramesh, N. Improving the robustness of convolutional networks to appearance variability in biomedical images. In International Symposium on Biomedical Imaging (ISBI), 549–553 (IEEE, 2018).

Wachinger, C., Rieckmann, A., Pölsterl, S. & Initiative, A. D. N. et al. Detect and correct bias in multi-site neuroimaging datasets. Med. Image Anal. 67 , 101879 (2021).

Ashraf, A., Khan, S., Bhagwat, N., Chakravarty, M. & Taati, B. Learning to unlearn: building immunity to dataset bias in medical imaging studies. In NeurIPS workshop on Machine Learning for Health (ML4H) (2018).

Yu, X., Zheng, H., Liu, C., Huang, Y. & Ding, X. Classify epithelium-stroma in histopathological images based on deep transferable network. J. Microsc. 271 , 164–173 (2018).

Abbasi-Sureshjani, S., Raumanns, R., Michels, B. E., Schouten, G. & Cheplygina, V. Risk of training diagnostic algorithms on data with demographic bias. In Interpretable and Annotation-Efficient Learning for Medical Image Computing , 183–192 (Springer, 2020).

Suresh, H. & Guttag, J. V. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002 (2019).

Park, S. H. & Han, K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286 , 800–809 (2018).

Oakden-Rayner, L., Dunnmon, J., Carneiro, G. & Ré, C. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In ACM Conference on Health, Inference, and Learning, 151–159 (2020).

Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155 , 1135–1141 (2019).

Joskowicz, L., Cohen, D., Caplan, N. & Sosna, J. Inter-observer variability of manual contour delineation of structures in CT. Eur. Radiol. 29 , 1391–1399 (2019).

Oakden-Rayner, L. Exploring large-scale public medical image datasets. Academic Radiol. 27 , 106–112 (2020).

Langley, P. The changing science of machine learning. Mach. Learn. 82 , 275–279 (2011).

Rabanser, S., Günnemann, S. & Lipton, Z. C. Failing loudly: an empirical study of methods for detecting dataset shift. In Neural Information Processing Systems (NeurIPS) (2018).

Rädsch, T. et al. What your radiologist might be missing: using machine learning to identify mislabeled instances of X-ray images. In Hawaii International Conference on System Sciences (HICSS) (2020).

Beyer, L., Hénaff, O. J., Kolesnikov, A., Zhai, X. & Oord, A. v. d. Are we done with ImageNet? arXiv preprint arXiv:2006.07159 (2020).

Gebru, T. et al. Datasheets for datasets. In Workshop on Fairness, Accountability, and Transparency in Machine Learning (2018).

Mitchell, M. et al. Model cards for model reporting. In Fairness, Accountability, and Transparency (FAccT) , 220–229 (ACM, 2019).

Ørting, S. N. et al. A survey of crowdsourcing in medical image analysis. Hum. Comput. 7 , 1–26 (2020).

Poldrack, R. A., Huckins, G. & Varoquaux, G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry 77 , 534–540 (2020).

Pulini, A. A., Kerr, W. T., Loo, S. K. & Lenartowicz, A. Classification accuracy of neuroimaging biomarkers in attention-deficit/hyperactivity disorder: Effects of sample size and circular analysis. Biol. Psychiatry.: Cogn. Neurosci. Neuroimaging 4 , 108–120 (2019).

Saeb, S., Lonini, L., Jayaraman, A., Mohr, D. C. & Kording, K. P. The need to approximate the use-case in clinical machine learning. Gigascience 6 , gix019 (2017).

Hosseini, M. et al. I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data. Neuroscience & Biobehavioral Reviews (2020).

Simpson, A. L. et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019).

Rohlfing, T. Image similarity and tissue overlaps as surrogates for image registration accuracy: widely used but unreliable. IEEE Trans. Med. Imaging 31 , 153–163 (2011).

Maier-Hein, L. et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9 , 5217 (2018).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Van Calster, B., McLernon, D. J., Van Smeden, M., Wynants, L. & Steyerberg, E. W. Calibration: the Achilles heel of predictive analytics. BMC Med. 17 , 1–7 (2019).

Wagstaff, K. L. Machine learning that matters. In International Conference on Machine Learning (ICML), 529–536 (2012).

Shankar, V. et al. Evaluating machine accuracy on imagenet. In International Conference on Machine Learning (ICML) (2020).

Bellamy, D., Celi, L. & Beam, A. L. Evaluating progress on machine learning for longitudinal electronic healthcare data. arXiv preprint arXiv:2010.01149 (2020).

Oliver, A., Odena, A., Raffel, C., Cubuk, E. D. & Goodfellow, I. J. Realistic evaluation of semi-supervised learning algorithms. In Neural Information Processing Systems (NeurIPS) (2018).

Dacrema, M. F., Cremonesi, P. & Jannach, D. Are we really making much progress? a worrying analysis of recent neural recommendation approaches. In ACM Conference on Recommender Systems, 101–109 (2019).

Musgrave, K., Belongie, S. & Lim, S.-N. A metric learning reality check. In European Conference on Computer Vision, 681–699 (Springer, 2020).

Pham, H. V. et al. Problems and opportunities in training deep learning software systems: an analysis of variance. In IEEE/ACM International Conference on Automated Software Engineering, 771–783 (2020).

Bouthillier, X. et al. Accounting for variance in machine learning benchmarks. In Machine Learning and Systems (2021).

Varoquaux, G. Cross-validation failure: small sample sizes lead to large error bars. NeuroImage 180 , 68–77 (2018).

Szucs, D. & Ioannidis, J. P. Sample size evolution in neuroimaging research: an evaluation of highly-cited studies (1990–2012) and of latest practices (2017–2018) in high-impact journals. NeuroImage117164 (2020).

Roelofs, R. et al. A meta-analysis of overfitting in machine learning. In Neural Information Processing Systems (NeurIPS), 9179–9189 (2019).

Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 , 1–30 (2006).

Thompson, W. H., Wright, J., Bissett, P. G. & Poldrack, R. A. Meta-research: dataset decay and the problem of sequential analyses on open datasets. eLife 9 , e53498 (2020).

Maier-Hein, L. et al. Is the winner really the best? a critical analysis of common research practice in biomedical image analysis competitions. Nature Communications (2018).

Cockburn, A., Dragicevic, P., Besançon, L. & Gutwin, C. Threats of a replication crisis in empirical computer science. Commun. ACM 63 , 70–79 (2020).

Gigerenzer, G. Statistical rituals: the replication delusion and how we got there. Adv. Methods Pract. Psychol. Sci. 1 , 198–218 (2018).

Benavoli, A., Corani, G. & Mangili, F. Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17 , 152–161 (2016).

Berrar, D. Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers. Mach. Learn. 106 , 911–949 (2017).

Bouthillier, X., Laurent, C. & Vincent, P. Unreproducible research is reproducible. In International Conference on Machine Learning (ICML), 725–734 (2019).

Norgeot, B. et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat. Med. 26 , 1320–1324 (2020).

Drummond, C. Machine learning as an experimental science (revisited). In AAAI workshop on evaluation methods for machine learning, 1–5 (2006).

Steyerberg, E. W. & Harrell, F. E. Prediction models need appropriate internal, internal–external, and external validation. J. Clin. Epidemiol. 69 , 245–247 (2016).

Woo, C.-W., Chang, L. J., Lindquist, M. A. & Wager, T. D. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20 , 365 (2017).

Van Calster, B. et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur. Urol. 74 , 796 (2018).

Thomas, R. & Uminsky, D. The problem with metrics is a fundamental problem for AI. arXiv preprint arXiv:2002.08512 (2020).

for the Evaluation of Medicinal Products, E. A. Points to consider on switching between superiority and non-inferiority. Br. J. Clin. Pharmacol. 52 , 223–228 (2001).

D’Agostino Sr, R. B., Massaro, J. M. & Sullivan, L. M. Non-inferiority trials: design concepts and issues–the encounters of academic consultants in statistics. Stat. Med. 22 , 169–186 (2003).

Christensen, E. Methodology of superiority vs. equivalence trials and non-inferiority trials. J. Hepatol. 46 , 947–954 (2007).

Hendriksen, J. M., Geersing, G.-J., Moons, K. G. & de Groot, J. A. Diagnostic and prognostic prediction models. J. Thrombosis Haemost. 11 , 129–141 (2013).

Campbell, M. K., Elbourne, D. R. & Altman, D. G. Consort statement: extension to cluster randomised trials. BMJ 328 , 702–708 (2004).

Blasini, M., Peiris, N., Wright, T. & Colloca, L. The role of patient–practitioner relationships in placebo and nocebo phenomena. Int. Rev. Neurobiol. 139 , 211–231 (2018).

Lipton, Z. C. & Steinhardt, J. Troubling trends in machine learning scholarship: some ML papers suffer from flaws that could mislead the public and stymie future research. Queue 17 , 45–77 (2019).

Tatman, R., VanderPlas, J. & Dane, S. A practical taxonomy of reproducibility for machine learning research. In ICML workshop on Reproducibility in Machine Learning (2018).

Gundersen, O. E. & Kjensmo, S. State of the art: Reproducibility in artificial intelligence. In AAAI Conference on Artificial Intelligence (2018).

Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D. & Amorim Fernández-Delgado, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15 , 3133–3181 (2014).

Sculley, D. et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NeurIPS), 2503–2511 (2015).

Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2 , e124 (2005).

Teney, D. et al. On the value of out-of-distribution testing: an example of Goodhart’s Law. In Neural Information Processing Systems (NeurIPS) (2020).

Kerr, N. L. HARKing: hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2 , 196–217 (1998).

Article   CAS   Google Scholar  

Gencoglu, O. et al. HARK side of deep learning–from grad student descent to automated machine learning. arXiv preprint arXiv:1904.07633 (2019).

Rosenthal, R. The file drawer problem and tolerance for null results. Psychological Bull. 86 , 638 (1979).

Kellmeyer, P. Ethical and legal implications of the methodological crisis in neuroimaging. Camb. Q. Healthc. Ethics 26 , 530–554 (2017).

Japkowicz, N. & Shah, M. Performance evaluation in machine learning. In Machine Learning in Radiation Oncology , 41–56 (Springer, 2015).

Santafe, G., Inza, I. & Lozano, J. A. Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44 , 467–508 (2015).

Han, K., Song, K. & Choi, B. W. How to develop, validate, and compare clinical prediction models involving radiological parameters: study design and statistical methods. Korean J. Radiol. 17 , 339–350 (2016).

Richter, A. N. & Khoshgoftaar, T. M. Sample size determination for biomedical big data with limited labels. Netw. Modeling Anal. Health Inform. Bioinforma. 9 , 12 (2020).

Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. J. Br. Surg. 102 , 148–158 (2015).

Wolff, R. F. et al. Probast: a tool to assess the risk of bias and applicability of prediction model studies. Ann. Intern. Med. 170 , 51–58 (2019).

Henderson, P. et al. Towards the systematic reporting of the energy and carbon footprints of machine learning. J. Mach. Learn. Res. 21 , 1–43 (2020).

Bowen, A. & Casadevall, A. Increasing disparities between resource inputs and outcomes, as measured by certain health deliverables, in biomedical research. Proc. Natl Acad. Sci. 112 , 11335–11340 (2015).

Chambers, C. D., Dienes, Z., McIntosh, R. D., Rotshtein, P. & Willmes, K. Registered reports: realigning incentives in scientific publishing. Cortex 66 , A1–A2 (2015).

Forde, J. Z. & Paganini, M. The scientific method in the science of machine learning. In ICLR workshop on Debugging Machine Learning Models (2019).

Firestein, S.Failure: Why science is so successful (Oxford University Press, 2015).

Borji, A. Negative results in computer vision: a perspective. Image Vis. Comput. 69 , 1–8 (2018).

Voets, M., Møllersen, K. & Bongo, L. A. Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. arXiv preprint arXiv:1803.04337 (2018).

Wilkinson, J. et al. Time to reality check the promises of machine learning-powered precision medicine. The Lancet Digital Health (2020).

Whitaker, K. & Guest, O. #bropenscience is broken science. Psychologist 33 , 34–37 (2020).

Kakarmath, S. et al. Best practices for authors of healthcare-related artificial intelligence manuscripts. NPJ Digital Med. 3 , 134–134 (2020).

Download references


We would like to thank Alexandra Elbakyan for help with the literature review. We thank Pierre Dragicevic for providing feedback on early versions of this manuscript, and Pierre Bartet for comments on the preprint. We also thank the reviewers, Jack Wilkinson and Odd Erik Gundersen, for excellent comments which improved our manuscript. GV acknowledges funding from grant ANR-17-CE23-0018, DirtyData.

Author information

Authors and affiliations.

INRIA, Versailles, France

Gaël Varoquaux

McGill University, Montreal, Canada

Mila, Montreal, Canada

IT University of Copenhagen, Copenhagen, Denmark

Veronika Cheplygina

You can also search for this author in PubMed   Google Scholar


Both V.C. and G.V. collected the data; conceived, designed, and performed the analysis; reviewed the literature; and wrote the paper.

Corresponding authors

Correspondence to Gaël Varoquaux or Veronika Cheplygina .

Ethics declarations

Competing interests.

The authors declare that there are no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Latex source files, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Varoquaux, G., Cheplygina, V. Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digit. Med. 5 , 48 (2022). https://doi.org/10.1038/s41746-022-00592-y

Download citation

Received : 21 June 2021

Accepted : 09 March 2022

Published : 12 April 2022

DOI : https://doi.org/10.1038/s41746-022-00592-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Deep representation learning of tissue metabolome and computed tomography annotates nsclc classification and prognosis.

  • Marc Boubnovski Martell
  • Kristofer Linton-Reid
  • Eric O. Aboagye

npj Precision Oncology (2024)

Electronic health records and stratified psychiatry: bridge to precision treatment?

  • Adrienne Grzenda
  • Alik S. Widge

Neuropsychopharmacology (2024)

Diagnostic performance of artificial intelligence-assisted PET imaging for Parkinson’s disease: a systematic review and meta-analysis

npj Digital Medicine (2024)

Deep learning-aided decision support for diagnosis of skin disease across skin tones

  • Matthew Groh
  • Rosalind Picard

Nature Medicine (2024)

Retinale optische Kohärenztomographie Biomarker und ihr Zusammenhang mit kognitiven Funktionen

  • Franziska G. Rauscher
  • Rui Bernardes

Die Ophthalmologie (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

biomedical image processing research papers

  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, trends and hotspots in research on medical images with deep learning: a bibliometric analysis from 2013 to 2023.

biomedical image processing research papers

  • 1 First School of Clinical Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
  • 2 College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
  • 3 The School of Health, Fujian Medical University, Fuzhou, China

Background: With the rapid development of the internet, the improvement of computer capabilities, and the continuous advancement of algorithms, deep learning has developed rapidly in recent years and has been widely applied in many fields. Previous studies have shown that deep learning has an excellent performance in image processing, and deep learning-based medical image processing may help solve the difficulties faced by traditional medical image processing. This technology has attracted the attention of many scholars in the fields of computer science and medicine. This study mainly summarizes the knowledge structure of deep learning-based medical image processing research through bibliometric analysis and explores the research hotspots and possible development trends in this field.

Methods: Retrieve the Web of Science Core Collection database using the search terms “deep learning,” “medical image processing,” and their synonyms. Use CiteSpace for visual analysis of authors, institutions, countries, keywords, co-cited references, co-cited authors, and co-cited journals.

Results: The analysis was conducted on 562 highly cited papers retrieved from the database. The trend chart of the annual publication volume shows an upward trend. Pheng-Ann Heng, Hao Chen, and Klaus Hermann Maier-Hein are among the active authors in this field. Chinese Academy of Sciences has the highest number of publications, while the institution with the highest centrality is Stanford University. The United States has the highest number of publications, followed by China. The most frequent keyword is “Deep Learning,” and the highest centrality keyword is “Algorithm.” The most cited author is Kaiming He, and the author with the highest centrality is Yoshua Bengio.

Conclusion: The application of deep learning in medical image processing is becoming increasingly common, and there are many active authors, institutions, and countries in this field. Current research in medical image processing mainly focuses on deep learning, convolutional neural networks, classification, diagnosis, segmentation, image, algorithm, and artificial intelligence. The research focus and trends are gradually shifting toward more complex and systematic directions, and deep learning technology will continue to play an important role.

1. Introduction

The origin of radiology can be seen as the beginning of medical image processing. The discovery of X-rays by Röntgen and its successful application in clinical practice ended the era of disease diagnosis relying solely on the clinical experience of doctors ( Glasser, 1995 ). The production of medical images provides doctors with more data, enabling them to diagnose and treat diseases more accurately. With the continuous improvement of computer performance and image processing technology represented by central processing units (CPUs; Dessy, 1976 ), medical image processing has become more efficient and accurate in medical research and clinical applications. Initially, medical image processing was mainly used in medical imaging diagnosis, such as analyzing and diagnosing X-rays, CT, MRI, and other images. Nowadays, medical image processing has become an important research tool in fields such as radiology, pathology, and biomedical engineering, providing strong support for medical research and clinical diagnosis ( Hosny et al., 2018 ; Hu et al., 2022 ; Lin et al., 2022 ).

Deep learning originated from artificial neural networks, which can be traced back to the 1940 and 1950s when scientists proposed the perceptron model and neuron model to simulate the working principles of human nervous system ( Rosenblatt, 1958 ; McCulloch and Pitts, 1990 ). However, limited by the weak performance of computers at that time, these models were quickly abandoned. In 2006, Canadian computer scientist Geoffrey Hinton and his team proposed a model called “deep belief network,” which adopted a deep structure and solved the shortcomings of traditional neural networks. This is considered as the starting point of deep learning ( Hinton et al., 2006 ).

In recent years, with the rapid development of the Internet, massive data are constantly generated and accumulated, which are very favorable for deep learning networks that require a large amount of data for training ( Misra et al., 2022 ). Additionally, the development of computer devices such as graphics processing units (GPUs) and tensor processing units(TPUs) has made the training of deep learning models faster and more efficient ( Alzubaidi et al., 2021 ; Elnaggar et al., 2022 ). Furthermore, the continuous improvement and optimization of deep learning algorithms have also led to the continuous improvement of the performance of deep learning models ( Minaee et al., 2022 ). Therefore, the application of deep learning is becoming more and more widespread in various fields, including medical image processing.

Deep learning has many advantages in processing medical images. Firstly, it does not require human intervention and can automatically learn and extract features, achieving automation in processing ( Yin et al., 2021 ). Secondly, it can process a large amount of data simultaneously, with processing efficiency far exceeding traditional manual methods ( Narin et al., 2021 ). Thirdly, its accuracy is also high, able to learn more complex features and discover subtle changes and patterns that are difficult for humans to perceive ( Han et al., 2022 ). Lastly, it is less affected by subjective human factors, leading to relatively more objective results ( Kerr et al., 2022 ).

Bibliometrics is a quantitative method for evaluating the research achievements of researchers, institutions, countries, or subject areas, and can be traced back to the 1960s ( Schoenbach and Garfield, 1956 ). In bibliometric analysis, the citation half-life of an article has two characteristics: first, classical articles are continuously cited; second, some articles are frequently cited within a certain period and quickly reach a peak. The length of time that classical articles are continuously cited is closely related to the speed of development of basic research, while the frequent citation of certain articles within a specific period represents the dynamic changes in the corresponding field. Generally speaking, articles that reflect dynamic changes in the field are more common than classical articles. In Web of Science, papers that are cited in one or more fields and rank in the top 1% of citation counts for their publication year are included as highly cited papers. Visual analysis of highly cited papers is more effective in identifying popular research areas and trends compared to visual analysis of all search results. CiteSpace is a visualization software that employs bibliometric methods, developed by Professor Chaomei Chen at Drexel University ( Chen, 2006 ).

Therefore, to gain a deeper understanding of the research hotspots and possible development trends of deep learning-based medical image processing, this study aims to analyze highly cited papers published between 2013 and 2023 using bibliometric methods, intends to identify the authors, institutions, and countries with the most research achievements, and provide an overall review of the knowledge structure among the highly cited papers. Expected to be helpful for researchers in this field.

2.1. Search strategy and data source

A search was conducted in the Web of Science Core Collection database using the search terms “deep learning” and “medical imaging,” along with their synonyms and related terms. The complete search string is as follows: (TS = Deep Learning OR “Deep Neural Networks” OR “Deep Machine Learning” OR “Deep Artificial Neural Networks” OR “Deep Models” OR “Hierarchical Learning” OR “Deep architectures” OR “Multi-layer Neural Networks” OR “Large-scale Neural Networks” OR “Deep Belief Networks”) AND (TS = “Medical imaging” OR “Radiology imaging” OR “Diagnostic imaging” OR “Clinical imaging” OR “Biomedical imaging” OR “Radiographic imaging” OR “Tomographic imaging” OR “Imaging modalities” OR “Medical visualization” OR “Medical image analysis”). The search was refined to include only articles published between 2013 and 2023, with a focus on highly cited papers. The search yielded a total of 562 results. The article type was restricted to papers, and the language was limited to English.

2.2. Scientometric analysis methods

Due to the Web of Science export limitation, the record options were set to export records 1–500 and 501–562 separately, and the record content including full records and cited references. This plain text file served as the source file for the analysis. Next, a new project was established in CiteSpace 6.1.R6, with the project location and data storage location set up. The input and output function of CiteSpace were used to convert the plain text file into a format that could be analyzed in CiteSpace. The remaining parameters were set as follows: the time slicing was set from 2013 to 2023, with a yearly time interval; the node types selected included authors, institutions, countries keyword, co-cited references, co-cited authors, and co-cited journals; the threshold for “Top N,” “Top N%,” and “g-index” were set to default; the network pruning was set to pathfinder and pruning the merged network; the visualization was set to static cluster view and show merged network to display the overall network.

In the map generated by CiteSpace, there are multiple elements. The various nodes available for analysis are represented as circles on the map, with their size generally indicating the quantity—the larger the circle, the greater the quantity. The circles are composed of annual rings, with the color of each ring representing the year, and the thickness of the ring determined by the number of corresponding nodes in that year. The more nodes in a year, the thicker the ring. The meaning of the “Centrality” option in CiteSpace menu is “Betweenness Centrality” ( Chen, 2005 ). CiteSpace utilizes this metric to discover and measure the importance of nodes, and highlights nodes with purple circles when the centrality greater than or equal to 0.1. It means that only nodes with centrality greater than or equal to 0.1 are worth emphasizing their importance. The calculation method is based on the formulation introduced by Freeman (1977) , and the formula is as follows:

In this formula, g s t represents the number of shortest paths from node s to node t , and n s t i represents the number of those shortest paths from node s to node t that pass through node i . From the information transmission perspective, the higher the Betweenness Centrality, the greater the importance of the node. Removing these nodes will have a larger impact on network transmission.

3.1. Analysis of annual publication volume

The trend of annual publication volume shows that from 2013 to 2023, the number of related studies fluctuated slightly each year but showed an overall upward trend. Overall, it can be divided into three stages: before 2016, the number of papers was relatively small; after 2016, the number of papers increased year by year, and the rate of increase accelerated. From 2016 to 2019, there was an increase of about 20 papers per year on the basis of the previous year. After 2019, the growth rate slowed down, but there was still a high level of publications each year ( Figure 1 ).


Figure 1 . Annual quantitative distribution of publications.

3.2. Analysis of authors

Among the 562 articles included, there are a total of 364 authors ( Figure 2 ). Pheng-Ann Heng and Hao Chen ranks first with seven publications, Klaus Hermann Maier-Hein ranks second with six publications, while Fabian Isensee, Jing Qin, Qi Dou, and Dinggang Shen are tied for third place with five publications each. From Figure 2 , it can be seen that there are many small groups of authors, but no very large research groups, and there are still many authors who do not have any collaborative relationships with each other.


Figure 2 . The collaborative relationship map of researchers in the field of medical image processing with deep learning from 2013 to 2023.The size of nodes represents the number of papers published by the author. The links between nodes reflect the strength of collaboration.

3.3. Analysis of institutions

In the 562 papers included, there are a total of 311 institutions ( Figure 3 ; Table 1 ). The institution with the highest publication output is Chinese Academy of Sciences, and the institution with the highest centrality is Stanford University. The map shows that there are close collaborative relationships between institutions, but these relationships are based on one or more institutions with high publication output and centrality. There is less collaboration between institutions with low publication output and no centrality. As shown in Table 1 , there is no necessary relationship between publication output and centrality, and the institution with the highest publication output does not necessarily have the highest centrality.


Figure 3 . The collaborative relationship map of institutions in the field of medical image processing with deep learning from 2013 to 2023. The size of nodes represents the number of papers published by the institution. The links between nodes reflect the strength of collaboration.


Table 1 . Top 10 institutions by publication volume and centrality.

3.4. Analysis of countries

In the 562 included papers, there are a total of 62 countries represented ( Figure 4 ; Table 2 ). The United States has the highest publication output, while Germany has the highest centrality. The map shows that all countries have at least some collaboration with other countries. In general, there are three situations: some countries have a high publication output and centrality; some have a low publication output but high centrality, and some have a high publication output but low centrality.


Figure 4 . The collaborative relationship map of countries in the field of medical image processing with deep learning from 2013 to 2023. The size of nodes represents the number of papers published by the country. The links between nodes reflect the strength of collaboration.


Table 2 . Top 10 countries by publication volume and centrality.

3.5. Analysis of keywords

Among the 562 papers included, there were a total of 425 keywords ( Figure 5 ; Table 3 ). The most frequently occurring keyword is “Deep Learning,” and the one with the highest centrality is “algorithm.” Clustering analysis of the keywords resulted in 20 clusters: management, laser radar, biomarker, mild cognitive impairment, COVID-19, image restoration, breast cancer, feature learning, major depressive disorder, pulmonary embolism detection, precursor, bioinformatics, computer vision, annotation, change detection, information, synthetic CT, auto-encoder, brain networks, and ultrasound.


Figure 5 . The clustering map of keywords in the field of medical image processing with deep learning from 2013 to 2023. The smaller the cluster number, the larger its size, and the more keywords it contains.


Table 3 . Top 10 keywords by quantity and centrality.

The evolution of burst keywords in recent years can be summarized as follow ( Figure 6 ): It all began in 2015 with a focus on “image.” By 2016, “feature, accuracy, algorithm, and machine learning” took center stage. The year 2017 brought prominence to “diabetic retinopathy, classification and computer-aided detection.” Moving into 2020, attention shifted to “COVID-19, pneumonia, lung, coronavirus, transfer learning and X-ray.” In 2021, the conversation revolved around “feature extraction, framework and image segmentation”.


Figure 6 . Top 17 keywords with the strongest citation bursts in publications of medical image processing with deep learning from 2013 to 2023. The blue line represents the overall timeline, while the red line represents the appearance year, duration, and end year of the burst keywords.

3.6. Analysis of references

In the 562 articles included, there are a total of 584 references ( Figure 7 ; Table 4 ). The most cited reference is “ImageNet Classification with Deep Convolutional Neural Networks” by Alex Krizhevsky. Alex Krizhevsky and his team developed a powerful convolutional neural network (CNN) to classify a vast dataset of high-resolution images into 1,000 categories, achieving significantly improved accuracy rates of 37.5 and 17.0% for top-1 and top-5 errors compared to previous methods ( Krizhevsky et al., 2017 ).


Figure 7 . The co-cited reference map in the field of medical image processing with deep learning from 2013 to 2023. The size of nodes reflects the number of citations, while the links between nodes reflect the strength of co-citations.


Table 4 . Top 10 references in quantity ranking.

There are a total of three articles with centrality greater than or equal to 0.1. The authors of these three articles are Dan Claudiu Ciresan, Liang-Chieh Chen, and Marios Anthimopoulos. Dan Claudiu Ciresan use deep max-pooling convolutional neural networks to detect mitosis in breast histology images and won the ICPR 2012 mitosis detection competition ( Ciresan et al., 2013 ). Liang-Chieh Chen address the task of semantic image segmentation with deep learning and make three main contributions. Firstly, convolution with upsampled filters, known as “atrous convolution.” Secondly, they introduce the method of atrous spatial pyramid pooling (ASPP). Lastly, they improve the accuracy of object boundary localization by integrating techniques from deep convolutional neural networks and probabilistic graphical models ( Chen et al., 2018 ). Marios Anthimopoulos propose and evaluate a convolutional neural network (CNN), designed for the classification of interstitial lung diseases (ILDs) patterns ( Anthimopoulos et al., 2016 ).

The eighth and ninth ranked articles have the same title, originating from the Nature journal. The commonality lies in their source, but they differ in authors. The eighth-ranked article is by Nicole Rusk, published in the Comments & Opinion section of Nature Methods. It provides a concise introduction to deep learning ( Rusk, 2016 ). On the other hand, the ninth-ranked article is authored by Yann LeCun and is a comprehensive review. In comparison to Nicole Rusk’s article, LeCun’s extensively elaborates on the fundamental principles of deep learning and its applications in various domains such as speech recognition, visual object recognition, object detection, as well as fields like drug discovery and genomics ( LeCun et al., 2015 ).

3.7. Analysis of co-cited authors

In the 562 included articles, there are a total of 634 cited authors ( Figure 8 ). The most cited author is Kaiming He, whose papers have been cited 141 times; the author with the highest centrality is Yoshua Bengio, whose papers have been cited 45 times.


Figure 8 . The map of co-cited author in the field of medical image processing with deep learning from 2013 to 2023. The size of nodes reflects the number of citations, while the links between nodes reflect the strength of co-citations.

The most cited paper authored by Kaiming He in Web of Science is “Deep Residual Learning for Image Recognition.” This paper introduces a residual learning framework to simplify the training of networks that are much deeper than those used previously. These residual networks are not only easier to optimize but also achieve higher accuracy with considerably increased depth ( He et al., 2016 ). On the other hand, the most cited paper authored by Yoshua Bengio in Web of Science is “Representation Learning: A Review and New Perspectives.” This paper reviews recent advances in unsupervised feature learning and deep learning, covering progress in probabilistic models, autoencoders, manifold learning, and deep networks ( Bengio et al., 2013 ).

3.8. Analysis of co-cited journals

In the 562 articles included, a total of 345 journals were cited ( Figure 9 ; Table 5 ). The journal with the most citations is the IEEE Conference on Computer Vision and Pattern Recognition, with 339 articles citing papers from this journal; the journal with the highest centrality is Advances in Neural Information Processing Systems, with 128 articles citing papers from this journal.


Figure 9 . The collaborative relationship map of co-cited journal in the field of medical image processing with deep learning from 2013 to 2023.The size of nodes reflects the number of citations, while the links between nodes reflect the strength of co-citations.


Table 5 . Top 10 journals in citation frequency and centrality ranking.

It can be seen that the literature in three major disciplines, mathematics, systems, and mathematical, cite systems, computing, computers; molecular, biology, genetics; health, nursing, and medicine. The literature in molecular, biology, and immunology cite molecular, biology, genetics, and literature in health, nursing, and medicine. The literature in medicine, medical, and clinical cite molecular, biology, genetics, and literature in health, nursing, medicine ( Figure 10 ).


Figure 10 . Dual-map overlap of journals. The map consists of two graphs, with the citing graph on the left and the cited graph on the right. The curves represent citation links, displaying the full citation chain. The longer the vertical axis of the ellipse, the more articles are published in the journal. The longer the horizontal axis of the ellipse, the more authors have contributed to the journal.

4. Discussion

From 2013 to 2023, the analysis of publication volume reveals an obvious stage characteristic, before and after 2016, and thus, 2016 is a key year for the field of deep learning-based medical image processing. Although deep learning technology began to be applied as early as 2012, it did not receive widespread attention in the field of medical image processing because traditional machine learning methods, such as support vector machines (SVM) and random forests ( Lehmann et al., 2007 ), were mainly used before then. At the same time, deep learning models require powerful computing power and a large amount of data for training ( Ren et al., 2022 ). Before 2016, high-performance computers were very expensive, which was not conducive to large-scale research in this field. Moreover, large-scale medical image datasets were relatively scarce, so research in this field was constrained by computing capability and dataset limitations. In 2016, however, deep learning technology achieved breakthroughs in computer vision, including image classification, object detection, and segmentation, providing more advanced and efficient solutions for medical image processing ( Girshick et al., 2016 ; Madabhushi and Lee, 2016 ). These breakthroughs accelerated the progress of research in this field, leading to an increase in publication volume year by year.

From the analysis of authors, it can be seen that the research on deep learning in the field of medical image processing is relatively scattered, and large-scale cooperative teams have not been formed. This may be because research on deep learning requires a large amount of computing resources and data, and therefore requires a strong background in mathematics and computer science. At the same time, the application of deep learning in the medical field is an interdisciplinary cross, which also requires the participation of talents with medical backgrounds. However, individuals with both backgrounds are relatively few, making it difficult to form large-scale research teams. In addition, researchers in this field may be more focused on personal research achievements rather than collaborating with others. This situation may not necessarily mean that researchers lack a spirit of cooperation, but rather reflects the research characteristics and preferences of this field’s researchers.

The institutional analysis mainly reflects two characteristics: first, the broad cooperation between institutions is mainly based on high publication volume and high centrality institutions; publication volume and centrality are not necessarily correlated. This indicates that in the field of medical image processing, institutions with high publication volume and centrality often have strong collaborative abilities and influence, which can attract other institutions to cooperate with them. However, institutions with low publication volume and no centrality may collaborate less due to a lack of resources or opportunities. Second, publication volume does not entirely determine centrality. Sometimes smaller institutions may receive high attention and recognition due to their unique research contributions or research directions ( Wuchty et al., 2007 ; Lariviere and Gingras, 2010 ). Therefore, institutional centrality is not only related to publication volume but also to the depth and breadth of research, and the degree of innovation in research results. Overall, these institutions are internationally renowned research institutions with broad disciplinary areas and research capabilities, and they have high centrality in the field of medical image processing, making them important research institutions in this field. The collaboration and communication between these institutions are also very frequent, jointly promoting the development of medical image processing. These institutions are distributed globally, including countries and regions such as China, the United States, Germany, and the United Kingdom, showing an international character. Among them, the United States has the largest number of institutions, occupying two of the top three positions, indicating that the United States has strong strength and influence in the field of medical image processing. In addition, these institutions include universities, hospitals, and research institutes, demonstrating the interdisciplinary nature of the field of medical image processing. These institutions also often collaborate and communicate with each other, jointly promoting the research progress in this field.

In country analysis, there are mainly three situations: some countries not only have a large number of publications, but also have high centrality; some countries have a small number of publications, but high centrality; and some countries have a large number of publications, but low centrality. This indicates that deep learning in medical image processing is a global research hotspot, and various countries have published high-quality papers in this field and have close collaborative relationships. Some countries have a large number of publications in this field because they have strong research capabilities and play a leading role in this field. The high centrality of these countries also indicates that they play an important role in collaborative relationships. Some countries have a relatively low number of publications, but their centrality is still high. This may be because they have unique contributions in specific research directions or technologies in this field ( Lee et al., 2018 ), or because they have close relationships with other countries in this field. There are also some countries with a large number of publications, but low centrality. This may be because their research and published paper quality is relatively low in this field, or because they have relatively few collaborative relationships with other countries.

According to keyword analysis, these keywords indicate that in highly cited papers in the field of medical image processing, core concepts include deep learning and machine learning, such as “deep learning” and “machine learning.” In terms of applications, the keywords emphasize COVID-19 diagnosis, image segmentation, and classification, while highlighting the significance of neural networks and convolutional neural networks. Additionally, the centrality-ranked keywords underscore the relevance of algorithms associated with deep learning and reiterate key themes in medical image processing, such as “cancer” and “MRI.” Overall, these keywords reflect the diverse applications of deep learning in medical image processing and the importance of algorithms.

From the clusters of keywords, these clusters can be grouped into four main domains, reflecting diverse applications of deep learning in medical image processing. The first group focuses on medical image processing and diseases, encompassing biomarkers, the detection, and diagnosis of specific diseases such as breast cancer and COVID-19 ( Chougrad et al., 2018 ; Altan and Karasu, 2020 ). The second group concentrates on image processing and computer vision, including image restoration, annotation, and change detection ( Zhang et al., 2016 ; Kumar et al., 2017 ; Tatsugami et al., 2019 ) to enhance the quality and analysis of medical images. The third group emphasizes data analysis and information processing, encompassing feature learning, bioinformatics, and information extraction ( Min et al., 2017 ; Chen et al., 2021 ; Hang et al., 2022 ), aiding in the extraction of valuable information from medical images. Lastly, the fourth group centers on neuroscience and medical imaging, studying brain networks and ultrasound images ( Kawahara et al., 2017 ; Ragab et al., 2022 ), highlighting the importance of deep learning in understanding and analyzing biomedical images for studying the nervous system and organs.

From the analysis of burst keywords, the evolution of these keywords reflects the changing trends and focal points in the field of deep learning in medical image processing. In 2015, the keyword “image” dominated, signifying an initial emphasis on basic image processing and analysis to acquire fundamental image information. By 2016, terms like “feature,” “accuracy,” “algorithm,” and “machine learning” ( Shin et al., 2016 ; Zhang et al., 2016 ; Jin et al., 2017 ; Lee et al., 2017 ; Zhang et al., 2018 ) were introduced, indicating a growing interest in feature extraction, algorithm optimization, accuracy, and machine learning methods, highlighting the shift toward higher-level analysis and precision in medical image processing. In 2017, terms like “diabetic retinopathy,” “classification,” and “computer-aided detection” ( Zhang et al., 2016 ; Lee et al., 2017 ; Quellec et al., 2017 ; Setio et al., 2017 ) were added, underlining an increased interest in disease-specific diagnoses (e.g., diabetic retinopathy) and computer-assisted detection of medical images. The year 2020 saw the emergence of “COVID-19,” “pneumonia,” “lung,” “coronavirus,” “transfer learning,” and “x-ray” ( Minaee et al., 2020 ) due to the urgent demand for analyzing lung diseases and infectious disease detection, prompted by the COVID-19 pandemic. Additionally, “transfer learning” reflected the trend of utilizing pre-existing deep learning models for medical image data. In 2021, keywords such as “feature extraction,” “framework,” and “image segmentation” ( Dhiman et al., 2021 ; Sinha and Dolz, 2021 ; Chen et al., 2022 ) became prominent, indicating a deeper exploration of feature extraction, analysis frameworks, and image segmentation to enhance the accuracy and efficiency of medical image processing. Overall, these changes illustrate the ongoing development in the field of medical image processing, evolving from basic image processing toward more precise feature extraction, disease diagnosis, lesion segmentation, and addressing the needs arising from disease outbreaks. This underscores the widespread application and continual evolution of deep learning in the medical domain.

Based on the analysis of reference citations, it is evident that these 10 highly cited papers cover significant research in the field of deep learning applied to medical image processing. They share a common emphasis on the outstanding performance of deep Convolutional Neural Networks (CNNs) in tasks such as image classification, skin cancer classification, and medical image segmentation. They explore the effectiveness of applying deep residual learning in large-scale image recognition and medical image analysis ( He et al., 2016 ). The introduction of the U-Net, a convolutional network architecture suitable for biomedical image segmentation, is another key aspect ( Ronneberger et al., 2015 ). Additionally, they develop deep learning algorithms for detecting diabetic retinopathy in retinal fundus photographs ( Gulshan et al., 2016 ). They also provide a review of deep learning in medical image analysis, summarizing the trends in related research ( LeCun et al., 2015 ; Rusk, 2016 ). However, these papers also exhibit some differences. Some focus on specific tasks like skin cancer classification and diabetic retinopathy detection, some concentrate on proposing new network structures (such as ResNet, U-Net, etc.) to enhance the performance of medical image processing, while others provide overviews and summaries of the overall application of deep learning in medical image processing. Overall, these papers collectively drive the advancement of deep learning in the field of medical image processing, achieving significant research outcomes through the introduction of new network architectures, effective algorithms, and their application to specific medical image tasks.

From the analysis of cited journal, it can be observed that these journals collectively highlight the important features of research in medical image processing. Firstly, they emphasize areas such as computer vision, image processing, and pattern recognition, which are closely related to medical image processing. Moreover, journals and conferences led by IEEE, such as IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Medical Imaging, and IEEE Winter Conference on Applications of Computer Vision, hold significant influence in the fields of computer vision and pattern recognition, reflecting IEEE’s leadership in the domain of medical image processing. These journals span across multiple fields including computer science, medicine, and natural sciences, underscoring the interdisciplinary nature of medical image processing research. Open-access publishing platforms like Arxiv and Scientific Reports underscore the importance of open access and information sharing in the field of medical image processing. Additionally, specialized journals like “Medical Image Analysis” and “Radiology” play pivotal roles in research on medical image processing. The comprehensive journal “Nature” covers a wide range of scientific disciplines, potentially including research related to medical image processing. In summary, these journals collectively form a comprehensive research network covering various academic disciplines in the field of medical image processing, emphasizing the significance of open access and information sharing. They also highlight the crucial role of deep learning and neural network technologies in medical image processing, as well as the importance of image processing, analysis, and diagnosis.

From the analysis of dual-map overlap of journals, it can be observed that a particularly noteworthy citation relationship is the reference of computer science, biology, and medicine to mathematics. Computer science research has a strong connection to mathematics, as mathematical methods and algorithms are the foundation of computer science, while the development of computers and information technology provides a broader range of applications for mathematical research ( Domingos, 2012 ). Molecular biology and genetics are important branches of biological research, where mathematical methods are widely applied, such as for analyzing gene sequences and molecular structures, and studying interactions between molecules ( Jerber et al., 2021 ). Medicine is a field related to human health, where mathematical methods also have many applications, such as for statistical analysis of clinical trial results, predicting disease risk, and optimizing the allocation of medical resources ( Gong and Tang, 2020 ; Wang et al., 2021 ).

From our perspective, the future development of deep learning in the field of medical image processing can be summarized as follows. First, with the widespread application of deep learning models in medical image processing, the design and development of more efficient and lightweight network architectures will become necessary. This can improve the speed and portability of the model, making it possible for these models to run effectively in resource-limited environments such as mobile devices ( Ghimire et al., 2022 ). Second, traditional deep learning methods usually require a large amount of labeled data for training, while in the field of medical image processing, labeled data is often difficult to obtain. Therefore, weakly supervised learning will become an important research direction to improve the model’s performance using a small amount of labeled data and a large amount of unlabeled data. This includes the application of techniques such as semi-supervised learning, transfer learning, and generative adversarial networks ( Ren et al., 2023 ). Third, medical image processing involves different types of data such as CT scans, MRI, X-rays, and biomarkers. Therefore, multimodal fusion will become an important research direction to organically combine information from different modalities and provide more comprehensive and accurate medical image analysis results. Deep learning methods can be used to learn the correlations between multimodal data and perform feature extraction and fusion across modalities ( Saleh et al., 2023 ). Finally, deep learning models are typically black boxes, and their decision-making process is difficult to explain and understand. In medical image processing, the interpretability and reliability of the decision-making process are crucial. Therefore, researchers will focus on developing interpretable deep learning methods to enhance physicians’ and clinical experts’ trust in the model’s results and provide explanations for the decision-making process ( Chaddad et al., 2023 ).

In conclusion, deep learning is becoming increasingly important in the field of medical image processing, with many active authors, institutions, and countries in this field. In the high-cited papers of this field in the core collection of Web of Science, Pheng-Ann Heng, Hao Chen, and Dinggang Shen have published a relatively large number of papers. China has the most research institutions in this field, including the Chinese Academy of Sciences, the University of Chinese Academy of Sciences, The Chinese University of Hong Kong, Zhejiang University, and Shanghai Jiao Tong University. The United States ranks second in terms of the number of institutions, including Stanford University, Harvard Medical School, and Massachusetts General Hospital. Germany and the United Kingdom have relatively few institutions in this field. The number of publications in the United States far exceeds that of other countries, with China in second place. The number of papers from the United Kingdom, Germany, Canada, Australia, and India is relatively high, while the number of papers from the Netherlands and France is relatively low. South Korea’s development and publication output in medical image processing are relatively low. Currently, research in this field is mainly focused on deep learning, convolutional neural networks, classification, diagnosis, segmentation, algorithms, artificial intelligence, and other aspects, and the research focus and trends are gradually moving toward more complex and systematic directions. Deep learning technology will continue to play an important role in this field.

This study has certain limitations. Firstly, we only selected highly cited papers from the Web of Science Core Collection as our analysis material, which means that we may have missed some highly cited papers from other databases and our analysis may not be comprehensive for the entire Web of Science. However, given the limitations of bibliometric software, it is difficult to merge and analyze various databases. Additionally, the reasons why we chose highly cited papers from the Web of Science Core Collection as our analysis material have been explained in the section “Introduction.” Secondly, we may have overlooked some important non-English papers, leading to research bias.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author contributions

BC: Writing – original draft. JJ: Writing – review & editing. HL: Writing – review & editing. ZY: Writing – review & editing. HZ: Writing – review & editing. YW: Writing – review & editing. JL: Writing – original draft. SW: Writing – original draft. SC: Writing – original draft.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work is supported by the National Natural Science Foundation of China (Grant No. 81973924) and Special Financial Subsidies of Fujian Province, China (Grant No. X2021003—Special financial).


We would like to thank Chaomei Chen for developing this visual analysis software.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.


CNNs, Convolutional neural networks; CPUs, Central processing units; GPUs, Graphics processing units; TPUs, Tensor processing units; ASPP, Atrous spatial pyramid pooling.

Altan, A., and Karasu, S. (2020). Recognition of Covid-19 disease from X-Ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique. Chaos, Solitons Fractals 140:110071. doi: 10.1016/j.chaos.2020.110071

PubMed Abstract | CrossRef Full Text | Google Scholar

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., et al. (2021). Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J. Big Data 8:53. doi: 10.1186/s40537-021-00444-8

Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., and Mougiakakou, S. (2016). Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Trans. Med. Imaging 35, 1207–1216. doi: 10.1109/TMI.2016.2535865

Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. doi: 10.1109/TPAMI.2013.50

Chaddad, A., Peng, J. H., Xu, J., and Bouridane, A. (2023). Survey of explainable AI techniques in healthcare. Sensors 23:634. doi: 10.3390/s23020634

Chen, C. (2005). “The centrality of pivotal points in the evolution of scientific networks” in Proceedings of the 10th international conference on Intelligent user interfaces ; San Diego, California, USA: Association for Computing Machinery. p. 98–105.

Google Scholar

Chen, C. M. (2006). Citespace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. Technol. 57, 359–377. doi: 10.1002/asi.20317

CrossRef Full Text | Google Scholar

Chen, R. J., Lu, M. Y., Wang, J. W., Williamson, D. F. K., Rodig, S. J., Lindeman, N. I., et al. (2022). Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41, 757–770. doi: 10.1109/TMI.2020.3021387

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2018). Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848. doi: 10.1109/TPAMI.2017.2699184

Chen, M., Shi, X. B., Zhang, Y., Wu, D., and Guizani, M. (2021). Deep feature learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans. Big Data 7, 750–758. doi: 10.1109/TBDATA.2017.2717439

Chougrad, H., Zouaki, H., and Alheyane, O. (2018). Deep convolutional neural networks for breast cancer screening. Comput. Methods Prog. Biomed. 157, 19–30. doi: 10.1016/j.cmpb.2018.01.011

Ciresan, D. C., Giusti, A., Gambardella, L. M., and Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. Med. Image Comput. Comput. Assist. Intervent. 16, 411–418. doi: 10.1007/978-3-642-40763-5_51

Dessy, R. E. (1976). Microprocessors?—an end user's view. Science (New York, N.Y.) 192, 511–518. doi: 10.1126/science.1257787

Dhiman, G., Kumar, V. V., Kaur, A., and Sharma, A. (2021). DON: deep learning and optimization-based framework for detection of novel coronavirus disease using X-ray images. Interdiscip. Sci. 13, 260–272. doi: 10.1007/s12539-021-00418-7

Domingos, P. (2012). A few useful things to know about machine learning. Commun. ACM 55, 78–87. doi: 10.1145/2347736.2347755

Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., et al. (2022). Prottrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127. doi: 10.1109/TPAMI.2021.3095381

Freeman, L. C. (1977). A set of measures of centrality based on betweenness. Sociometry 40, 35–41. doi: 10.2307/3033543

Ghimire, D., Kil, D., and Kim, S. H. (2022). A survey on efficient convolutional neural networks and hardware acceleration. Electronics 11:945. doi: 10.3390/electronics11060945

Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 142–158. doi: 10.1109/TPAMI.2015.2437384

Glasser, O. W. C. (1995). Roentgen and the Discovery of the Roentgen Rays. AJR Am. J. Roentgenol. 165, 1033–1040. doi: 10.2214/ajr.165.5.7572472

Gong, F., and Tang, S. (2020). Internet intervention system for elderly hypertensive patients based on hospital community family edge network and personal medical resources optimization. J. Med. Syst. 44:95. doi: 10.1007/s10916-020-01554-1

Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J. Am. Med. Assoc. 316, 2402–2410. doi: 10.1001/jama.2016.17216

Han, Z., Yu, S., Lin, S.-B., and Zhou, D.-X. (2022). Depth selection for deep relu nets in feature extraction and generalization. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1853–1868. doi: 10.1109/TPAMI.2020.3032422

Hang, R. L., Qian, X. W., and Liu, Q. S. (2022). Cross-modality contrastive learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 60, 1–12. doi: 10.1109/TGRS.2022.3188529

He, K, Zhang, X, Ren, S, and Sun, J (eds.) (2016). “Deep Residual Learning for Image Recognition” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ; June 27-30, 2016.

Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554. doi: 10.1162/neco.2006.18.7.1527

Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., and Aerts, H. J. W. L. (2018). Artificial intelligence in radiology. Nat. Rev. Cancer 18, 500–510. doi: 10.1038/s41568-018-0016-5

Hu, K., Zhao, L., Feng, S., Zhang, S., Zhou, Q., Gao, X., et al. (2022). Colorectal polyp region extraction using saliency detection network with neutrosophic enhancement. Comput. Biol. Med. 147:105760. doi: 10.1016/j.compbiomed.2022.105760

Jerber, J., Seaton, D. D., Cuomo, A. S. E., Kumasaka, N., Haldane, J., Steer, J., et al. (2021). Population-scale single-cell RNA-Seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53:304. doi: 10.1038/s41588-021-00801-6

Jin, K. H., McCann, M. T., Froustey, E., and Unser, M. (2017). Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 26, 4509–4522. doi: 10.1109/TIP.2017.2713099

Kawahara, J., Brown, C. J., Miller, S. P., Booth, B. G., Chau, V., Grunau, R. E., et al. (2017). Brainnetcnn: convolutional neural networks for brain networks; toward predicting neurodevelopment. NeuroImage 146, 1038–1049. doi: 10.1016/j.neuroimage.2016.09.046

Kerr, M. V., Bryden, P., and Nguyen, E. T. (2022). Diagnostic imaging and mechanical objectivity in medicine. Acad. Radiol. 29, 409–412. doi: 10.1016/j.acra.2020.12.017

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90. doi: 10.1145/3065386

Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., and Sethi, A. (2017). A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans. Med. Imaging 36, 1550–1560. doi: 10.1109/TMI.2017.2677499

Lariviere, V., and Gingras, Y. (2010). The impact factor's matthew effect: a natural experiment in bibliometrics. J. Am. Soc. Inf. Sci. Technol. 61, 424–427. doi: 10.1002/asi.21232

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521, 436–444. doi: 10.1038/nature14539

Lee, H., Tajmir, S., Lee, J., Zissen, M., Yeshiwas, B. A., Alkasab, T. K., et al. (2017). Fully automated deep learning system for bone age assessment. J. Digit. Imaging 30, 427–441. doi: 10.1007/s10278-017-9955-8

Lee, D., Yoo, J., Tak, S., and Ye, J. C. (2018). Deep residual learning for accelerated MRI using magnitude and phase networks. IEEE Trans. Biomed. Eng. 65, 1985–1995. doi: 10.1109/TBME.2018.2821699

Lehmann, C., Koenig, T., Jelic, V., Prichep, L., John, R. E., Wahlund, L.-O., et al. (2007). Application and comparison of classification algorithms for recognition of alzheimer's disease in electrical brain activity (EEG). J. Neurosci. Methods 161, 342–350. doi: 10.1016/j.jneumeth.2006.10.023

Lin, H., Wang, C., Cui, L., Sun, Y., Xu, C., and Yu, F. (2022). Brain-like initial-boosted hyperchaos and application in biomedical image encryption. IEEE Trans. Industr. Inform. 18, 8839–8850. doi: 10.1109/TII.2022.3155599

Madabhushi, A., and Lee, G. (2016). Image analysis and machine learning in digital pathology: challenges and opportunities. Med. Image Anal. 33, 170–175. doi: 10.1016/j.media.2016.06.037

McCulloch, W. S., and Pitts, W. (1990). A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol. 52, 99–115. doi: 10.1016/S0092-8240(05)80006-0

Min, S., Lee, B., and Yoon, S. (2017). Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869. doi: 10.1093/bib/bbw068

Minaee, S., Boykov, Y. Y., Porikli, F., Plaza, A. J., Kehtarnavaz, N., and Terzopoulos, D. (2022). Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3523–3542. doi: 10.1109/TPAMI.2021.3059968

Minaee, S., Kafieh, R., Sonka, M., Yazdani, S., and Soufi, G. J. (2020). Deep-covid: predicting covid-19 from chest X-ray images using deep transfer learning. Med. Image Anal. 65:101794. doi: 10.1016/j.media.2020.101794

Misra, N. N., Dixit, Y., Al-Mallahi, A., Bhullar, M. S., Upadhyay, R., and Martynenko, A. (2022). Iot, big data, and artificial intelligence in agriculture and food industry. IEEE Internet Things J. 9, 6305–6324. doi: 10.1109/JIOT.2020.2998584

Narin, A., Kaya, C., and Pamuk, Z. (2021). Automatic detection of coronavirus disease (Covid-19) using X-ray images and deep convolutional neural networks. Pattern. Anal. Applic. 24, 1207–1220. doi: 10.1007/s10044-021-00984-y

Quellec, G., Charriére, K., Boudi, Y., Cochener, B., and Lamard, M. (2017). Deep image mining for diabetic retinopathy screening. Med. Image Anal. 39, 178–193. doi: 10.1016/j.media.2017.04.012

Ragab, M., Albukhari, A., Alyami, J., and Mansour, R. F. (2022). Ensemble Deep-Learning-Enabled Clinical Decision Support System for Breast Cancer Diagnosis and Classification on Ultrasound Images. Biology 11:439. doi: 10.3390/biology11030439

Ren, Z. Y., Wang, S. H., and Zhang, Y. D. (2023). Weakly supervised machine learning. Caai Transact. Intellig. Technol. 8, 549–580. doi: 10.1049/cit2.12216

Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z., Gupta, B. B., et al. (2022). A survey of deep active learning. ACM Comput. Surv. 54, 1–40. doi: 10.1145/3472291

Ronneberger, O, Fischer, P, and Brox, T (eds.) (2015). “U-Net: Convolutional Networks for Biomedical Image Segmentation” in International Conference on Medical Image Computing and Computer-Assisted Intervention .

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408. doi: 10.1037/h0042519

Rusk, N. (2016). Deep learning. Nat. Methods 13:35. doi: 10.1038/nmeth.3707

Saleh, M. A., Ali, A. A., Ahmed, K., and Sarhan, A. M. (2023). A brief analysis of multimodal medical image fusion techniques. Electronics 12:97. doi: 10.3390/electronics12010097

Schoenbach, U. H., and Garfield, E. (1956). Citation indexes for science. Science (New York, N.Y.) 123, 61–62. doi: 10.1126/science.123.3185.61.b

Setio, A. A. A., Traverso, A., de Bel, T., Berens, M. S. N., van den Bogaard, C., Cerello, P., et al. (2017). Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the Luna16 challenge. Med. Image Anal. 42, 1–13. doi: 10.1016/j.media.2017.06.015

Shin, H. C., Roth, H. R., Gao, M. C., Lu, L., Xu, Z. Y., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35, 1285–1298. doi: 10.1109/TMI.2016.2528162

Sinha, A., and Dolz, J. (2021). Multi-scale self-guided attention for medical image segmentation. IEEE J. Biomed. Health Inform. 25, 121–130. doi: 10.1109/JBHI.2020.2986926

Tatsugami, F., Higaki, T., Nakamura, Y., Yu, Z., Zhou, J., Lu, Y. J., et al. (2019). Deep learning-based image restoration algorithm for coronary CT angiography. Eur. Radiol. 29, 5322–5329. doi: 10.1007/s00330-019-06183-y

Wang, S., Zhang, Y., and Yao, X. (2021). Research on spatial unbalance and influencing factors of ecological well-being performance in China. Int. J. Environ. Res. Public Health 18:9299. doi: 10.3390/ijerph18179299

Wuchty, S., Jones, B. F., and Uzzi, B. (2007). The increasing dominance of teams in production of knowledge. Science 316, 1036–1039. doi: 10.1126/science.1136099

Yin, L., Zhang, C., Wang, Y., Gao, F., Yu, J., and Cheng, L. (2021). Emotional deep learning programming controller for automatic voltage control of power systems. IEEE Access 9, 31880–31891. doi: 10.1109/ACCESS.2021.3060620

Zhang, J., Gajjala, S., Agrawal, P., Tison, G. H., Hallock, L. A., Beussink-Nelson, L., et al. (2018). Fully automated echocardiogram interpretation in clinical practice: feasibility and diagnostic accuracy. Circulation 138, 1623–1635. doi: 10.1161/CIRCULATIONAHA.118.034338

Zhang, P. Z., Gong, M. G., Su, L. Z., Liu, J., and Li, Z. Z. (2016). Change detection based on deep feature representation and mapping transformation for multi-spatial-resolution remote sensing images. ISPRS-J Photogramm Remote Sens 116, 24–41. doi: 10.1016/j.isprsjprs.2016.02.013

Keywords: deep learning, medical images, bibliometric analysis, CiteSpace, trends, hotspots

Citation: Chen B, Jin J, Liu H, Yang Z, Zhu H, Wang Y, Lin J, Wang S and Chen S (2023) Trends and hotspots in research on medical images with deep learning: a bibliometric analysis from 2013 to 2023. Front. Artif. Intell . 6:1289669. doi: 10.3389/frai.2023.1289669

Received: 06 September 2023; Accepted: 27 October 2023; Published: 09 November 2023.

Reviewed by:

Copyright © 2023 Chen, Jin, Liu, Yang, Zhu, Wang, Lin, Wang and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jianping Lin, [email protected] ; Shizhong Wang, [email protected] ; Shaoqing Chen, [email protected]

† These authors have contributed equally to this work and share first authorship

Book cover

  • © 2023

Biomedical Signal and Image Processing with Artificial Intelligence

  • Chirag Paunwala 0 ,
  • Mita Paunwala 1 ,
  • Rahul Kher 2 ,
  • Falgun Thakkar 3 ,
  • Heena Kher 4 ,
  • Mohammed Atiquzzaman 5 ,
  • Norliza Mohd. Noor 6

Electronics & Communication Engineering, Sarvajanik College of Engineering and Technology, Surat, India

You can also search for this editor in PubMed   Google Scholar

Electronics & Communication Engineering, C. K. Pithawala College of Engineering and Technology, Surat, India

Electronics & Communication Engineering, G. H. Patel College of Engineering & Technology, Vallabh Vidyanagar, India

A. D. Patel Institute of Technology, New Vallabh Vidyanagar, India

School of computer science, university of oklahoma, norman, usa, utm razak school, menara razak, universiti teknologi malaysia, kuala lumpur, malaysia.

Provides insights into medical signal and image analysis using artificial intelligence

Includes novel and recent trends of decision support system for medical research

Outlines employment of evolutionary algorithms for biomedical data, big data analysis

Part of the book series: EAI/Springer Innovations in Communication and Computing (EAISICC)

5528 Accesses

6 Citations

  • Table of contents

About this book

Editors and affiliations, about the editors, bibliographic information.

  • Publish with us

Buying options

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (20 chapters)

Front matter, voice privacy in biometrics.

  • Priyanka Gupta, Shrishti Singh, Gauri P. Prajapati, Hemant A. Patil

Histopathology Whole Slide Image Analysis for Breast Cancer Detection

  • Pushap Deep Singh, Arnav Bhavsar, K. K. Harinarayanan

Lung Classification for COVID-19

  • Norliza Mohd. Noor, Muhammad Samer Sallam

GRU-Based Parameter-Efficient Epileptic Seizure Detection

  • Ojas A. Ramwala, Chirag N. Paunwala, Mita C. Paunwala

An Object Aware Hybrid U-Net for Breast Tumour Annotation

  • Suvidha Tripathi, Satish Kumar Singh

VLSI Implementation of sEMG Based Classification for Muscle Activity Control

  • Amit M. Joshi, Natasha Singh, Sri Teja

Content-Based Image Retrieval Techniques and Their Applications in Medical Science

  • Mayank R. Kapadia, Chirag N. Paunwala

Data Analytics on Medical Images with Deep Learning Approach

  • S. Saravanan, K. Surendheran, K. Krishnakumar

Analysis and Classification Dysarthric Speech

  • Siddhant Gupta, Hemant A. Patil

Skin Cancer Detection and Classification Using DWT-GLCM with Probabilistic Neural Networks

  • J. Pandu, Umadevi Kudtala, B. Prabhakar

Manufacturing of Medical Devices Using Artificial Intelligence-Based Troubleshooters

  • Akbar Doctor

Enhanced Hierarchical Prediction for Lossless Medical Image Compression in the Field of Telemedicine Application

  • Ketki C. Pathak, Jignesh N. Sarvaiya, Anand D. Darji

LBP-Based CAD System Designs for Breast Tumor Characterization

  • Kriti, Jitendra Virmani, Ravinder Agarwal

Detection of Fetal Abnormality Using ANN Techniques

  • Vidhi Rawat, Vibhakar Shrimali, Alok Jain, Abhishek Rawat

Machine Learning and Deep Learning-Based Framework for Detection and Classification of Diabetic Retinopathy

  • V. Purna Chandra Reddy, Kiran Kumar Gurrala

Applications of Artificial Intelligence in Medical Images Analysis

  • Pushpanjali Gupta, Prasan Kumar Sahoo

Intelligent Image Segmentation Methods Using Deep Convolutional Neural Network

  • Mekhla Sarkar, Prasan Kumar Sahoo

Artificial Intelligence Assisted Cardiac Signal Analysis for Heart Disease Prediction

  • Prasan Kumar Sahoo, Sulagna Mohapatra, Hiren Kumar Thakkar

Early Lung Cancer Detection by Using Artificial Intelligence System

  • Fatma Taher

This book focuses on advanced techniques used for feature extraction, analysis, recognition, and classification in the area of biomedical signal and image processing. Contributions cover all aspects of artificial intelligence, machine learning, and deep learning in the field of biomedical signal and image processing using novel and unexplored techniques and methodologies. The book covers recent developments in both medical images and signals analyzed by artificial intelligence techniques. The authors also cover topics related to development based artificial intelligence, which includes machine learning, neural networks, and deep learning. This book will provide a platform for researchers who are working in the area of artificial intelligence for biomedical applications.

  • Provides insights into medical signal and image analysis using artificial intelligence;
  • Includes novel and recent trends of decision support system for medical research;
  • Outlines employment of evolutionary algorithms for biomedical data, big data analysis for medical databases, and reliability, opportunities, and challenges in clinical data.
  • Biomedical Image Processing with Artificial Intelligence
  • Biomedical Signal Processing with Artificial Intelligence
  • Artificial Intelligence in Medical Images Analysis
  • Decision Making Biomedical Support Systems
  • Ultrasound Image Analysis using AI

Electronics & Communication Engineering, Sarvajanik College of Engineering and Technology, Surat, India

Chirag Paunwala

Electronics & Communication Engineering, C. K. Pithawala College of Engineering and Technology, Surat, India

Mita Paunwala

Electronics & Communication Engineering, G. H. Patel College of Engineering & Technology, Vallabh Vidyanagar, India

Rahul Kher, Falgun Thakkar

Mohammed Atiquzzaman

Norliza Mohd. Noor

Dr. Chirag N. Paunwala is working as a Professor, EC Department, and Dean R&D, Sarvajanik College of Engineering and Technology, Surat. His research interests include Image Processing, Pattern Recognition, Deep Learning, and Medical Signal Processing. He has published more than 60 research publications in reputed conferences, Journals and Book Chapters. He is the first recipient of the Regional Meritorious Service Award by the IEEE Signal Processing Society, USA in 2017. He has also served as a Chairman, SPS Chapter, Gujarat Section and winner of “Best Chapter Award” consecutively three times during his tenure. He was Chapter Chair Coordinator for IEEE, SPS, USA for year 2019. Currently He is volunteering as a Vice-Chair for IEEE Gujarat Section. He is a reviewer for many reputed journals like IEEE, Elsevier and Springer. He served as Technical Program Chair for Signal and Image processing track for reputed conferences like INDICON, TENSYMP, TENCON etc.

Dr. Mita Paunwala received B. E (Electronics) from Sarvajanik College of Engineering and Technology, Surat in 1999, M.Tech. (Communication System) from Sardar Vallabhbhai National Institute of Technology, Surat in 2008 and Ph.D degree from NIT, Surat, India in 2014. She is Associate Professor at the Electronics and Communication Engineering Department, CKPCET, Surat, India. She has teaching and research experience of over 20 years. Her area of interest is Image, Video and Signal Processing, Pattern Recognition, Machine Learning, Deep learning and Healthcare systems. She has published more than 25 research papers in various renowned conferences, Journals and Books. She is a Vice Chair of IEEE Signal Processing Society, Gujarat chapter for duration 2019 to 2021. She has reviewed many papers of renowned journals from IET, Springer, Elsevier, IEEE Access etc.

Dr. Rahul Kher received B.E (Electronics) from Sardar Patel University in 1997, M.Tech (Electrical Engineering) from Indian Institute of Technology, Roorkee in 2006 and Ph.D (Electronics & Communication Engineering) from Sardar Patel University in 2014. He has a teaching and research experience of over 21 years. His research interests include Biomedical Signal and Image Processing, Medical image analysis and Healthcare monitoring systems. He has published four books and more than 70 research papers in various international journals and conferences. He is a Senior Member of IEEE and was the founder Secretary of Signal Processing Society (SPS) Chapter of IEEE Gujarat Section during 2013-2015. He has been on the Reviewer Panel/ TPC member of many International Journals and conferences including the IEEE Communication Society Magazine, Journal of Biomedical Signal Processing and Control (Elsevier), Journal of Computer Networks (Elsevier), International Journal of Advanced Intelligence Paradigms (Inderscience), Biomedical Engineering: Application, Basis and Communication (World Scientific), 1st Global IoT Summit (GIoTS 2017), Geneva, Switzerland, 3rd Global IoT Innovation Forum, Barcelona, Spain, 3rd Annual Int. Conf. on Wireless Comm. and Sensor Networks (WCSN 2016), 2016 IEEE World Forum on Internet of Things (WF-IoT), Virginia, US and many more. He has visited Japan, USA and UK for various academic purposes.

Dr. Falgun Thakkar obtained Ph.D. from National Institute of Technology Allahabad in February 2018. He graduated from Birla Vishvakarma Mahavidyalaya (BVM) in 2004 and completed his Masters of Engineering in Communication from GCET, S P University V V Nagar in 2010. Dr. Falgun has published more than 25 research articles in various International and National Journals and Conferences. He has served as reviewer of many international journals and conferences of repute. He has published a book in the domain of Compressed Sensing based ECG Signal Compression. His areas of interest include Antenna design, HF transmission line, Microwave Engineering, Wavelet based image and signal processing, Medical Image Security, Compressive Sensing and optimization techniques like PSO and GA. He has guided more than 5 M.E. Students in their dissertation as well as more than 10 projects of B.E. Students. At present he is guiding six PhD. students in the domain of microwave antenna design and medical image processing.

Dr. Heena Kher received B.E. in Instrumentation and Control from Sarvajanik College of Engineering and Technology, Surat in 2001, M.E. in Microprocessors Systems and Applications from M. S. University, Baroda in 2006 and Ph.D. from Sardar Patel University, Vallabh Vidyanagar in 2014. She is Assistant Professor in A. D. Patel Institute of Technology, New Vallabh Vidyanagar. She is having 18 years of teaching experience. Her areas of interest are Digital Image Processing, Machine Learning, Deep Learning, Biomedical Signal Processing, Optimization Techniques. She has guided 10 dissertations of M.E. Students and DPC member of 3 Ph.D. Students. She has presented/published 30 research papers in various International/National conferences and International/National Journals. She is a reviewer of many reputed journals.

Dr. Mohammed Atiquzzaman ’s research interests and publications are in next generation computer networks, wireless and mobile networks, satellite networks, switching and routing, optical communications and multimedia over networks. Many of the current research activities are supported by the National Science Foundation (NSF), National Aeronautics and Space Administration (NASA) and the U.S. Air Force. He served as the Editor-in-Chief of Journal of Network and Computer Applications, Editor-in-Chief of the Vehicular Communications journal and associate editor of IEEE Communications Magazine, Journal of Wireless and Optical Communications, International Journal of Communication Systems, International Journal of Sensor Networks, International Journal of Communication Networks and Distributed Systems, and Journal of Real-Time Image Processing.

Dr. Norliza Mohd Noor is currently attached as Associate Professor in UTM Razak School of Engineering and Advanced Technology, University Technology Malaysia (UTM), Kuala Lumpur Campus. She received her B. Sc. In Electrical Engineering from Texas Tech University in Lubbock, Texas and her Master of Electrical Engineering (by research) and PhD (Electrical Engineering) from UTM. Her research area is in image processing and image analysis. Her current work concentrates on medical image analysis for lung diseases. She has published many papers in journals and in indexed conference proceedings, and has published one academic book and two book chapters. Currently she is the Head of the Electrophysiology Research Group., UTM Razak School.

Book Title : Biomedical Signal and Image Processing with Artificial Intelligence

Editors : Chirag Paunwala, Mita Paunwala, Rahul Kher, Falgun Thakkar, Heena Kher, Mohammed Atiquzzaman, Norliza Mohd. Noor

Series Title : EAI/Springer Innovations in Communication and Computing

DOI : https://doi.org/10.1007/978-3-031-15816-2

Publisher : Springer Cham

eBook Packages : Engineering , Engineering (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

Hardcover ISBN : 978-3-031-15815-5 Published: 10 January 2023

Softcover ISBN : 978-3-031-15818-6 Published: 11 January 2024

eBook ISBN : 978-3-031-15816-2 Published: 09 January 2023

Series ISSN : 2522-8595

Series E-ISSN : 2522-8609

Edition Number : 1

Number of Pages : XIII, 419

Number of Illustrations : 43 b/w illustrations, 204 illustrations in colour

Topics : Signal, Image and Speech Processing , Computer Imaging, Vision, Pattern Recognition and Graphics , Health Informatics , Biomedical Engineering and Bioengineering

Policies and ethics

  • Find a journal
  • Track your research

Biomedical image processing

  • PMID: 7023828

Biomedical image processing is a very broad field; it covers biomedical signal gathering, image forming, picture processing, and image display to medical diagnosis based on features extracted from images. This article reviews this topic in both its fundamentals and applications. In its fundamentals, some basic image processing techniques including outlining, deblurring, noise cleaning, filtering, search, classical analysis and texture analysis have been reviewed together with examples. The state-of-the-art image processing systems have been introduced and discussed in two categories: general purpose image processing systems and image analyzers. In order for these systems to be effective for biomedical applications, special biomedical image processing languages have to be developed. The combination of both hardware and software leads to clinical imaging devices. Two different types of clinical imaging devices have been discussed. There are radiological imagings which include radiography, thermography, ultrasound, nuclear medicine and CT. Among these, thermography is the most noninvasive but is limited in application due to the low energy of its source. X-ray CT is excellent for static anatomical images and is moving toward the measurement of dynamic function, whereas nuclear imaging is moving toward organ metabolism and ultrasound is toward tissue physical characteristics. Heart imaging is one of the most interesting and challenging research topics in biomedical image processing; current methods including the invasive-technique cineangiography, and noninvasive ultrasound, nuclear medicine, transmission, and emission CT methodologies have been reviewed. Two current federally funded research projects in heart imaging, the dynamic spatial reconstructor and the dynamic cardiac three-dimensional densitometer, should bring some fruitful results in the near future. Miscrosopic imaging technique is very different from the radiological imaging technique in the sense that interaction between the operator and the imaging device is very essential. The white blood cell analyzer has been developed to the point that it becomes a daily clinical imaging device. An interactive chromosome karyotyper is being clinical evaluated and its preliminary indication is very encouraging. Tremendous efforts have been devoted to automation of cancer cytology; it is hoped that some prototypes will be available for clinical trials very soon. Automation of histology is still in its infancy; much work still needs to be done in this area. The 1970s have been very fruitful in utilizing the imaging technique in biomedical application; the computerized tomographic scanner and the white blood cell analyzer being the most successful imaging devices...

Publication types

  • Breast Neoplasms / diagnosis
  • Cineangiography / methods
  • Clinical Laboratory Techniques / methods
  • Diagnosis, Computer-Assisted
  • Echocardiography
  • Heart / diagnostic imaging
  • Image Enhancement / instrumentation
  • Image Enhancement / methods
  • Medical Laboratory Science*
  • Pneumoconiosis / diagnostic imaging
  • Radiography, Thoracic
  • Radionuclide Imaging
  • Thermography
  • Tomography, Emission-Computed
  • Tomography, X-Ray Computed
  • Ultrasonography

Biomedical Image Classification Using Convolutional Neural Networks

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Merry Mani PhD

Merry Mani, PhD

"We're advancing understanding of the human brain through innovative imaging." Merry Mani.PhD

Dr. Merry Mani's work focuses on the development of next generation imaging technologies, particularly those that enhance the performance of the non-invasive MRI-based imaging methods, by synergistically combining advances in hardware capabilities and advanced signal processing methods. The overarching goal is to advance the understanding of the human brain. To harness the full potential of MRI technology in studying live human brains, high throughput imaging technologies are essential. Such technology should enable imaging at high spatio-temporal resolutions to study the brain in microscopic detail as well as capture the dynamic changes such as functional activations and metabolic changes.

Dr. Mani's research addresses these needs, especially to study patient populations affected by neurodegenerative disorders and neurodevelopmental disorders. These methods are particularly crucial for in-depth exploration of conditions such as Huntington's disease, Alzheimer's disease, Autism, Epilepsy etc. The emphasis is on unraveling the microstructural features of neurons and glial cells, and to characterize their modulations in the context of neurological disorders. Dr. Mani's lab employs a multidisciplinary approach, combining biomedical imaging and biophysical modeling methodologies. This integrated approach allows for a comprehensive study of the biological processes involved in neurodegenerative and neurodevelopmental disorders.

Dr. Merry Mani's lab is committed to pushing the boundaries of our understanding of the human brain, through the development of efficient imaging methods. If you are passionate about computational biomedical imaging and analysis, consider joining our team or collaborating on exciting research projects.

  • NARSAD Young Investigator Award, Brain & Behavior Foundation
  • NRSA Post-doctoral Fellowship

Research Interests

  • Biomedical Imaging
  • Signal Processing
  • Image Reconstruction
  • Pulse sequence development
  • Machine Learning
  • Biophysical Modeling
  • Neurological disorders

Featured Grants & Projects

Fast multi-dimensional diffusion mri with sparse sampling and model-based deep learning reconstruction, nih r01eb031169.

Neurodegenerative disorders are a significant public health and economic problem affecting about 450 million people worldwide are the leading cause of disability and ill-health according to world health organization. The main objective of the proposal is the development, validation and translation of a non-invasive diffusion MRI assay, that enable efficient encoding of diffusion parameter space to characterize the neurodegenerative processes that drive the progression of neurodegeneration. We validate the framework in a cohort of Huntington's disease, with the prospect of extending these studies to understand the neurodegenerative cascade in the entire class of neurodegenerative diseases, including Parkinson's and Alzheimer's.

Other Awards

Fast Diffusion MRI assay for Studying Neurodegeneration in Alzheimer’s Disease Investigating Novel Imaging Biomarkers of Acute Target Engagement of rTMS For Major Depression

  • Visit Grounds
  • Give to Engineering


  1. Top 20 Biomedical Image Processing Projects [Development Tools]

    biomedical image processing research papers

  2. FREE 27+ Research Paper Formats in PDF

    biomedical image processing research papers

  3. Researpers on biomedical image processing

    biomedical image processing research papers

  4. (PDF) Medical Image Processing-An Introduction

    biomedical image processing research papers

  5. Review Paper on Biomedical Image processing using Wavelets

    biomedical image processing research papers

  6. Medical Image Processing

    biomedical image processing research papers


  1. IBME 3rd Lecture Researches in Biomedical Engineering (Urdu-Hindi)

  2. creative video pathology laboratory report

  3. P1 Learning and Developing using Scientific Research Papers

  4. P2 Learning and Developing using Scientific Research Papers


  6. EasyNER: Using Artificial Intelligence to “Read”...


  1. Advances in biomedical signal and image processing

    Biomedical signal and image processing establish a dynamic area of specialization in both academic as well as research aspects of biomedical engineering.

  2. Modern Trends and Applications of Intelligent Methods in Biomedical

    The Special Issue "Modern Trends and Applications of Intelligent Methods in Biomedical Signal and Image Processing" is aimed at the new proposals and intelligent solutions that constitute the state of the art of the intelligent methods for biomedical data processing from selected areas of signal and image processing.

  3. (PDF) Medical Image Processing-An Introduction

    Biomedical image processing has experienced dramatic expansion, and has been an interdisciplinary research field attracting expertise from applied mathematics, computer sciences,...

  4. Recent Advances in Medical Image Processing

    Key Message: In this paper, we will review recent advances in artificial intelligence, machine learning, and deep convolution neural network, focusing on their applications in medical image processing.

  5. [2402.08276] Rethinking U-net Skip Connections for Biomedical Image

    The U-net architecture has significantly impacted deep learning-based segmentation of medical images. Through the integration of long-range skip connections, it facilitated the preservation of high-resolution features. Out-of-distribution data can, however, substantially impede the performance of neural networks. Previous works showed that the trained network layers differ in their ...

  6. Medical image analysis based on deep learning approach

    In the health care system, there has been a dramatic increase in demand for medical image services, e.g. Radiography, endoscopy, Computed Tomography (CT), Mammography Images (MG), Ultrasound images, Magnetic Resonance Imaging (MRI), Magnetic Resonance Angiography (MRA), Nuclear medicine imaging, Positron Emission Tomography (PET) and pathologica...

  7. Advances in Deep Learning-Based Medical Image Analysis

    Although there exist a number of reviews on deep learning methods on medical image analysis [4-13], most of them emphasize either on general deep learning techniques or on specific clinical applications.The most comprehensive review paper is the work of Litjens et al. published in 2017 [].Deep learning is such a quickly evolving research field; numerous state-of-the-art works have been ...

  8. Bioimaging: Evolution, Significance, and Deficit

    Abstract. Bioimaging is a digital technology-based medical advancement which is still relatively new. It has to do with real-time visualization of biological processes. This innovative imaging technology combines anatomical structure with functional data such as electric and magnetic fields, motion which is mechanical, and metabolism to provide ...

  9. [2301.11813] Biomedical Image Reconstruction: A Survey

    This survey paper is intended for machine learning researchers to grasp a general understanding of the biomedical image reconstruction field and the current research trend in deep biomedical image reconstruction. Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2301.11813 [eess.IV]

  10. Recent Advances in Biomedical Image Processing

    Dear Colleagues, We invite submissions exploring cutting-edge research and recent developments in the field of biomedical image processing. Advances in biomedical imaging, including digital radiography; X-ray computed tomography (CT); nuclear (positron emission tomography—PET); ultrasound; optical and magnetic resonance imaging (MRI); as well as a variety of new microscopies, including whole ...

  11. Research in Medical Imaging Using Image Processing Techniques

    Image processing increases the percentage and amount of detected tissues. This chapter presents the application of both simple and sophisticated image analysis techniques in the medical...

  12. Medical image analysis based on deep learning approach

    A Precise Analysis of Deep Learning for Medical Image Processing Chapter © 2021 Survey of deep learning in breast cancer image analysis Article 24 August 2019 1 Introduction

  13. Advantages of transformer and its application for medical image

    Purpose Convolution operator-based neural networks have shown great success in medical image segmentation over the past decade. The U-shaped network with a codec structure is one of the most widely used models. Transformer, a technology used in natural language processing, can capture long-distance dependencies and has been applied in Vision Transformer to achieve state-of-the-art performance ...

  14. Recent Advance of Machine Learning in Biomedical Image Analysis

    This Special Issue aims to present the latest research developments in machine learning applied to biomedical image analysis. The topics of interest include but are not limited to: Semantic segmentation of medical images; Computer-aided detection and diagnosis; Learning from weak or noisy annotations;

  15. Biomedical Image Processing

    IEEE Xplore, delivering full text access to the world's highest quality technical literature in engineering and technology. | IEEE Xplore

  16. Machine learning for medical imaging: methodological failures and

    In this paper, we explore avenues to improve clinical impact of machine learning in medical imaging.

  17. Algorithms for Biomedical Image Analysis and Processing

    Algorithms for processing and analyzing biomedical images are commonly used to visualize anatomical structures or assess the functionality of human organs, point out pathological regions, analyze biological and metabolic processes, set therapy plans, and carry out image-guided surgery.

  18. Frontiers

    1. Introduction. The origin of radiology can be seen as the beginning of medical image processing. The discovery of X-rays by Röntgen and its successful application in clinical practice ended the era of disease diagnosis relying solely on the clinical experience of doctors (Glasser, 1995).The production of medical images provides doctors with more data, enabling them to diagnose and treat ...

  19. (PDF) Artificial Intelligence in Biomedical Image Processing

    Artificial Intelligence in Biomedical Image Processing A typical convolutional 160 J. Selvaraj et al. attened input, which means that each input is linked to all the connectivity has weights...

  20. Biomedical Signal and Image Processing with Artificial ...

    He has a teaching and research experience of over 21 years. His research interests include Biomedical Signal and Image Processing, Medical image analysis and Healthcare monitoring systems. He has published four books and more than 70 research papers in various international journals and conferences.

  21. A review on biomedical image processing and future trends

    This paper presents a review on the current state-of-the-art techniques in biomedical image processing and comments on future trends. Publication types MeSH terms Computer Systems Image Processing, Computer-Assisted / trends* Radiographic Image Enhancement / methods

  22. Biomedical image processing

    Biomedical image processing 1981;5 (3):185-271. PMID: 7023828 Abstract Biomedical image processing is a very broad field; it covers biomedical signal gathering, image forming, picture processing, and image display to medical diagnosis based on features extracted from images. This article reviews this topic in both its fundamentals and applications.

  23. Biomedical Image Classification Using Convolutional Neural Networks

    Recent research trends in the field image processing have focussed on challenges and few techniques for processing and classification tasks related to it. Image classification aims at classifying images based on several predefined categories. Several research works have been carried out to overcome shortcomings in image classification, nevertheless the output was restricted to the elementary ...

  24. (PDF) Biomedical Signal and Image Processing

    M. Elena. S. Blasco. José M. Quero. This paper proposes an iterative procedure for the determination of parameters and working conditions of biomedical signal processing algorithms. A ...

  25. Merry P. Mani

    Dr. Merry Mani's work focuses on the development of next generation imaging technologies, particularly those that enhance the performance of the non-invasive MRI-based imaging methods, by synergistically combining advances in hardware capabilities and advanced signal processing methods. The overarching goal is to advance the understanding of the human brain. To harness the full potential of ...

  26. Advances in Biomedical Signal and Image Processing

    Abstract. Biomedical signal and image processing establish a vital area of specialization in both academic as well as research aspects of biomedical engineering. The concepts of signal and image ...