Abstract
The interest in Deep Learning (DL) has seen an exponential growth in the last ten years, producing a significant increase in both theoretical and applicative studies. On the one hand, the versatility and the ability to tackle complex tasks have led to the rapid and widespread diffusion of DL technologies. On the other hand, the dizzying increase in the availability of biomedical data has made classical analyses, carried out by human experts, progressively more unlikely. Contextually, the need for efficient and reliable automatic tools to support clinicians, at least in the most demanding tasks, has become increasingly pressing. In this survey, we will introduce a broad overview of DL models and their applications to biomedical data processing, specifically to medical image analysis, sequence processing (RNA and proteins) and graph modeling of molecular data interactions. First, the fundamental key concepts of DL architectures will be introduced, with particular reference to neural networks for structured data, convolutional neural networks, generative adversarial models, and siamese architectures. Subsequently, their applicability for the analysis of different types of biomedical data will be shown, in areas ranging from diagnostics to the understanding of the characteristics underlying the process of transcription and translation of our genetic code, up to the discovery of new drugs. Finally, the prospects and future expectations of DL applications to biomedical data will be discussed.
Introduction
In recent years, deep learning (DL) techniques have achieved state-of-the-art performance in several different tasks, from image semantic segmentation [1, 2] to object detection [3, 4], from modelling traffic flows [5] to bioinformatics applications [6, 7, 8, 9]. Even though we understand the world through the interaction with the environment we observe, this empirical realism called experience is somewhat limiting. Despite being subject to the limitations of our senses, such a flat view of the world has been the driving force in many fields and was able to lay the foundations of the former artificial intelligence systems. However, recent real-world challenges are changing this perspective, showing that the world around us, and the answers we are looking for, admit a non-Euclidean structure. More specifically, we could say that a critical research question is how the data are described. We could therefore define two main ways of representing data: through a symbolic or through a structural approach [10]. In the first case, the data are expressed through feature vectors, while, in the second, we use structured data, like sequences, trees or graphs, taking into account complex relationships existing between the basic information entities in a real scenario. Nevertheless, moving to structured representations means increasing the complexity of processing data and sometimes even not having a direct way to extend operations commonly performed in vector spaces. To give an example, while the concept of similarity between two vectors is well defined, and corresponds to calculate a distance within a metric space, this does not hold for graphs [10].
Indeed, with the accumulation of the so called non-Euclidean data [11], graphs have become extremely common and widely used. Graph structures can in fact represent biological networks at the molecular, protein, or species level; they can describe drug molecules, so as 3D protein structures or metabolic networks. On the other hand, digital images can be represented as pixel lattices, i.e. in the form of regular graphs. Even simpler data, though structured, are also very common in biological applications, such as sequences – for DNA and RNA data – and trees, for reconstructing molecular phylogenies. For what concerns DL techniques, these are becoming ubiquitous in chemistry, biology and medicine [12], including not only genome annotation and transcriptome analysis, but also predictions of protein binding sites, identification of major cancer transcription factors, predictions of metabolic functions in complex microbial communities, drug discovery and re-purposing, and precision medicine applications.
Among deep networks commonly used to process biomedical data, Recurrent Neural Networks (RNNs) were specifically tailored to process sequential data. In addition to feedforward connections, they are equipped with delayed connections, which make them able to process a sequence one element at a time – in the context of a protein sequence, for instance, one residue after another –, considering therefore their natural flow. In this way, memory arises and the neural network acquires the ability to store and integrate information from past inputs. Long-Short Term Memory (LSTM) networks are a special type of RNNs composed by memory cells, where context-dependent input, output, and forget gates are able to control what is the information processed and passed through at each stage [13]. Thus, LSTMs are capable of learning long-term dependencies, easily storing and exclusively transmitting selected inputs.
Graph Neural Networks (GNNs) [14], instead, are able to process input data encoded as general labeled graphs, and they can directly be applied, for instance, for molecule processing in the context of biomedical data processing. The state at each node is iteratively evaluated based on local information solely, realizing a sort of data diffusion across the whole graph. Moreover, GNNs are provided with a supervised learning algorithm that, besides the standard input-output data fitting cost, incorporates a criterion aimed at enforcing a contractive dynamics, to ensure the convergence of the state computation. Both node-focused and graph-focused problems can be addressed by GNNs, meaning that an output is produced for each node or for a unique node of the graph. GNNs can also be applied to network-medicine, a brand new research field, which have brought to the definition of a highly interconnected and tight network of diseases that are interdependent and needs to be studied in a network perspective [15]. Finally, they can be used as an engine for graph generation, allowing to design new drug molecules, starting from a dataset of known compounds [16].
Besides, segmented images can be represented through region adjacency graphs, with nodes – labeled with vectors which describe visual features, such as texture, perimeter and area in pixels, etc. – describing homogeneous regions and arcs representing the adjacency relationship among regions. Therefore, they can be processed by GNNs, for example to perform object localization or detection (node-focused and graph-focused tasks, respectively). However, the most commonly used DL architectures for image processing are Convolutional Neural Networks (CNNs) [17, 18], due to their ability to integrate the feature selection process within the network training. Moreover, since hierarchical patch-based convolution operations are employed in CNNs, computational costs are reduced and images are abstracted on different feature levels. CNNs are also particularly effective in processing medical images, which however are often not enough to train a deep network. In such a case, data augmentation, namely synthetic image generation, is the only viable solution, which can be implemented using Generative Adversarial Networks (GANs) [19]. GANs use two neural networks competing one against the other, a generative model
Finally, all the above-described architectures can constitute the basis for the construction of a Siamese network. Siamese networks are composed by two or more identical sub-networks, where identical entails the fact that they have the same configuration with the same parameters and weights. Parameter updating is mirrored across sub-networks. They have been employed to implement a sort of similarity learning, since they are able to compare feature vectors describing two or more patterns, obtained with ad hoc networks (i.e., RNNs for sequences, GNNs for graphs, or CNNs for images). Siamese networks are particularly useful, for example, to search in medical image databases [21].
The rest of this survey is organized as follows. In Section 2, we present the commonly used types of structured data. In Section 3, we introduce the DL architectures used to address the various biomedical problems described in the following Section 4. Finally, Section 5 draws some conclusions and introduces open challenges for future work.
With the awareness that an exhaustive survey on the proposed topic would require the writing of numerous books, the arguments addressed in this paper are particularly focused on the research activity of the Bio-SAILab of the University of Siena, of which both authors are members.
Structured data
Structured data have a hybrid nature, both symbolic and sub-symbolic, and cannot be represented independently from the links between some basic constitutive entities. Symbolic information (collected in feature vectors) is used to label each base entity (a node). Instead, the sub-symbolic information is carried by edges between nodes, which represent relationships, symmetrical or not, such as inclusion, adjacency, presence of a chemical bond, etc. Edges can also be labeled, in order to characterize the relationship they describe. Both entities and their relationships can be homogeneous throughout the structure or not.
In a way that may seem counter-intuitive, we begin describing the most complex structured data, that is by giving the definition of a graph. In fact, the other types of structured data, which we will examine, all represent particular cases of graphs, just as the networks that are normally used to process them (recurrent and recursive networks) can be considered special cases of the GNN model. Images, as a type of complex aggregated data, will be described separately at the end of this section.
Graphs, trees, and sequences
A
Digital images
A digital image is a matrix of pixel values or, in other words, a regular grid of pixels. The size of the pixel is equal to the spatial resolution of the image and depends upon the instrument providing the data. Similarly, the number of elements in the vector describing each pixel is determined by the ability of the equipment to distinguish variations (in colors, texture, etc.). An image can therefore be considered as a regular graph (a lattice), where each pixel represents a node and each edge stands for a vicinity relation. DL techniques for image processing, however, do not take into account this structured interpretation of images, but instead work on pixel matrices, implementing specific layer to realize classical algebraic operations like upsampling or downsampling.
Deep learning architectures
In this section, we briefly introduce the neural network models used for the biological and biomedical applications described in Section 4. A particular attention will be devoted to Graph Neural Networks (GNNs), for which also the layered [23] and composite [24] versions will be sketched.
Recurrent neural networks
Network architectures able to process plain data, collected within arrays, are said to be static; they just define a mapping between the sets of input and output values. In fact, once the network has been trained, it computes a function between inputs and outputs, calculated according to the learning set, where the output at time
The simplest dynamic data type is a sequence, which represents one of the most natural ways to model temporal/sequential domains. In speech recognition, for instance, the words naturally flow to constitute a temporal sequence of acoustic features while, in molecular biology, proteins are organized in amino acid strings. The simplest dynamic architectures are recurrent networks, able to model temporal/sequential phenomena. Indeed, recurrent networks are able to capture the temporal structure of the input and to produce a timeline output, thanks to neuron activations that can change even in presence of the same input pattern. Architectures composed by units having feedback connections, both between neurons belonging to the same layer or to different layers, show such a dynamic behaviour. More formally, a network is said to be recurrent if it contains some neurons whose activations depend directly or indirectly from their outputs. In other words, following the signal transmission through the network, cyclic paths exist that connect one or more neurons with itself/themselves.
A
(a particular instance of the state update equation for GNNs, see Eq. (2)), that appropriately encodes all the past information injected into its inputs together with the input at time
When training a deep neural network – as the unfolded network – with a gradient-based learning method like BackPropagation (BP), the partial derivatives are calculated by traversing the network from the final layer to the initial layer; using the chain rule, the deeper layers in the network go through continuous matrix multiplications to calculate their derivatives. If the derivatives are large, then the gradient will increase exponentially during BP, eventually exploding. Conversely, if the derivatives are small, then the gradient will decrease exponentially, possibly vanishing. In the case of exploding gradients, the accumulation of large derivatives results in the model being very unstable and incapable of effective learning, while the accumulation of small gradients results in a model that is incapable of extracting meaningful information from data, since the weights and biases of the initial layers, which tends to learn the core features from the inputs, will not be updated effectively. Anyway, long-term dependencies are difficult to be learned due to the very deep architecture they correspond to.
the input gate controls the extent to which a new value flows into the cell; the forget gate controls the extent to which a value remains in the cell; the output gate controls the extent to which the value in the cell is used to compute the output of the specific LSTM unit.
In this way, LSTMs learn when it is necessary to retain a state or to forget it. They have many more internal parameters, which must be learned and constantly updated as new data arrives, which is their strength and weakness, as they are much more flexible than ordinary RNNs, but also much more expensive to be trained.
In this section, we will briefly describe the original GNN model, presented in [14].
being
Given the state transition function and an input graph
where
Equation (3) is a system of non-linear equations in the variables
For each node
Since the state calculation proceeds for
Different problems on graphs solvable by GNNs.
The learning process described before still holds. The only difference, in the heterogeneous setting, consists in the fact that the GNN network, and consequently the unfolding network, are composed of
Computational issues have been reported in training dynamical networks to perform tasks in which spatio-temporal contingencies present in the input structures span long intervals. Indeed, as we have observed in the case of RNNs, gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. In other words, there is a trade-off between efficient learning by gradient descent and latching of information for long “periods”. In GNNs, the long-term dependency problem is observed when the output on a node depends on far nodes (i.e. neurons connected by long paths).
To solve this computational issue, both standard and composite GNNs can be cascaded, obtaining a
Finally, independently of the type of GNN – standard, composite or layered – both inductive and transductive learning can naturally be exploited [27]. In the inductive learning framework, a parametric model
Example of a simple CNN architecture made of a series of convolutional and max-pooling layers, leading to the last dense layer.
CNNs are mainly used in computer vision for various types of image analysis tasks, like image segmentation, object detection and recognition, etc. They are made up of a hierarchy of levels, as depicted in Fig. 2:
an input layer, which acts as a buffer for the image pixels (or for any input information); some intermediate layers, which have local connections and shared weights, and are mainly a combination of convolutional and pooling layers; one or two terminal, fully-connected layers.
More in details: the input layer consists of a set of neurons responsible for passing the data representing the pixels of the input image. In the case of a colored image of
Local and shared connections imply that neurons process in the same way different portions of the image, producing a biological-like behavior, since different regions of the field of view contain the same type of information (edges, borders, portions of objects, etc.).
the generator, which is devoted to the generation of new plausible synthetic examples, based on the input data; the discriminator, which aims at detecting whether the examples received actually belong to the input domain (real data) or come from the output of the generator (fake data).
In other words, GANs are based on a game theory scenario in which the generator must compete against an opponent. The generator directly produces samples. Its opponent, the discriminator, tries to distinguish between samples taken from the training data and synthetically generated. In practice, both the generator and the discriminator are convolutional neural networks where the output of the former is directly connected to the input of the second. More specifically, the generator generates synthetic samples given a random noise (sampled from a latent space) while the discriminator is simply a binary classifier that discriminates between whether the input sample is real or fake. Through error backpropagation, the classification carried out by the discriminator provides the generator with the information necessary to update its weights. Once the generator training process is completed, the discriminator is discarded, since the GAN has learnt its task, i.e. has acquired the capability to generate realistic, synthetic data.
As a consequence of the fact that the GAN training actually corresponds to the training of two distinct models, it proceeds alternately: the generator is kept frozen during the training of the discriminator, so that the latter can understand what the defects of the generator are; conversely, during the generator training, it is the discriminator that is kept constant, otherwise it would practically try to hit a moving target and therefore would never converge. It is this back and forth procedure that, allowing the GAN to separate the training of its two components, guarantees to get a model capable of effectively addressing generative problems, otherwise intractable. However, as the generator improves performance, the performance of the discriminator deteriorates, since it is no longer able to correctly distinguish fake data from real ones and, in the case of a perfect generator, the discriminator would have a 50% accuracy (assuming that half of the examples come from the generator). This progression poses a problem for the convergence of the GAN training: the discriminator feedback becomes less significant over time, given that beyond a certain threshold it will decide intrinsically more and more random. In this case, if the generator training is not interrupted, it will receive junk feedback and the quality of the samples generated will get worse.
Even if Siamese networks can be used for any type of inputs [29], they are most commonly applied to image processing tasks, and they were shown to be particularly useful for image retrieval, verification, and few-shot learning [30, 31, 32, 33]. More precisely, let us consider a Siamese network able to compare pairs of input images. In this framework, the architecture of a Siamese network is constituted by a two-branch convolutional neural network, which is used on both the input images in order to extract their features, and by a distance function, which measures their similarity. In order to evaluate the similarity between two images
The contrastive loss presumes the availability of a supervised similarity label
where
DL techniques in biomedical applications
In this section, we will present an overview of DL techniques used in various contexts of biomedical data analysis. In detail, in Section 4.1, we will show some applications of recurrent neural networks and, particularly, LSTMs to sequence analysis. In Section 4.2, we will introduce GNN applications to biological data, with special reference to molecule analysis and drug discovery. Finally, in Section 4.3, we will draw some examples and briefly sketch applications of CNNs to image processing and biomedical data analysis for diagnostic purposes. A graphical summary of this section is proposed in Fig. 3.
Structured data processing with deep neural networks.
The application of recurrent architectures has been proved to be particularly successful in many different biomedical fields, from heart-related ECG analysis [35] to brain signals, as for instance fMRI signals [36, 37].
In this section, we will provide several examples concerning the application of DL techniques for the analysis of sequences of various types: from ECG to protein and genomic data. Indeed, RNNs have been profitably employed for the analysis of cardiac pathologies, for instance, for the detection of cardiac arrhythmia [35], as well as for heart failure prediction in primary care patients [38]. Another example of the application of RNNs to the ECG time series analysis is reported in [39], where the Echo State Network (ESN) is employed for the automatic identification of patterns related to the Brugada Syndrome, which is a rare cardiac disease, whose diagnosis – through the analysis of the ECG – is particularly difficult. The ESN recurrent architecture can actually offer an efficient clinical tool for the early detection of such pathology and represents a valid support for cardiologists.
If, in the context of heart-related diseases, applications of RNNs have proved to be successful, the same can be said in a different biomedical context, i.e. for protein sequence analysis. In fact, several works are present in the literature on this topic [40, 41]. For example, in [42], a new deep learning framework is provided, denominated DeepPPISP, in which contextual and local features are combined for protein-protein interaction site predictions. Moreover, in [43], a comprehensive comparison of different machine learning models, including LSTMs, is realized for the prediction of biological signals characterizing the formation of
Other interesting applications of RNNs to biological sequences come from their use for the analysis of protein sequences. For instance, in [45], a new computational environment is provided for modelling proteins and functions of non-coding DNA sequences. Similarly in [46], a siamese network, composed of a pair of identical (weight-sharing) LSTMs, was proposed to realize a new similarity score between protein sequences, able to resemble the BLAST score. In particular, this work focuses on the comparison between circulating common cold coronaviruses and SARS-CoV-2, and was able to prove how a preexisting immune memory due to exposure to common cold HCoVs has a significant impact on the COVID-19 disease severity, thus suggesting the fundamental role of the protein sequence similarities with different circulating coronaviruses to understand SARS-CoV-2 cross-reactivity [46]. In particular, it was found that the spike protein bring the largest cross-reactivity potential, as well as other proteins which figure on the surface proteins, proportionally to their importance in the immune response and memory. Also structural proteins bear a potential of cross-reactivity, yet this is limited and has therefore limited predictive power, as highlighted using an attention mechanism inside the LSTM. Moreover, the SARS-CoV-2 proteome had been also investigated from the point of view of protein-protein interactions in [47]. The focus of this latter work was finding a mechanism for disrupting the spike trimerization, therefore hampering the formation of the virus’ most powerful weapon for penetrating human cells.
Graph processing via graph neural networks
Graphs emerge naturally in several biological contexts, from protein interaction networks to biomedical images, from metabolic networks to disease interactions. In other words, different important applications can be performed based on a graph-representation of data, where the topological information can be exploited under different analysis perspectives, namely for node, edge, or graph-focused classification or regression problems, for link prediction, in a generative framework, etc. (see Fig. 1). Therefore, the capabilities of GNNs in the biomedical field are huge and lead to a vast amount of applications in very different domains [48].
As a generative framework, GNNs can be used for drug discovery and repurposing. The design of new drugs is, in fact, a time consuming process, made up of several steps, from drug target determination to lead compound discovery and optimization and to pre-clinical assessment [11]. The process turns out to be extremely expensive and therefore the use of automatic methodologies for performing drug development becomes a crucial step. In particular, in recent years, the use of machine learning models has become fundamental to speed up the process and help both in the design and the analysis of new proposed drugs. While several machine learning models have been devised for drug development and repurposing [49, 50, 51, 52], many research directions remain unexplored from the point of view of applying DL architectures to solve the relevant problems, which is particularly true for graph neural networks. This phenomenon is mainly due to the lack of sufficiently large datasets, as well as to the complexity of drug interaction data, which must be considered in this context.
Apart from the limitations due to the scarcity of appropriately labeled data, GNNs can provide a valid tool for the generation of molecules as well as to predict drug side effects. In this context, several studies can be found in the literature, such as [53], for de novo drug design, or [54], for molecular property explanation. In [16], a sequential graph generator for molecular structures is proposed, named MG2N2 (Molecular Generative Graph Neural Networks). Each node in the molecular graph corresponds to an atom, and each edge to a chemical bond. The MG2N2 algorithm is based on an iterative process in which, at each iteration, a node is added to the molecular structure. In order to do so, the model is composed of three GNN modules: the first generates new atom nodes, the second connects each new node to the atom it was generated as a nieghbour of, and the third module generates the (optional) edges connecting the new node to the rest of the graph. The training procedure of each module is independent from the others, a characteristic which guarantees a faster training, and an easier retraining in case one of the modules should be upgraded, also enhancing the model’s interpretability, as suggested for this kind of model [55]. As hinted in the description of the second module, the molecular graphs are generated through an expansion process, in which the GNNs focus on one atom after the other, generating all the neighbors of that atom before moving to the following one. Atoms are expanded following a first-in-first-out queue, in which the first generated atoms are expanded first. This guarantees that the generated graph will not have disconnected components. Since the three modules are basically thought as classifiers, to avoid mode collapse and give a stochastic behaviour suitable for the generative purpose to the GNNs, each module is equipped with a Gumbel softmax output layer [56], instead than a regular softmax layer. The main advantage of MG2N2 with respect to other sequential generators based on RNNs, reinforcement learning, or GAN-like mechanisms, is represented by the smaller information loss thanks to the capability of GNNs of natively processing graphs, while the other methods need to simplify the data representation using SMILES and other types of vectorial or sequential representations of the graph. Moreover, generating the graph step by step allows to retain a more interpretable, error-aware, and easier to train mechanism with respect to SMILES-based VAE generators and even graph-based VAE generators, that generate all the molecules in one shot by sampling from their latent space [16].
Even drug side effects have a high impact on health system costs and drug discovery processes. Predicting their probability before their occurrence is fundamental to reduce this impact. Indeed, candidate molecules could be screened before undergoing clinical trials, reducing the costs in terms of time, money, and health of the participants. Drug side effects are triggered by complex biological processes involving many different entities, from drug structures to protein-protein interactions. In [22], GNNs are applied for this task. Specifically, heterogeneous data sources were used and integrated in a unique graph, conveying information on drug structures and chemical features, drug-gene and gene-gene interactions, and drug-drug similarities. In this way, the relational context established among drugs and genes, on which they produce effects, together with the interactions existing among genes, are taken into account and exploited to predict drug side effects. The network makes use of this sort of knowledge graph to mine the information relevant for each prediction, creating a model of great usability that can predict drug side effects of newly submitted drugs without retraining and with a small amount of new information. The two models proposed in [16, 22] could even be combined, with MG2N2 generating molecular graphs of possible drug candidates in large quantities, which can be subsequently screened to filter out all the compounds with high probabilities of occurrence of dangerous side effects or that simply produce too many side effects.
Similarly, further works in this scope were developed with the aim of studying drug-target interactions [57], as well as performing drug-drug interaction prediction [58].
Several other applications of GNNs in the context of biomedical data concern the analysis of metabolic and genomic networks [59, 60, 61] and the prediction of disease-disease and protein-protein interactions. For example, in [62], GNNs are employed for the prediction of Protein-Protein Interfaces (PPI). In particular, the study was focused on the binding site identification, allowing to determine the functionality and the quaternary structure of protein-protein complexes. Interacting peptides were represented as graphs, in which each secondary structure corresponds to a node and edges model the physico-chemical bonds between secondary structures. A correspondence graph can be built, describing their interaction, in which secondary structures that show correspondence are linked together. As it was proved in [63], finding the maximum clique in the correspondence graph allows to identify the secondary structure elements belonging to the interface site. Although the maximum clique problem is NP-complete, GNNs represent a soft-computing tool able to solve the problem in an affordable time. The GNN can be trained on a relatively small number of examples labeled with the Bron-Kerbosch algorithm [64], learning to replicate the algorithm solution in a fraction of the time employed by the traditional implementation. It can then be exploited to predict new interfaces in a short time and with high accuracy.
Also, the power of GNNs in community analysis was exploited to build the proof of concept of a mechanism to create a community of caregivers of rare disease patients. The implementation of a smartphone app to connect caregivers with each other would be beneficial to them as being a caregiver is often a challenge from many point of views, and sharing experiences and sensations could improve their capability of facing such challenges [65].
Medical image processing via convolutional networks
Recently, DL models have had a huge impact in computer vision applications: from image semantic segmentation to object detection, most of the computer vision tasks reached state-of-the-art performance with the use of Deep Learning. In this section, we will describe some applications of the models presented in Section 3, showing examples of standard CNNs, GANs and Siamese architectures applied to medical image processing.
A way to support clinicians in the decision-making process is to provide a system for retrieving cases that are similar to the examined one. In this way, doctors can compare cases and directly assess the similarity with past exams. The exploitation of this comparison is particularly useful to calibrate diagnoses and treatments, moving toward a precision medicine approach. Such a framework can be implemented by a Content-Based Image Retrieval system (CBIR), capable of retrieving the most similar images to a query one. In [66], a novel supervised Siamese-based architecture was proposed, which is able to treat multi-modal and multi-view MR images, and retrieve similar lesions in the case of prostate images. Similarly, in [67], a Siamese network was devised for prostate cancer classification.
CNNs were also exploited for eye-tracking during trail making tests aimed at diagnosing particular pathological conditions, like extrapyramidal syndromes and chronic pain [68].
A different task in which DL models have proved to reach state-of-the-art performance is image generation and, in particular, medical image processing has highly benefited from the use of DL architectures, as for tomographic image analysis [69] or PET image reconstruction [70]. The need for medical image generation directly comes from the commonly small dimension of image datasets, which contain not enough data to be used for training a deep network. Among the most notable CNN-based architectures for image generation we can find autoencoders1
A variational autoencoder (VAE) is a type of deep network that learns to reproduce its input – actually, it represents a self-supervised model –, and also map data to a latent (hidden) space, in which the information is compressed based on a maximum-conservation principle.
In [71], an autoencoder network was proposed for 3D brain MRI reconstruction. Here, the autoencoder is able to reproduce high-quality 3D images, as well as to retain meaningful information in the most hidden latent embedding dimension. In the context of brain MRI analysis, in fact, one of the main challenges is represented by the need of treating high-dimensional images. A way to address this significant computational burden is to consider slices of the 3D images to treat brain MRI scans. An example of this approach can be found in [72], where 2D slices of brain NMR scans were used to predict the patients’ biological age.
GANs also proved to be suitable for medical image generation. Urinary tract infections (UTIs) are considered to be the most common bacterial infection and, actually, it is estimated that about 150 million UTIs occur yearly on a world wide basis. To automatically analyze Petri plates coming from urine culture, in [73], a two stage computational workflow for image segmentation was presented. Indeed, the original dataset was augmented using a GAN, to be later segmented with the use of a CNN-based architecture. Finally, a standard MLP can be used to classify the type of infection and its severity [74]. Similarly, in [75], GANs were used to produce high-quality retinal images, together with the corresponding semantic label maps, to estimate the retinal vessel tortuosity. The two-step approach is based on a first phase in which a progressive growing GAN (PGGAN [76]) is trained to produce the semantic label maps. These maps, in fact, describe the blood vessel structures, and are needed to detect possible retinal or circulatory diseases. Subsequently, an image-to-image translation approach was used to obtain retinal images generated by the means of first sketching the vessel network. In this way, the generation process is simplified and requires less computational effort. Finally, in [77], GANs are employed in a multi-stage fashion, requiring a smaller amount of data, for multi-organ chest X-ray image generation. Analogously to the approach in [73], the generation process is followed by the organ segmentation step and then by the chest X-ray image reconstruction.
Even in the case of image segmentation, CNN-based architectures have proved to be particularly performing for medical applications such as, for example, in histopathologic image analysis (of kidneys, liver, lung, breast and other biological tissues [78, 79, 80]). In particular, in [81, 82], a DeepLab based architecture [83] for the construction of an automatic tool for kidney glomeruli segmentation was proposed. Such tool can provide an important support for clinicians to count the number of glomeruli present in a renal section and understand how many are sclerotised or not.
Different types of images – natural and coming from various instruments – can be subject to segmentation in medical image processing. For example, in [84], a CNN-based approach is used to segment aorta CT scans, in order to early discover alterations in its morphology which may portend an aortic dissection, i.e. the rupture of the innermost layer of the aorta which allows blood to flow between the layers of the aortic wall, forcing the layers to separate. On the other hand, in [85] a weakly supervised approach was implemented to realize skin lesion segmentation and identify possible harmful melanomas from benign nevi. This research topic was also further exploited, investigating a multi-modal approach, i.e. fusing anamnestic patient information with skin lesion images. Indeed, in [86], a DL tool for the early diagnosis of skin lesions was developed. The input data consisted in images and in demographic features of the patients (including age and gender), together with the position of the lesion. The DL classifier can efficiently discern between benign and malignant lesions, allowing clinicians to be supported in their diagnosis, possibly avoinding surgery. Finally, in [87], CNNs were used for segmenting oocyte images in order to support medical specialists in improving medical assisted procreation.
However, CNNs are also used for applications not strictly related to images, possibly adjusting the architecture to suit the particular type of data. As an example of a convolutional network application to RNA sequences, in [88], a 1-D CNN was used for the analysis of ribosome profiling data (Ribo-seq). In particular high-reproducible Ribo-seq profiles were analysed, to produce a prediction on the translation speed associated with the sequence. This is possible because the Ribo-seq profile is a measurement of the quantity of fragments found inside ribosomes for each part of a sequence. Fragments which are frequently found inside ribosomes have been demonstrated to have slower translation (the ribosome spends more time on it) while fragments with low frequency have high translation speed. E.Coli Ribo-seq profiles were collected from various sources, though the consensus among these sources is high only on 40 of the various thousand genes of E.Coli. These 40 sequences were used as reference to train and test a neural network predictor of the translation speed of subsequences. Another example of cross-field application of CNNs is using them for analysing molecular graphs from the QM9 dataset: as it was recently proved in [89], in some particular cases, graphs of very small size can be translated to images and successfully processed with image-oriented CNNs. Indeed, GNNs are still recommended for the vast majority of graph-based tasks, as suggested by the fact that they are universal approximators on graphs [90].
Deep learning is one of the fastest growing fields of research and has had a significant impact on different types of bioinformatics applications. The analysis of biomedical data in fact poses a wide variety of problems that can be effectively addressed by building decision support systems, capable of providing substantial help to human experts, especially in the most routine tasks. We are conscious that, in this paper, we have offered only a partial view of the research field, mainly proposing a survey of the works carried out within the Bio-SAILab of the University of Siena, of which the two authors are members, but trying to show how our research is well supported by the general and widespread interest that both DL techniques and biomedical applications receive in the research community. Challenges and future perspective for this field are several. The availability of new data will support the development of new deep learning techniques and the improvement of existing ones. Furthermore, future methods cannot be separated from being trustworthy and explainable [91], a need that is very much felt in all DL applications, but which is particularly sensitive in the case of biomedical applications where a wrong choice, or simply not very reliable or understandable, can have a significant impact on the health of individuals.
