1. Abstract: The adversarial risk of a machine learning model has been widely studied. Most previous works assume that the data lies in the whole ambient space. We propose to take a new angle and take the manifold assumption into consideration. Assuming data lies in a manifold, we investigate two new types of adversarial risk, the normal adversarial risk due to perturbation along normal direction, and the in-manifold adversarial risk due to perturbation within the manifold. We prove that the classic adversarial risk can be bounded from both sides using the normal and in-manifold adversarial risks. We also show with a surprisingly pessimistic case that the standard adversarial risk can be nonzero even when both normal and in-manifold risks are zero. We finalize the paper with empirical studies supporting our theoretical results. Our results suggest the possibility of improving the robustness of a classifier by only focusing on the normal adversarial risk.

    Zhang, Wenjia, Yikai Zhang, Xiaoling Hu, Mayank Goswami, Chao Chen, and Dimitris N. Metaxas. 28--30 Mar 2022. “A Manifold View of Adversarial Risk.” In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, edited by Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, 151:11598–614. Proceedings of Machine Learning Research. PMLR.

  2. Abstract: Deep-learning-based clinical decision support using structured electronic health records (EHR) has been an active research area for predicting risks of mortality and diseases. Meanwhile, large amounts of narrative clinical notes provide complementary information, but are often not integrated into predictive models. In this paper, we provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality. To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes and discover the critical structured EHR features with Shapley values. These important words and clinical features are visualized to assist with interpretation of the prediction outcomes. We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT, which is used to learn the representations of clinical notes. Experiments demonstrated that our model outperforms other methods (AUCPR: 0.538, AUCROC: 0.877, F1:0.490).
    Lyu, Weimin, Xinyu Dong, Rachel Wong, Songzhu Zheng, Kayley Abell-Hart, Fusheng Wang, and Chao Chen. 2022. “A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction.” AMIA ... Annual Symposium Proceedings / AMIA Symposium. AMIA Symposium 2022: 719–28.

  3. Abstract: Opioid overdose (OD) has become a leading cause of accidental death in the United States, and overdose deaths reached a record high during the COVID-19 pandemic. Combating the opioid crisis requires targeting high-need populations by identifying individuals at risk of OD. While deep learning emerges as a powerful method for building predictive models using large scale electronic health records (EHR), it is challenged by the complex intrinsic relationships among EHR data. Further, its utility is limited by the lack of clinically meaningful explainability, which is necessary for making informed clinical or policy decisions using such models. In this paper, we present LIGHTED, an integrated deep learning model combining long short term memory (LSTM) and graph neural networks (GNN) to predict patients' OD risk. The LIGHTED model can incorporate the temporal effects of disease progression and the knowledge learned from interactions among clinical features. We evaluated the model using Cerner's Health Facts database with over 5 million patients. Our experiments demonstrated that the model outperforms traditional machine learning methods and other deep learning models. We also proposed a novel interpretability method by exploiting embeddings provided by GNNs to cluster patients and EHR features respectively, and conducted qualitative feature cluster analysis for clinical interpretations. Our study shows that LIGHTED can take advantage of longitudinal EHR data and the intrinsic graph structure of EHRs among patients to provide effective and interpretable OD risk predictions that may potentially improve clinical decision support.
    Dong, Xinyu, Rachel Wong, Weimin Lyu, Kayley Abell-Hart, Jianyuan Deng, Yinan Liu, Janos G. Hajagos, Richard N. Rosenthal, Chao Chen, and Fusheng Wang. 2023. “An Integrated LSTM-HeteroRGNN Model for Interpretable Opioid Overdose Risk Prediction.” Artificial Intelligence in Medicine 135 (January): 102439.

  4. Abstract: Extracting rich phenotype information, such as cell density and arrangement, from whole slide histology images (WSIs), requires analysis of large field of view, i.e. more contextual information. This can be achieved through analyzing the digital slides at lower resolution. A potential drawback is missing out on details present at a higher resolution. To jointly leverage complementary information from multiple resolutions, we present a novel transformer based Pyramidal Context-Detail Network (CD-Net). CD-Net exploits the WSI pyramidal structure through co-training of proposed Context and Detail Modules, which operate on inputs from multiple resolutions. The residual connections between the modules enable the joint training paradigm while learning self-supervised representation for WSIs. The efficacy of CD-Net is demonstrated in classifying Lung Adenocarcinoma from Squamous cell carcinoma.
    Kapse, Saarthak, Srijan Das, and Prateek Prasanna. 2022. “CD-Net: Histopathology Representation Learning Using Pyramidal Context-Detail Network.” arXiv [cs.CV]. arXiv.

  5. Abstract: We propose a few shot learning approach for the problem of hematopoietic cell classification in digital pathology. In hematopoiesis cell classification, the classes correspond to the different stages of the cellular maturation process. Two consecutive stage categories are considered to have a neighborhood relationship, which implies a visual similarity between the two categories. We propose RelationVAE which incorporates these relationships between hematopoietic cell classes to robustly generate more data for the classes with limited training data. Specifically, we first model these relationships using a graphical model, and propose RelationVAE, a deep generative model which implements the graphical model. RelationVAE is trained to optimize the lower bound of the pairwise data likelihood of the graphical model. In this way, it can identify class level features of a specific class from a small number of input images together with the knowledge transferred from visually similar classes, leading to more robust sample synthesis. The experiments on our collected hematopoietic dataset show the improved results of our proposed RelationVAE over a baseline VAE model and other few shot learning methods. Our code and data are available at
    Nguyen, Vu, Prantik Howlader, Le Hou, Dimitris Samaras, Rajarsi R. Gupta, and Joel Saltz. 2023. “Few Shot Hematopoietic Cell Classification.”

  6. Abstract: We present GazeRadar, a novel radiomics and eye gaze-guided deep learning architecture for disease localization in chest radiographs. GazeRadar combines the representation of radiologists’ visual search patterns with corresponding radiomic signatures into an integrated radiomics-visual attention representation for downstream disease localization and classification tasks. Radiologists generally tend to focus on fine-grained disease features, while radiomics features provide high-level textural information. Our framework first ‘fuses’ radiomics features with visual features inside a teacher block. The visual features are learned through a teacher-focal block, while the radiomics features are learned through a teacher-global block. A novel Radiomics- Visual Attention loss is proposed to transfer knowledge from this joint radiomics-visual attention representation of the teacher network to the student network. We show that GazeRadar outperforms baseline approaches for disease localization and classification tasks on 4 large scale chest radiograph datasets comprising multiple diseases. Code:
    Bhattacharya, Moinak, Shubham Jain, and Prateek Prasanna. 2022a. “GazeRadar: A Gaze and Radiomics-Guided Disease Localization Framework.” In Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, 686–96. Springer Nature Switzerland.

  7. Abstract: Histopathology whole slide images (WSIs) play a very important role in clinical studies and serve as the gold standard for many cancer diagnoses. However, generating automatic tools for processing WSIs is challenging due to their enormous sizes. Currently, to deal with this issue, conventional methods rely on a multiple instance learning (MIL) strategy to process a WSI at patch level. Although effective, such methods are computationally expensive, because tiling a WSI into patches takes time and does not explore the spatial relations between these tiles. To tackle these limitations, we propose a locally supervised learning framework which processes the entire slide by exploring the entire local and global information that it contains. This framework divides a pre-trained network into several modules and optimizes each module locally using an auxiliary model. We also introduce a random feature reconstruction unit (RFR) to preserve distinguishing features during training and improve the performance of our method by 1%1% to 3%3%. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and LKS, highlight the superiority of our method on different classification tasks. Our method outperforms the state-of-the-art MIL methods by 2%2% to 5%5% in accuracy, while being 7 to 10 times faster. Additionally, when dividing it into eight modules, our method requires as little as 20% of the total gpu memory required by end-to-end training. Our code is available at
    Zhang, Jingwei, Xin Zhang, Ke Ma, Rajarsi Gupta, Joel Saltz, Maria Vakalopoulou, and Dimitris Samaras. 2022. “Gigapixel Whole-Slide Images Classification Using Locally Supervised Learning.” In Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, 192–201. Springer Nature Switzerland.

  8. Abstract: Accurate delineation of fine-scale structures is a very important yet challenging problem. Existing methods use topological information as an additional training loss, but are ultimately making pixel-wise predictions. In this paper, we propose the first deep learning-based method to learn topological/structural representations. We use discrete Morse theory and persistent homology to construct an one-parameter family of structures as the topological/structural representation space. Furthermore, we learn a probabilistic model that can perform inference tasks in such a topological/structural representation space. Our method generates true structures rather than pixel-maps, leading to better topological integrity in automatic segmentation tasks. It also facilitates semi-automatic interactive annotation/proofreading via the sampling of structures and structure-aware uncertainty.
    Hu, Xiaoling, Dimitris Samaras, and Chao Chen. 2022. “Learning Probabilistic Topological Representations Using Discrete Morse Theory.” arXiv [eess.IV]. arXiv.

  9. Abstract: Noisy labels can significantly affect the performance of deep neural networks (DNNs). In medical image segmentation tasks, annotations are error-prone due to the high demand in annotation time and in the annotators' expertise. Existing methods mostly tackle label noise in classification tasks. Their independent-noise assumptions do not fit label noise in segmentation task. In this paper, we propose a novel noise model for segmentation problems that encodes spatial correlation and bias, which are prominent in segmentation annotations. Further, to mitigate such label noise, we propose a label correction method to recover true label progressively. We provide theoretical guarantees of the correctness of the proposed method. Experiments show that our approach outperforms current state-of-the-art methods on both synthetic and real-world noisy annotations.
    Yao, Jiachen, Yikai Zhang, Songzhu Zheng, Mayank Goswami, Prateek Prasanna, and Chao Chen. 2023. “Learning to Segment from Noisy Annotations: A Spatial Correction Approach.”

  10. Abstract: Deep learning methods have achieved impressive performance for multi- class medical image segmentation. However, they are limited in their ability to encode topological interactions among different classes (e.g., containment and exclusion). These constraints naturally arise in biomedical images and can be crucial in improving segmentation quality. In this paper, we introduce a novel topological interaction module to encode the topological interactions into a deep neural network. The implementation is completely convolution-based and thus can be very efficient. This empowers us to incorporate the constraints into end-to-end training and enrich the feature representation of neural networks. The efficacy of the proposed method is validated on different types of interactions. We also demonstrate the generalizability of the method on both proprietary and public challenge datasets, in both 2D and 3D settings, as well as across different modalities such as CT and Ultrasound. Code is available at:
    Gupta, Saumya, Xiaoling Hu, James Kaan, Michael Jin, Mutshipay Mpoy, Katherine Chung, Gagandeep Singh, et al. 2022. “Learning Topological Interactions for Multi-Class Medical Image Segmentation.” arXiv [cs.CV]. arXiv.

  11. Abstract: Topological features based on persistent homology capture high-order structural information so as to augment graph neural network methods. However, computing extended persistent homology summaries remains slow for large and dense graphs and can be a serious bottleneck for the learning pipeline. Inspired by recent success in neural algorithmic reasoning, we propose a novel graph neural network to estimate extended persistence diagrams (EPDs) on graphs efficiently. Our model is built on algorithmic insights, and benefits from better supervision and closer alignment with the EPD computation algorithm. We validate our method with convincing empirical results on approximating EPDs and downstream graph representation learning tasks. Our method is also efficient; on large and dense graphs, we accelerate the computation by nearly 100 times.

    Yan, Zuoyu, Tengfei Ma, Liangcai Gao, Zhi Tang, Yusu Wang, and Chao Chen. 2022. “Neural Approximation of Graph Topological Features.” arXiv [cs.LG]. arXiv.

  12. A new study entitled "Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma"  was published in the August issue of Cancer Cell from The Cancer Genome Atlas network. The work investigated the molecular landscape of pancreatic adenocarcinoma. BMI faculty Dr. Richard Moffitt served as the analysis coordinator for the project, which involved the work of over 200 scientists.




  13. Abstract: Dense prediction tasks such as segmentation and detection of pathological entities hold crucial clinical value in computational pathology workflows. However, obtaining dense annotations on large cohorts is usually tedious and expensive. Contrastive learning (CL) is thus often employed to leverage large volumes of unlabeled data to pre-train the backbone network. To boost CL for dense prediction, some studies have proposed variations of dense matching objectives in pre-training. However, our analysis shows that employing existing dense matching strategies on histopathology images enforces invariance among incorrect pairs of dense features and, thus, is imprecise. To address this, we propose a precise location-based matching mechanism that utilizes the overlapping information between geometric transformations to precisely match regions in two augmentations. Extensive experiments on two pretraining datasets (TCGA-BRCA, NCT-CRC-HE) and three downstream datasets (GlaS, CRAG, BCSS) highlight the superiority of our method in semantic and instance segmentation tasks. Our method outperforms previous dense matching methods by up to 7.2% in average precision for detection and 5.6% in average precision for instance segmentation tasks. Additionally, by using our matching mechanism in the three popular contrastive learning frameworks, MoCo-v2, VICRegL, and ConCL, the average precision in detection is improved by 0.7% to 5.2%, and the average precision in segmentation is improved by 0.7% to 4.0%, demonstrating generalizability. Our code is available at

    Zhang, Jingwei, Saarthak Kapse, Ke Ma, Prateek Prasanna, Maria Vakalopoulou, Joel Saltz, and Dimitris Samaras. 2022. “Precise Location Matching Improves Dense Contrastive Learning in Digital Pathology.” arXiv [cs.CV]. arXiv.

  14. Abstract: This work presents PathAttFormer, a deep learning model that predicts the visual attention of pathologists viewing whole slide images (WSIs) while evaluating cancer. This model has two main components: (1) a patch-wise attention prediction module using a Swin transformer backbone and (2) a self-attention based attention refinement module to compute pairwise-similarity between patches to predict spatially consistent attention heatmaps. We observed a high level of agreement between model predictions and actual viewing behavior, collected by capturing panning and zooming movements using a digital microscope interface. Visual attention was analyzed in the evaluation of prostate cancer and gastrointestinal neuroendocrine tumors (GI-NETs), which differ greatly in terms of diagnostic paradigms and the demands on attention. Prostate cancer involves examining WSIs stained with Hematoxylin and Eosin (H &E) to identify distinct growth patterns for Gleason grading. In contrast, GI-NETs require a multi-step approach of identifying tumor regions in H &E WSIs and grading by quantifying the number of Ki-67 positive tumor cells highlighted with immunohistochemistry (IHC) in a separate image. We collected attention data from pathologists viewing prostate cancer H &E WSIs from The Cancer Genome Atlas (TCGA) and 21 H &E WSIs of GI-NETs with corresponding Ki-67 IHC WSIs. This is the first work that utilizes the Swin transformer architecture to predict visual attention in histopathology images of GI-NETs, which is generalizable to predicting attention in the evaluation of multiple sequential images in real world diagnostic pathology and IHC applications.
    Chakraborty, Souradeep, Rajarsi Gupta, Ke Ma, Darshana Govind, Pinaki Sarder, Won-Tak Choi, Waqas Mahmud, et al. 2022. “Predicting the Visual Attention of Pathologists Evaluating Whole Slide Images of Cancer.” In Medical Optical Imaging and Virtual Microscopy Image Analysis, 11–21. Springer Nature Switzerland.

  15. Abstract: Although recent works in semi-supervised learning (SemiSL) have accomplished significant success in natural image segmentation, the task of learning discriminative representations from limited annotations has been an open problem in medical images. Contrastive Learning (CL) frameworks use the notion of similarity measure which is useful for classification problems, however, they fail to transfer these quality representations for accurate pixel-level segmentation. To this end, we propose a novel semi-supervised patch-based CL framework for medical image segmentation without using any explicit pretext task. We harness the power of both CL and SemiSL, where the pseudo-labels generated from SemiSL aid CL by providing additional guidance, whereas discriminative class information learned in CL leads to accurate multi-class segmentation. Additionally, we formulate a novel loss that synergistically encourages inter-class separability and intra-class compactness among the learned representations. A new inter-patch semantic disparity mapping using average patch entropy is employed for a guided sampling of positives and negatives in the proposed CL framework. Experimental analysis on three publicly available datasets of multiple modalities reveals the superiority of our proposed method as compared to the state-of- the-art methods. Code is available at:

    Basak, Hritam, and Zhaozheng Yin. 2023. “Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19786–97.

  16. Abstract: In this work, we present RadioTransformer, a novel student-teacher transformer framework, that leverages radiologists’ gaze patterns and models their visuo-cognitive behavior for disease diagnosis on chest radiographs. Domain experts, such as radiologists, rely on visual information for medical image interpretation. On the other hand, deep neural networks have demonstrated significant promise in similar tasks even where visual interpretation is challenging. Eye-gaze tracking has been used to capture the viewing behavior of domain experts, lending insights into the complexity of visual search. However, deep learning frameworks, even those that rely on attention mechanisms, do not leverage this rich domain information for diagnostic purposes. RadioTransformer fills this critical gap by learning from radiologists’ visual search patterns, encoded as ‘human visual attention regions’ in a cascaded global-focal transformer framework. The overall ‘global’ image characteristics and the more detailed ‘local’ features are captured by the proposed global and focal modules, respectively. We experimentally validate the efficacy of RadioTransformer on 8 datasets involving different disease classification tasks where eye-gaze data is not available during the inference phase.Code:
    2022b. “RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention–Guided Disease Classification.” In Computer Vision – ECCV 2022, 679–98. Springer Nature Switzerland.

  17. Abstract: Masked Autoencoder (MAE) has recently been shown to be effective in pre- training Vision Transformers (ViT) for natural image analysis. By reconstructing full images from partially masked inputs, a ViT encoder aggregates contextual information to infer masked image regions. We believe that this context aggregation ability is particularly essential to the medical image domain where each anatomical structure is functionally and mechanically connected to other structures and regions. Because there is no ImageNet-scale medical image dataset for pre-training, we investigate a self pre- training paradigm with MAE for medical image analysis tasks. Our method pre-trains a ViT on the training set of the target data instead of another dataset. Thus, self pre- training can benefit more scenarios where pre-training data is hard to acquire. Our experimental results show that MAE self pre-training markedly improves diverse medical image tasks including chest X-ray disease classification, abdominal CT multi- organ segmentation, and MRI brain tumor segmentation. Code is available at

    Zhou, Lei, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, and Prateek Prasanna. 2022. “Self Pre-Training with Masked Autoencoders for Medical Image Analysis.” arXiv Preprint arXiv:2203. 05573.

  18. Abstract: Clinical outcome or severity prediction from medical images has largely focused on learning representations from single-timepoint or snapshot scans. It has been shown that disease progression can be better characterized by temporal imaging. We therefore hypothesized that outcome predictions can be improved by utilizing the disease progression information from sequential images. We present a deep learning approach that leverages temporal progression information to improve clinical outcome predictions from single-timepoint images. In our method, a self-attention based Temporal Convolutional Network (TCN) is used to learn a representation that is most reflective of the disease trajectory. Meanwhile, a Vision Transformer is pretrained in a self-supervised fashion to extract features from single-timepoint images. The key contribution is to design a recalibration module that employs maximum mean discrepancy loss (MMD) to align distributions of the above two contextual representations. We train our system to predict clinical outcomes and severity grades from single-timepoint images. Experiments on chest and osteoarthritis radiography datasets demonstrate that our approach outperforms other state-of-the-art techniques.
    Konwer, Aishik, Xuan Xu, Joseph Bae, Chao Chen, and Prateek Prasanna. 2022. “Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18824–35.

  19. Abstract: Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding -> token completion -> dense decoding (SCD) pipeline. We first empirically show that naively applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput and inference up to 60.6% higher throughput) while maintaining segmentation quality.

    —. 2023. “Token Sparsification for Faster Medical Image Segmentation.” arXiv [cs.CV]. arXiv.

  20. Abstract: In digital pathology, the spatial context of cells is important for cell classification, cancer diagnosis and prognosis. To model such complex cell context, however, is challenging. Cells form different mixtures, lineages, clusters and holes. To model such structural patterns in a learnable fashion, we introduce several mathematical tools from spatial statistics and topological data analysis. We incorporate such structural descriptors into a deep generative model as both conditional inputs and a differentiable loss. This way, we are able to generate high quality multi-class cell layouts for the first time. We show that the topology-rich cell layouts can be used for data augmentation and improve the performance of downstream tasks such as cell classification. 

    Abousamra, Shahira, Rajarsi Gupta, Tahsin Kurc, Dimitris Samaras, Joel Saltz, and Chao Chen. 2023. “Topology-Guided Multi-Class Cell Context Generation for Digital Pathology.” arXiv [eess.IV]. arXiv. Guided_Multi Class_Cell_Context_Generation_for_Digital_Pathology_CVPR_2023_paper.html

  21. Abstract: We study the attention of pathologists as they examine whole-slide images (WSIs) of prostate cancer tissue using a digital microscope. To the best of our knowledge, our study is the first to report in detail how pathologists navigate WSIs of prostate cancer as they accumulate information for their diagnoses. We collected slide navigation data (i.e., viewport location, magnification level, and time) from 13 pathologists in 2 groups (5 genitourinary (GU) specialists and 8 general pathologists) and generated visual attention heatmaps and scanpaths. Each pathologist examined five WSIs from the TCGA PRAD dataset, which were selected by a GU pathology specialist. We examined and analyzed the distributions of visual attention for each group of pathologists after each WSI was examined. To quantify the relationship between a pathologist’s attention and evidence for cancer in the WSI, we obtained tumor annotations from a genitourinary specialist. We used these annotations to compute the overlap between the distribution of visual attention and annotated tumor region to identify strong correlations. Motivated by this analysis, we trained a deep learning model to predict visual attention on unseen WSIs. We find that the attention heatmaps predicted by our model correlate quite well with the ground truth attention heatmap and tumor annotations on a test set of 17 WSIs by using various spatial and temporal evaluation metrics.
    Chakraborty, Souradeep, Ke Ma, Rajarsi Gupta, Beatrice Knudsen, Gregory J. Zelinsky, Joel H. Saltz, and Dimitris Samaras. 2022. “Visual Attention Analysis Of Pathologists Examining Whole Slide Images Of Prostate Cancer.” In 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), 1–5.