Publications

2025

Out of Distribution Detection in Gastrointestinal Vision by Estimating Nearest Centroid Distance Deficit

Sandesh Pokhrel, Sanjay Bhandari, Sharib Ali, Tryphon Lambrou, Anh Nguyen, Yash Raj Shrestha, Angus Watson, Danail Stoyanov, Prashnna Gyawali, Binod Bhattarai
Medical Image Understanding and Analysis (MIUA) 2025
Peer Reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@misc{pokhrel2024ncddnearestcentroiddistance,
title={NCDD: Nearest Centroid Distance Deficit for Out-Of-Distribution Detection in
Gastrointestinal Vision},
author={Sandesh Pokhrel and Sanjay Bhandari and Sharib Ali and Tryphon Lambrou and Anh
Nguyen and Yash Raj Shrestha and Angus Watson and Danail Stoyanov and Prashnna Gyawali
and Binod Bhattarai},
year={2024},
eprint={2412.01590},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.01590},
}
                             
Web Summary
The integration of deep learning tools in gastrointestinal vision holds the potential for significant advancements in diagnosis, treatment, and overall patient care.
A major challenge, however, is these tools’ tendency to make overconfident predictions, even when encountering unseen or newly emerging disease patterns, undermining their reliability. We address this critical issue of reliability by framing it as an out-of-distribution (OOD) detection problem, where previously unseen and emerging diseases are identified as OOD examples. However, gastrointestinal images pose a unique challenge due to the overlapping feature representations between in- Distribution (ID) and OOD examples. Existing approaches often overlook this characteristic, as they are primarily developed for natural image datasets, where feature distinctions are more apparent.
Despite the overlap, we hypothesize that the features of an in-distribution example will cluster closer to the centroids of their ground truth class, resulting in a shorter distance to the nearest centroid. In contrast, OOD examples maintain an equal distance from all class centroids. Based on this observation, we propose a novel nearestcentroid distance deficit (NCCD) score in the feature space for gastrointestinal OOD detection. Evaluations across multiple deep learning architectures and two publicly available benchmarks, Kvasir2 and Gastrovision, demonstrate the effectiveness of our approach compared to several state-of-the-art methods.
2025

Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models

Bidur Khanal, Sandesh Pokhrel, Sanjay Bhandari, Ramesh Rana, Nikesh Shrestha, Ram Bahadur Gurung, Cristian Linte, Angus Watson, Yash Raj Shrestha, Binod Bhattarai
Medical Image Computing and Computer Assisted Intervention (MICCAI) 2025
Peer Reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@misc{khanal2025hallucinationawaremultimodalbenchmarkgastrointestinal,
title={Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models},
author={Bidur Khanal and Sandesh Pokhrel and Sanjay Bhandari and Ramesh Rana and Nikesh Shrestha and Ram Bahadur Gurung and Cristian Linte and Angus Watson and Yash Raj Shrestha and Binod Bhattarai},
year={2025},
eprint={2505.07001},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.07001},
 }
                            
Web Summary
Vision-Language Models (VLMs) are becoming increasingly popular in the medical domain, bridging the gap between medical images and clinical language. Existing VLMs demonstrate an impressive ability to comprehend medical images and text queries to generate detailed, descriptive diagnostic medical reports. However, hallucination--the tendency to generate descriptions that are inconsistent with the visual content--remains a significant issue in VLMs, with particularly severe implications in the medical field.
To facilitate VLM research on gastrointestinal (GI) image analysis and study hallucination, we curate a multimodal image-text GI dataset: Gut-VLM. This dataset is created using a two-stage pipeline: first, descriptive medical reports of Kvasir-v2 images are generated using ChatGPT, which introduces some hallucinated or incorrect texts. In the second stage, medical experts systematically review these reports, and identify and correct potential inaccuracies to ensure high-quality, clinically reliable annotations.
Unlike traditional datasets that contain only descriptive texts, our dataset also features tags identifying hallucinated sentences and their corresponding corrections. A common approach to reducing hallucination in VLM is to finetune the model on a small-scale, problem-specific dataset. However, we take a different strategy using our dataset. Instead of finetuning the VLM solely for generating textual reports, we finetune it to detect and correct hallucinations, an approach we call hallucination-aware finetuning. Our results show that this approach is better than simply finetuning for descriptive report generation. Additionally, we conduct an extensive evaluation of state-of-the-art VLMs across several metrics, establishing a benchmark.
https://github.com/bhattarailab/Hallucination-Aware-VLM
2025

FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Karim Lekadir, Alejandro F Frangi, Antonio R Porras, Ben Glocker, Celia Cintas, Curtis P Langlotz, Eva Weicken, Folkert W Asselbergs, Fred Prior, Gary S Collins, Georgios Kaissis, Gianna Tsakou, Irène Buvat, Jayashree Kalpathy-Cramer, John Mongan, Julia A Schnabel, Kaisar Kushibar, Katrine Riklund, et al.
British Medical Journal (BMJ)
Peer Reviewed Journal article
Transforming Global health with AI (TOGAI)
BibTeX
@misc{}
            
Web Summary
Despite major advances in artificial intelligence (AI) research for healthcare, the deployment and adoption of AI technologies remain limited in clinical practice. This paper describes the FUTURE-AI framework, which provides guidance for the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI Consortium was founded in 2021 and comprises 117 interdisciplinary experts from 50 countries representing all continents, including AI scientists, clinical researchers, biomedical ethicists, and social scientists. Over a two year period, the FUTURE-AI guideline was established through consensus based on six guiding principles—fairness, universality, traceability, usability, robustness, and explainability. To operationalise trustworthy AI in healthcare, a set of 30 best practices were defined, addressing technical, clinical, socioethical, and legal dimensions. The recommendations cover the entire lifecycle of healthcare AI, from design, development, and validation to regulation, deployment, and monitoring.
2025

Transforming healthcare through just, equitable and quality driven artificial intelligence solutions in South Asia

Sushmita Adhikari, Iftikhar Ahmed, Deepak Bajracharya, Bishesh Khanal, Chandrasegarar Solomon, Kapila Jayaratne, Khondaker Abdullah Al Mamum, Muhammad Shamim Hayder Talukder, Sunila Shakya, Suresh Manandhar, Zahid Ali Memon, Moinul Haque Chowdhury, Ihtesham ul Islam, Noor Sabah Rakhshani & M. Imran Khan
npj Digital Medicine (nature)
Peer Reviewed Journal article
Transforming Global health with AI (TOGAI)
BibTeX
@misc{}
Web Summary
AI can transform healthcare in LMICs by improving access, reducing costs, and enhancing efficiency. However, challenges such as safety, bias, and the resource constraints need to be addressed. Further, collaboration across domains is essential to develop capacity, user-friendly tools, and training. Ethical considerations should be central to AI deployment. By emphasizing gender equity, fairness, and responsible design, LMICs can harness AI’s power to enhance healthcare outcomes and advance equitable care.
2025

Multimodal Federated Learning With Missing Modalities through Feature Imputation Network

Pranav Poudel, Aavash Chhetri, Prashnna Gyawali, Georgios Leontidis, Binod Bhattarai
Medical Image Understanding and Analysis (MIUA) 2025
Peer Reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@misc{poudel2025multimodalfederatedlearningmissing,
 title={Multimodal Federated Learning With Missing Modalities through Feature Imputation Network}, 
 author={Pranav Poudel and Aavash Chhetri and Prashnna Gyawali and Georgios Leontidis and Binod Bhattarai},
 year={2025},
 eprint={2505.20232},
 archivePrefix={arXiv},
 primaryClass={cs.LG},
 url={https://arxiv.org/abs/2505.20232}, 
}
            
Web Summary
Multimodal federated learning holds immense potential for collaboratively training models from multiple sources without sharing raw data, addressing both data scarcity and privacy concerns—two key challenges in healthcare. A major challenge in training multimodal federated models in healthcare is the presence of missing modalities due to multiple reasons, including variations in clinical practice, cost and accessibility constraints, retrospective data collection, privacy concerns, and occasional technical or human errors. Previous methods typically rely on publicly available real datasets or synthetic data to compensate for missing modalities. However, obtaining real datasets for every disease is impractical, and training generative models to synthesize missing modalities is computationally expensive and prone to errors due to the high dimensionality of medical data. In this paper, we propose a novel, lightweight, low-dimensional feature translator to reconstruct bottleneck features of the missing modalities. Our experiments on three different datasets (MIMIC-CXR, NIH Open-I, and CheXpert), in both homogeneous and heterogeneous settings consistently improve the performance of competitive baselines. The code and implementation details are available at:
https://github.com/bhattarailab/FedFeatGen
2025

NERO: Explainable Out-of-Distribution Detection with Neuron-level Relevance

Medical Image Computing and Computer Assisted Intervention (MICCAI) 2025
Peer Reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@misc{chhetri2025neroexplainableoutofdistributiondetection,
  title={NERO: Explainable Out-of-Distribution Detection with Neuron-level Relevance}, 
  author={Anju Chhetri and Jari Korhonen and Prashnna Gyawali and Binod Bhattarai},
  year={2025},
  eprint={2506.15404},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2506.15404}, 
}
            
Web Summary
Ensuring reliability is paramount in deep learning, particularly within the domain of medical imaging, where diagnostic decisions often hinge on model outputs. The capacity to separate out-ofdistribution (OOD) samples has proven to be a valuable indicator of a model’s reliability in research. In medical imaging, this is especially critical, as identifying OOD inputs can help flag potential anomalies that might otherwise go undetected. While many OOD detection methods rely on feature or logit space representations, recent works suggest these approaches may not fully capture OOD diversity. To address this, we propose a novel OOD scoring mechanism, called NERO, that leverages neuron-level relevance at the feature layer. Specifically, we cluster neuron-level relevance for each in-distribution (ID) class to form representative centroids and introduce a relevance distance metric to quantify a new sample’s deviation from these centroids, enhancing OOD separability. Additionally, we refine performance by incorporating scaled relevance in the bias term and combining feature norms. Our framework also enables explainable OOD detection. We validate its effectiveness across multiple deep learning architectures on the gastrointestinal imaging benchmarks Kvasir and GastroVision, achieving improvements over state-of-the-art OOD detection methods.
2025

Assistive Artificial Intelligence in Epilepsy and Its Impact on Epilepsy Care in Low- and Middle-Income Countries

Nabin Koirala , Shishir Raj Adhikari , Mukesh Adhikari, Taruna Yadav, Abdul Rauf Anwar, Dumitru Ciolac, Bibhusan Shrestha, Ishan Adhikari, Bishesh Khanal, Muthuraman Muthuraman
Brain Sciences (MDPI) 2025
Peer Reviewed Journal article
Transforming Global health with AI (TOGAI)
BibTeX
@article{brainsci15050481,
title={Assistive Artificial Intelligence in Epilepsy and Its Impact on Epilepsy Care in
Low- and Middle-Income Countries},
author={Koirala, Nabin and Adhikari, Shishir Raj and Adhikari, Mukesh and Yadav, Taruna
and Anwar, Abdul Rauf and Ciolac, Dumitru and Shrestha, Bibhusan and Adhikari, Ishan and
Khanal, Bishesh and Muthuraman, Muthuraman},
year={2025},
journal={Brain Sciences},
doi = {10.3390/brainsci15050481},
url={https://www.mdpi.com/2076-3425/15/5/481},
}
                            
Web Summary
Epilepsy, one of the most common neurological diseases in the world, affects around 50 million people, with a notably disproportionate prevalence in individuals residing in low- and middle-income countries (LMICs). Alarmingly, over 80% of annual epilepsy-related fatalities occur within LMICs. The burden of the disease assessed using Disability Adjusted Life Years (DALYs) shows that epilepsy accounts for about 13 million DALYs per year, with LMICs bearing most of this burden due to the disproportionately high diagnostic and treatment gaps. Furthermore, LMICs also endure a significant financial burden, with the cost of epilepsy reaching up to 0.5% of the Gross National Product (GNP) in some cases. Difficulties in the appropriate diagnosis and treatment are complicated by the lack of trained medical specialists. Therefore, in these conditions, adopting artificial intelligence (AI)-based solutions may improve epilepsy care in LMICs. In this theoretical and critical review, we focus on epilepsy and its management in LMICs, as well as on the employment of AI technologies to aid epilepsy care in LMICs. We begin with a general introduction of epilepsy and present basic diagnostic and treatment approaches. We then explore the socioeconomic impact, treatment gaps, and efforts made to mitigate these issues. Taking this step further, we examine recent AI-related developments and their potential as assistive tools in clinical application in LMICs, along with proposals for future directions. We conclude by suggesting the need for scalable, low-cost AI solutions that align with the local infrastructure, policy and community engagement to improve epilepsy care in LMICs.
2025

Multimodal Lead-Specific Modeling of ECG for Low-Cost Pulmonary Hypertension Assessment

Mohammod N. I. Suvon, Shuo Zhou, Prasun C. Tripathi, Wenrui Fan, Samer Alabed, Bishesh Khanal, Venet Osmani, Andrew J. Swift, Chen (Cherise) Chen, Haiping Lu
Pre-Print
Transforming Global health with AI (TOGAI)
BibTeX
@misc{suvon2025multimodalleadspecificmodelingecg,
title={Multimodal Lead-Specific Modeling of ECG for Low-Cost Pulmonary Hypertension Assessment},
author={Mohammod N. I. Suvon and Shuo Zhou and Prasun C. Tripathi and Wenrui Fan and Samer Alabed and Bishesh Khanal and Venet Osmani and Andrew J. Swift and Chen and Chen and Haiping Lu},
year={2025},
eprint={2503.13470},
archivePrefix={arXiv},
primaryClass={eess.SP},
url={https://arxiv.org/abs/2503.13470},
}
                            
Web Summary
Pulmonary hypertension (PH) is frequently underdiagnosed in low- and middle-income countries (LMICs) primarily due to the scarcity of advanced diagnostic tools. Several studies in PH have applied ma chine learning to low-cost diagnostic tools like 12-lead ECG (12L-ECG), but they mainly focus on areas with limited resources, overlooking areas with no diagnostic tools, such as rural primary healthcare in LMICs. Re cent studies have shown the effectiveness of 6-lead ECG (6L-ECG), as a cheaper and portable alternative in detecting various cardiac conditions, but its clinical value for PH detection is not well proved. Furthermore, existing methods treat 12L-/6L-ECG as a single modality, capturing only shared features while overlooking lead-specific features essential for iden tifying complex cardiac hemodynamic changes. In this paper, we propose Lead-Specific Electrocardiogram Multimodal Variational Autoencoder (LS-EMVAE), a model pre-trained on large-population 12L-ECG data and fine-tuned on task-specific data (12L-ECG or 6L-ECG). LS-EMVAE treats each lead in 12L-ECG as a separate modality and incorporates a novel hierarchical modality expert composition mechanism based on Mixture of Expert and Product of Expert to enable flexible, adaptive latent feature fusion among lead-specific and shared features. Unlike ex isting approaches, LS-EMVAE makes better predictions on both 12L ECG and 6L-ECG at inference time, making it an equitable solution for areas with both limited and no diagnostic tools. We pre-trained LS EMVAE on 800,000 publicly available 12L-ECG samples and fine-tuned it for two tasks: 1) PH detection and 2) phenotyping pre-/post-capillary PH, on in-house datasets of 892 and 691 subjects across 12L-ECG and 6L-ECG settings. Extensive experiments show that LS-EMVAE outper forms existing baselines in both ECG settings, while 6L-ECG achieves performance comparable to 12L-ECG, unlocking its potential for global PH screening in areas without diagnostic tools
2025

Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation

Conference on Parsimony and Learning (CPAL) 2025
Peer Reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@misc{sapkota2024dimensionmixergroupmixing, 
  title={Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation}, 
  author={Suman Sapkota and Binod Bhattarai}, 
  year={2024}, 
  eprint={2311.18735}, 
  archivePrefix={arXiv}, 
  primaryClass={cs.LG}, 
  url={https://arxiv.org/abs/2311.18735}, 
}
            
Web Summary
The recent success of multiple neural architectures like CNNs, Transformers, and MLP-Mixers motivated us to look for similarities and differences between them. We found that these architectures can be interpreted through the lens of a general concept of dimension mixing. Research on coupling flows and the butterfly transform shows that partial and hierarchical signal mixing schemes are sufficient for efficient and expressive function approximation. In this work, we study group-wise sparse, non-linear, multi-layered and learnable mixing schemes of inputs and find that they are complementary to many standard neural architectures. Following our observations and drawing inspiration from the Fast Fourier Transform, we generalize Butterfly Structure to use non-linear mixer function allowing for MLP as mixing function called Butterfly MLP. We were also able to sparsely mix along sequence dimension for Transformer-based architectures called Butterfly Attention. Experiments on CIFAR and LRA datasets demonstrate that the proposed Non-Linear Butterfly Mixers are efficient and scale well when the host architectures are used as mixing function. Additionally, we propose Patch-Only MLP-Mixer for processing spatial 2D signals demonstrating a different dimension mixing strategy.
2024

NLPineers@ NLU of Devanagari Script Languages 2025: Hate speech detection using ensembling of BERT-based models

Anmol Guragain, Nadika Poudel, Rajesh Piryani, Bishesh Khanal
CHiPSAL: Challenges in Processing South Asian Languages. COLING 2025
Workshop and Challenge article
Transforming Global health with AI (TOGAI)
BibTeX
@misc{guragain2024nlpineersnludevanagariscript,
      title={NLPineers@ NLU of Devanagari Script Languages 2025: Hate Speech Detection using Ensembling of BERT-based models}, 
      author={Anmol Guragain and Nadika Poudel and Rajesh Piryani and Bishesh Khanal},
      year={2024},
      eprint={2412.08163},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.08163}, 
}
            
Web Summary
This paper explores hate speech detection in Devanagari-scripted languages, focusing on Hindi and Nepali, for Subtask B of the CHIP SAL@COLING 2025 Shared Task. Using a range of transformer-based models such as XLM-RoBERTa, MURIL, and IndicBERT, we examined their effectiveness in navigating the nuanced boundary between hate speech and free expression. Our best performing model, implemented as ensemble of mul tilingual BERT models achieved Recall of 0.7762 (Rank 3/31 in terms of recall) and F1 score of 0.6914 (Rank 17/31). To ad dress class imbalance, we used backtransla tion for data augmentation, and cosine simi larity to preserve label consistency after aug mentation. This work emphasizes the need for hate speech detection in Devanagari-scripted languages and presents a foundation for fur ther research. The code can be accessed at:
https://github.com/Anmol2059/NLPineers
2024

Exploring transfer learning in medical image segmentation using vision-language models

Medical Imaging with Deep Learning (MIDL) 2024
Peer Reviewed Conference article
Transforming Global health with AI (TOGAI)
BibTeX
@article{poudel2023exploring, 
  title={Exploring transfer learning in medical image segmentation using vision-language models}, 
  author={Poudel, Kanchan and Dhakal, Manish and Bhandari, Prasiddha and Adhikari, Rabin and Thapaliya, Safal and Khanal, Bishesh}, 
  journal={arXiv preprint arXiv:2308.07706}, 
  year={2023}
}
            
Web Summary
Medical image segmentation allows quantifying target structure size and shape, aid ing in disease diagnosis, prognosis, surgery planning, and comprehension. Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an additional input to segmentation models. Introducing auxiliary information via text with human-in-the-loop prompting dur ing inference opens up unique opportunities, such as open vocabulary segmentation and potentially more robust segmentation models against out-of-distribution data.
Although transfer learning from natural to medical images has been explored for image only segmentation models, the joint representation of vision-language in segmentation problems remains underexplored. This study introduces the first systematic study on transferring VLSMs to 2D medical images, using carefully curated 11 datasets encompassing diverse modalities and insightful language prompts and experiments. Our findings demon strate that although VLSMs show competitive performance compared to image-only models for segmentation after finetuning in limited medical image datasets, not all VLSMs utilize the additional information from language prompts, with image features playing a dominant role. While VLSMs exhibit enhanced performance in handling pooled datasets with diverse modalities and show potential robustness to domain shifts compared to conventional seg mentation models, our results suggest that novel approaches are required to enable VLSMs to leverage the various auxiliary information available through language prompts. The code and datasets are available at:
https://github.com/naamiinepal/medvlsm.
2024

TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Asian Conference on Computer Vision (ACCV) 2024
Peer Reviewed Conference article
Transforming Global health with AI (TOGAI)
BibTeX
@article{adhikari2024tunevlsegprompttuningbenchmark,
      title={TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models}, 
      author={Rabin Adhikari and Safal Thapaliya and Manish Dhakal and Bishesh Khanal},
      year={2024},
      eprint={2410.05239},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.05239}, 
}
            
Web Summary
Vision-Language Models (VLMs) have shown impressive per formance in vision tasks, but adapting them to new domains often re quires expensive fine-tuning. Prompt tuning techniques, including tex tual, visual, and multimodal prompting, offer efficient alternatives by leveraging learnable prompts. However, their application to Vision-Language Segmentation Models (VLSMs) and evaluation under significant domain shifts remain unexplored. This work presents an open-source benchmark ing framework, TuneVLSeg, to integrate various unimodal and multi modal prompt tuning techniques into VLSMs, making prompt tuning usable for downstream segmentation datasets with any number of classes. TuneVLSeg includes 6 prompt tuning strategies on various prompt depths used in 2 VLSMs totaling of 8 different combinations. We test vari ous prompt tuning on 8 diverse medical datasets, including 3 radiology datasets (breast tumor, echocardiograph, chest X-ray pathologies) and 5 non-radiology datasets (polyp, ulcer, skin cancer), and two natural domain segmentation datasets. Our study found that textual prompt tuning struggles under significant domain shifts, from natural-domain images to medical data. Furthermore, visual prompt tuning, with fewer hyperparameters than multimodal prompt tuning, often achieves per formance competitive to multimodal approaches, making it a valuable first attempt. Our work advances the understanding and applicability of different prompt-tuning techniques for robust domain-specific segmenta tion. The source code is available at:
https://github.com/naamiinepal/tunevlseg
2024

Deep-learning assisted detection and quantification of (oo) cysts of Giardia and Cryptosporidium on smartphone microscopy images

Suprim Nakarmi, Sanam Pudasaini, Safal Thapaliya, Pratima Upretee, Retina Shrestha, Basant Giri, Bhanu Bhakta Neupane, Bishesh Khanal
Machine Learning for Biomedical Imaging (MELBA) 2024
Published at Machine Learning for Biomedical Imaging, 2024
Peer Reviewed Journal article
Transforming Global health with AI (TOGAI)
BibTeX
@article{nakarmi2023deep, 
  title={Deep-learning assisted detection and quantification of (oo) cysts of Giardia and Cryptosporidium on smartphone microscopy images}, 
  author={Nakarmi, Suprim and Pudasaini, Sanam and Thapaliya, Safal and Upretee, Pratima and Shrestha, Retina and Giri, Basant and Neupane, Bhanu Bhakta and Khanal, Bishesh}, 
  journal={arXiv preprint arXiv:2304.05339}, 
  year={2023} 
}
            
Web Summary
The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identification by trained technicians, usually unavailable in resource-limited settings. Automatic detection of (oo)cysts using deep-learning-based object detection could offer a solution for this limitation. We evaluate the performance of four state-of-the-art object detectors to detect (oo)cysts of Giardia and Cryptosporidium on a custom dataset that includes both smartphone and brightfield microscopic images from vegetable samples. Faster RCNN, RetinaNet, You Only Look Once (YOLOv8s), and Deformable Detection Transformer (Deformable DETR) deep-learning models were employed to explore their efficacy and limitations. Our results show that while the deep-learning models perform better with the brightfield microscopy image dataset than the smartphone microscopy image dataset, the smartphone microscopy predictions are still comparable to the prediction performance of non-experts. Also, we publicly release brightfield and smartphone microscopy datasets with the benchmark results for the detection of Giardia and Cryptosporidium, independently captured on reference (or standard lab setting) and vegetable samples.
2024

T2FNorm: Train-time Feature Normalization for OOD Detection in Image Classification

Sudarshan Regmi, Bibek Panthi, Yifei Ming, Prashnna Gyawali, Danail Stoyanov, Binod Bhattarai
Computer Vision and Pattern Recognition (CVPR) Workshop 2024
Peer-reviewed Workshop article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@inproceedings{regmi2024t2fnorm,
  title={T2FNorm: Train-time Feature Normalization for OOD Detection in Image Classification},
  author={Regmi, Sudarshan and Panthi, Bibek and Dotel, Sakar and Gyawali, Prashnna K and Stoyanov, Danail and Bhattarai, Binod},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={153--162},
  year={2024}
}
                
Web Summary
Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deploy ment in real-world applications. While feature normalization has garnered considerable attention within the deep learn ing literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorporation of feature normalization within neural networks does not guarantee substantial improvement in OOD detection performance. In this work, we introduce T2FNorm, anovelapproachtotrans forming features to hyperspherical space during training, while employing non-transformed space for OOD-scoring purposes. This method yields a surprising enhancement in OOD detection capabilities without compromising model accuracy in in-distribution(ID). Our investigation demon strates that the proposed technique substantially diminishes the norm of the features of all samples, more so in the case of out-of-distribution samples, thereby addressing the prevalent concern of overconfidence in neural networks. The proposed method also significantly improves various post-hoc OOD detection methods.
2024

Task-Aware Active Learning for Endoscopic Polyp Segmentation

Shrawan Kumar Thapa, Pranav Poudel, Sudarshan Regmi, Binod Bhattarai, Danail Stoyanov
Data Engineering in Medical Imaging (DEMI) 2024
Peer-reviewed Workshop article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{
}
                
Web Summary
2024

Investigation of Federated Learning Algorithms for Retinal Optical Coherence Tomography Image Classification with Statistical Heterogeneity

Sanskar Amgain, Prashant Shrestha, Sophia Bano, Ignacio del Valle Torres, Michael Cunniffe, Victor Hernandez, Phil Beales, Binod Bhattarai
Information Processing in Computer-Assisted Interventions (IPCAI)
Peer-reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{amgain2024investigationfederatedlearningalgorithms,
      title={Investigation of Federated Learning Algorithms for Retinal Optical Coherence Tomography Image Classification with Statistical Heterogeneity}, 
      author={Sanskar Amgain and Prashant Shrestha and Sophia Bano and Ignacio del Valle Torres and Michael Cunniffe and Victor Hernandez and Phil Beales and Binod Bhattarai},
      year={2024},
      eprint={2402.10035},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2402.10035}, 
}
                
Web Summary
Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely.
Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy concerns associated with centralizing data. We partitioned a publicly available OCT dataset across multiple clients under IID and Non-IID settings and conducted local training on the subsets for each client. We evaluated two federated learning methods, FedAvg and FedProx for these settings.
Results: Our experiments on the dataset suggest that under IID settings, both methods perform on par with training on a central data pool. However, the performance of both algorithms declines as we increase the statistical heterogeneity across the client data, while FedProx consistently performs better than FedAvg in the increased heterogeneity settings.
Conclusion: Despite the effectiveness of federated learning in the utilization of private data across multiple medical institutions, the large number of clients and heterogeneous distribution of labels deteriorate the performance of both algorithms. Notably, FedProx appears to be more robust to the increased heterogeneity.
2024

iHuman: Instant Animatable Digital Humans From Monocular Videos

Pramish Paudel, Anubhav Khanal, Ajad Chhatkuli, Danda Pani Paudel, Jyoti Tandukar
ECCV: European Conference on Computer Vision (2024): 18th European Conference on Computer Vision, Milan, Italy, (Sept 29 - October 4, 2024), Proceedings, Part I, (2024)
Peer-reviewed Conference article
others
BibTeX
@article{paudel2024ihuman,
  title={iHuman: Instant Animatable Digital Humans From Monocular Videos},
  author={Paudel, Pramish and Khanal, Anubhav and Chhatkuli, Ajad and Paudel, Danda Pani and Tandukar, Jyoti},
  journal={arXiv preprint arXiv:2407.11174},
  year={2024}
}
                
Web Summary
Personalized 3D avatars require an animatable representa tion of digital humans. Doing so instantly from monocular videos offers scalability to broad class of users and wide-scale applications. In this paper, we present a fast, simple, yet effective method for creating ani matable 3D digital humans from monocular videos. Our method utilizes the efficiency of Gaussian splatting to model both 3D geometry and ap pearance. However, we observed that naively optimizing Gaussian splats results in inaccurate geometry, thereby leading to poor animations. This work achieves and illustrates the need of accurate 3D mesh-type modelling of the human body for animatable digitization through Gaus sian splats. This is achieved by developing a novel pipeline that benefits from three key aspects: (a) implicit modelling of surface’s displacements and the color’s spherical harmonics; (b) binding of 3D Gaussians to the respective triangular faces of the body template; (c) a novel technique to render normals followed by their auxiliary supervision. Our exhaus tive experiments on three different benchmark datasets demonstrates the state-of-the-art results of our method, in limited time settings. In fact, our method is faster by an order of magnitude (in terms of training time) than its closest competitor. At the same time, we achieve superior ren dering and 3D reconstruction performance under the change of poses.
2024

Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise

Bidur Khanal, Tianhong Dai, Binod Bhattarai, Cristian Linte
Peer-reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{khanal2024active,
  title={Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise},
  author={Khanal, Bidur and Dai, Tianhong and Bhattarai, Binod and Linte, Cristian},
  journal={arXiv preprint arXiv:2407.05973},
  year={2024}
}
                
Web Summary
The robustness of supervised deep learning-based medical image classification is significantly undermined by label noise in the training data. Although several methods have been proposed to enhance classification performance in the presence of noisy labels, they face some challenges: 1) a struggle with class-imbalanced datasets, leading to the frequent overlooking of minority classes as noisy samples; 2) a singular focus on maximizing performance using noisy datasets, without incor porating experts-in-the-loop for actively cleaning the noisy labels. To mitigate these challenges, we propose a two-phase approach that com bines Learning with Noisy Labels (LNL) and active learning. This ap proach not only improves the robustness of medical image classification in the presence of noisy labels but also iteratively improves the qual ity of the dataset by relabeling the important incorrect labels, under a limited annotation budget. Furthermore, we introduce a novel Variance of Gradients approach in the LNL phase, which complements the loss based sample selection by also sampling under-represented examples. Us ing two imbalanced noisy medical classification datasets, we demonstrate that our proposed technique is superior to its predecessors at handling class imbalance by not misidentifying clean samples from minority classes as mostly noisy samples.
2024

How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?

Pre-print
Transforming Global health with AI (TOGAI)
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{khanal2024does,
  title={How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?},
  author={Khanal, Bidur and Bhattarai, Binod and Khanal, Bishesh and Linte, Cristian},
  journal={arXiv preprint arXiv:2401.07990},
  year={2024}
}
                
Web Summary
Noisy labels can significantly impact medical image classification, particularly in deep learning, by corrupting learned features. Self-supervised pretraining, which doesn’t rely on labeled data, can enhance robustness against noisy labels. However, this robustness varies based on factors like the number of classes, dataset complexity, and training size. In medical images, subtle inter-class differences and modality-specific characteristics add complexity. Previous research hasn’t comprehensively explored the interplay between self-supervised learning and robustness against noisy labels in medical image classification, considering all these factors. In this study, we address three key questions: i) How does label noise impact various medical image classification datasets? ii) Which types of medical image datasets are more challenging to learn and more affected by label noise? iii) How do different selfsupervised pretraining methods enhance robustness across various medical image datasets? Our results show that DermNet, among five datasets (Fetal plane, DermNet, COVID-DUEx, MURA, NCT-CRC-HE-100K), is the most challenging but exhibits greater robustness against noisy labels. Additionally, contrastive learning stands out among the eight selfsupervised methods as the most effective approach to enhance robustness against noisy labels.
2024

CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities

Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024
Peer-reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{poudel2024car,
  title={CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities},
  author={Poudel, Pranav and Shrestha, Prashant and Amgain, Sanskar and Shrestha, Yash Raj and Gyawali, Prashnna and Bhattarai, Binod},
  journal={arXiv preprint arXiv:2407.08648},
  year={2024}
}
                
Web Summary
Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centralizing sensitive data, thus maintaining privacy and security. Yet, research in multimodal federated learning, particularly in scenarios with missing modalities—a common issue in healthcare datasets—remains scarce, highlighting a critical area for future exploration. Toward this, we propose a novel method for multimodal federated learning with missing modalities. Our contribution lies in a novel cross-modal data augmentation by retrieval, leveraging the small publicly available dataset to fill the missing modalities in the clients. Our method learns the parameters in a federated manner, ensuring privacy protection and improving performance in multiple challenging multimodal benchmarks in the medical domain, surpassing several competitive baselines.
2024

TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision

Sandesh Pokhrel, Sanjay Bhandari, Eduard Vazquez, Tryphon Lambrou, Prashnna Gyawali, Binod Bhattarai
Data Engineering in Medical Imaging (DEMI) 2024
Peer-reviewed Workshop article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{pokhrel2024tta,
  title={TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision},
  author={Pokhrel, Sandesh and Bhandari, Sanjay and Vazquez, Eduard and Lambrou, Tryphon and Gyawali, Prashnna and Bhattarai, Binod},
  journal={arXiv preprint arXiv:2407.14024},
  year={2024}
}
                
Web Summary
Deep learning has significantly advanced the field of gastrointestinal vision, enhancing disease diagnosis capabilities. One major challenge in automating diagnosis within gastrointestinal settings is the detection of abnormal cases in endoscopic images. Due to the sparsity of data, this process of distinguishing normal from abnormal cases has faced significant challenges, particularly with rare and unseen conditions. To address this issue, we frame abnormality detection as an outof-distribution (OOD) detection problem. In this setup, a model trained on In-Distribution (ID) data, which represents a healthy GI tract, can accurately identify healthy cases, while abnormalities are detected as OOD, regardless of their class. We introduce a test-time augmentation segment into the OOD detection pipeline, which enhances the distinction between ID and OOD examples, thereby improving the effectiveness of existing OOD methods with the same model. This augmentation shifts the pixel space, which translates into a more distinct semantic representation for OOD examples compared to ID examples. We evaluated our method against existing state-of-the-art OOD scores, showing improvements with test-time augmentation over the baseline approach.
2024

Cross-Task Data Augmentation by Pseudo-label Generation for Region Based Coronary Artery Instance Segmentation

Sandesh Pokhrel, Sanjay Bhandari, Eduard Vazquez, Yash Raj Shrestha, Binod Bhattarai
Data Engineering in Medical Imaging (DEMI) 2024
Peer-reviewed Workshop article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@misc{pokhrel2024crosstaskdataaugmentationpseudolabel,
  title={Cross-Task Data Augmentation by Pseudo-label Generation for Region Based Coronary Artery Instance Segmentation},
  author={Sandesh Pokhrel and Sanjay Bhandari and Eduard Vazquez and Yash Raj Shrestha and Binod Bhattarai},
  year={2024},
  eprint={2310.05990},
  archivePrefix={arXiv},
  primaryClass={eess.IV},
  url={https://arxiv.org/abs/2310.05990}
}
                
Web Summary
Coronary Artery Diseases (CADs) although preventable, are one of the leading causes of death and disability. Diagnosis of these diseases is often difficult and resource intensive. Angiographic imaging segmentation of the arteries has evolved as a tool of assistance that helps clinicians make an accurate diagnosis. However, due to the limited amount of data and the difficulty in curating a dataset, the task of segmentation has proven challenging. In this study, we introduce the use of pseudo-labels to address the issue of limited data in the angiographic dataset to enhance the performance of the baseline YOLO model. Unlike existing data augmentation techniques that improve the model constrained to a fixed dataset, we introduce the use of pseudo-labels generated on a dataset of separate related task to diversify and improve model performance. This method increases the baseline F1 score by 9% in the validation data set and by 3% in the test data set.
2024

Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2024
Peer Reviewed Conference article
Transforming Global health with AI (TOGAI)
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{Khanal2024InvestigatingTR, 
  title={Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification}, 
  author={Bidur Khanal and Prashant Shrestha and Sanskar Amgain and Bishesh Khanal and Binod Bhattarai and Cristian A. Linte}, 
  journal={ArXiv}, 
  year={2024}, 
  volume={abs/2402.16734}, 
  url={https://api.semanticscholar.org/CorpusID:268678393} 
}
          
Web Summary
Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image classification. In this paper, we investigate the architectural robustness of ViT against label noise and compare it to that of CNNs. We use two medical image classification datasets—COVID-DU-Ex, and NCT-CRC-HE-100K—both corrupted by injecting label noise at various rates. Additionally, we show that pretraining is crucial for ensuring ViT’s improved robustness against label noise in supervised training.
2024

ReweightOOD: Loss Reweighting for Distance-based OOD Detection

Data Engineering in Medical Imaging (DEMI) 2024
Peer Reviewed Workshop article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@inproceedings{regmi2024reweightood, 
  title={ReweightOOD: Loss Reweighting for Distance-based OOD Detection}, 
  author={Regmi, Sudarshan and Panthi, Bibek and Ming, Yifei and Gyawali, Prashnna K and Stoyanov, Danail and Bhattarai, Binod}, 
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, 
  year={2024}, 
  pages={131--141} 
}
          
Web Summary
Out-of-Distribution (OOD) detection is crucial for ensuring safety and reliability of neural networks in critical applications. Distance-based OOD detection is based on the assumption that OOD samples are mapped far from InDistribution (ID) clusters in embedding space. A recent approach for obtaining OOD-detection-friendly embedding space has been contrastive optimization of pulling similar pairs and pushing apart dissimilar pairs. It assigns equal significance to all similarity instances with the implicit objective of maximizing the mean proximity between samples with their corresponding hypothetical class centroids. However, the emphasis should be directed towards reducing the Minimum Enclosing Sphere (MES) for each class and achieving higher inter-class dispersion to effectively mitigate the potential for ID-OOD overlap. Optimizing lowsignal dissimilar pairs might potentially act against achieving maximal inter-class dispersion while less-optimized similar pairs prevent achieving smaller MES. Based on this, we propose a reweighting scheme ReweightOOD, that adopts the similarity optimization which prioritizes the optimization of less-optimized contrasting pairs while assigning lower importance to already well-optimized contrasting pairs. Such a reweighting scheme serves to minimize the MES for each class while achieving maximal interclass dispersion. Experimental results on a challenging CIFAR100 benchmark using ResNet-18 network demonstrate that ReweightOOD outperforms supervised contrastive loss by a whopping 38% in the average FPR metric. In various classification datasets, our method provides a promising solution for enhancing OOD detection capabilities in neural networks.
2024

AI-Assisted Cervical Cancer Screening

Kanchan Poudel, Lisasha Poudel, Prabin Raj Shakya, Atit Poudel, Archana Shrestha, Bishesh Khanal
Pre Print
Transforming Global health with AI (TOGAI)
BibTeX
@article{Poudel2024AIAssistedCC, 
  title={AI-Assisted Cervical Cancer Screening}, 
  author={Poudel, Kanchan and Poudel, Lisasha and Shakya, Prabin Raj and Poudel, Atit and Shrestha, Archana and Khanal, Bishesh}, 
  journal={arXiv preprint arXiv:2403.11936}, 
  year={2024} 
}
          
Web Summary
Visual Inspection with Acetic Acid (VIA) remains the most feasible cervical cancer screening test in resource-constrained settings of low- and middle-income countries (LMICs), which are often performed in screening camps or primary/community health centers by nurses instead of the preferred but unavailable expert Gynecologist. To address the highly subjective nature of the test, various handheld devices integrating cameras or smartphones have been recently explored to capture cervical images during VIA and aid decision-making via telemedicine or AI models. Most studies proposing AI models retrospectively use a relatively small number of already collected images from specific devices, digital cameras, or smartphones; the challenges and protocol for quality image acquisition during VIA in resource-constrained camp settings, challenges in getting gold standard, data imbalance, etc. are often overlooked. We present a novel approach and describe the end-to-end design process to build a robust smartphone-based AI-assisted system that does not require buying a separate integrated device: the proposed protocol for quality image acquisition in resource-constrained settings, dataset collected from 1, 430 women during VIA performed by nurses in screening camps, preprocessing pipeline, and training and evaluation of a deep-learning-based classification model aimed to identify (pre)cancerous lesions. Our work shows that the readily available smartphones and a suitable protocol can capture the cervix images with the required details for the VIA test well; the deep-learning-based classification model provides promising results to assist nurses in VIA screening; and provides a direction for large-scale data collection and validation in resource-constrained settings.
2024

VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks

Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024
Peer Reviewed Conference article
Transforming Global health with AI (TOGAI)
BibTeX
@article{dhakal2024vlsm, 
  title={VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks}, 
  author={Dhakal, Manish and Adhikari, Rabin and Thapaliya, Safal and Khanal, Bishesh}, 
  journal={arXiv preprint arXiv:2405.06196}, 
  month={May 10} 
}
          
Web Summary
Foundation Vision-Language Models (VLMs) trained using large-scale open-domain images and text pairs have recently been adapted to develop Vision-Language Segmentation Models (VLSMs) that allow providing text prompts during inference to guide image segmentation. If robust and powerful VLSMs can be built for medical images, it could aid medical professionals in many clinical tasks where they must spend substantial time delineating the target structure of interest. VLSMs for medical images resort to fine-tuning base VLM or VLSM pretrained on open-domain natural image datasets due to fewer annotated medical image datasets; this fine-tuning is resource-consuming and expensive as it usually requires updating all or a significant fraction of the pretrained parameters. Recently, lightweight blocks called adapters have been proposed in VLMs that keep the pretrained model frozen and only train adapters during f ine-tuning, substantially reducing the computing resources required. We introduce a novel adapter, VLSM-Adapter, that can fine-tune pretrained vision-language segmentation models using transformer encoders. Our experiments in widely used CLIP-based segmentation models show that with only 3 million trainable parameters, the VLSM-Adapter outperforms state-of-the-art and is comparable to the upper bound end-to-end finetuning.
2023

Metric Transform: Exploring beyond Affine Transform for Neural Networks

TinyPapers ICLR 2023
Peer Reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{sapkota2023metric, 
  title={Metric Transform: Exploring beyond Affine Transform for Neural Networks}, 
  author={Sapkota, Suman and Bhattarai, Binod}, 
  year={2023} 
}
          
Web Summary
Artificial Neural Networks(ANN) of varying architectures are generally paired with linear transformation at the core. However, we find dot product neurons with global influence less interpretable as compared to a more local influence of euclidean distance (as used in RBF). In this work, we explore the generalization of dot product neurons to lp-norm, metrics, and beyond. We find such metrics as transform performs similarly to affine transform when used in MLP or CNN. Furthermore, we use distance/similarity measuring neurons to interpret and explain input data, overfitting and Residual MLP.
2023

An objective validation of polyp and instrument segmentation methods in colonoscopy through Medico 2020 polyp segmentation and MedAI 2021 transparency challenges

Debesh Jha, ... , Shruti Shrestha, ... , Sharib Ali, Michael A Riegler, Pål Halvorsen, Ulas Bagci, Thomas De Lange
arXiv preprint arXiv:2307.16262, 2023
Pre Print
Computational Endoscopy, Surgery & Pathology (CESP)
BibTeX
@article{jha2023objective, 
  title={An objective validation of polyp and instrument segmentation methods in colonoscopy through Medico 2020 polyp segmentation and MedAI 2021 transparency challenges}, 
  author={Jha, Debesh and Sharma, Vanshali and Banik, Debapriya and Bhattacharya, Debayan and Roy, Kaushiki and Hicks, Steven A and Tomar, Nikhil Kumar and Thambawita, Vajira and Krenzer, Adrian and Ji, Ge-Peng and others}, 
  journal={arXiv preprint arXiv:2307.16262}, 
  year={2023}
}
          
Web Summary
Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the "Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image Segmentation (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage qualitative evaluation for building more transparent and understandable AI-based colonoscopy systems.
2023

Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography

Simplifying Medical Ultrasound. ASMUS, 2023
Peer Reviewed Workshop article
Transforming Global health with AI (TOGAI)
BibTeX
@inproceedings{adhikari2023synthetic, 
  title={Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography}, 
  author={Adhikari, Rabin and Dhakal, Manish and Thapaliya, Safal and Poudel, Kanchan and Bhandari, Prasiddha and Khanal, Bishesh}, 
  booktitle={International Workshop on Advances in Simplifying Medical Ultrasound}, 
  pages={89--99}, 
  year={2023}, 
  organization={Springer}
}
          
Web Summary
Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially aiding in accurate and explainable segmentation. However, the lack of readily available data in echocardiography hampers the training of VLSMs. In this study, we explore using synthetic datasets from Semantic Diffusion Models (SDMs) to enhance VLSMs for echocardiography segmentation. We evaluate results for two popular VLSMs (CLIPSeg and CRIS) using seven different kinds of language prompts derived from several attributes, automatically extracted from echocardiography images, segmentation masks, and their metadata. Our results show improved metrics and faster convergence when pretraining VLSMs on SDM-generated synthetic images before finetuning on real images.
2023

Exploring transfer learning in medical image segmentation using vision-language models

arXiv preprint arXiv:2308.07706, 2023
Pre-Prints
Transforming Global health with AI (TOGAI)
BibTeX
@article{poudel2023exploring, 
  title={Exploring transfer learning in medical image segmentation using vision-language models}, 
  author={Poudel, Kanchan and Dhakal, Manish and Bhandari, Prasiddha and Adhikari, Rabin and Thapaliya, Safal
  and Khanal, Bishesh}, 
  journal={arXiv preprint arXiv:2308.07706}, 
  year={2023} 
}
          
Web Summary
Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and this http URL upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an additional input to segmentation models. Introducing auxiliary information via text with human-in-the-loop prompting during inference opens up unique opportunities, such as open vocabulary segmentation and potentially more robust segmentation models against out-of-distribution data. Although transfer learning from natural to medical images has been explored for image-only segmentation models, the joint representation of vision-language in segmentation problems remains underexplored. This study introduces the first systematic study on transferring VLSMs to 2D medical images, using carefully curated 11 datasets encompassing diverse modalities and insightful language prompts and experiments. Our findings demonstrate that although VLSMs show competitive performance compared to image-only models for segmentation after finetuning in limited medical image datasets, not all VLSMs utilize the additional information from language prompts, with image features playing a dominant role. While VLSMs exhibit enhanced performance in handling pooled datasets with diverse modalities and show potential robustness to domain shifts compared to conventional segmentation models, our results suggest that novel approaches are required to enable VLSMs to leverage the various auxiliary information available through language prompts
2023

FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Karim Lekadir, Bishesh Khanal, Martijn Starmans
arXiv preprint arXiv:2309.12325 , 2023
Pre-Prints
Transforming Global health with AI (TOGAI)
BibTeX
@article{lekadir2023future, 
  title={FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence
  in healthcare}, 
  author={Lekadir, Karim and Feragen, Aasa and Fofanah, Abdul Joseph and Frangi, Alejandro F and Buyx, Alena
  and Emelie, Anais and Lara, Andrea and Porras, Antonio R and Chan, An-Wen and Navarro, Arcadi and others},
  journal={arXiv preprint arXiv:2309.12325}, 
  year={2023}
}
          
Web Summary
Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI.
2023

Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining

MICCAI Workshop on Data Engineering in Medical Imaging. DEMI, 2023
Peer Reviewed Workshop article
Transforming Global health with AI (TOGAI)
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@inproceedings{khanal2023improving, 
  title={Improving Medical Image Classification in Noisy Labels Using only Self-supervised Pretraining},
  author={Khanal, Bidur and Bhattarai, Binod and Khanal, Bishesh and Linte, Cristian A}, 
  booktitle={MICCAI Workshop on Data Engineering in Medical Imaging}, 
  pages={78--90}, 
  year={2023}, 
  organization={Springer} 
}
          
Web Summary
Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have explored: i) how other self-supervised approaches, such as pretext task-based pretraining, impact the learning with noisy label, and ii) any self-supervised pretraining methods alone for medical images in noisy label settings. Medical images often feature smaller datasets and subtle inter class variations, requiring human expertise to ensure correct classification. Thus, it is not clear if the methods improving learning with noisy labels in natural image datasets such as CIFAR would also help with medical images. In this work, we explore contrastive and pretext task-based self-supervised pretraining to initialize the weights of a deep learning classification model for two medical datasets with self-induced noisy labels -- NCT-CRC-HE-100K tissue histological images and COVID-QU-Ex chest X-ray images. Our results show that models initialized with pretrained weights obtained from self-supervised learning can effectively learn better features and improve robustness against noisy labels.
2023

CholecTriplet2022: Show me a tool and tell me the triplet–an endoscopic vision challenge for surgical action triplet detection

Chinedu Innocent Nwoye, Pranav Poudel, Binod Bhattarai, Shrawan Kumar Thapa, Nicolas Padoy
Medical Image Analysis, 2023
Peer Reviewed Journal article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
 @article{nwoye2023cholectriplet2022,
title={CholecTriplet2022: Show me a tool and tell me the triplet--an endoscopic vision
challenge for surgical action triplet detection},
author={Nwoye, Chinedu Innocent and Yu, Tong and Sharma, Saurav and Murali, Aditya and Alapatt, Deepak and Vardazaryan, Armine and Yuan, Kun and Hajek, Jonas and Reiter, Wolfgang and Yamlahi, Amine and others},
journal={arXiv preprint arXiv:2302.06294}, 
year={2023} 
}
                        
Web Summary
Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery.
2023

Emerging Avenue of Artificial Intelligence and Ethical Considerations

arXiv preprint arXiv:2308.07706, 2023
Pre-Prints
Transforming Global health with AI (TOGAI)
BibTeX
@article{khanal2023emerging, 
title={Emerging Avenue of Artificial Intelligence and Ethical Considerations}, 
author={Khanal, Bishesh}, 
year={2023}
}
                    
Web Summary
2023

Neural Network Pruning for Real-time Polyp Segmentation

Suman Sapkota, Pranav Poudel, Sudarshan Regmi, Bibek Panthi, Binod Bhattarai
Medical Image Understanding and Analysis (MIUA) 2023
Peer Reviewed Conference article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{sapkota2023neural,
title={Neural Network Pruning for Real-time Polyp Segmentation},
author={Sapkota, Suman and Poudel, Pranav and Regmi, Sudarshan and Panthi, Bibek and Bhattarai, Binod},
journal={arXiv preprint arXiv:2306.13203},
year={2023}
}
Web Summary
Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising model size. Neural network pruning has emerged as an exciting area that aims to eliminate redundant parameters to make the inference faster. In this study, we show an application of neural network pruning in polyp segmentation. We compute the importance score of convolutional filters and remove the filters having the least scores, which to some value of pruning does not degrade the performance. For computing the importance score, we use the Taylor First Order (TaylorFO) approximation of the change in network output for the removal of certain filters. Specifically, we employ a gradient-normalized backpropagation for the computation of the importance score. Through experiments in the polyp datasets, we validate that our approach can significantly reduce the parameter count and FLOPs retaining similar performance.
2023

M-VAAL: Multimodal Variational Adversarial Active Learning for Downstream Medical Image Analysis Tasks

Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Danail Stoyanov, Cristian A Linte
Annual Conference on Medical Image Understanding and Analysis (MIUA)
Pre-Prints
Transforming Global health with AI (TOGAI)
BibTeX
@article{khanal2023m, <
title={M-VAAL: Multimodal Variational Adversarial Active Learning for Downstream Medical
Image Analysis Tasks}, 
author={Khanal, Bidur and Bhattarai, Binod and Khanal, Bishesh and Stoyanov, Danail and
Linte, Cristian A},
journal={arXiv preprint arXiv:2306.12376}, 
year={2023}
}
                            
Web Summary
Acquiring properly annotated data is expensive in the medical field as it requires experts, time-consuming protocols, and rigorous validation. Active learning attempts to minimize the need for large annotated samples by actively sampling the most informative examples for annotation. These examples contribute significantly to improving the performance of supervised machine learning models, and thus, active learning can play an essential role in selecting the most appropriate information in deep learning-based diagnosis, clinical assessments, and treatment planning. Although some existing works have proposed methods for sampling the best examples for annotation in medical image analysis, they are not task-agnostic and do not use multimodal auxiliary information in the sampler, which has the potential to increase robustness. Therefore, in this work, we propose a Multimodal Variational Adversarial Active Learning (M-VAAL) method that uses auxiliary information from additional modalities to enhance the active sampling. We applied our method to two datasets: i) brain tumor segmentation and multi-label classification using the BraTS2018 dataset, and ii) chest X-ray image classification using the COVID-QU-Ex dataset. Our results show a promising direction toward data-efficient learning under limited annotations.
2023

A Client-server Deep Federated Learning for Cross-domain Surgical Image Segmentation

Ronast Subedi, Rebati Raman Gaire, Sharib Ali, Anh Nguyen, Danail Stoyanov, Binod Bhattarai
Data Engineering in Medical Imaging. DEMI, 2023
Peer Reviewed Workshop article
B Bhattarai MultiModal Learning Lab (MMLL)
Computational Endoscopy, Surgery & Pathology (CESP)
BibTeX
@article{subedi2023client, 
title={A Client-server Deep Federated Learning for Cross-domain Surgical Image Segmentation},
author={Subedi, Ronast and Gaire, Rebati Raman and Ali, Sharib and Nguyen, Anh and Stoyanov, Danail and Bhattarai, Binod}, 
journal={arXiv preprint arXiv:2306.08720},
year={2023}
}
                         
Web Summary
This paper presents a solution to the cross-domain adaptation problem for 2D surgical image segmentation, explicitly considering the privacy protection of distributed datasets belonging to different centers. Deep learning architectures in medical image analysis necessitate extensive training data for better generalization. However, obtaining sufficient diagnostic and surgical data is still challenging, mainly due to the inherent cost of data curation and the need of experts for data annotation. Moreover, increased privacy and legal compliance concerns can make data sharing across clinical sites or regions difficult. Another ubiquitous challenge the medical datasets face is inevitable domain shifts among the collected data at the different centers. To this end, we propose a Client-server deep federated architecture for cross-domain adaptation. A server hosts a set of immutable parameters common to both the source and target domains. The clients consist of the respective domain-specific parameters and make requests to the server while learning their parameters and inferencing. We evaluate our framework in two benchmark datasets, demonstrating applicability in computer-assisted interventions for endoscopic polyp segmentation and diagnostic skin lesion detection and analysis. Our extensive quantitative and qualitative experiments demonstrate the superiority of the proposed method compared to competitive baseline and state-of-the-art methods.
2023

T2FNorm: Extremely Simple Scaled Train-time Feature Normalization for OOD Detection

Sudarshan Regmi, Bibek Panthi, Sakar Dotel, Prashnna K Gyawali, Danail Stoynov, Binod Bhattarai
arXiv preprint arXiv:2305.17797, 2023
Pre-Prints
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{regmi2023t2fnorm, 
title={T2FNorm: Extremely Simple Scaled Train-time Feature Normalization for OOD Detection},
author={Regmi, Sudarshan and Panthi, Bibek and Dotel, Sakar and Gyawali, Prashnna K and Stoynov, Danail and Bhattarai, Binod},
journal={arXiv preprint arXiv:2305.17797}, 
year={2023} 
}
                         
Web Summary
Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention within the deep learning literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorporation of feature normalization within neural networks does not guarantee substantial improvement in OOD detection performance. In this work, we introduce T2FNorm, a novel approach to transforming features to hyperspherical space during training, while employing non-transformed space for OOD-scoring purposes. This method yields a surprising enhancement in OOD detection capabilities without compromising model accuracy in in-distribution(ID). Our investigation demonstrates that the proposed technique substantially diminishes the norm of the features of all samples, more so in the case of out-of-distribution samples, thereby addressing the prevalent concern of overconfidence in neural networks. The proposed method also significantly improves various post-hoc OOD detection methods.
2023

Histogram of Oriented Gradients meet deep learning: A novel multi-task deep network for 2D surgical image semantic segmentation

Binod Bhattarai, Ronast Subedi, Rebati Raman Gaire, Eduard Vazquez, Danail Stoyanov
Medical Image Analysis, 2023
Peer Reviewed Journal article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{bhattarai2022histogram, 
title={Histogram of Oriented Gradients Meet Deep Learning: A Novel Multi-task Deep Network for Medical Image Semantic Segmentation},
author={Bhattarai, Binod and Subedi, Ronast and Gaire, Rebati Raman and Vazquez, Eduard and Stoyanov, Danail},
journal={arXiv preprint arXiv:2204.01712}, 
year={2022}
}
                         
Web Summary
We present our novel deep multi-task learning method for medical image segmentation. Existing multi-task methods demand ground truth annotations for both the primary and auxiliary tasks. Contrary to it, we propose to generate the pseudo-labels of an auxiliary task in an unsupervised manner. To generate the pseudo-labels, we leverage Histogram of Oriented Gradients (HOGs), one of the most widely used and powerful hand-crafted features for detection. Together with the ground truth semantic segmentation masks for the primary task and pseudo-labels for the auxiliary task, we learn the parameters of the deep network to minimise the loss of both the primary task and the auxiliary task jointly. We employed our method on two powerful and widely used semantic segmentation networks: UNet and U2Net to train in a multi-task setup. To validate our hypothesis, we performed experiments on two different medical image segmentation data sets. From the extensive quantitative and qualitative results, we observe that our method consistently improves the performance compared to the counter-part method. Moreover, our method is the winner of FetReg Endovis Sub-challenge on Semantic Segmentation organised in conjunction with MICCAI 2021.
2023

Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Bone Shape Reconstruction

Mahesh Shakya, Bishesh Khanal
Neurips, 2023
Peer Reviewed Conference article
Transforming Global health with AI (TOGAI)
BibTeX
 @article{hasan2022challenges,
title={Challenges of deep learning methods for COVID-19 detection using public datasets},
author={Hasan, Md Kamrul and Alam, Md Ashraful and Dahal, Lavsen and Roy, Shidhartho and Wahid, Sifat Redwan and Elahi, Md Toufick E and Mart{\'\i}, Robert and Khanal, Bishesh},
journal={Informatics in Medicine Unlocked}, 
volume={30}, 
pages={100945},
year={2022}, 
publisher={Elsevier}
}
                        
Web Summary
Various deep learning models have been proposed for 3D bone shape reconstruction from two orthogonal (biplanar) X-ray images. However, it is unclear how these models compare against each other since they are evaluated on different anatomy, cohort and (often privately held) datasets. Moreover, the impact of the commonly optimized image-based segmentation metrics such as dice score on the estimation of clinical parameters relevant in 2D-3D bone shape reconstruction is not well known. To move closer toward clinical translation, we propose a benchmarking framework that evaluates tasks relevant to real-world clinical scenarios, including reconstruction of fractured bones, bones with implants, robustness to population shift, and error in estimating clinical parameters. Our open-source platform provides reference implementations of 8 models (many of whose implementations were not publicly available), APIs to easily collect and preprocess 6 public datasets, and the implementation of automatic clinical parameter and landmark extraction methods. We present an extensive evaluation of 8 2D-3D models on equal footing using 6 public datasets comprising images for four different anatomies. Our results show that attention-based methods that capture global spatial relationships tend to perform better across all anatomies and datasets; performance on clinically relevant subgroups may be overestimated without disaggregated reporting; ribs are substantially more difficult to reconstruct compared to femur, hip and spine; and the dice score improvement does not always bring a corresponding improvement in the automatic estimation of clinically relevant parameters.
2023

Investigating the impact of class-dependent label noise in medical image classification

Bidur Khanal, SM Kamrul Hasan, Bishesh Khanal, Cristian A Linte
Medical Imaging 2023: Image Processing, vol. 12463, pp. 728-733. SPIE, (2023)
Peer Reviewed Conference article
Transforming Global health with AI (TOGAI)
BibTeX
                            @article{}
                        
Web Summary
Label noise is inevitable in medical image databases developed for deep learning due to the inter-observer variability caused by the different levels of expertise of the experts annotating the images, and, in some cases, the automated methods that generate labels from medical reports. It is known that incorrect annotations or label noise can degrade the actual performance of supervised deep learning models and can bias the model’s evaluation. Existing literature show that noise in one class has minimal impact on the model’s performance for another class in natural image classification problems where different target classes have a relatively distinct shape and share minimal visual cues for knowledge transfer among the classes. However, it is not clear how classdependent label noise affects the model’s performance when operating on medical images, for which different output classes can be difficult to distinguish even for experts, and there is a high possibility of knowledge transfer across classes during the training period. We hypothesize that for medical image classification tasks where the different classes share a very similar shape with differences only in texture, the noisy label for one class might affect the performance across other classes, unlike the case when the target classes have different shapes and are visually distinct. In this paper, we study this hypothesis using two publicly available datasets: a 2D organ classification dataset with target organ classes being visually distinct, and a histopathology image classification dataset where the target classes look very similar visually. Our results show that the label noise in one class has a much higher impact on the model’s performance on other classes for the histopathology dataset compared to the organ dataset.
2023

Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge Findings

Sophia Bano, Alessandro Casella, Francisco Vasconcelos, Abdul Qayyum, Abdesslam Benzinou, Moona Mazher, Fabrice Meriaudeau, Chiara Lena, Ilaria Anita Cintorrino, Gaia Romana De Paolis, Jessica Biagioli, Daria Grechishnikova, Jing Jiao, Bizhe Bai, Yanyan Qiao, Binod Bhattarai, Rebati Raman Gaire, Ronast Subedi, Eduard Vazquez, Szymon Płotka, Aneta Lisowska, Arkadiusz Sitek, George Attilakos, Ruwan Wimalasundera, Anna L David, Dario Paladini, Jan Deprest, Elena De Momi, Leonardo S Mattos, Sara Moccia, Danail Stoyanov
Medical Image Analysis, 2023
Peer Reviewed Journal article
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{bano2023placental,
title={Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and
MICCAI FetReg2021 Challenge Findings}, 
author={Bano, Sophia and Casella, Alessandro and Vasconcelos, Francisco and Qayyum, Abdul
and Benzinou, Abdesslam and Mazher, Moona and Meriaudeau, Fabrice and Lena, Chiara and Cintorrino, Ilaria
Anita and De Paolis, Gaia Romana and others}, 
journal={Medical Image Analysis}, 
year={2023} 
}
                            
Web Summary
Fetoscopy laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS). The procedure involves photocoagulation pathological anastomoses to regulate blood exchange among twins. The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination. These challenges may lead to increased surgery time and incomplete ablation. Computer-assisted intervention (CAI) can provide surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking. Research in this domain has been hampered by the lack of high-quality data to design, develop and test CAI algorithms. Through the Fetoscopic Placental Vessel Segmentation and Registration (FetReg2021) challenge, which was organized as part of the MICCAI2021 Endoscopic Vision challenge, we released the first largescale multicentre TTTS dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms. For this challenge, we released a dataset of 2060 images, pixel-annotated for vessels, tool, fetus and background classes, from 18 in-vivo TTTS fetoscopy procedures and 18 short video clips. Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fetoscopic procedures and 6 short clips. The challenge provided an opportunity for creating generalized solutions for fetoscopic scene understanding and mosaicking. In this paper, we present the findings of the FetReg2021 challenge alongside reporting a detailed literature review for CAI in TTTS fetoscopy. Through this challenge, its analysis and the release of multi-centre fetoscopic data, we provide a benchmark for future research in this field.
2023

Input Invex Neural Network

Suman Sapkota, Binod Bhattarai
Neural Network , 2023
Pre-Prints
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{sapkota2021input, 
title={Input Invex Neural Network},
author={Sapkota, Suman and Bhattarai, Binod},
journal={arXiv preprint arXiv:2106.08748}, 
year={2023} 
}
                            
Web Summary
Connected decision boundaries are useful in several tasks like image segmentation, clustering, alpha-shape or defining a region in nD-space. However, the machine learning literature lacks methods for generating connected decision boundaries using neural networks. Thresholding an invex function, a generalization of a convex function, generates such decision boundaries. This paper presents two methods for constructing invex functions using neural networks. The first approach is based on constraining a neural network with Gradient Clipped-Gradient Penality (GCGP), where we clip and penalise the gradients. In contrast, the second one is based on the relationship of the invex function to the composition of invertible and convex functions. We employ connectedness as a basic interpretation method and create connected region-based classifiers. We show that multiple connected set based classifiers can approximate any classification function. In the experiments section, we use our methods for classification tasks using an ensemble of 1-vs-all models as well as using a single multiclass model on small-scale datasets. The experiments show that connected set-based classifiers do not pose any disadvantage over ordinary neural network classifiers, but rather, enhance their interpretability. We also did an extensive study on the properties of invex function and connected sets for interpretability and network morphism with experiments on toy and real-world data sets. Our study suggests that invex function is fundamental to understanding and applying locality and connectedness of input space which is useful for various downstream tasks.
2023

Fast fetal head compounding from multi-view 3D ultrasound

Robert Wright, Alberto Gomez, Veronika A Zimmer, Nicolas Toussaint, Bishesh Khanal, Jacqueline Matthew, Emily Skelton, Bernhard Kainz, Daniel Rueckert, Joseph V Hajnal, Julia A Schnabel
Medical Image Analysis, (2023) : 102793, 2023
Peer Reviewed Journal article
Transforming Global health with AI (TOGAI)
BibTeX
@article{wright2023fast, 
title={Fast fetal head compounding from multi-view 3D ultrasound}, 
author={Wright, Robert and Gomez, Alberto and Zimmer, Veronika A and Toussaint,
Nicolas and Khanal, Bishesh and Matthew, Jacqueline and Skelton, Emily and Kainz, Bernhard and Rueckert, Daniel
and Hajnal, Joseph V and others}, 
journal={Medical Image Analysis}, 
pages={102793}, 
year={2023}, 
publisher={Elsevier}
}
                            
Web Summary
The diagnostic value of ultrasound images may be limited by the presence of artefacts, notably acoustic shadows, lack of contrast and localised signal dropout. Some of these artefacts are dependent on probe orientation and scan technique, with each image giving a distinct, partial view of the imaged anatomy. In this work, we propose a novel method to fuse the partially imaged fetal head anatomy, acquired from numerous views, into a single coherent 3D volume of the full anatomy. Firstly, a stream of freehand 3D US images is acquired using a single probe, capturing as many different views of the head as possible. The imaged anatomy at each time-point is then independently aligned to a canonical pose using a recurrent spatial transformer network, making our approach robust to fast fetal and probe motion. Secondly, images are fused by averaging only the most consistent and salient features from all images, producing a more detailed compounding, while minimising artefacts. We evaluated our method quantitatively and qualitatively, using image quality metrics and expert ratings, yielding state of the art performance in terms of image quality and robustness to misalignments. Being online, fast and fully automated, our method shows promise for clinical use and deployment as a real-time tool in the fetal screening clinic, where it may enable unparallelled insight into the shape and structure of the face, skull and brain.
2022

Distributed, Collaborative, and Federated Learning, and Affordable AI and Healthcare for Resource Diverse Global Health

Shadi Albarqouni, Spyridon Bakas, Sophia Bano, M Jorge Cardoso, Bishesh Khanal, Bennett Landman, Xiaoxiao Li, Chen Qin, Islem Rekik, Nicola Rieke, Holger Roth, Debdoot Sheet, Daguang Xu
Third MICCAI Workshop, DeCaF 2022, and Second MICCAI Workshop, FAIR 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18 and 22, 2022
Peer-reviewed conference articles
Transforming Global health with AI (TOGAI)
BibTeX
@book{albarqouni2022distributed, 
title={Distributed, Collaborative, and Federated Learning, and Affordable AI and
Healthcare for Resource Diverse Global Health: Third MICCAI Workshop, DeCaF 2022, and Second MICCAI
Workshop, FAIR 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18 and 22, 2022,
Proceedings}, 
author={Albarqouni, Shadi and Bakas, Spyridon and Bano, Sophia and Cardoso, M
Jorge and Khanal, Bishesh and Landman, Bennett and Li, Xiaoxiao and Qin, Chen and Rekik, Islem and Rieke,
Nicola and others},
volume={13573}, 
year={2022}, 
publisher={Springer Nature} 
}
                            
Web Summary
2022

Noisy Heuristics NAS: A Network Morphism based Neural Architecture Search using Heuristics

Suman Sapkota, Binod Bhattarai
Dynamic Neural Networks, ICML, 2022
Workshop & Challenges
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{sapkota2022noisy, 
title={Noisy Heuristics NAS: A Network Morphism based Neural Architecture
Search using Heuristics}, 
author={Sapkota, Suman and Bhattarai, Binod}, 
journal={arXiv preprint arXiv:2207.04467}, 
year={2022}
}
                                
Web Summary
Network Morphism based Neural Architecture Search (NAS) is one of the most efficient methods, however, knowing where and when to add new neurons or remove dis-functional ones is generally left to black-box Reinforcement Learning models. In this paper, we present a new Network Morphism based NAS called Noisy Heuristics NAS which uses heuristics learned from manually developing neural network models and inspired by biological neuronal dynamics. Firstly, we add new neurons randomly and prune away some to select only the best fitting neurons. Secondly, we control the number of layers in the network using the relationship of hidden units to the number of input-output connections. Our method can increase or decrease the capacity or non-linearity of models online which is specified with a few meta-parameters by the user. Our method generalizes both on toy datasets and on real-world data sets such as MNIST, CIFAR-10, and CIFAR100. The performance is comparable to the handengineered architecture ResNet-18 with the similar parameters.
2022

NepBERTa: Nepali Language Model Trained in a Large Corpus

Sulav Timilsina, Milan Gautam, Binod Bhattarai
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022
Peer-reviewed conference articles
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@inproceedings{timilsina-etal-2022-nepberta, 
title = "{N}ep{BERT}a: {N}epali Language Model Trained in a Large Corpus", 
author = "Timilsina, Sulav and Gautam, Milan and Bhattarai, Binod", 
booktitle = "Proceedings of the 2nd Conference of the Asia-Pacific
Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on
Natural Language Processing (Volume 2: Short Papers)", 
month = nov, 
year = "2022", 
address = "Online only",
publisher = "Association for Computational Linguistics", 
url = "https://aclanthology.org/2022.aacl-short.34",
pages = "273--284",
}
                        
Web Summary
Nepali is a low-resource language with more than 40 million speakers worldwide. It is written in Devnagari script and has rich semantics and complex grammatical structure. To this date, multilingual models such as Multilingual BERT, XLM and XLM-RoBERTa haven{'}t been able to achieve promising results in Nepali NLP tasks, and there does not exist any such a large-scale monolingual corpus. This study presents NepBERTa, a BERT-based Natural Language Understanding (NLU) model trained on the most extensive monolingual Nepali corpus ever. We collected a dataset of 0.8B words from 36 different popular news sites in Nepal and introduced the model. This data set is 3 folds times larger than the previous publicly available corpus. We evaluated the performance of NepBERTa in multiple Nepali-specific NLP tasks, including Named-Entity Recognition, Content Classification, POS Tagging, and Sequence Pair Similarity. We also introduce two different datasets for two new downstream tasks and benchmark four diverse NLU tasks altogether. We bring all these four tasks under the first-ever Nepali Language Understanding Evaluation (Nep-gLUE) benchmark. We will make Nep-gLUE along with the pre-trained model and data sets publicly available for research.
2022

Challenges of Deep Learning Methods for COVID-19 Detection Using Public Datasets

Md Kamrul Hasan, Md Ashraful Alam, Lavsen Dahal, Md Toufick E Elahi, Shidhartho Roy, Sifat Redwan Wahid, Robert Marti, Bishesh Khanal
Informatics in Medicine Unlocked 30 (2022): 100945.
Peer Reviewed Journal article
Transforming Global health with AI (TOGAI)
BibTeX
@article{Hasan2020.11.07.20227504, 
author = {Hasan, Md. Kamrul and Alam, Md. Ashraful and Dahal, Lavsen
and Elahi, Md. Toufick E and Roy,
Shidhartho and Wahid, Sifat Redwan and Mart\'i, Robert and Khanal,
Bishesh}, 
title = {Challenges of Deep Learning Methods for COVID-19 Detection
Using Public Datasets}, 
elocation-id = {2020.11.07.20227504}, 
year = {2020}, 
doi = {10.1101/2020.11.07.20227504}, 
publisher = {Cold Spring Harbor Laboratory Press},
}
                        
Web Summary
A large number of studies in the past months have proposed deep learning-based Artificial Intelligence (AI) tools for automated detection of COVID-19 using publicly available datasets of Chest X-rays (CXRs) or CT scans for training and evaluation. Most of these studies report high accuracy when classifying COVID-19 patients from normal or other commonly occurring pneumonia cases. However, these results are often obtained on cross-validation studies without an independent test set coming from a separate dataset and have biases such as the two classes to be predicted come from two completely different datasets. In this work, we investigate potential overfitting and biases in such studies by designing different experimental setups within the available public data constraints and highlight the challenges and limitations of developing deep learning models with such datasets. We propose a deep learning architecture for COVID-19 classification that combines two very popular classification networks, ResNet and Xception, and use it to carry out the experiments to investigate challenges and limitations. The results show that the deep learning models can overestimate their performance due to biases in the experimental design and overfitting to the training dataset. We compare the proposed architecture to state-of-the-art methods utilizing an independent test set for evaluation, where some of the identified bias and overfitting issues are reduced. Although our proposed deep learning architecture gives the best performance with our best possible setup, we highlight the challenges in comparing and interpreting various deep learning algorithms’ results. While the deep learning-based methods using chest imaging data show promise in being helpful for clinical management and triage of COVID-19 patients, our experiments suggest that a larger, more comprehensive database with less bias is necessary for developing tools applicable in real clinical settings.
2022

COVID-19-related Nepali Tweets Classification in a Low Resource Setting

Rabin Adhikari, Safal Thapaliya, Nirajan Basnet, Samip Poudel, Aman Shakya, Bishesh Khanal
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications (SMM4H), Workshop & Shared Task, COLING (2022), Korea, 2022
Peer-reviewed workshop articles
Transforming Global health with AI (TOGAI)
BibTeX
@inproceedings{adhikari-etal-2022-covid, 
title = "{COVID}-19-related {N}epali Tweets Classification in a
Low Resource Setting", 
author = "Adhikari, Rabin and Thapaliya, Safal and Basnet,
Nirajan and Poudel, Samip and Shakya,
Aman and Khanal, Bishesh",
booktitle = "Proceedings of The Seventh Workshop on Social Media
Mining for Health Applications,
Workshop & Shared Task", 
month = oct, 
year = "2022",
address = "Gyeongju, Republic of Korea", 
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.smm4h-1.52",
pages = "209--215",
}
                            
Web Summary
Billions of people across the globe have been using social media platforms in their local languages to voice their opinions about the various topics related to the COVID-19 pandemic. Several organizations, including the World Health Organization, have developed automated social media analysis tools that classify COVID-19-related tweets to various topics. However, these tools that help combat the pandemic are limited to very few languages, making several countries unable to take their benefit. While multi-lingual or low-resource languagespecific tools are being developed, there is still a need to expand their coverage, such as for the Nepali language. In this paper, we identify the eight most common COVID-19 discussion topics among the Twitter community using the Nepali language, set up an online platform to automatically gather Nepali tweets containing the COVID-19-related keywords, classify the tweets into the eight topics, and visualize the results across the period in a web-based dashboard. We compare the performance of two state-of-the-art multi-lingual language models for Nepali tweet classification, one generic (mBERT) and the other Nepali language family specific model (MuRIL). Our results show that the models’ relative performance depends on the data size, with MuRIL doing better for a larger dataset.
2022

FixMatchSeg: Fixing FixMatch for Semi-Supervised Semantic Segmentation

Pratima Upretee, Bishesh Khanal
2022
Pre-prints
Transforming Global health with AI (TOGAI)
BibTeX
@article{upretee2022fixmatchseg, 
title={FixMatchSeg: Fixing FixMatch for Semi-Supervised Semantic Segmentation}, 
author={Upretee, Pratima and Khanal, Bishesh},
journal={arXiv preprint arXiv:2208.00400}, 
year={2022}
}
                        
Web Summary
Supervised deep learning methods for semantic medical image segmentation are getting increasingly popular in the past few this http URL, in resource constrained settings, getting large number of annotated images is very difficult as it mostly requires experts, is expensive and this http URL-supervised segmentation can be an attractive solution where a very few labeled images are used along with a large number of unlabeled ones. While the gap between supervised and semi-supervised methods have been dramatically reduced for classification problems in the past couple of years, there still remains a larger gap in segmentation methods. In this work, we adapt a state-of-the-art semi-supervised classification method FixMatch to semantic segmentation task, introducing FixMatchSeg. FixMatchSeg is evaluated in four different publicly available datasets of different anatomy and different modality: cardiac ultrasound, chest X-ray, retinal fundus image, and skin images. When there are few labels, we show that FixMatchSeg performs on par with strong supervised baselines.
2022

Task-Aware Active Learning for Endoscopic Polyp Segmentation

Shrawan Kumar Thapa, Pranav Poudel, Sudarshan Regmi, Binod Bhattarai, Danail Stoyanov
arXiv preprint arXiv:2204.03440 , 2022
Pre-prints
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{thapa2023task, 
title={Task-Aware Active Learning for Endoscopic Polyp Segmentation}, 
author={Thapa, Shrawan Kumar and Poudel, Pranav and Regmi, Sudarshan and Bhattarai, Binod and Stoyanov, Danail},
year={2023}, 
publisher={TechRxiv} 
}
                        
Web Summary
Semantic segmentation of polyps is one of the most important research problems in endoscopic image analysis. One of the main obstacles to researching such a problem is the lack of annotated data. Endoscopic annotations necessitate the specialist knowledge of expert endoscopists, and hence the difficulty of organizing arises along with tremendous costs in time and budget. To address this problem, we investigate an active learning paradigm to reduce the requirement of massive labeled training examples by selecting the most discriminative and diverse unlabeled examples for the task taken into consideration. To this end, we propose a task-aware active learning pipeline that considers not only the uncertainty that the current task model exhibits for a given unlabelled example but also the diversity in the composition of the acquired pool in the feature space of the model. We compare our method with the competitive baselines on two publicly available polyps segmentation benchmark datasets. Both qualitative and quantitative analysis show a significant improvement in performance when sampling on the semantic space of the model than image space, and also demonstrate complementary nature of using model uncertainty information.
2022

Label Geometry Aware Discriminator for Conditional Generative Networks

Suman Sapkota, Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Tae-Kyun Kim
26th International Conference on Pattern Recognition (ICPR). IEEE. August, 2022
Peer-reviewed conference articles
Transforming Global health with AI (TOGAI)
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@inproceedings{sapkota2022label, 
title={Label Geometry Aware Discriminator for Conditional Generative Adversarial Networks},
author={Sapkota, Suman and Khanal, Bidur and Bhattarai, Binod and Khanal, Bishesh and Kim, Tae-Kyun}, 
booktitle={2022 26th International Conference on Pattern Recognition (ICPR)}, 
pages={2914--2920}, 
year={2022}, 
organization={IEEE} 
}
                        
Web Summary
Multi-domain image-to-image translation with conditional Generative Adversarial Networks (GANs) can generate highly photo realistic images with desired target classes, yet these synthetic images have not always been helpful to improve downstream supervised tasks such as image classification. Improving downstream tasks with synthetic examples requires generating images with high fidelity to the unknown conditional distribution of the target class, which many labeled conditional GANs attempt to achieve by adding soft-max cross-entropy loss based auxiliary classifier in the discriminator. As recent studies suggest that the soft-max loss in Euclidean space of deep feature does not leverage their intrinsic angular distribution, we propose to replace this loss in auxiliary classifier with an additive angular margin (AAM) loss that takes benefit of the intrinsic angular distribution, and promotes intra-class compactness and inter-class separation to help generator synthesize high fidelity images. We validate our method on RaFD and CIFAR-100, two challenging face expression and natural image classification data set. Our method outperforms state-of-the-art methods in several different evaluation criteria including recently proposed GAN-train and GAN-test metrics designed to assess the impact of synthetic data on downstream classification task, assessing the usefulness in data augmentation for supervised tasks with prediction accuracy score and average confidence score, and the well known FID metric.
2022

TGANet: Text-guided attention for improved polyp segmentation

Nikhil Kumar Tomar, Debesh Jha, Ulas Bagci, Sharib Ali
Medical Image Computing and Computer Assisted Intervention–MICCAI (2022): 25th International Conference, Singapore, (September 18–22, 2022), Proceedings, Part III. Cham: Springer Nature Switzerland, , 2022
Peer-reviewed conference articles
Computational Endoscopy, Surgery & Pathology (CESP)
BibTeX
@InProceedings{10.1007/978-3-031-16437-8_15, 
author="Tomar, Nikhil Kumar and Jha, Debesh and Bagci, Ulas and Ali, Sharib", 
editor="Wang, Linwei and Dou, Qi and Fletcher, P. Thomas and Speidel, Stefanie and Li, Shuo",
title="TGANet: Text-Guided Attention for Improved Polyp Segmentation", 
booktitle="Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022", 
year="2022", 
publisher="Springer Nature Switzerland", 
address="Cham", 
pages="151--160", 
isbn="978-3-031-16437-8" 
}
                        
Web Summary
Colonoscopy is a gold standard procedure but is highly operator-dependent. Automated polyp segmentation, a precancerous precursor, can minimize missed rates and timely treatment of colon cancer at an early stage. Even though there are deep learning methods developed for this task, variability in polyp size can impact model training, thereby limiting it to the size attribute of the majority of samples in the training dataset that may provide sub-optimal results to differently sized polyps. In this work, we exploit size-related and polyp number-related features in the form of text attention during training. We introduce an auxiliary classification task to weight the text-based embedding that allows network to learn additional feature representations that can distinctly adapt to differently sized polyps and can adapt to cases with multiple polyps. Our experimental results demonstrate that these added text embeddings improve the overall performance of the model compared to state-of-the-art segmentation methods. We explore four different datasets and provide insights for size-specific improvements. Our proposed text-guided attention network (TGANet) can generalize well to variable-sized polyps in different datasets.
2022

Task-Aware Active Learning for Endoscopic Image Analysis

Shrawan Kumar Thapa, Pranav Poudel, Binod Bhattarai, Danail Stoyanov
Endoscopic Image Analysis, 2022
Pre-prints
B Bhattarai MultiModal Learning Lab (MMLL)
BibTeX
@article{thapa2022task, 
title={Task-Aware Active Learning for Endoscopic Image Analysis}, 
author={Thapa, Shrawan Kumar and Poudel, Pranav and Bhattarai, Binod and Stoyanov,Danail}, 
journal={arXiv preprint arXiv:2204.03440},
year={2022}
}
                        
Web Summary
Semantic segmentation of polyps and depth estimation are two important research problems in endoscopic image analysis. One of the main obstacles to conduct research on these research problems is lack of annotated data. Endoscopic annotations necessitate the specialist knowledge of expert endoscopists and due to this, it can be difficult to organise, expensive and time consuming. To address this problem, we investigate an active learning paradigm to reduce the number of training examples by selecting the most discriminative and diverse unlabelled examples for the task taken into consideration. Most of the existing active learning pipelines are task-agnostic in nature and are often sub-optimal to the end task. In this paper, we propose a novel task-aware active learning pipeline and applied for two important tasks in endoscopic image analysis: semantic segmentation and depth estimation. We compared our method with the competitive baselines. From the experimental results, we observe a substantial improvement over the compared baselines.
2022

Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge

Sharib Ali, Noha Ghatwary, Debesh Jha, Ece Isik-Polat, Gorkem Polat, Chen Yang, Wuyang Li, Adrian Galdran, Miguel-Ángel González Ballester, Vajira Thambawita, Steven Hicks, Sahadev Poudel, Sang-Woong Lee, Ziyi Jin, Tianyuan Gan, ChengHui Yu, JiangPeng Yan, Doyeob Yeo, Hyunseok Lee, Nikhil Kumar Tomar, Mahmood Haithmi, Amr Ahmed, Michael A. Riegler, Christian Daul, Pål Halvorsen, Jens Rittscher, Osama E. Salem, Dominique Lamarque, Renato Cannizzaro, Stefano Realdon, Thomas de Lange, James E. East
2022
Pre-prints
Computational Endoscopy, Surgery & Pathology (CESP)
BibTeX
@article{ali2022assessing, 
title={Assessing generalisability of deep learning-based polyp detection and segmentation
methods through a computer vision challenge}, 
author={Ali, Sharib and Ghatwary, Noha and Jha, Debesh and Isik-Polat, Ece and Polat, Gorkem and
Yang, Chen and Li, Wuyang and Galdran, Adrian and Ballester, Miguel-{\'A}ngel Gonz{\'a}lez and Thambawita,
Vajira and others}, 
journal={arXiv preprint arXiv:2202.12031},
year={2022} 
}
                        
Web Summary
Polyps are well-known cancer precursors identified by colonoscopy. However, variability in their size, location, and surface largely affect identification, localisation, and characterisation. Moreover, colonoscopic surveillance and removal of polyps (referred to as polypectomy ) are highly operator-dependent procedures. There exist a high missed detection rate and incomplete removal of colonic polyps due to their variable nature, the difficulties to delineate the abnormality, the high recurrence rates, and the anatomical topography of the colon. There have been several developments in realising automated methods for both detection and segmentation of these polyps using machine learning. However, the major drawback in most of these methods is their ability to generalise to out-of-sample unseen datasets that come from different centres, modalities and acquisition systems. To test this hypothesis rigorously we curated a multi-centre and multi-population dataset acquired from multiple colonoscopy systems and challenged teams comprising machine learning experts to develop robust automated detection and segmentation methods as part of our crowd-sourcing Endoscopic computer vision challenge (EndoCV) 2021. In this paper, we analyse the detection results of the four top (among seven) teams and the segmentation results of the five top teams (among 16). Our analyses demonstrate that the top-ranking teams concentrated on accuracy (i.e., accuracy > 80% on overall Dice score on different validation sets) over real-time performance required for clinical applicability. We further dissect the methods and provide an experiment-based hypothesis that reveals the need for improved generalisability to tackle diversity present in multi-centre datasets.
2021

Machine-Learning-Assisted Analysis of Colorimetric Assays on Paper Analytical Devices

Bidur Khanal, Pravin Pokhrel, Bishesh Khanal, Basant Giri
ACS omega 6.49 (2021): 33837-33845., 2021
Peer-reviewed Journal articles
Transforming Global health with AI (TOGAI)
BibTeX
@article{khanal2021machine, 
title={Machine-Learning-Assisted Analysis of Colorimetric Assays on Paper Analytical Devices},
author={Khanal, Bidur and Pokhrel, Pravin and Khanal, Bishesh and Giri, Basant}, 
journal={ACS omega}, 
volume={6}, 
number={49}, 
pages={33837--33845}, 
year={2021}, 
publisher={ACS Publications} 
}
                         
Web Summary
Paper-based analytical devices (PADs) employing colorimetric detection and smartphone images have gained wider acceptance in a variety of measurement applications. PADs are primarily meant to be used in field settings where assay and imaging conditions greatly vary, resulting in less accurate results. Recently, machine-learning (ML)-assisted models have been used in image analysis. We evaluated a combination of four ML models─logistic regression, support vector machine (SVM), random forest, and artificial neural network (ANN)─as well as three image color spaces, RGB, HSV, and LAB, for their ability to accurately predict analyte concentrations. We used images of PADs taken at varying lighting conditions, with different cameras and users for food color and enzyme inhibition assays to create training and test datasets. The prediction accuracy was higher for food color than enzyme inhibition assays in most of the ML models and color space combinations. All models better predicted coarse-level classifications than fine-grained concentration classes. ML models using the sample color along with a reference color increased the models’ ability to predict the result in which the reference color may have partially factored out the variation in ambient assay and imaging conditions. The best concentration class prediction accuracy obtained for food color was 0.966 when using the ANN model and LAB color space. The accuracy for enzyme inhibition assay was 0.908 when using the SVM model and LAB color space. Appropriate models and color space combinations can be useful to analyze large numbers of samples on PADs as a powerful low-cost quick field-testing tool.
2021

Iterative deep learning for improved segmentation of endoscopic images

Nikhil Kumar Tomar, Nikhil K Tomar
Nordic Machine Intelligence 1.1 (2021): 38-40., 2021
Workshop & Challenges
Computational Endoscopy, Surgery & Pathology (CESP)
BibTeX
@article{ali2021iterative,
title={Iterative deep learning for improved segmentation of endoscopic images}, 
author={Ali, Sharib and Tomar, Nikhil K}, 
journal={Nordic Machine Intelligence}, 
volume={1}, 
number={1}, 
pages={38--40}, 
year={2021} 
}
                        
Web Summary
Iterative segmentation is a unique way to prune the segmentation maps initialized by faster inference techniques or even unsupervised traditional thresholding methods. We used our previous feedback attention-based method for this work and demonstrate that with an optimal iterative procedure, our method can reach competitive accuracies in endoscopic imaging
2021

Evaluation and Comparison of Accurate Automated Spinal Curvature Estimation Algorithms with Spinal Anterior-posterior X-Ray Images: The AASCE2019 Challenge

Liansheng Wang, Cong Xie, Yi Lin, Hong-Yu Zhou, Kailin Chen, Dalong Cheng, Florian Dubost, Benjamin Collery,Bidur Khanal, Bishesh Khanal, Rong Tao, Shangliang Xu, Upasana Upadhyay Bharadwaj, Zhusi Zhong, Jie Li, Shuxin Wang, Shuo Li
Medical Image Analysis 72 (2021): 102115., 2021
Peer-reviewed Journal articles
Transforming Global health with AI (TOGAI)
BibTeX
@article{wang2021evaluation, 
title={Evaluation and Comparison of Accurate Automated Spinal Curvature Estimation Algorithms
with Spinal Anterior-posterior X-Ray Images: The AASCE2019 Challenge}, 
author={Wang, Liansheng and Xie, Cong and Lin, Yi and Zhou, Hong-Yu and Chen, Kailin and Cheng,
Dalong and Dubost, Florian and Collery, Benjamin and Khanal, Bidur and Khanal, Bishesh and others}, 
journal={Medical Image Analysis}, 
pages={102115}, 
year={2021}, 
publisher={Elsevier}
}
                        
Web Summary
Scoliosis is a common medical condition, which occurs most often during the growth spurt just before puberty. Untreated Scoliosis may cause long-term sequelae. Therefore, accurate automated quantitative estimation of spinal curvature is an important task for the clinical evaluation and treatment planning of Scoliosis. A couple of attempts have been made for automated Cobb angle estimation on single-view x-rays. It is very challenging to achieve a highly accurate automated estimation of Cobb angles because it is difficult to utilize x-rays efficiently. With the idea of developing methods for accurate automated spinal curvature estimation, AASCE2019 challenge provides spinal anterior-posterior x-ray images with manual labels for training and testing the participating methods. We review eight top-ranked methods from 12 teams. Experimental results show that overall the best performing method achieved a symmetric mean absolute percentage (SMAPE) of 21.71%. Limitations and possible future directions are also described in the paper. We hope the dataset in AASCE2019 and this paper could provide insights into quantitative measurement of the spine.
2021

Visualising Argumentation Graphs with Graph Embeddings and t-SNE

Lars Malmqvist, Tommy Yuan, Suresh Manandhar
2021
Pre-prints
Other Adj. Faculties
BibTeX
@article{malmqvist2021visualising, 
title={Visualising Argumentation Graphs with Graph Embeddings and t-SNE}, 
author={Malmqvist, Lars and Yuan, Tommy and Manandhar, Suresh},
journal={arXiv preprint arXiv:2107.00528}, 
year={2021} 
}
Web Summary
This paper applies t-SNE, a visualisation technique familiar from Deep Neural Network research to argumentation graphs by applying it to the output of graph embeddings generated using several different methods. It shows that such a visualisation approach can work for argumentation and show interesting structural properties of argumentation graphs, opening up paths for further research in the area.
2021

COVID-19 control strategies and intervention effects in resource limited settings: a modeling study

Kiran Raj Pandey, Anup Subedee, Bishesh Khanal, Bhagawan Koirala
Plos one 16.6 (2021): e0252570., 2021
Peer-reviewed Journal articles
Transforming Global health with AI (TOGAI)
BibTeX
@article{pandey2021covid, 
title={COVID-19 control strategies and intervention effects in resource limited settings: a
modeling study},
author={Pandey, Kiran Raj and Subedee, Anup and Khanal, Bishesh and Koirala, Bhagawan}, 
journal={Plos one}, 
volume={16}, 
number={6}, 
pages={e0252570}, 
year={2021}, 
publisher={Public Library of Science San Francisco, CA USA} 
}
                       
Web Summary
2021

Penalizing small errors using an Adaptive Logarithmic Loss

Chaitanya Kaul, Nick Pears, Hang Dai, Roderick Murray-Smith, Suresh Manandhar
Pattern Recognition. ICPR International Workshops and Challenges (January 10-15, 2021). Proceedings, Part I. Cham: Springer International Publishing,, 2021
Peer-reviewed workshop articles
Other Adj. Faculties
BibTeX
@inproceedings{kaul2021penalizing, 
title={Penalizing small errors using an adaptive logarithmic loss}, 
author={Kaul, Chaitanya and Pears, Nick and Dai, Hang and Murray-Smith, Roderick and Manandhar,
Suresh},
booktitle={International Conference on Pattern Recognition}, 
pages={368--375}, 
year={2021}, 
organization={Springer} 
}
                        
Web Summary
Loss functions are error metrics that quantify the di erence between a prediction and its corresponding ground truth. Fundamentally, they de ne a functional landscape for traversal by gradient descent. Although numerous loss functions have been proposed to date in order to handle various machine learning problems, little attention has been given to enhancing these functions to better traverse the loss landscape. In this paper, we simultaneously and signi cantly mitigate two prominent problems in medical image segmentation namely: i) class imbalance between foreground and background pixels and ii) poor loss function convergence. To this end, we propose an Adaptive Logarithmic Loss (ALL) function. We compare this loss function with the existing state-of-theart on the ISIC 2018 dataset, the nuclei segmentation dataset as well as the DRIVE retinal vessel segmentation dataset. We measure the performance of our methodology on benchmark metrics and demonstrate state-of-the-art performance. More generally, we show that our system can be used as a framework for better training of deep neural networks.
2021

FatNet: A feature-attentive network for 3D point cloud processing

Chaitanya Kaul, Nick Pears, Suresh Manandhar
25th International conference on pattern recognition (ICPR) (2020). IEEE (2021)., 2021
Peer-reviewed conference articles
Other Adj. Faculties
BibTeX
@inproceedings{kaul2021fatnet, 
title={FatNet: A feature-attentive network for 3D point cloud processing}, 
author={Kaul, Chaitanya and Pears, Nick and Manandhar, Suresh}, 
booktitle={2020 25th International Conference on Pattern Recognition (ICPR)}, 
pages={7211--7218}, 
year={2021}, 
organization={IEEE} 
}
                         
Web Summary
The application of deep learning to 3D point clouds is challenging due to its lack of order. Inspired by the point embeddings of PointNet and the edge embeddings of DGCNNs, we propose three improvements to the task of point cloud analysis. First, we introduce a novel feature-attentive neural network layer, a FAT layer, that combines both global point-based features and local edge-based features in order to generate better embeddings. Second, we find that applying the same attention mechanism across two different forms of feature map aggregation, max pooling and average pooling, gives better performance than either alone. Third, we observe that residual feature reuse in this setting propagates information more effectively between the layers, and makes the network easier to train. Our architecture achieves state-of-the-art results on the task of point cloud classification, as demonstrated on the ModelNet40 dataset, and an extremely competitive performance on the ShapeNet part segmentation challenge.
2020

Ensemble U-Net model for efficient polyp segmentation

MediaEval (2020)., 2020
Workshop & Challenges
Transforming Global health with AI (TOGAI)
Computational Endoscopy, Surgery & Pathology (CESP)
BibTeX
@article{shrestha2020ensemble, 
title={Ensemble U-Net model for efficient polyp segmentation}, 
author={Shrestha, Shruti and Khanal, Bishesh and Ali, Sharib}, 
year={2020}
}
                    
Web Summary
This paper presents our approach developed for the Medico automatic polyp segmentation challenge 2020 1. We used a U-Net model with two different encoder backbones: ResNet-34 and EfficientNetB2. The two models were trained separately, and trained for ensembling using Tversky loss. We performed CutMix and standard augmentations for data pre-processing. For ensembling, we chose the hyperparameter of the loss function in the range that makes individual models have high recall while relaxing the precision. We evaluated the individual models and the ensemble model on validation data. ResNet-34 backbone model and the ensemble model were submitted to the challenge website for further evaluation on the test data. Our ensemble model improved performance on metrics compared to the single networks by achieving a Dice Coefficient of 0.8316, Intersection Over Union of 0.7550, Precision of 0.8851, and Overall Accuracy of 0.9583.
2020

Towards automated extraction of 2D standard fetal head planes from 3D ultrasound acquisitions: A clinical evaluation and quality assessment comparison

E Skelton, J Matthew, Y Li, Bishesh Khanal, JJ Cerrolaza Martinez, N Toussaint, C Gupta, C Knight, B Kainz, JV Hajnal, M Rutherford
Radiography 27.2 (2021): 519-526., 2020
Peer-reviewed Journal articles
Transforming Global health with AI (TOGAI)
BibTeX
@article{SKELTON2021519, 
title={Towards automated extraction of 2D standard fetal head planes from 3D ultrasound
acquisitions: A clinical evaluation and quality assessment comparison}, 
journal={Radiography}, 
volume={27}, 
number={2}, 
pages={519-526}, 
year={2021}, 
issn={1078-8174}, 
doi={https://doi.org/10.1016/j.radi.2020.11.006}, 
url={https://www.sciencedirect.com/science/article/pii/S1078817420302352}, 
author={E. Skelton and J. Matthew and Y. Li and B. Khanal and J.J. {Cerrolaza Martinez} and N.
Toussaint and C. Gupta and C. Knight and B. Kainz and J.V. Hajnal and M. Rutherford}, 
keywords={Clinical evaluation, Fetal imaging, Quality assessment, Ultrasound} 
}
                    
Web Summary
Introduction: Clinical evaluation of deep learning (DL) tools is essential to compliment technical accuracy metrics. This study assessed the image quality of standard fetal head planes automatically-extracted from three-dimensional (3D) ultrasound fetal head volumes using a customised DL-algorithm.
Methods: Two observers retrospectively reviewed standard fetal head planes against pre-defined image quality criteria. Forty-eight images (29 transventricular, 19 transcerebellar) were selected from 91 transabdominal fetal scans (mean gestational age ¼ 26 completed weeks, range ¼ 20þ5e32þ3 weeks). Each had two-dimensional (2D) manually-acquired (2D-MA), 3D operator-selected (3D-OS) and 3D-DL automatically-acquired (3D-DL) images. The proportion of adequate images from each plane and modality, and the number of inadequate images per plane was compared for each method. Inter and intraobserver agreement of overall image quality was calculated.
Results: Sixty-seven percent of 3D-OS and 3D-DL transventricular planes were adequate quality. Fortyf ive percent of 3D-OS and 55% of 3D-DL transcerebellar planes were adequate. Seventy-one percent of 3D-OS and 86% of 3D-DL transventricular planes failed with poor visualisation of intra-cranial structures. Eighty-six percent of 3D-OS and 80% of 3D-DL transcerebellar planes failed due to inadequate visualisation of cerebellar hemispheres. Image quality was significantly different between 2D and 3D, however, no significant difference between 3D-modalities was demonstrated (p < 0.005). Inter-observer agreement of transventricular plane adequacy was moderate for both 3D-modalities, and weak for transcerebellar planes.
Conclusion: The 3D-DL algorithm can automatically extract standard fetal head planes from 3D-head volumes of comparable quality to operator-selected planes. Image quality in 3D is inferior to corresponding 2D planes, likely due to limitations with 3D-technology and acquisition technique. Implications for practice: Automated image extraction of standard planes from US-volumes could facilitate use of 3DUS in clinical practice, however image quality is dependent on the volume acquisition technique.
2020

Determining the Acceptability of Abstract Arguments with Graph Convolutional Networks

Lars Malmqvist, Tommy Yuan, Peter Nightingale, Suresh Manandhar
SAFA@ COMMA (2020) (pp. 47-56)., 2020
Peer-reviewed workshop articles
Other Adj. Faculties
BibTeX
@inproceedings{malmqvist2020determining, 
title={Determining the Acceptability of Abstract Arguments with Graph Convolutional Networks.},
author={Malmqvist, Lars and Yuan, Tommy and Nightingale, Peter and Manandhar, Suresh}, 
booktitle={SAFA@ COMMA}, 
pages={47--56}, 
year={2020} 
}
                        
Web Summary
This paper presents a new deep learning approach to sceptical and credulous acceptance of arguments, two key problems in Abstract Argumentation that are most commonly solved using exact methods often by reduction to SAT. We train a Graph Convolutional Neural Network with a randomised training regime, dynamic balancing of training data, and improved residual connections and achieve up to 97.15% accuracy improving on past studies by 30 percentage points. The new approach is applied to problems from the ICCMA 2017 competition and achieves a new state of the art for this type of architecture. Additionally, the training regime used for this study has potential wider applicability in deep learning for graph structured NP-hard problems.
2020

Uncertainty Estimation in Deep 2D Echocardiography Segmentation

2020
Pre-prints
Transforming Global health with AI (TOGAI)
BibTeX
@misc{dahal2020uncertainty, <
title={Uncertainty Estimation in Deep 2D Echocardiography Segmentation}, 
author={Lavsen Dahal and Aayush Kafle and Bishesh Khanal}, 
year={2020}, 
eprint={2005.09349}, 
archivePrefix={arXiv}, 
primaryClass={cs.CV}
}
                    
Web Summary
2D echocardiography is the most common imaging modality for cardiovascular diseases. The portability and relatively low-cost nature of Ultrasound (US) enable the US devices needed for performing echocardiography to be made widely available. However, acquiring and interpreting cardiac US images is operator dependent, limiting its use to only places where experts are present. Recently, Deep Learning (DL) has been used in 2D echocardiography for automated view classification, and structure and function assessment. Although these recent works show promise in developing computer-guided acquisition and automated interpretation of echocardiograms, most of these methods do not model and estimate uncertainty which can be important when testing on data coming from a distribution further away from that of the training data. Uncertainty estimates can be beneficial both during the image acquisition phase (by providing real-time feedback to the operator on acquired image's quality), and during automated measurement and interpretation. The performance of uncertainty models and quantification metric may depend on the prediction task and the models being compared. Hence, to gain insight of uncertainty modelling for left ventricular segmentation from US images, we compare three ensembling based uncertainty models quantified using four different metrics (one newly proposed) on state-of-the-art baseline networks using two publicly available echocardiogram datasets. We further demonstrate how uncertainty estimation can be used to automatically reject poor quality images and improve state-of-the-art segmentation results.
2019

Automatic Cobb Angle Detection using Vertebra Detector and Vertebra Corners Regression

Bidur Khanal, Lavsen Dahal, Prashant Adhikari, Bishesh Khanal
Computational Methods and Clinical Applications for Spine Imaging: 6th International Workshop and Challenge, CSI (2019), Shenzhen, China, (October 17, 2019), Proceedings 6. Springer International Publishing (2020)., 2019
Workshop & Challenges
Transforming Global health with AI (TOGAI)
BibTeX
@misc{khanal2019automaticcobbangledetection,
title={Automatic Cobb Angle Detection using Vertebra Detector and Vertebra Corners Regression}, 
author={Bidur Khanal and Lavsen Dahal and Prashant Adhikari and Bishesh Khanal}, 
year={2019}, 
eprint={1910.14202}, 
archivePrefix={arXiv},
primaryClass={cs.CV}, 
url={https://arxiv.org/abs/1910.14202}, 
}
                        
Web Summary
Correct evaluation and treatment of Scoliosis require accurate estimation of spinal curvature. Current gold standard is to manually estimate Cobb Angles in spinal X-ray images which is time consuming and has high inter-rater variability. We propose an automatic method with a novel framework that first detects vertebrae as objects followed by a landmark detector that estimates the 4 landmark corners of each vertebra separately. Cobb Angles are calculated using the slope of each vertebra obtained from the predicted landmarks. For inference on test data, we perform pre and post processings that include cropping, outlier rejection and smoothing of the predicted landmarks. The results were assessed in AASCE MICCAI challenge 2019 which showed a promise with a SMAPE score of 25.69 on the challenge test set.
2019

Image Reconstruction in a Manifold of Image Patches: Application to Whole-fetus Ultrasound Imaging

Alberto Gomez, Veronika Zimmer, Nicolas Toussaint, Robert Wright, James R. Clough, Bishesh Khanal, Milou Van Poppel, Emily Skelton, Jackie Matthews, and Julia A. Schnabel
Machine Learning for Medical Image Reconstruction: Second International Workshop, MLMIR (2019), Held in Conjunction with MICCAI (2019), Shenzhen, China, (October 17, 2019), Proceedings. Cham: Springer International Publishing, (2019)., 2019
Peer-reviewed workshop articles
Transforming Global health with AI (TOGAI)
BibTeX
@inproceedings{gomez2019image, 
title={Image Reconstruction in a Manifold of Image Patches: Application to Whole-fetus
Ultrasound Imaging}, 
author={Gomez, Alberto and Zimmer, Veronika and Toussaint, Nicolas and Wright, Robert and
Clough, James R. and Khanal, Bishesh and Poppel, Milou Van and Skelton, Emily and Matthews,
Jackie and Schnabel, Julia A.},
booktitle={Machine Learning for Medical Image Reconstruction - MLMIR}, 
year={2019}, 
note={Accepted}
}
                        
Web Summary
2019

Towards whole placenta segmentation at late gestation using multi-view ultrasound images

Zimmer, Veronika and Gomez, Alberto and Skelton, Emily and Toussaint, Nicolas and Zhang, Tong and , Bishesh Khanal, and Wright, Robert and Noh, Yohan and Ho, Alison and Matthew, Jacqueline and Schnabel, Julia
Medical Image Computing and Computer Assisted Intervention–MICCAI (2019): 22nd International Conference, Shenzhen, China, (October 13–17, 2019), Proceedings, Part V 22. Springer International Publishing, (2019)., 2019
Peer-reviewed conference articles
Transforming Global health with AI (TOGAI)
BibTeX
@inproceedings{zimmer2019towards, 
title={Towards whole placenta segmentation at late gestation using multi-view ultrasound images},
author={Zimmer, Veronika and Gomez, Alberto and Skelton, Emily and Toussaint, Nicolas and
Zhang, Tong and Khanal, Bishesh and Wright, Robert and Noh,
Yohan and Ho, Alison and Matthew, Jacqueline and Schnabel, Julia}, 
booktitle={MICCAI}, 
year={2019} 
}
                        
Web Summary
2019

Confident Head Circumference Measurement from Ultrasound with Real-time Feedback for Sonographers

Budd, Samuel and Sinclair, Matthew and, Bishesh Khanal, and Matthew, Jacqueline and Llyod, David and Gomez, Alberto and Toussaint, Nicolas and Robinson, Emma and Kainz, Bernhard
Medical Image Computing and Computer Assisted Intervention–MICCAI (2019): 22nd International Conference, Shenzhen, China, (October 13–17, 2019), Proceedings, Part IV 22. Springer International Publishing, (2019)., 2019
Peer-reviewed conference articles
Transforming Global health with AI (TOGAI)
BibTeX
@inproceedings{budd2019confident, 
title={Confident Head Circumference Measurement from Ultrasound with Real-time Feedback for Sonographers},
author={Budd, Samuel and Sinclair, Matthew and Khanal, Bishesh and Matthew, Jacqueline and
Llyod, David and Gomez, Alberto and Toussaint, Nicolas and Robinson, Emma and Kainz, Bernhard}, 
booktitle={MICCAI}, 
year={2019}, 
note={Accepted}
}
                        
Web Summary
Manual estimation of fetal Head Circumference (HC) from Ultrasound (US) is a key biometric for monitoring the healthy development of fetuses. Unfortunately, such measurements are subject to large inter-observer variability, resulting in low early-detection rates of fetal abnormalities. To address this issue, we propose a novel probabilistic Deep Learning approach for real-time automated estimation of fetal HC. This system feeds back statistics on measurement robustness to inform users how condent a deep neural network is in evaluating suitable views acquired during free-hand ultrasound examination. In real-time scenarios, this approach may be exploited to guide operators to scan planes that are as close as possible to the underlying distribution of training images, for the purpose of improving inter-operator consistency. We train on freehand ultrasound data from over 2000 subjects (2848 training/540 test) and show that our method is able to predict HC measurements within 181 165mm deviation from the ground truth, with 50% of the test images fully contained within the predicted condence margins, and an average of 182 178mm deviation from the margin for the remaining cases that are not fully contained.
2019

Complete Fetal Head Compounding from Multi-View 3D Ultrasound

Bishesh Khanal, Robert and Toussaint, Nicolas and Gomez, Alberto and Zimmer, Veronika and Matthew, Jacqueline and Skelton, Emily and, Bishesh Khanal, and Kainz, Bernhard and Reuckert, Daniel and Hajnal, Jo and Schnabel, Julia
Medical Image Computing and Computer Assisted Intervention–MICCAI (2019): 22nd International Conference, Shenzhen, China, (October 13–17, 2019), Proceedings, Part III 22. Springer International Publishing, (2019)., 2019
Peer-reviewed conference articles
Transforming Global health with AI (TOGAI)
BibTeX
@inproceedings{wright2019complete, 
title={Complete Fetal Head Compounding from Multi-View 3D Ultrasound},
author={Wright, Robert and Toussaint, Nicolas and Gomez, Alberto and Zimmer, Veronika and Matthew, Jacqueline and Skelton, Emily and Khanal, Bishesh and Kainz, Bernhard and Reuckert, Daniel and Hajnal, Jo and Schnabel, Julia}, 
booktitle={MICCAI},
year={2019}, 
note={Accepted}
}
                        
Web Summary
2018

Adapted and Oversegmenting Graphs: Application to Geometric Deep Learning

Alberto Gomez, Veronika A. Zimmer, Bishesh Khanal, Nicolas Toussaint, Julia A. Schnabel
Pre-prints
Transforming Global health with AI (TOGAI)
BibTeX
@article{gomez2018oversegmenting, 
title={Oversegmenting Graphs}, 
author={Gomez, Alberto and Zimmer, Veronika A and Khanal, Bishesh and Toussaint, Nicolas and
Schnabel, Julia A}, 
journal={arXiv preprint arXiv:1806.00411},
year={2018}, 
note={under review}
}
                        
Web Summary
We propose a novel iterative method to adapt a graph to d-dimensional image data. The method drives the nodes of the graph towards image features. The adaptation process naturally lends itself to a measure of feature saliency which can then be used to retain meaningful nodes and edges in the graph. From the adapted graph, we also propose the computation of a dual graph, which inherits the saliency measure from the adapted graph, and whose edges run along image features, hence producing an oversegmenting graph. The proposed method is computationally efficient and fully parallelisable. We propose two distance measures to find image saliency along graph edges, and evaluate the performance on synthetic images and on natural images from publily available databases. In both cases, the most salient nodes of the graph achieve average boundary recall over 90%. We also apply our method to image classification on the MNIST hand-written digit dataset, using a recently proposed Deep Geometric Learning architecture, and achieving state-of-the-art classification accuracy, for a graph-based method, of 97.86%.