New AI meA groundbreaking machine learning technique, Counterfactual
en-GBde-DEes-ESfr-FR

New AI meA groundbreaking machine learning technique, Counterfactual

16/04/2025 TranSpread

Machine learning holds great promise in healthcare, with applications ranging from early disease detection to personalized treatments. However, its effectiveness is often hindered by imbalanced data, where rare, critical outcomes such as certain diseases are vastly underrepresented compared to negative cases. As a result, traditional models tend to favor the majority class, neglecting life-threatening conditions. While techniques like Synthetic Minority Oversampling Technique (SMOTE) attempt to balance these datasets by generating synthetic minority samples, they often produce noisy or redundant data, leading to misdiagnoses or wasted resources. Addressing these shortcomings, there is a need for advanced methods that can improve model accuracy and reliability without introducing unwanted noise.

On January 25, 2025, researchers Goncalo Almeida and Fernando Bacao from NOVA Information Management School introduced Counterfactual SMOTE, a new enhancement to the widely used SMOTE technique. Published (DOI: 10.1016/j.dsm.2025.01.006) in Data Science and Management, this new method integrates counterfactual generation to place synthetic samples strategically near decision boundaries within the "safe" minority regions. Validated on 24 highly imbalanced healthcare datasets, Counterfactual SMOTE showed a 10% average improvement in F1-score, significantly outperforming existing methods. This innovation marks a major step forward in addressing the challenges of imbalanced data, offering improved performance for medical diagnostics and beyond.

Counterfactual SMOTE improves upon traditional SMOTE by addressing two critical issues: noisy samples and near-duplicates. It generates synthetic data points as counterfactuals of majority-class instances, ensuring that these samples are placed near the decision boundary, where misclassification risks are highest. By utilizing a binary search along the line connecting majority and minority samples, guided by a k-NN classifier, the method ensures that synthetic data remains within "minority-safe" zones, thereby reducing potential noise. Key innovations include boundary-focused sampling, which uses majority-minority pairs rather than interpolating between minority samples. The method has been validated across eight benchmark models, including Borderline SMOTE and Adaptative Synthetic Sampling Method (ADASYN), showing significant improvements in reducing false negatives by 24%–34% while maintaining low false positives. Although the method incurs higher computational costs, the gains in accuracy, particularly in resource-critical fields like healthcare, justify its application. Moreover, its generalizability extends beyond healthcare, making it applicable to other domains like fraud detection and manufacturing defect analysis.

Dr. Goncalo Almeida, the study's lead author, emphasized, "Counterfactual SMOTE bridges the gap between data imbalance and actionable AI. By focusing on safe, informative samples, it ensures models don't just 'guess' majority classes but truly learn to identify rare cases. This is a paradigm shift for imbalanced learning, with life-saving implications in medical diagnostics." Dr. Almeida highlighted the method's potential to enhance the precision of AI models in healthcare, ensuring that they prioritize rare conditions without overwhelming the system with false alarms. This breakthrough represents a transformative step in the field of imbalanced data learning.

Counterfactual SMOTE's impact extends well beyond healthcare. In sectors like finance, the method could improve fraud detection by ensuring that rare fraudulent activities are accurately identified, while in telecommunications, it could predict customer churn with higher precision. In healthcare, the method enables accurate detection of rare diseases, balancing the need for precise identification with the prevention of false positives that can overwhelm healthcare systems. Open-sourcing the code further facilitates broader adoption across industries. Future developments may explore expanding the method's capabilities to handle categorical data and multiclass applications, reinforcing Counterfactual SMOTE as a cornerstone solution for tackling data imbalance in a wide range of fields.

###

References

DOI

doi.org/10.1016/j.dsm.2025.01.006

Original Source URL

https://doi.org/10.1016/j.dsm.2025.01.006

Funding information

This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS).

About Data Science and Management

Data Science and Management (DSM) is a peer-reviewed open access journal for original research articles, review articles and technical reports related to all aspects of data science and its application in the field of business, economics, finance, operations, engineering, healthcare, transportation, agriculture, energy, environment, sports, and social management. DSM was launched in 2021, and published quarterly by Xi'an Jiaotong University.

Paper title: Counterfactual synthetic minority oversampling technique: solving healthcare's imbalanced learning challenge
Attached files
  • Schematization of the application of the Counterfactual synthetic minority oversampling technique (SMOTE) to an imbalanced learning problem.
16/04/2025 TranSpread
Regions: North America, United States, Europe, Portugal
Keywords: Applied science, Artificial Intelligence, Health, Medical

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.
Koula Bouloukos, Senior manager, Editorial & Production Underknown
We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet

We Work Closely With...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2025 by AlphaGalileo Terms Of Use Privacy Statement