New AI model TabPFN enables faster and more accurate predictions on small tabular data sets
en-GBde-DEes-ESfr-FR

New AI model TabPFN enables faster and more accurate predictions on small tabular data sets


· A team led by Frank Hutter, Professor of Machine Learning at the University of Freiburg, has developed a new method that facilitates and improves predictions of tabular data, especially for small data sets with fewer than 10,000 data points.

· The new AI model TabPFN is trained on synthetically generated data before it is used and thus learns to evaluate possible causal relationships and use them for predictions.

· Hutter: “Many disciplines can benefit from this method and thus also recognise important relationships faster and more reliably than before, even with limited data.”

Filling gaps in data sets or identifying outliers – that’s the domain of the machine learning algorithm TabPFN, developed by a team led by Prof. Dr. Frank Hutter from the University of Freiburg. This artificial intelligence (AI) uses learning methods inspired by large language models. TabPFN learns causal relationships from synthetic data and is therefore more likely to make correct predictions than the standard algorithms that have been used up to now. The results were published in the journal Nature. In addition to the University of Freiburg, the University Medical Center Freiburg, the Charité – Berlin University Medicine, the Freiburg startup PriorLabs and the ELLIS Institute Tübingen were involved.

Data sets, whether they are on the effects of certain medications or particle paths in accelerators at CERN, are rarely complete or error-free. Therefore, an important part of scientific data analysis is to recognise outliers as such or to predict meaningful estimates for missing values. Existing algorithms, such as XGBoost, work well with large data sets, but are often unreliable with smaller data volumes.

With the TabPFN model, Hutter and his team solve this problem by training the algorithm on artificially created data sets that are modelled on real scenarios. To do this, the scientists create data tables in which the entries in the individual table columns are causally linked. TabPFN was trained with 100 million such synthetic data sets. This training teaches the model to evaluate various possible causal relationships and use them for its predictions.

The model especially outperforms other algorithms for small tables with fewer than 10,000 rows, many outliers or a large number of missing values. For example, TabPFN requires only 50% of the data to achieve the same accuracy as the previously best model. In addition, TabPFN is more efficient than previous algorithms at handling new types of data. Instead of starting a new learning process for each data set, the model can be adapted to similar data sets. This process is similar to the adaptation of language models with open weights like Llama, developed by Meta. The model also makes it possible to derive the probability density from a data set and to generate new data with similar properties from it.

‘The ability to use TabPFN to reliably and quickly calculate predictions from tabular data is beneficial for many disciplines, from biomedicine to economics and physics,’ says Hutter. ’TabPFN delivers better results faster and, because it requires few resources and data, is ideal for small companies and teams.’ The code and instructions on how to use it can be found here. In the next step, the researchers will further develop the AI so that it can make the best possible predictions even with larger data sets.

Original publication: N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, Shi Bin Hoo, R. T. Schirrmeister, F. Hutter: Accurate Predictions on Small Data with a Tabular Foundation Model. Nature, 2025. URL: https://www.nature.com/articles/s41586-024-08328-6 . DOI: 10.1038/s41586-024-08328-6
Regions: Europe, Germany
Keywords: Applied science, Artificial Intelligence

Disclaimer: AlphaGalileo is not responsible for the accuracy of news releases posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.
Koula Bouloukos, Senior manager, Editorial & Production Underknown
We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet

We Work Closely With...


  • BBC
  • The Times
  • National Geographic
  • The University of Edinburgh
  • University of Cambridge
  • iesResearch
Copyright 2025 by AlphaGalileo Terms Of Use Privacy Statement