Identifying rare microorganisms in microbiome data just got easier. A team of researchers from Portugal and Canada has developed a new tool that uses machine learning to automatically detect rare biosphere in ecological datasets.
A team of researchers from Portugal and Canada has developed a tool that uses ‘machine learning’, a branch of Artificial Intelligence (AI) that automates the construction of analytical models. The aim is to quickly, autonomously and unsupervisedly identify rare microorganisms in microbiome datasets. This new tool, named
ulrb, responds to a long-standing challenge in microbial ecology: distinguishing rare microorganisms from the most abundant in natural environments.
The new methodology and the new
ulrb software have now been published in the study ‘
Definition of the microbial rare biosphere through unsupervised machine learning’ in the scientific journal Communications Biology, and is the result of an international collaboration between the
Interdisciplinary Centre for Marine and Environmental Research (CIIMAR), the
Faculty of Sciences of the University of Porto, the
Institute of Bioengineering and Biosciences (iBB) of the
Instituto Superior Técnico of the University of Lisbon and the
School of Electrical Engineering and Computer Science of the University of Ottawa (EECS) and the
Faculty of Computer Science of Dalhousie University, both in Canada.
This is a product of the PhD project of CIIMAR student
Francisco Pascoal under the supervision of CIIMAR researcher
Catarina Magalhães and the co-supervision of researchers
Rodrigo Costa (iBB) and
Paula Branco (EECS). This new software will increase not only the accuracy of ecological analyses of different microbiomes and ecosystems, but also the depth at which these analyses are carried out, ultimately improving our understanding of microbial diversity and its role in ecosystem resilience.
What is the rare biosphere?
Microbial communities normally follow a pattern in which only a few species are highly abundant, while the vast majority of diversity is low in abundance and belongs to the so-called ‘rare biosphere’. In fact, there are thousands of species of prokaryotic microorganisms that can inhabit 1 litre of seawater. However, only 2 to 5% of these species are abundant, while the rest are rare and very difficult to detect and identify due to methodological limitations.
Why is it so important to study rare microorganisms?
Although they are not very abundant, rare species contain the greatest genetic diversity on the planet. They are responsible for providing great resilience to an ecosystem: ‘if the most abundant species are threatened by climate change, other rare species can take over and ensure the functions of the microbiome, keeping the ecosystem stable,’ explains Francisco Pascoal. The rare biosphere therefore plays a very important role in ecosystem responses to major changes in the environment, such as the effects of climate change. Studying rare organisms allows us to understand the resilience of ecosystems to these changes and to study their reaction to environmental alterations.
What is innovative about ulrb?
By employing unsupervised machine learning techniques,
ulrb allows researchers to quickly and reliably identify rare microorganisms in a community. A major advantage of this method is its adaptability to different methodological contexts, i.e. the algorithm ‘learns’ the patterns present in the data itself, regardless of its origin.
‘The possibility of identifying rare microorganisms arose with the development of high-throughput DNA sequencing technologies, but even with this data it was never clear among peers how to identify rare microorganisms, as they were overshadowed by the abundant ones. Thus, many researchers limited themselves to establishing random levels of abundance, which was an insufficient approach since it was not supported by biological justification. With this new method, we were able to use sequencing data to automatically distinguish which microorganisms are rare, based on the information provided in each sample,’ says Francisco Pascoal, first author of the study.
To automate the process, an algorithm was created that groups together the microorganisms that are most similar to each other in terms of their abundance in a given sample. As it is based on the relative distance between them, it can be automated and applied to databases of any size, and produces a result with rigorous and uniform ecological and biological value. ‘Basically, the algorithm “learns” what the abundance groups in a community are and matches them up with an abundance classification, which makes it possible to distinguish microorganisms that are rare from those that are abundant,’ says Francisco.
What are the possible applications?
The
ulrb can be applied to data derived from common microbial ecology protocols, and could be useful for studying emerging diseases and biological invasions. Since this method can be applied to non-microbial data, it can also be useful for determining which species of animals and/or plants are at risk in certain contexts, which can be useful for environmental monitoring.
If you are a researcher and want to apply this tool to your own data, ulrb is available as an open source R package on
CRAN and
GitHub. The team of researchers has also created a
website with learning materials to encourage you to use the tool.
Ends.