Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Communications Biology
  • View all journals
  • Search
  • Log in
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. communications biology
  3. articles
  4. article
What does it take to learn the rules of RNA base pairing? A lot less than you may think
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 26 March 2026

What does it take to learn the rules of RNA base pairing? A lot less than you may think

  • Jayanth S. Pratap  ORCID: orcid.org/0009-0001-4920-27091,
  • Ryan K. Krueger  ORCID: orcid.org/0000-0001-6856-02482 &
  • Elena Rivas  ORCID: orcid.org/0000-0002-2084-269X1 

Communications Biology (2026) Cite this article

  • 5042 Accesses

  • 3 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational models
  • Machine learning
  • Programming language
  • RNA
  • Structural biology

Abstract

Amidst the fast-developing trend of RNA large language models with millions of parameters, we asked what would be minimally required to rediscover the rules of RNA canonical base pairing that define secondary structure, namely the Watson-Crick-Franklin A:U, G:C and the wobble G:U base pairs. Here, we conclude that it does not require much at all. It does not require knowing secondary structures, it does not require aligning the sequences, and it does not require many parameters. We selected a probabilistic model (a stochastic context-free grammar or SCFG) with a total of just 21 parameters, that can describe arbitrary pairwise interactions including but not restricted to those of RNA base pairing. Using standard deep learning techniques, we estimate its parameters by implementing the generative process in an automatic differentiation (autodiff) framework and applying stochastic gradient descent (SGD). We define and minimize a loss function that does not use any structural or alignment information. Trained on as few as fifty RNA sequences, the specific rules of RNA base pairing emerge after only a few iterations of SGD. Crucially, the sole inputs are RNA sequences. When optimizing for sequences corresponding to structured RNAs, SGD also yields the rules of RNA base-pair aggregation into helices. In sharp contrast, when trained on shuffled sequences, the system optimizes by avoiding base pairing altogether. Trained on messenger RNAs, it reveals interactions that are different from those of structural RNAs, and specific to each mRNA. We demonstrate that our approach generalizes across diverse RNA families by testing on 1094 sequences from 22 structurally distinct RNA families. Our results show that the emergence of canonical RNA base-pairing can be attributed to sequence-level signals that are robust and detectable even without labeled structures or alignments, and with very few parameters. Autodiff algorithms for probabilistic models, such as, but not restricted to SCFGs, have significant potential as they allow these models to be incorporated into end-to-end RNA deep learning methods for discerning transcripts of different functionalities.

The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities

Article Open access 16 May 2023

Deep generalizable prediction of RNA secondary structure via base pair motif energy

Article Open access 01 July 2025

Extensive breaking of genetic code degeneracy with non-canonical amino acids

Article Open access 17 August 2023

References

  1. Penic, R. J., Vlasic, T., Huber, R. G., Wan, Y. & Sikic, M. RiNALMo: general-purpose RNA language models cangeneralize well on structure prediction tasks. Nat. Commun. 16, 5671 (2025).

  2. Akiyama, M. & Sakakibara, Y. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom. Bioinform. 4, 4 (2022).

    Google Scholar 

  3. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Google Scholar 

  4. Gong, T. & Bu, D. Language models enable zero-shot prediction of RNA secondary structure including pseudoknots. bioRxiv https://www.biorxiv.org/content/10.1101/2024.01.27.577533v1 (2024).

  5. de Lajarte, A. A. et al. Diverse database and machine learning model to narrow the generalization gap in RNA structure prediction. Sci. Adv. 12, eadz4967 (2024).

  6. Wang, N. et al. Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning. Nat. Mach. Intell. 6, 548–557 (2024).

    Google Scholar 

  7. Yu, H. et al. An interpretable RNA foundation model for exploring functional RNA motifs in plants. Nat. Mach. Intell. 6, 1616–1625 (2024).

    Google Scholar 

  8. Zou, S. et al. A large-scale foundation model for RNA function and structure prediction. bioRxiv https://doi.org/10.1101/2024.11.28.625345 (2024).

  9. Sato, K., Akiyama, M. & Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 12, 941 (2021).

    Google Scholar 

  10. Fu, L. et al. UFold: fast and accurate RNA secondary structure prediction with deep learning. Nucl. Acids Res. 50, e14–e14 (2022).

    Google Scholar 

  11. Singh, J. et al. Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning. Bioinformatics 37, 2589–2600 (2021).

    Google Scholar 

  12. Sato, K. & Hamada, M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief. Bioinform. 24, bbad186 (2023).

    Google Scholar 

  13. da Silva, P. T. et al. Nucleotide dependency analysis of DNA language models reveals genomic functional elements. Nat. Genet. 57, 2589–2602 (2025).

    Google Scholar 

  14. Rivas, E., Clements, J. & Eddy, S. R. A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs. Nat. Methods 14, 45–48 (2017).

    Google Scholar 

  15. Chomsky, N. Three models for the description of language. IRE Trans. Inf. Theory 2, 113–124 (1956).

    Google Scholar 

  16. Dowell, R. D. & Eddy, S. R. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinform. 5, 71 (2004).

    Google Scholar 

  17. Bradbury, J. et al. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax (2018).

  18. Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).

    Google Scholar 

  19. Eddy, S. R. & Durbin, R. RNA sequence analysis using covariance models. Nucl. Acids Res. 22, 2079–2088 (1994).

    Google Scholar 

  20. Rivas, E., Lang, R. & Eddy, S. R. A range of complex probabilistic models for RNA secondary structure prediction that include the nearest neighbor model and more. RNA 18, 193–212 (2012).

    Google Scholar 

  21. Knudsen, B. & Hein, J. Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucl. Acids Res. 31, 3423–3428 (2003).

    Google Scholar 

  22. Justyna, M., Antczak, M. & Szachniuk, M. Machine learning for RNA 2D structure prediction benchmarked on experimental data. Brief. Bioinform. 24, 1–9 (2023).

    Google Scholar 

  23. Lari, K. & Young, S. J. The estimation of stochastic context-free grammars using the inside-outside algorithm. Comput. Speech Lang. 4, 35–56 (1990).

    Google Scholar 

  24. Lari, K. & Young, S. J. Applications of stochastic context-free grammars using the inside-outside algorithm. Comput. Speech Lang. 5, 237–257 (1991).

    Google Scholar 

  25. McCaskill, J. S. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29, 1105–19 (1990).

    Google Scholar 

  26. Zuker, M. Mfold web server for nucleic acid folding of hyprodization prediction. Nucleic Acids Res. 31, 3406–3415 (2003).

    Google Scholar 

  27. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 1748–7188 (2011).

    Google Scholar 

  28. Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinform. 11, 10 (2010).

    Google Scholar 

  29. Rivas, E. The four ingredients of single-sequence RNA secondary structure prediction: a unifying perspective. RNA Biol. 10, 1185–1196 (2013).

    Google Scholar 

  30. Eisner, J. Inside-outside and forward-backward algorithms are just backprop (tutorial paper). In Proc. 1st Workshop on Structured Prediction for NLP (ed. Balsubramanian, V. N. et al.) 1–17 (Association for Computational Linguistics, 2016).

  31. Matthies, M. C., Krueger, R., Torda, A. E. & Ward, M. Differentiable partition function calculation for RNA. NAR 52, e14 (2024).

    Google Scholar 

  32. Krueger, R. K. & Ward, M. JAX-RNAfold: scalable differentiable folding. Bioinformatics 41, btaf203 (2025).

  33. Krueger, R. K., Aviran, S., Mathews, D. H., Zuber, J. & Ward, M. Differentiable folding for nearest neighbor model optimization. arXiv preprint https://arxiv.org/abs/2503.09085 (2025).

  34. Brown, J. W. The ribonuclease P database. Nucl. Acids Res. 27, 314 (1999).

    Google Scholar 

  35. Szikszai, M., Wise, M., Datta, A., Ward, M. & Mathews, D. H. Deep learning models for RNA secondary structure prediction (probably) do not generalize across families. Bioinformatics 38, 3892–3899 (2022).

    Google Scholar 

  36. Szikszai, M. et al. Deep learning for RNA secondary structure determination: Gauging generalizability and broadening the scope of traditional methods. RNA 32, 428–442 (2026).

  37. Cocco, S., Monasson, R. & Weigt, M. From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction. PLoS Comput. Biol. 9, e1003176 (2013).

    Google Scholar 

  38. Gao, W., Yang, A. & Rivas, E. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life 75, 471–492 (2022).

    Google Scholar 

  39. Rivas, E. RNA structure prediction using positive and negative evolutionary information. PLOS Comput. Biol. 16, e1008387 (2020).

    Google Scholar 

  40. Karan, A. & Rivas, E. All-at-once RNA folding with 3D motif prediction framed by evolutionary information. Nat. Methods 22, 2094–2106 (2025).

    Google Scholar 

  41. Danaee, P. et al. bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic Acids Res. 46, 5381–5394 (2018).

    Google Scholar 

  42. Sloma, M. F. & Mathews, D. H. Exact calculation of loop formation probability identifies folding motifs in RNA secondary structure. RNA 22, 1808–1818 (2016).

    Google Scholar 

  43. Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487–2489 (2013).

    Google Scholar 

Download references

Acknowledgements

This work was supported by NIH grant R01-GM144423 to E.R. This material is based in part upon work supported by the National Science Foundation under Grant no. UWSC13223 (R.K.K.). We thank Marcell Szikszai for help running the software MXFOLD2 and RiNALMo, and Max Ward for insights into automatic differentiation of RNA folding models. We thank William Gao for providing the fungal mRNA sequences. We thank Sean R. Eddy and William Gao for a critical reading of the manuscript. E.R. acknowledges the hospitality of the Centro de Ciencias de Benasque Pedro Pascual, Benasque, Spain, during the completion of this manuscript. We also thank the reviewers for their insightful comments.

Author information

Authors and Affiliations

  1. Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA

    Jayanth S. Pratap & Elena Rivas

  2. School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA

    Ryan K. Krueger

Authors
  1. Jayanth S. Pratap
    View author publications

    Search author on:PubMed Google Scholar

  2. Ryan K. Krueger
    View author publications

    Search author on:PubMed Google Scholar

  3. Elena Rivas
    View author publications

    Search author on:PubMed Google Scholar

Contributions

E.R. conceived the research. J.S.P. and R.K.K. implemented the algorithms for the G5 grammar. E.R. implemented the algorithms for the G6 grammar. E.R. performed the experiments and wrote the manuscript. All authors edited the manuscript.

Corresponding author

Correspondence to Elena Rivas.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Kengo Sato and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Michal Kolář and Mengtan Xing. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Availability: rivaslab.org, https://github.com/EddyRivasLab/R-scape/tree/master/python/d-SCFG

Supplementary information

supplemental material

Reporting summary (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pratap, J.S., Krueger, R.K. & Rivas, E. What does it take to learn the rules of RNA base pairing? A lot less than you may think. Commun Biol (2026). https://doi.org/10.1038/s42003-026-09921-3

Download citation

  • Received: 25 August 2025

  • Accepted: 12 March 2026

  • Published: 26 March 2026

  • DOI: https://doi.org/10.1038/s42003-026-09921-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Artificial intelligence in RNA biology

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Journal Information
  • Open Access Fees and Funding
  • Journal Metrics
  • Editors
  • Editorial Board
  • Calls for Papers
  • Referees
  • Contact
  • Editorial policies
  • Aims & Scope

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Communications Biology (Commun Biol)

ISSN 2399-3642 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing