In the human proteome, 30–40% of proteins lack a stable tertiary structure under physiological conditions. These are known as intrinsically disordered proteins (IDPs), which participate in signal transduction, molecular recognition, and cell cycle regulation, and are closely related to cancer, diabetes, and Alzheimer's disease. AlphaFold is ineffective against them, and traditional molecular dynamics simulations are extremely expensive.
IDPFold is a conformational ensemble generation algorithm designed specifically for IDPs. It employs a fine-tuned diffusion model, trained in two stages: first pre-trained on the PDB experimental structure database, then fine-tuned on the IDRome large-scale molecular dynamics trajectory dataset. The model directly generates main-chain conformational ensembles from amino acid sequences, requiring neither MSA nor experimental data.

The benchmark tests cover 27 IDP systems. IDPFold achieves a relative error of only −0.06 in radius of gyration (Rg) and a low RMSD of 0.65 ppm for Cα second-order chemical shifts compared to experimental values. This accuracy significantly outperforms all existing generative deep learning methods and rivals or even surpasses traditional all-atom MD simulations.
For drug design, the value of IDPFold lies in its dynamic nature. Static structure prediction only provides a "snapshot" of a protein, while IDPFold offers a "movie." The interactions between disordered proteins and ligands, and the functional switching driven by conformational changes, can now be systematically studied. This provides a completely new tool for drug screening targeting IDP-related diseases.
The model code is fully open source and supports single or batch FASTA sequence inference.
This finding was published in Advanced Science (2025).
WeChat Customer Service