• 25 Feb, 2025

Vevo Therapeutics Open Sources Tahoe-100M, the World's Largest Single-Cell Dataset, as the Inaugural Contribution to Arc Institute's New Virtual Cell Atlas

Vevo Therapeutics Open Sources Tahoe-100M, the World's Largest Single-Cell Dataset, as the Inaugural Contribution to Arc Institute's New Virtual Cell Atlas

300 million single cell atlas now accessible to the scientific community comprised of Vevo's Tahoe-100M, mapping 60,000 drug-patient interactions, and Arc's AI-curated scBaseCamp 200 million cell dataset

Generated using Vevo's Mosaic platform, Tahoe-100M leveraged Parse Biosciences' GigaLab for single cell sample preparation and Ultima Genomics for sequencing. 

PALO ALTO, Calif. and SOUTH SAN FRANCISCO, Calif., Feb. 25, 2025 -- In a landmark move to advance AI-driven biological research, Arc Institute and Vevo Therapeutics announced today that they have partnered on the first release of the Arc Virtual Cell Atlas—the largest and most biologically diverse public resource for single-cell transcriptomic data across species, tissues, and experimental and perturbation conditions, starting with data from over 300 million unique cells. This data is open source and freely accessible via Arc's website as of February 25, 2025.

The atlas currently includes single-cell gene expression data from two massive datasets:

  • Vevo's Tahoe-100M, is the world's largest single-cell dataset, 50x larger than all public drug-perturbed data combined. It includes 100 million cells and maps 60,000 drug-patient interactions, measuring cellular response across 50 cancer cell lines to 1,200 drug perturbations. Tahoe-100M was generated using Vevo's Mosaic Technology, the first platform to make pan-cancer testing of drugs at single cell resolution scalable, and with support from Parse Biosciences' GigaLab leveraging its single-cell RNA sequencing capabilities.

  • Arc's scBaseCamp is the first single-cell RNA sequencing data repository from public data to be curated and reprocessed at scale using AI agents. This gene expression data from another 200 million cells from 21 different species was sourced from public repositories and has been standardized to ensure interoperability for optimal use by machine learning models.

"What makes the Arc Virtual Cell Atlas particularly powerful is not just its scale, but that now researchers can analyze together both observational natural cell states and cells that have been deliberately perturbed by drugs or chemicals to see how they respond," says Dave Burke (@davey_burke) Arc Institute's Chief Technology Officer. "We're grateful to partner with Vevo on our first release of this resource, leveraging their large-scale Tahoe-100M cell dataset, which is crucial for developing predictive models that can simulate cellular responses to perturbations, potentially reducing years of laboratory work to computational queries that take minutes."

"Something extraordinary happened in the last few years: emergence of AI models that can predict protein structure and function," says Nima Alidoust (@nalidoust), Chief Executive Officer and Co-founder of Vevo Therapeutics. "Our mission at Vevo is to go a huge step further: build AI models of human cells to predict how diseased cells interact with potential drug molecules."

"These models need massive amounts of observational and drug-perturbed single-cell data, leaps beyond what is publicly available today," says Johnny Yu, Chief Scientific Officer at Vevo. "Our Mosaic platform overcomes this fundamental challenge; it can generate single-cell datasets such as Tahoe-100M at a scale that was not possible before."

"We are open sourcing Tahoe-100M to help start a new movement in biological modeling that goes beyond us," says Alidoust. "Releasing it on Arc's Virtual Cell Atlas is the obvious choice as it aims to precisely do that."

The Arc Virtual Cell Atlas is now accessible on this portal: https://arcinstitute.org/tools/virtualcellatlas

Vevo's Tahoe-100M Preprint: https://www.biorxiv.org/content/10.1101/2025.02.20.639398v1 

Arc's scBaseCamp Technical Report: https://arcinstitute.org/manuscripts/scBaseCamp

About the Arc Institute

The Arc Institute (@arcinstitute) is an independent nonprofit research organization located in Palo Alto, California, that aims to accelerate scientific progress and understand the root causes of complex diseases. Arc's model gives scientists complete freedom to pursue curiosity-driven research agendas and fosters deep interdisciplinary collaboration.

About Vevo Therapeutics

Vevo Therapeutics is a biotechnology company using its in vivo drug discovery platform and next-generation AI models to uncover better drugs for more patients. The company's Mosaic platform is the first to make multi-patient drug screening data scalable, with single-cell precision, to better represent patient diversity in drug response. Vevo is using Mosaic to build the world's largest atlas of how drugs interact with patient cells and to train disease-relevant models of human cells for discovering novel targets and drugs undetectable by other technologies.

Located in South San Francisco, CA, Vevo was founded by a team of inventors and thought leaders who have discovered drugs for "undruggable" targets and invented novel methods in genomics, computational biology, and chemistry. Learn more at www.vevo.ai and follow us on LinkedIn and X.

Vevo

This News is brought to you by Qube Mark, your trusted source for the latest updates and insights in marketing technology. Stay tuned for more groundbreaking innovations in the world of technology. 

PR Newswire

PR Newswire empowers communicators to identify and engage with key influencers, craft and distribute meaningful stories, and measure the financial impact of their efforts. Cision is a leading global provider of earned media software and services to public relations and marketing communications professionals.