GPT-Rosalind gets its first major upgrade: agentic coding, new benchmarks, and global research access

OpenAI published a significant update to GPT-Rosalind on June 3, 2026, roughly six weeks after the model’s initial launch in April. The update is the first major capability upgrade to OpenAI’s domain-specific life sciences model, and it covers a lot of ground: a new model engine, three new evaluation benchmarks, two Codex-powered plugins, and the opening of global research preview access to eligible organisations for the first time.

Here is what changed and what it means in practice.

A quick recap: what GPT-Rosalind is

GPT-Rosalind, named after the chemist Rosalind Franklin whose X-ray crystallography work was foundational to understanding DNA’s structure, is OpenAI’s first purpose-built domain-specific model series. It is fine-tuned specifically for pharmaceutical and academic life sciences research, targeting the multi-step reasoning demands of genomics, medicinal chemistry, and protein engineering. General-purpose models can answer biology questions reasonably well; GPT-Rosalind is built to reason through multi-step scientific workflows with precision, not just retrieve knowledge.

The original launch in April 2026 was limited-access. This update opens it up further and substantially deepens what the model can do.

What’s new: GPT-5.5 under the hood

The updated GPT-Rosalind is built on GPT-5.5, which brings meaningfully stronger agentic coding and tool-use capabilities. In practical terms, this means the model can now construct an end-to-end plan for a scientific task, write and execute the required code, use specific lab tools, and present its reasoning for human review, rather than simply answering questions or generating text outputs.

This matters because life sciences research rarely involves a single, clean query. It involves chaining together literature review, data analysis, hypothesis generation, experimental design, and result interpretation. A model that can handle that chain, with a researcher maintaining oversight, is meaningfully more useful than one that handles each step in isolation.

Three new benchmarks to track real performance

OpenAI introduced three new benchmarks alongside this update, which is worth paying attention to. Benchmarks in AI are often criticised for measuring what is easy to measure rather than what matters. These three are designed with a different intent.

LifeSciBench takes an end-to-end view of scientifically valuable work, drawing tasks from six workflow areas: evidence handling, analysis, design and optimisation, scientific reasoning, validation and operations, and translation and communication. It is externally expert-judged rather than relying on automated scoring alone.

MedChemBench tests realistic medicinal chemistry workflows, including multimodal chemical structure understanding, structure-activity relationships, prediction of drug potency and toxicity, ADME properties, lead optimisation, and retrosynthesis. GPT-Rosalind scores 27.5% against GPT-5.5’s 25.1%, while using 7.2% fewer tokens to get there.

LabWorkBench tests the model’s ability to help scientists in actual wet-lab contexts, linking experimental perturbations to outcomes across troubleshooting and optimisation tasks. GPT-Rosalind scores 63.2% versus GPT-5.5’s 55.8%, using 5.3% fewer tokens.

On the existing GeneBench (the agentic evaluation for long-horizon genomics and quantitative biology analysis), GPT-Rosalind achieves 21.6% versus GPT-5.5’s 20.4%, while using 31% fewer tokens.

What those numbers actually mean for you

The efficiency gains are arguably as important as the accuracy improvements. Using fewer tokens to reach a higher score means faster results and lower compute costs at scale. For organisations running GPT-Rosalind across large research programmes, that efficiency compounds quickly.

On the accuracy figures themselves: a 27.5% pass rate on MedChemBench means the model still fails on nearly three-quarters of tested tasks. That is not a criticism so much as an honest framing. GPT-Rosalind at this stage is an acceleration tool for expert researchers, not an autonomous drug designer. The benchmarks are designed to track improvement over time, and these figures represent the current state of the technology, not its ceiling.

Two new Codex-powered plugins

Two new plugins extend GPT-Rosalind’s intelligence into practical execution:

The Life Sciences Research plugin brings sourced evidence retrieval and biological interpretation into the same workspace, so researchers are not constantly switching between tools to cross-reference findings.

The Life Sciences NGS Analysis plugin handles next-generation sequencing workflows more directly. Given a bulk RNA-seq sample sheet, FASTQ bundle, and reference files, it can produce a QC-reviewed counts bundle with MultiQC, Salmon matrices, provenance records, and explicit caveats. The outputs are auditable, which matters for research that needs to be reproducible and documented.

Both plugins are accessible to all users through Codex. Qualified GPT-Rosalind enterprise users can additionally use GPT-Rosalind itself to power them.

OpenAI has also added interactive viewers for biologically native file formats, including sequence, alignment, and protein structure files. Researchers can inspect raw evidence directly as the model reasons through a workflow rather than exporting outputs to separate visualisation tools.

Global research preview access, and Novo Nordisk joins

The most significant access change in this update is the opening of GPT-Rosalind to eligible organisations globally. Access is through a trusted-access deployment structure that requires institutional biosafety oversight, strong governance, and controlled enterprise-grade security. This is not a public API release. Organisations need to demonstrate clear public benefit and scientific legitimacy.

OpenAI is also now offering a managed workspace option for qualified organisations that do not have an existing Enterprise account, which removes one of the practical barriers to access.

Novo Nordisk joins as a new partner in this global expansion. The partnership is focused on scaling medical research, with GPT-Rosalind helping research teams connect evidence across literature, genomics, transcriptomics, sequence data, structure, and experimental results to move from data to clearer research decisions more quickly. Novo Nordisk had signed a broader strategic partnership with OpenAI in April 2026 covering drug discovery, manufacturing, supply chain, and commercial operations. This update formalises the research model piece of that relationship.

Earlier partners in the GPT-Rosalind programme include Amgen, Moderna, the Allen Institute, and Thermo Fisher Scientific.

The broader context

Drug development in the United States takes roughly 10 to 15 years from target discovery to regulatory approval. The research complexity is not the only constraint; the workflows themselves are a bottleneck. A model that can reliably assist with multi-step analytical tasks, while keeping researchers in scientific control, addresses both problems.

GPT-Rosalind is also now connected to the Rosalind Biodefense initiative, announced May 29, 2026, which extends trusted access to vetted US government agencies and allied partners for pandemic preparedness and biodefense applications.

For pharmaceutical researchers, academic life scientists, and biotech teams thinking about where this fits, the short version is this: the June 3 update makes GPT-Rosalind meaningfully more capable, more efficient, and more accessible than it was at launch. The benchmarks give you honest numbers to work with. And the Codex plugins begin to close the gap between a model that understands scientific workflows and one that can actively participate in them.