MCP / Tech8 min readMay 22, 2026

RDKit over MCP: where the open-source toolkit stops, and where CovaSyn picks up

RDKit MCP servers give AI agents deterministic cheminformatics. Where the limit sits, no trained ML predictions, no hosting, no compliance, often too many generic tools, and what CovaSyn adds on top. With an honest recommendation for when a pure RDKit MCP server is enough.

Oliver Kraft

CovaSyn

Key takeaways

RDKit over MCP is real and useful. Several community servers (e.g. tandemai, mcp_rdkit) already give AI agents deterministic access to RDKit functions.
We build on RDKit ourselves, the question is not "RDKit or CovaSyn" but "where does a pure RDKit MCP server end".
The boundary of RDKit MCP: no trained ML predictions (solubility, toxicity), no spectral interpretation or stability kinetics, no hosting / auth / compliance, and often too many, too generic tools for an agent.
CovaSyn delivers RDKit's determinism plus 17 tool suites including ML models, hosted, GDPR-compliant and curated for agent ergonomics.

RDKit over MCP is no longer theory

Anyone giving an AI agent chemistry tools in 2026 quickly runs into RDKit, the de-facto standard toolkit in cheminformatics, open source, battle-tested. And there are now several ways to wire RDKit to a language model through the Model Context Protocol (MCP): tandemai's rdkit-mcp-server exposes almost every RDKit function to an agent, and mcp_rdkit can be installed with pip install mcp-rdkit. Add several DIY guides and the "RDKit Copilot".

That is a good development, and the right first step. When an agent needs a molecular weight, a logP descriptor or a Murcko scaffold, an RDKit MCP server is exactly the right tool: deterministic, correct, free.

At CovaSyn this is not a competition but a foundation. Our base layer (CovaBasic) uses RDKit under the hood. The honest question is therefore not "RDKit or CovaSyn" but: where does a pure RDKit MCP server end, and what do you have to build yourself from there?

Where RDKit-over-MCP runs into its boundary

RDKit excels at what it was built for: deterministic computation on the molecular graph. Descriptors, substructure search, fingerprints, canonical SMILES, scaffold decomposition. But that is also where the feature set ends, and four gaps surface in a productive R&D workflow.

1. No trained prediction models.

RDKit computes what can be derived directly from structure. It does not predict solubility, toxicity or ADMET, that requires trained ML models on experimental data. These are exactly the questions that matter in drug development: "How soluble is this compound?", "Is it mutagenic?". A pure RDKit MCP server cannot answer them; an agent that tries anyway falls back on the language model's guesswork, the very problem tools were supposed to solve.

2. No analytics, formulation or process data.

NMR, MS, IR and UV/Vis interpretation, Arrhenius stability kinetics, solubility versus temperature and solvent, DoE / RSM optimization, all of this sits outside RDKit's core domain. These are separate scientific domains with their own models.

3. No hosting, no authentication, no compliance.

Most RDKit MCP servers run locally (stdio) on the developer's machine. For a team in a regulated environment that means: self-host, build authentication, manage multi-user access, ensure availability, settle DPA / GDPR. That is not cheminformatics, it is platform and compliance work, and it lands entirely with the user.

4. Agent ergonomics: many tools are not the same as good tools.

A common approach is to expose every RDKit function as its own tool. To a human, "access to everything" sounds great. To an agent it is often the opposite: hundreds of nearly identical tools overload the selection, raise the error rate and worsen tool choice. Good agent design means few, clearly delineated, well-described tools, curated, not auto-generated.

What CovaSyn adds on top

CovaSyn starts where the RDKit MCP server ends and keeps that determinism as its foundation.

Deterministic descriptors: RDKit MCP ✓ · CovaSyn ✓ (RDKit-based)
ML predictions (solubility, tox, ADMET): RDKit MCP no · CovaSyn ✓
Analytics (NMR, MS, IR, UV/Vis): RDKit MCP no · CovaSyn ✓
Stability, formulation, DoE: RDKit MCP no · CovaSyn ✓
Uncertainty / applicability domain: RDKit MCP no · CovaSyn ✓
Hosted, auth, multi-user: RDKit MCP usually local · CovaSyn ✓
GDPR / DPA: RDKit MCP on the user · CovaSyn ✓
Tool design: RDKit MCP often "all functions" · CovaSyn curated for agents

Concretely: 17 tool suites with roughly 199 tools, from structure analysis through solubility prediction with calibrated uncertainty and ICH M7 toxicology to spectral interpretation and stability kinetics. Every prediction, where it makes sense, with an uncertainty interval and an applicability-domain flag that signals when a model is extrapolating. Hosted, with API-key authentication, GDPR-compliant and with DPA, without the team running a platform itself.

The point is not that CovaSyn has "more functions". The point is that the layers above RDKit, trained models, domain-specific analytics, hosting, compliance, agent ergonomics, are the actual effort in a regulated R&D workflow. That is the effort we take off the table.

When RDKit MCP is enough, and when it is not

To keep this fair, the honest decision guide:

A pure RDKit MCP server is enough if

you are an individual or small dev team that needs deterministic descriptors, works locally, is happy to self-host, and has no ML prediction or compliance needs. That is a completely legitimate setup, and free.

CovaSyn is worth it if

you need predictions beyond pure descriptors (solubility, tox, ADMET, analytics, stability), if a team needs to work together with traceability, if GDPR / DPA matters, or if you simply do not want to run a platform yourself. The broader market overview across several chemistry MCP approaches sits in a separate post: Chemistry MCP servers compared 2026.

Try it yourself

The free tier lets you attach CovaSyn directly to your agent, Claude, ChatGPT, Cursor or Copilot, and see the difference between a pure descriptor and a trained prediction in your own workflow. 100 credits per week. → See CovaSyn MCP

FAQ

Is there an RDKit MCP server?

Yes, several. Open-source projects like tandemai/rdkit-mcp-server and mcp_rdkit give AI agents deterministic access to RDKit functions through the Model Context Protocol. They mostly run locally and cover descriptors, substructure search and visualization.

What is the difference between RDKit MCP and CovaSyn?

RDKit MCP servers deliver deterministic cheminformatics computation. CovaSyn builds on that and adds trained ML predictions (solubility, toxicity, ADMET), analytics interpretation, stability and formulation tools, calibrated uncertainty, plus hosted GDPR-compliant multi-user setup.

Can RDKit predict solubility or toxicity?

Not directly. RDKit computes descriptors derivable from structure. Predictions like solubility or mutagenicity require trained ML models on experimental data, that layer sits outside RDKit.

Do I have to self-host an RDKit MCP server?

Generally, yes. Most run locally over stdio. Anyone who needs team access, authentication, availability and GDPR / DPA has to set it up themselves, or use a hosted platform like CovaSyn.

Does CovaSyn use RDKit?

Yes. CovaSyn's deterministic base layer is RDKit-based. CovaSyn is an extension covering the layers RDKit alone does not, not a replacement.

CovaSyn MCP

Scientific tools in your AI workflow.

130+ functions for pharma, biotech and chemistry. Free tier instantly active.

See CovaSyn MCP →