Agent Setup10 min readMay 22, 2026

AI Chemistry with Gemini, CovaSyn MCP Tool Calls for Drug Discovery and Biologics

Gemini's tool-calling reliability got pharma-usable in 2026. This guide shows how to attach CovaSyn's 130 deterministic chemistry tools to a Gemini agent for drug discovery, biologics, and process development, via the Gemini CLI and the Gemini API on Vertex AI. With the ICLR 2026 benchmark numbers and a worked tool call.

Oliver Kraft

CovaSyn

Lead

Gemini is the second-strongest stack for chemistry agent workflows in pharma R&D in 2026, behind Claude only because of ecosystem reach. On the ICLR 2026 MolecularIQ benchmark, an untooled Gemini-3 reaches roughly the same baseline as GPT-5.5 (low 20s percent). Wire CovaSyn's 130 chemical MCP tools into a Gemini agent and accuracy lifts into the high 80s, same architecture as Claude, same lift, different runtime. This guide is the practical setup for Gemini CLI and the Gemini API on Vertex AI, with concrete drug-discovery and biologics workflows.

What are chemical MCP tools?

Chemical MCP tools are deterministic cheminformatics functions, ADMET, ICH M7 mutagenicity assessment, ICH Q1 stability kinetics, NMR and mass-spec interpretation, druglikeness, retrosynthesis hints, exposed to an LLM agent through the Model Context Protocol. The agent calls them like APIs; the responses are validated, version-pinned, and reproducible. In other words: the parts of chemistry that a language model should not be hallucinating get computed by deterministic functions instead.

The Model Context Protocol is the open standard Anthropic introduced in 2024 and that Anthropic, OpenAI, and major open-source projects have since 2025 jointly maintained as the agent-tool integration baseline. Google's Gemini ecosystem ships MCP-compatible tooling for both the Gemini CLI and Vertex-AI-hosted Gemini.

Why Gemini matters for chemistry agents

Three reasons Gemini is worth the integration effort, even though Claude has more pharma-enterprise mindshare today.

First, cost. Gemini is consistently among the lowest-cost frontier-class LLMs per million tokens. For high-volume screening campaigns, a CDMO triaging 30,000 impurities a month, this matters. The deterministic tool layer takes care of accuracy; the LLM cost becomes the dominant variable.

Second, throughput. Gemini's lighter-weight models run at significantly higher tokens-per-second than Claude's flagship, which matters for batch workflows where you want to process a 1,000-compound library overnight without watching latency.

Third, Google ecosystem integration. If your data sits in BigQuery, your protocols sit in Google Drive, and your reports go into Google Workspace, a Gemini-based agent stays inside one identity boundary. Less moving parts, fewer access reviews.

On the ICLR 2026 MolecularIQ benchmark, the lift pattern is structural, not model-specific. Claude Haiku 4.5, Claude Opus 4.7 and GPT-5.5 move from 21–41 percent baseline to 85–92 percent once CovaSyn MCP is wired in. Gemini 3.5 Flash, the cheapest model in the set, goes from 14 percent baseline to 76 percent with the same tool layer, a 5.5× lift. Three of four models land above 85, the fourth lands at 76. Full methodology: covasyn.com/en/benchmark.

How CovaSyn connects chemistry to Gemini

CovaSyn exposes 130 deterministic tools as an MCP server. From the Gemini agent's perspective, those tools look like any other tool registered in its function-calling list. The agent picks the right one for the question, calls it, gets a structured response, reasons over it, and replies.

Coverage by domain:

Drug discovery, ADMET, scaffold and BRICS fragment analysis, ICH M7 triage, druglikeness, similarity, off-target screening
Biologics and modalities, antibody developability, peptide profiling, ADC design, mRNA / oligo therapeutics, immunogenicity, viscosity
Process & formulation, DoE, response surface modeling, solubility, crystallization, formulation optimisation
Analytical, mass spec identification and fragmentation, NMR 1D/2D analysis and prediction, retention-time prediction

A worked tool call

Concrete: a med-chem question about solubility before an in-vivo study, asked to a Gemini agent with CovaSyn attached.

{
  "tool": "covasolve_predict",
  "arguments": {
    "smiles": "CC(=O)Oc1ccccc1C(=O)O",
    "solvents": ["water", "DMSO", "ethanol", "PEG400"],
    "temperature_k": 310
  }
}

Response, reproducible byte-for-byte across calls:

{
  "predictions": [
    { "solvent": "water",   "log_s": -1.72, "ci95": [-1.85, -1.58], "model": "covasolve-v2.4.1" },
    { "solvent": "DMSO",    "log_s":  0.94, "ci95": [ 0.81,  1.07], "model": "covasolve-v2.4.1" },
    { "solvent": "ethanol", "log_s":  0.21, "ci95": [ 0.08,  0.34], "model": "covasolve-v2.4.1" },
    { "solvent": "PEG400",  "log_s":  0.55, "ci95": [ 0.42,  0.68], "model": "covasolve-v2.4.1" }
  ],
  "audit_id": "cs-2026-05-22-9f8e7d6c"
}

Gemini takes the structured payload, ranks the solvents, recommends DMSO with a PEG400 fallback for the in-vivo formulation, and notes the confidence intervals in the answer to the user. Every numeric value is reproducible; if QA asks "where did the 0.94 logS for DMSO come from", the audit log has the version-pinned answer.

Cost: cents per compound, plus the throughput advantage

For an ADMET-plus-ICH-M7 screening on a 1,000-compound library, a rough comparison:

| Path | Throughput | Cost per compound | Cost per 1,000 | |---|---|---|---| | Manual cheminformatician (8h day) | ~80 / day | €40–80 | €40,000–80,000 | | Untooled Gemini (no MCP) | ~30,000 / day | €0.02 | €20, ~22% accurate | | Gemini + CovaSyn MCP | ~30,000 / day | €0.06 | €60, ~88% accurate, audit-logged | | Outsourced CRO | ~1,000 / week | €100–500 | €100,000–500,000 |

The throughput-plus-accuracy combination is what makes large screening campaigns viable. The cost-per-compound is small enough to run again with different priors when the next data lands.

Where this breaks down

Three honest limits.

Tool-use reliability under multi-step plans. Gemini's tool-calling is excellent for single-tool questions and good but not perfect for chains of 4+ tool calls in a row. If your workflow needs 8 sequential tool calls with intermediate reasoning, Claude Opus is currently more reliable.
Out-of-distribution chemistry. If the input compound is far from the training distribution of the underlying Q-SAR engines, the tool returns low confidence. The agent should report that; a human still decides.
EU regulated workflows. Gemini hosted on Vertex AI runs in Google Cloud regions; for EU-Annex-11 contexts you need to confirm your region choice and accept the data residency. CovaSyn's chemistry layer is DACH-hosted independently, with a self-hosted option for the IT-Security cases where any external cloud is off the table.

Getting started, Gemini CLI

The fastest path for developers.

```bash # Install Gemini CLI (if not already) npm i -g @google/gemini-cli

# Register the CovaSyn MCP server gemini mcp add covasyn \ --command npx \ --args -y,@covasyn/mcp-client \ --env COVASYN_API_KEY=sk-cova-…

# Verify gemini mcp list ```

The 130 CovaSyn tools appear in the Gemini CLI. From the next gemini chat session, the agent can call them. Free tier of CovaSyn (100 credits / week) covers the typical evaluation flow.

Getting started, Gemini API on Vertex AI

For your own agent stack. The CovaSyn MCP server speaks standard MCP, and the Gemini Python SDK can attach MCP tools to a function-calling session:

```python import google.generativeai as genai from mcp.client.stdio import stdio_client

genai.configure(api_key="…") model = genai.GenerativeModel("gemini-3-pro")

async with stdio_client("npx", "-y", "@covasyn/mcp-client", env={"COVASYN_API_KEY": "sk-cova-…"}) as mcp: tools = await mcp.list_tools() chat = model.start_chat(tools=tools) response = chat.send_message("ADMET profile for compound X") ```

For Vertex AI deployments inside a regulated workload, route the same CovaSyn MCP client to a self-hosted CovaSyn container running inside your VPC. Same protocol, internal-only network path.

FAQ

Does Gemini CLI support MCP servers?

Yes, since the CLI gained MCP integration in 2025. The exact command surface evolves; confirm against current docs if you hit a flag mismatch.

Is CovaSyn output deterministic when called from Gemini?

Yes. Tool output is deterministic regardless of which LLM is the caller. The version-pinned chemistry engines return the same payload for the same input every time.

What if my workflow needs Vertex AI EU data residency?

Pick a Vertex AI region inside the EU when you configure the project. The CovaSyn side is already DACH-hosted. For setups that must stay fully off external cloud, use the self-hosted CovaSyn container.

Can I switch from Claude to Gemini without rebuilding the chemistry layer?

Yes, that is the whole point of MCP. The same CovaSyn MCP server attaches to Claude (Claude guide), ChatGPT (ChatGPT guide), Cursor and open-weight models (Cursor & open-weights guide). The LLM is swappable, the chemistry layer is the constant.

Sources

ICLR 2026 MolecularIQ benchmark: covasyn.com/en/benchmark.
AI for chemistry pillar: covasyn.com/en/ai-for-chemistry.
Model Context Protocol specification: modelcontextprotocol.io.
Comparison of chemistry MCP servers: The 5 leading chemistry MCP servers for pharma R&D.

CovaSyn MCP

Scientific tools in your AI workflow.

130+ functions for pharma, biotech and chemistry. Free tier instantly active.

See CovaSyn MCP →