Building an MCP Server that Uses the Client’s LLM for Translation
TL;DR: Keep your MCP server lean by delegating language tasks (like translation) to the client via ctx.sample(...). Below is a working pattern based on a Spaceflight News example, plus error-handling tips and how to wire it all together. More archetypes here: https://github.com/EvobyteDigitalBiology/mcp-archetypes.
Why use client-side sampling from an MCP server?
In the Model Context Protocol (MCP), servers provide tools; clients provide the model. Letting the client handle LLM work has nice benefits:
- Slim servers: Your server only fetches/structures data; no model dependency or GPU needed.
- User control: The client can pick the model, temperature, and policies.
- Compliance & privacy: The client owns the sampling context and constraints.
The example: translate news on demand
We’ll fetch today’s spaceflight headlines and summaries from the Spaceflight News API, then—only if requested—ask the client LLM to translate them.
Key pieces to notice
1. Server bootstrap
from fastmcp import FastMCP
mcp = FastMCP("SpaceNewsTranslation")
2. Fetch from Spaceflight News API
(See get_spaceflight_news_date)
- Uses
httpx.AsyncClient() - Endpoint:
https://api.spaceflightnewsapi.net/v4/articles/?published_at_gte={YYYY-MM-DD} - Returns a
List[Tuple[str, str]]of(title, summary)
API_BASE = "https://api.spaceflightnewsapi.net/v4/articles/"
async def get_spaceflight_news_date(date: datetime.date) -> List[Tuple[str, str]] | None:
endpoint = API_BASE + f'?published_at_gte={date.isoformat()}'
async with httpx.AsyncClient() as client:
response = await client.get(endpoint, headers={"Accept": "application/json"}, timeout=30.0)
response.raise_for_status()
news_summaries = []
for res in response.json()['results']:
news_summaries.append((res['title'], res['summary']))
return news_summaries
3. Expose an MCP tool
(See @mcp.tool() get_todays_spacenews)
- Gathers today’s items.
- If
language != "EN", builds a JSON-centric translation prompt. - Calls
ctx.sample(prompt)to offload translation to the client.
@mcp.tool()
async def get_todays_spacenews(ctx: Context, language: str = 'EN') -> List[Dict[str,str]]:
date = datetime.date.today()
news_today = await get_spaceflight_news_date(date)
if not news_today:
raise FastMCPError("API Call Failed.")
news_out = [{'title': t, 'summary': s} for (t, s) in news_today]
if language != 'EN':
prompt = f"""
Translate the title and the summary of each news entry into the language {language} (ISO639 format).
Return a JSON list of objects with keys "title" and "summary" only.
Input:
{news_out}
"""
try:
response = await ctx.sample(prompt)
except:
raise FastMCPError("Translation failed, LLM Sampling not available.")
else:
response = news_out
return response
4. Run the server (stdio transport)
if __name__ == "__main__":
mcp.run(transport='stdio')
How the flow works
- Client calls tool:
get_todays_spacenews(language="DE") - Server fetches English news →
news_out - Server asks client LLM via
ctx.sample(...)to translatenews_outto German - Client returns translated JSON → server returns it to the client/tool caller
If the client can’t sample (no model, or disabled), the server raises a clear FastMCPError("Translation failed, LLM Sampling not available.").
Error handling patterns you can reuse
- Wrap outbound calls:
response.raise_for_status()and atry/exceptaroundhttpxrequests. - Distinguish data fetch vs sampling failures:
- API error →
"API Call Failed." - LLM unavailable →
"Translation failed, LLM Sampling not available."
- API error →
- Keep messages end-user friendly but specific enough to debug.
When not to sample
- If the target language is English (
"EN"), skip sampling and return raw data. - If the client indicates no model/quotas, short-circuit with a clear error.
More MCP archetypes & patterns
You’ll find more complete examples (tools, transports, data fetching, validation) here:
👉 https://github.com/EvobyteDigitalBiology/mcp-archetypes
