Skip to content

Creating Async-Safe MCP Servers

Facility and endstation tools should live in separate MCP server repositories instead of being added directly to EAA. This keeps EAA focused on orchestration while each facility owns its instrument-control dependencies, deployment model, and safety policy.

The recommended server design is a small Python package with separate modules for the command protocol, ZMQ client, blocking worker, FastMCP frontend, and launcher. External MCP servers should not depend on EAA packages. They should expose normal MCP tools whose names, arguments, schemas, and descriptions match the facility's real operations.

EAA compatibility is deliberately narrow:

  • Operational tools should be normal MCP tools with facility-specific names and schemas.
  • Tool argument names should match the facility API and should not be shaped around EAA task-manager internals.
  • The only EAA-specific support tool that a stateful server should expose is get_attribute_payload(name: str).

Process Pattern

Use a three-layer process boundary:

EAA or another MCP client
  -> MCP over HTTP
  -> FastMCP frontend process
  -> ZMQ request/reply
  -> instrument worker process

The FastMCP process owns MCP and HTTP request handling. The instrument worker process owns the instrument-control library, device state, blocking control logic, and any runtime required by the control stack. ZMQ carries a small JSON-serializable command envelope between the two processes.

This split is intentional. Many instrument libraries are synchronous, stateful, or built around their own event loop or callback runtime. Tool worker functions should therefore be synchronous and blocking by design. The MCP frontend may be async internally, but it should forward each request to the worker without running the control library in the MCP event loop.

Expose three console scripts with a consistent launch convention:

facility-suite
facility-suite-worker
facility-suite-mcp

The top-level launcher starts and supervises the two child processes. It should start the worker first, wait for a health response over ZMQ, then start the MCP server. It should also terminate both child processes on shutdown or when one child exits unexpectedly.

Recommended launcher options:

facility-suite \
  --worker-endpoint tcp://127.0.0.1:5555 \
  --worker-startup-timeout-s 10 \
  --request-timeout-ms 30000 \
  --mcp-host 0.0.0.0 \
  --mcp-port 8050 \
  --mcp-path /mcp

Recommended worker options:

facility-suite-worker --bind tcp://127.0.0.1:5555

Recommended MCP frontend options:

facility-suite-mcp \
  --worker tcp://127.0.0.1:5555 \
  --timeout-ms 30000 \
  --host 0.0.0.0 \
  --port 8050 \
  --path /mcp

The exact package and script names should be facility-specific, but the option names and behavior should stay consistent.

Tool Naming and Argument Naming

Do not name remote tools or arguments for EAA's internal classes. Name them for the facility operation they actually perform. For example, an endstation might choose names such as collect_fly_scan, move_zone_plate, set_mirror_voltages, or measure_probe_profile. Another facility might choose completely different names for equivalent operations.

The same rule applies to arguments. Use the names that are natural for the instrument API and clear to users of that facility's MCP server. EAA should not force names such as image coordinates, scan sizes, scan steps, parameter vectors, or line-scan fields at the remote MCP boundary.

The server should not duplicate tools or aliases solely to match a particular client's local terminology.

State Sync Contract

The only standardized EAA support tool for stateful logic-driven workflows is:

def get_attribute_payload(name: str) -> object:
    ...

This tool returns the current value of a server-side attribute identified by name. The name values are server-defined attribute identifiers, not EAA attribute names.

The support tool name should be exactly get_attribute_payload or a dotted tool name ending with .get_attribute_payload. Attribute names are facility-defined; examples include detector.last_frame, detector.last_metadata, stage.last_position, or any other stable names that fit the server's state model.

Payload Format

get_attribute_payload may return any JSON-serializable scalar, list, or dictionary directly. For NumPy arrays or other dense numerical arrays, return a portable array payload:

{
  "encoding": "numpy_base64",
  "dtype": "float32",
  "shape": [256, 256],
  "data": "base64-encoded contiguous array bytes"
}

EAA decodes this payload with BaseTool.decode_array_payload. External MCP servers do not need to depend on EAA to produce it:

import base64
import numpy as np


def encode_array_payload(array: np.ndarray) -> dict:
    contiguous = np.ascontiguousarray(array)
    return {
        "encoding": "numpy_base64",
        "dtype": str(contiguous.dtype),
        "shape": list(contiguous.shape),
        "data": base64.b64encode(contiguous.tobytes()).decode("ascii"),
    }

Use JSON literals for metadata, histories, counters, configuration flags, and small structured records. Use the array payload only for dense arrays.

Server-Side State

State that a logic-driven task manager will read later should be owned by the instrument worker or by the MCP frontend. The owner is a facility decision. The only requirement is that the MCP frontend can return the selected server-side attributes through get_attribute_payload.

Typical state includes:

  • the latest measurement array or image;
  • previous or initial arrays needed for registration or comparison;
  • pixel sizes, timestamps, file paths, or scan metadata;
  • acquisition histories or parameter histories;
  • status flags or configuration values needed by a later local decision.

Do not build EAA-specific state names into the server. Keep facility state names stable.

Artifacts and Large Data

Tool results should remain JSON-serializable. Use file paths for artifacts that humans or downstream tools need to inspect, such as display PNGs, logs, or raw data files. The field names for those paths are part of the facility API or the facility workflow; they are not standardized by this server contract.

Dense numerical arrays that EAA must keep in memory should be exposed through get_attribute_payload rather than embedded directly in ordinary tool results. Very large arrays may still be written to disk and returned as paths when the downstream workflow is path-based.

Worker Protocol

Use a small JSON command envelope between the FastMCP frontend and the instrument worker:

{
  "id": "uuid",
  "method": "facility_specific_operation",
  "params": {"facility_argument": 1.0}
}

Successful responses contain:

{
  "id": "uuid",
  "status": "ok",
  "result": {}
}

Failed responses contain:

{
  "id": "uuid",
  "status": "error",
  "error": "message"
}

Keep the worker boundary JSON-serializable. Large numerical arrays should be kept in worker/frontend state and returned through get_attribute_payload or written as artifacts and returned by path.

Safety and Concurrency

Instrument-control operations should be serialized unless the facility control stack explicitly supports concurrent calls. The worker process is the right place to enforce serialization, hardware interlocks, argument bounds, timeout policy, and recovery behavior.

The MCP frontend should treat worker timeouts and worker-side exceptions as ordinary tool failures with clear error messages. It should not leave requests running indefinitely in the MCP event loop.

Minimal Example

This example shows the EAA-specific support surface only. The operational tool names remain facility-specific.

import base64
import numpy as np
from fastmcp import FastMCP

app = FastMCP("Example Facility Server")
state = {
    "detector.last_frame": None,
    "detector.last_metadata": {},
}


def encode_array_payload(array: np.ndarray) -> dict:
    contiguous = np.ascontiguousarray(array)
    return {
        "encoding": "numpy_base64",
        "dtype": str(contiguous.dtype),
        "shape": list(contiguous.shape),
        "data": base64.b64encode(contiguous.tobytes()).decode("ascii"),
    }


@app.tool()
def collect_detector_frame(exposure_ms: float, roi_width: int, roi_height: int) -> dict:
    frame = np.zeros((roi_height, roi_width), dtype=np.float32)
    state["detector.last_frame"] = frame
    state["detector.last_metadata"] = {
        "exposure_ms": exposure_ms,
        "roi_width": roi_width,
        "roi_height": roi_height,
    }
    return {"status": "ok"}


@app.tool()
def get_attribute_payload(name: str) -> object:
    value = state[name]
    if isinstance(value, np.ndarray):
        return encode_array_payload(value)
    return value

Agent Skill

An agent-facing playbook for creating compatible servers is available at developer_tools/create_eaa_compatible_mcp_servers/SKILL.md. Add that directory to an agent skill path when you want an agent to scaffold or review a new facility MCP server against this pattern.