Chapter 04

消息协议与 Tool Calling

深入 OpenAI 兼容协议的每个字段，看 Hermes 怎么用它跨 30 个不同的 Provider 工作——以及怎么处理它们各自的"小怪癖"。

本章约 6,400 字阅读 ~25 分钟关键词：OpenAI Protocol · Function Calling · Reasoning · Provider Quirks

上一章我们看了循环结构。这一章我们看循环里传输的数据格式——messages 数组、 tool_calls 结构、reasoning content 字段。这些字段名看起来都是 OpenAI 定的标准，但当 Hermes 同时支持 30 个 provider 时，每家都有自己的小怪癖。这一章讲怎么把"事实标准" 落地到代码。

4.1为什么 OpenAI 协议成了事实标准

2023 年 6 月 OpenAI 加 function calling 字段时，没人想到这会变成业界标准。但发生了。 Anthropic、Google、DeepSeek、Moonshot、阿里、字节、Mistral……所有人都做了 OpenAI 兼容 endpoint。原因有三个：

市场惯性：早期所有 SDK 都是写给 OpenAI 的。新 provider 想拿用户，最便宜的路径是"装成 OpenAI"。
OpenRouter 等聚合层：聚合 200 多个模型的服务都用 OpenAI 协议做统一接口。这进一步强化标准。
Schema 设计够好：messages + tools 这种格式简洁、可扩展、可流式。竞争对手挑不出明显缺陷。

结果：Hermes 不用写 30 个 provider 的客户端，它只用 OpenAI Python SDK（openai==2.24.0， exact-pin），把不同 provider 的 base_url 替换成各家的兼容 endpoint 就行。细节差异通过 ProviderProfile 抽象处理（第 12 章）。

4.2消息四种角色

OpenAI 协议定义四种 role。每种都有明确语义和字段：

system — 系统指令

{
    "role": "system",
    "content": "You are Hermes, an AI assistant..."
}

一个 conversation 通常只有一条 system message，在最开头。它定义 Agent 的身份、风格、可用工具的提示等。这是缓存友好的关键——下章详谈。

user — 用户输入

{
    "role": "user",
    "content": "找一下 Python 3.13 的发布日期"
}

多模态 user 消息：

{
    "role": "user",
    "content": [
        {"type": "text", "text": "这张图里有什么？"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
    ]
}

当 content 是数组时，每个 item 是不同模态的"内容块"。所有支持视觉的 provider 都用这个格式。

assistant — 模型输出

最复杂的角色。可能含 content、tool_calls、reasoning。

# 纯文本回复
{"role": "assistant", "content": "Python 3.13 于 2024 年 10 月 7 日发布。"}

# 含 tool_calls
{
    "role": "assistant",
    "content": "我来查一下",        # content 可以并存
    "tool_calls": [
        {
            "id": "call_abc123",
            "type": "function",
            "function": {
                "name": "web_search",
                "arguments": '{"query": "Python 3.13 release"}'
            }
        }
    ]
}

# 含 reasoning（Anthropic Extended Thinking / DeepSeek R1）
{
    "role": "assistant",
    "content": "Python 3.13 于 2024 年 10 月 7 日发布。",
    "reasoning_content": "用户问的是 Python 版本发布日期..."
}

tool — 工具结果

{
    "role": "tool",
    "tool_call_id": "call_abc123",    # 必须和 assistant.tool_calls[i].id 一致
    "content": "Search results: Python 3.13 released on Oct 7 2024..."
}

关键不变量对每一个 assistant.tool_calls[i]，必须有一条对应的 tool 消息（用 id 匹配）。这是协议要求。Hermes 在打断时不杀工具进程，就是为了保证这条不变量。破了协议会直接 HTTP 400。

4.3一个完整 turn 的消息演化

用户问"找 Python 3.13 发布日期"，整轮消息列表的演化：

# Turn 开始
messages = [
    {"role": "system", "content": "You are Hermes..."},
    {"role": "user", "content": "找 Python 3.13 发布日期"},
]

# 第 1 次 LLM 调用 → 返回 tool_calls
response_1 = {
    "role": "assistant", "content": None,
    "tool_calls": [{"id": "c1", "function": {
        "name": "web_search", "arguments": '{"q":"Python 3.13 release"}'}}]
}
messages.append(response_1)

# 执行 web_search，得到结果
messages.append({
    "role": "tool", "tool_call_id": "c1",
    "content": "Python 3.13 released on October 7, 2024..."
})

# 第 2 次 LLM 调用 → 返回最终文本
response_2 = {
    "role": "assistant",
    "content": "Python 3.13 于 2024 年 10 月 7 日发布。"
}
messages.append(response_2)

# Turn 结束

注意：

messages 是累积的，不是替换。Turn 之间也保留（除非压缩）。
第 1 次 LLM 调用的 content 可以为 null（只有 tool_calls）。
第 2 次 LLM 调用前，messages 末尾必须是 tool。否则协议错。

4.4tool_calls 的 schema 详解

当 LLM 决定调工具时，response.choices[0].message.tool_calls 是数组。每个元素：

{
    "id": "call_abc123",         # 唯一 id，提供方生成
    "type": "function",           # 目前只有这一种
    "function": {
        "name": "web_search",
        "arguments": '{"query": "..."}'  # JSON 字符串！不是对象
    }
}

一个反直觉的点：arguments 是JSON 字符串，不是 JSON 对象。为什么？因为流式输出时，对象不能流式发——你不知道什么时候 }。但字符串可以一段段发。这就是为什么 Hermes 要 json.loads(tc.function.arguments) 解析。

tool schemas (送给 LLM 的)

这是 Agent → LLM 方向的字段。在 API call 参数 tools 里：

tools = [{
    "type": "function",
    "function": {
        "name": "web_search",
        "description": "Search the web. Returns top 5 results.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query"
                },
                "max_results": {
                    "type": "integer",
                    "default": 5
                }
            },
            "required": ["query"]
        }
    }
}]

parameters 是标准 JSON Schema。模型读这个 schema 决定怎么调。 description 极其关键——它是 LLM 选用哪个工具的唯一指南。第 6 章会详谈怎么写好的 schema description。

4.5Reasoning Content 的多家差异

2024 年 9 月 OpenAI o1 发布，让 reasoning 第一次成为 API 一等公民—— 模型先在内部"想"很多，再输出最终答案。从此 reasoning content 成了协议字段。问题是：每家 provider 字段名不一样。

Provider	Reasoning 字段	特殊行为
OpenAI (o1/o3)	不暴露（在 API 内部消耗 tokens）	计费但不返回，不需要 echo back
Anthropic (Claude 3.7+)	`reasoning_content` + `thinking` blocks	必须按原样 echo back 否则 400
DeepSeek (R1, V4+)	`reasoning_content`	V4+ 默认 thinking on，会要求 echo
Qwen Thinking	inline `<think>...</think>` in content	需要从 content 里 strip
OpenRouter / Mixed	`reasoning_details` 数组	OpenRouter 包了一层统一格式

Hermes 的统一处理

在 conversation_loop.py 第 2283–2334 行，Hermes 把所有 provider 的 reasoning 规范化到一个内部表示：

agent/conversation_loop.py:2283-2334 (摘要)
# 检测 reasoning 字段（多种格式）
reasoning = (
    msg.get("reasoning_content") or
    msg.get("reasoning") or
    ""
)

# Qwen 风格的内联 think block
content = msg.get("content") or ""
think_match = re.search(r"<think>(.*?)</think>", content, re.DOTALL)
if think_match:
    inline_reasoning = think_match.group(1).strip()
    content = re.sub(r"<think>.*?</think>", "", content, flags=re.DOTALL)
    if not reasoning:
        reasoning = inline_reasoning

# 把 reasoning 存进 assistant message — 下轮 echo back
if reasoning:
    msg["reasoning_content"] = reasoning

为什么有些 provider 要 echo back

Anthropic 和 DeepSeek 设计 reasoning 时，把它当作"模型的内部状态"。下一轮调用时，如果不把上次的 reasoning 原样回传，模型相当于"忘记了自己想过什么"，推理质量下降。所以协议要求 echo。

但 Hermes 处理 reasoning 还有一个陷阱——空 reasoning 消息：

# 危险消息：只有 reasoning，没 content，也没 tool_calls
{
    "role": "assistant",
    "reasoning_content": "我在想...",
    "content": None,
    "tool_calls": None
}

某些 provider（如 DeepSeek-Reasoner）流式中途可能产出这种"纯思考"消息。其他 provider 看到这种消息会拒绝——它们要求 assistant 必须有 content 或 tool_calls。 Hermes 检测后把这种消息直接丢掉，不入历史。

设计选择 Hermes 选择"丢掉"而不是"补 content"。补 content 等于伪造模型输出，污染历史。丢掉只损失下一轮的 reasoning 连续性——但比污染历史好得多。

4.6Provider Quirks：四个真实案例

即使大家都"OpenAI 兼容"，行为细节还是不一样。这里挑四个 Hermes 实际处理过的 quirk。

Quirk 1：DeepSeek V4+ 的 thinking flag

看 plugins/model-providers/deepseek/__init__.py：

plugins/model-providers/deepseek/__init__.py
class DeepSeekProfile(ProviderProfile):
    def build_api_kwargs_extras(self, *, model=None, **ctx):
        body = {}
        if _model_supports_thinking(model):
            # DeepSeek V4 默认 thinking on，会返回 reasoning_content。
            # 不传 extra_body.thinking 会触发"thinking content must echo back"错误。
            # 显式传一下消除歧义。
            body["extra_body"] = {"thinking": {"type": "enabled"}}
        return body

关键注释（写在源码里）：

DeepSeek's V4 family defaults to thinking-mode ON when extra_body.thinking is unset. The API then returns reasoning_content and starts enforcing the contract that subsequent turns echo it back; combined with how Hermes replays history this lands on HTTP 400 reasoning_content must be passed back error.
plugins/model-providers/deepseek/__init__.py 注释

读这条注释胜过任何文档——它是真实踩坑后写下的 lesson learned。

Quirk 2：某些 provider 拒绝重复 tool name

DeepSeek、Xiaomi MiMo、Kimi 这些 provider 对 tools 参数有严格 unique 检查—— 如果 tools 列表里两个 tool 同名（哪怕来自不同 namespace），直接 HTTP 400。

问题来源：Hermes 在长会话里会缓存 tool definitions。如果 cache 里残留了一份被覆盖的旧 schema 加新 schema，发送时就会出现重名。Hermes 的对策：

# model_tools.py 的注释（issue #17335）
# Cache the freshly-computed list, but hand callers a shallow copy so
# downstream mutations (e.g. run_agent appending memory/LCM tool
# schemas to self.tools) don't poison the cache. Without this, a
# long-lived Gateway process accumulates duplicate tool names across
# agent inits and providers that enforce unique tool names
# (DeepSeek, Xiaomi MiMo, Moonshot Kimi) reject the request with
# HTTP 400. Mirrors the cache-hit path above. (issue #17335)

解法是"返回 shallow copy"——cache 内部的 list 不被外部修改污染。一个看似无聊的细节，背后是一次真实 outage 的修复。

Quirk 3：Codex Responses API（OpenAI 的新协议）

2024 年 OpenAI 推出 Responses API（取代 Assistants API）。它的字段名和经典 Chat Completions 不同：用 input 而非 messages， output 而非 choices。OpenAI 推荐 GPT-4.1+ 用 Responses API。

Hermes 通过 api_mode 参数支持两种协议：

if agent.api_mode == "chat_completions":
    response = client.chat.completions.create(...)
elif agent.api_mode == "codex_responses":
    response = client.responses.create(...)
    response = _normalize_codex_response(response)   # 转回经典格式

对上层（conversation_loop）完全透明——它永远只看到经典 Chat Completions 格式。 Provider profile 负责协议转换。

Quirk 4：Gemini 的 tool_choice 限制

Gemini 通过 OpenRouter 暴露的 OpenAI 兼容 endpoint，不支持 tool_choice="required"。你必须传 tool_choice="auto" 或者干脆不传。Hermes 在 GeminiProfile 里覆盖了这个参数。

结论：所有 provider quirks 都通过 ProviderProfile 子类的方法重载处理，主循环代码不需要 if-else 满天飞。第 12 章会详谈这个设计。

4.7流式输出的协议

流式 (streaming) 是 SSE (Server-Sent Events) 协议。每个 chunk 是 OpenAI 的 delta 格式：

# 第 1 个 chunk
data: {"choices":[{"delta":{"role":"assistant"}}]}

# 中间 chunks — 增量 content
data: {"choices":[{"delta":{"content":"Python"}}]}
data: {"choices":[{"delta":{"content":" 3.13"}}]}
data: {"choices":[{"delta":{"content":" 于"}}]}

# Tool call 也可以流式（注意 arguments 是字符串拼接）
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"c1",
                                              "function":{"name":"web_search"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
                                              "function":{"arguments":'{"q'}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
                                              "function":{"arguments":'":"Py'}}]}}]}
# ... 最后
data: {"choices":[{"finish_reason":"tool_calls"}]}
data: [DONE]

客户端的任务：

累积 content 字符串，每次更新 UI。
累积每个 tool_call 的 name + arguments。tool_call 用 index 标识，跨 chunk 拼起来。
累积 reasoning_content（如果有）。
看到 finish_reason 后，从累积器组装出完整 message 对象。

OpenAI SDK 帮你做了这事。你只需要遍历 response（生成器）：

response = client.chat.completions.create(messages=..., stream=True)
for chunk in response:
    delta = chunk.choices[0].delta
    if delta.content:
        stream_callback(delta.content)
    # tool_calls 拼接由 SDK 内部处理，结束后从 response.choices[0].message 拿完整版

但要做可中断流式，你得手动迭代 chunk + 在每个 chunk 检查 interrupt flag。 Hermes 的 _interruptible_streaming_api_call 干的就是这个。

4.8错误处理与重试

Provider 不稳定是常态。Rate limit、500、网络抖动、provider 重启……Hermes 用 tenacity==9.1.4 库做自动重试：

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=2, max=60),
    retry=retry_if_exception_type(_retryable),
)
def api_call_with_retry(...):
    ...

什么算 retryable？

错误	HTTP 状态	Retry？
Rate limit	429	是，指数退避
Server error	500/502/503/504	是
Timeout	—	是
Bad request (schema)	400	否，retry 还是错
Auth	401/403	否
Model overloaded	529 (Anthropic)	是

Fallback Model

即使 retry 也救不了，Hermes 支持fallback model——一个模型挂了自动切到另一个。 Agent 构造时接受 fallback_model 参数。当主 model 多次失败时，剩余循环用 fallback。

实用建议为生产 Agent 配 fallback 是基本动作。常用组合：主用 Claude Opus（贵但强）， fallback 用 Claude Sonnet 或 GPT-4o（稍便宜但稳）。当 Anthropic API 整体 outage 时，Hermes 会自动切到非 Anthropic 模型继续。

4.92025–2026 协议演进

OpenAI 兼容协议是 2023 年成型的。这两年又有几个值得注意的演进：

① OpenAI Responses API（替代 Chat Completions）

2024 年底 OpenAI 推 Responses API 作为 Chat Completions 的继任者。字段换了壳：messages → input，choices → output。 Responses API 在语义上是 stateful（OpenAI 服务端保存 conversation thread），但语义上也可以 stateless 用（每次传完整 input）。

到 2026 年 5 月：GPT-5 系列默认走 Responses API；Chat Completions 仍保留但被标"deprecated for new features"。 Hermes 通过 api_mode = "codex_responses" 支持两种协议，上层 conversation_loop 不感知。详见第 12 章 CodexProfile。

② Interleaved Thinking（Anthropic 2025+）

Anthropic 在 2025 年中给 Claude 加了 interleaved thinking： reasoning content 可以穿插在 tool_calls 之间，不必先全想完再调工具。例如：

thinking: "我要先看 README..."
tool_call: read_file(README.md)
[结果回到模型]
thinking: "啊 OK，看来要改 main.py"
tool_call: read_file(main.py)
...

这让 reasoning 更"贴近 action"，但对 client 实现增加了挑战—— 你的 streaming 处理逻辑必须能正确分流：把 thinking 部分显示成"思考"， tool_call 显示成"工具调用"，最终 content 显示成"答复"。 Hermes 的 reasoning content 处理（第 4.5 节）已经 handle 这种 interleaved 格式。

③ Tool Result Clearing（2025 Anthropic API 新功能）

之前 tool result 一直占 context window 不放。Anthropic 2025 给 API 加了 clear_tool_results_above 字段——你可以告诉它"id=msg_abc 之前的所有 tool result 请丢掉，留 assistant 总结"。这是 Anthropic Context Engineering 三招之一（第 13 章详谈）。

④ Multi-modal 工具调用结果

2025 起 OpenAI/Anthropic 都允许工具结果不只是文本——可以返回图片、PDF、音频。典型场景：web_extract 工具返回的不是网页文本，而是screenshot 截图，让 vision-capable model 直接"看"。Hermes 的 vision_analyze、browser_snapshot 工具就走这条路径。

⑤ Computer Use 工具的标准化

Anthropic 2024.10 发的 Computer Use 工具 schema（action: screenshot | click | type | scroll）到 2026 已经被 OpenAI Operator、Google Mariner 沿用，事实成标准。Hermes 的 computer_use 工具用同一 schema，可在 Claude / GPT-5 vision 两边切换。

4.10代码深挖:跨 provider reasoning + 参数类型 coercion

4.10.1 copy_reasoning_content_for_api:5 层 fallback

第 4.5 节我们说不同 provider 的 reasoning 字段不一样。看 Hermes 在回放历史时怎么处理"poisoned history"——历史里某 turn 是 MiniMax 写的,下一 turn 用 DeepSeek。 DeepSeek 不能接受 MiniMax 的 reasoning 文本(可能 schema 不兼容),但又要求"必须有 reasoning_content":

agent/agent_runtime_helpers.py:1834-1903 (摘要)
def copy_reasoning_content_for_api(agent, source_msg, api_msg):
    """Copy provider-facing reasoning fields onto an API replay message."""
    if source_msg.get("role") != "assistant":
        return

    # 层 1:已有 reasoning_content,原样保留
    # 边界:DeepSeek V4 Pro 拒绝空字符串,把 "" → " "(空格)
    existing = source_msg.get("reasoning_content")
    if isinstance(existing, str):
        if existing == "" and agent._needs_thinking_reasoning_pad():
            api_msg["reasoning_content"] = " "
        else:
            api_msg["reasoning_content"] = existing
        return

    needs_thinking_pad = agent._needs_thinking_reasoning_pad()

    # 层 2:跨 provider poisoned history (#15748):
    # DeepSeek/Kimi + 有 tool_calls + 有 'reasoning' 字段但无 reasoning_content
    # → 这个 reasoning 来自别的 provider(MiniMax 等)
    # 注入一个空格满足 schema,但不泄漏其他 provider 的 chain of thought
    normalized_reasoning = source_msg.get("reasoning")
    if (needs_thinking_pad
        and source_msg.get("tool_calls")
        and isinstance(normalized_reasoning, str)
        and normalized_reasoning):
        api_msg["reasoning_content"] = " "
        return

    # 层 3:正常 session, 'reasoning' → 'reasoning_content'
    if isinstance(normalized_reasoning, str) and normalized_reasoning:
        api_msg["reasoning_content"] = normalized_reasoning
        return

    # 层 4:DeepSeek/Kimi thinking 模式 — 没显式 reasoning 也要兜底
    if needs_thinking_pad:
        api_msg["reasoning_content"] = " "
        return

    # 层 5:reasoning_content 是 non-string(比如 None) — 不发给 API
    api_msg.pop("reasoning_content", None)

为什么是空格,不是空字符串 DeepSeek V4 Pro 实测拒绝 reasoning_content: ""(HTTP 400), 但接受 reasoning_content: " "(单空格)。 issue #17341 记录了这个 quirk。 "安静的细节"就是这种——你跑全集成测试碰不到,只有真正切到 DeepSeek 才炸。这类知识没有 docs,只有 GitHub issue 历史。

4.10.2 coerce_tool_args:string → typed

开源模型(DeepSeek、Qwen、GLM)经常把 {"max_results": 5} 写成 {"max_results": "5"}。直接传给 handler 类型不对会出错。Hermes 用 schema-aware coercion:

model_tools.py:545-626 (摘要)
def coerce_tool_args(tool_name, args):
    """Coerce tool call arguments to match their JSON Schema types.

    LLM 频繁返回数字字符串("42" 而不是 42)和布尔字符串("true").
    对照 schema 把字符串安全转换。失败保留原值。"""
    if not args or not isinstance(args, dict):
        return args

    schema = registry.get_schema(tool_name)
    properties = (schema.get("parameters") or {}).get("properties")
    if not properties:
        return args

    for key, value in list(args.items()):
        prop_schema = properties.get(key)
        if not prop_schema:
            continue
        expected = prop_schema.get("type")

        # 期望 array 但拿到 scalar — 自动包成单元素 list
        # DeepSeek 经常 {"urls": "https://a.com"} 而 schema 要 ["https://a.com"]
        if expected == "array" and value is not None and not isinstance(value, (list, tuple)):
            if isinstance(value, str):
                coerced = _coerce_value(value, expected, schema=prop_schema)
                if coerced is not value:
                    args[key] = coerced
                    continue
                # 看起来是 JSON 数组字符串但解析失败 — 警告
                if value.strip().startswith("["):
                    logger.warning("%s.%s looks like JSON array string ...", tool_name, key)
                args[key] = [value]   # fallback:包成单元素 list
                continue
            args[key] = [value]
            logger.info("wrapped bare %s in list", type(value).__name__)
            continue

        # 标量 string → expected type 转换
        if not isinstance(value, str):
            continue
        coerced = _coerce_value(value, expected, schema=prop_schema)
        if coerced is not value:
            args[key] = coerced

    return args

"silent repair"的智慧:不让用户看到错误,直接修。但每次修都logger.info 记录——便于 dev 调试时看"哦 Qwen 总把 URL 当 string 传"。日志是"修了但你该知道"的妥协方案。

4.11本章带走的

OpenAI Chat Completions 协议是事实标准。Hermes 不写专属 client，而是把所有 provider 抽象为"OpenAI 兼容 endpoint"。
四种 message role 的字段和约束必须背熟： system / user / assistant (可带 tool_calls/reasoning) / tool (必须配 tool_call_id)。
tool_calls 里 arguments 是字符串，不是对象。原因是流式拼接。
Reasoning content 多家格式不一。Hermes 在中间层统一规范化。
Provider quirks 通过 Profile 子类隔离，不污染主循环。
流式协议是 SSE delta。可中断流式要手动迭代 chunk 检查 flag。
Retry 策略要分类：429/5xx retry，400/401 不 retry。Fallback model 是生产基本动作。

章末练习

Easy 为什么 tool_calls[i].function.arguments 设计成字符串而不是对象？用 30 字解释。
Easy 列出 messages 列表中哪种结构会违反协议、被 OpenAI/Anthropic 直接拒：
- (a) 没有 system message
- (b) 有 assistant.tool_calls 但下一条不是 tool
- (c) tool message 的 tool_call_id 在前面找不到匹配
- (d) 连续两条 user message 中间没 assistant
- (e) tool 消息出现在第一条
Medium 写一段代码，模拟流式 chunk 累积——给定一个 chunk 生成器，输出完整的 {content, tool_calls}。考虑 tool_calls 的 index 跨 chunk 拼接。
Medium 读 plugins/model-providers/anthropic/__init__.py（在 Hermes 仓库里）。列出 AnthropicProfile 重载了哪些方法、分别处理什么 quirk。
Hard DeepSeek R1 的 reasoning_content 在 历史里必须 echo back，但你压缩历史时，应该保留 reasoning_content 吗？为什么？（提示：考虑压缩后下一轮 API 调用谁会读这些消息）
Hard 设计一个"双 provider 同时跑"的策略：把每个 user message 同时发给 Claude 和 GPT-4，谁先返回用谁，另一个 cancel。讨论：消息历史如何处理（两个 model 看到的历史可能不同）？ tool_call_id 如何对齐？

← 上一章

第 3 章 · run_conversation 解剖

第 5 章 · System Prompt 与缓存经济