每天学习一个Agent - WebSailor

2025-07-14 约 1786 字预计阅读 4 分钟

简介

项目地址：https://github.com/Alibaba-NLP/WebAgent/tree/main/WebSailor

项目介绍：阿里巴巴开源的 WebAgent 是一个用于信息搜索的项目，整合了 WebSailor、WebDancer 和 WebWalker 等多个组件。其中，WebSailor 是一个强大的网络智能体，具备复杂的推理和检索能力，能够在开放网页环境中自主跳转页面、查找信息、整合多源线索并完成推理

解析

Agent体系结构就是一个简单的React Agent

定义一个最大LLM调用次数，只要不超过最大次数或者得到最终答案就循环问大模型
如果LLM返回的结果需要使用工具，代码调用相关工具方法，并将工具函数返回的结果放入到messages给大模型
如果token数超过模型最大上下文长度，直接让模型根据已有信息给出最终结果

1. 调用模型生成

调用模型生成提示词结构

System:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


You are a Web Information Seeking Master. Your task is to thoroughly seek the internet for information and provide accurate answers to questions. No matter how complex the query, you will not give up until you find the corresponding information.

As you proceed, adhere to the following principles:

1. **Persistent Actions for Answers**: You will engage in many interactions, delving deeply into the topic to explore all possible aspects until a satisfactory answer is found.

2. **Repeated Verification**: Before presenting a Final Answer, you will **cross-check** and **validate the information** you've gathered to confirm its accuracy and reliability.

3. **Attention to Detail**: You will carefully analyze each information source to ensure that all data is current, relevant, and from credible origins.

Current date: {datetime.now().strftime("%Y-%m-%d")}

角色定义：一如既往的玄学部分，“你是一个web信息搜集大师，不管问题多困难你都不放弃…”
附加信息：当前时间等附加信息
任务指令：这部分也比较诡异，一般在这一部分中会讲具体的详细的要求，但是WebSailor的这部分很抽象，告诉了llm应该怎么样但是又不具体说怎么做，应该是针对这些指令进行过特别的微调与对齐
- 你要多次互动持续行动以找到最终满意答案
- 在提供最终答案前你将交叉检查、验证信息确保信息的准确性和可靠性*（这部分说了但是没有说怎么做，也没有其他模型给交叉检查，也没说怎么验证）*
- 你仔细检查信息来源确保是最新的相关的和可信的*（一样的说了要什么但是很抽象，尤其是可信来源这点，没有告诉过模型什么是可信的）*

User:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67


A conversation between User and Assistant. The user asks a question, and the assistant solves it by calling one or more of the following tools.
<tools>
{
  "name": "search",
  "description": "Performs batched web searches: supply an array 'query'; the tool retrieves the top 10 results for each query in one call.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "array",
        "items": {
          "type": "string"
        },
        "description": "Array of query strings. Include multiple complementary search queries in a single call."
      }
    },
    "required": [
      "query"
    ]
    }
},
{
  "name": "visit",
    "description": "Visit webpage(s) and return the summary of the content.",
    "parameters": {
        "type": "object",
        "properties": {
            "url": {
                "type": "array",
                "items": {"type": "string"},
                "description": "The URL(s) of the webpage(s) to visit. Can be a single URL or an array of URLs."
            },
            "goal": {
                "type": "string",
                "description": "The specific information goal for visiting webpage(s)."
            }
        },
        "required": [
            "url",
            "goal"
        ]
    }
}
</tools>

The assistant starts with one or more cycles of (thinking about which tool to use -> performing tool call -> waiting for tool response), and ends with (thinking about the answer -> answer of the question). The thinking processes, tool calls, tool responses, and answer are enclosed within their tags. There could be multiple thinking processes, tool calls, tool call parameters and tool response parameters.

Example response:
<think> thinking process here </think>
<tool_call>
{"name": "tool name here", "arguments": {"parameter name here": parameter value here, "another parameter name here": another parameter value here, ...}}
</tool_call>
<tool_response>
tool_response here
</tool_response>
<think> thinking process here </think>
<tool_call>
{"name": "another tool name here", "arguments": {...}}
</tool_call>
<tool_response>
tool_response here
</tool_response>
(more thinking processes, tool calls and tool responses here)
<think> thinking process here </think>
<answer> answer here </answer>

User: 

任务描述：用户和助手的对话，助手通过调用工具解决用户问题
工具描述：工具的元数据
任务要求：这部分详细讲了Agent如何工作，通过多次thinking about which tool to user -> performing tool call -> waiting for tool response的循环来思考问题，最终通过thinking about the answer -> answer of the question来结束思考输出结果
任务示例：一个案例的思考过程，也是个思考格式的示例，没有具体的任务内容，强调的是返回格式而不是参考思考过程

4. 要求模型给结果

如果Token数超过模型上下文限制，Agent会直接将最后一次的message改成让llm直接返回结果的prompt，做最后的尝试让模型给出结果

1

You have now reached the maximum context length you can handle. You should stop making tool calls and, based on all the information above, think again and provide what you consider the most likely answer in the following format:<think>your final thinking</think>\n<answer>your answer</answer>

总结

WebSailor的Agent设计没有出奇之处：WebSailor的Agent设计是一个通用的ReAct Agent，它在BrowseComp等困难思考评测中获得SOTA的成绩应该主要在其模型的训练过程
模型训练创新：根据文档及论文中提到的，其在模型训练过程中主要的创新应该包括
- 通过结构化采样与信息模糊技术生成高不确定性任务数据集
- RFT冷启动：在RL之前，通过RFT的微调，让模型快速掌握基本工具调用和推理能力
- DUPO算法：Duplicating Sampling Policy Optimization
模型应该对特定提示词进行高质量数据微调：根据其提示词的抽象性，考虑可能模型训练过程中增加了特定样式的强化，使得模型可以理解其抽象描述

目录

每天学习一个Agent - WebSailor

简介

解析

1. 调用模型生成

4. 要求模型给结果

总结