Agent Web Search (Outline)

Agent 為什麼需要 Web Search？

LLM 的知識有 training cutoff，無法得知最新資訊。
Web Search 讓 Agent 能在推理時動態拉取即時資料。

使用者問「最新的 GPT-5 規格是什麼？」
    ↓
LLM 本身不知道（cutoff 之後的資訊）
    ↓
呼叫 Web Search tool → 拿到即時搜尋結果
    ↓
結果放進 context → LLM 生成回答

這個模式稱為 Tool-Augmented Generation，是 RAG 的即時網路版本

三種主流方案

工具	定位	詳細筆記
Tavily	專為 LLM 設計的官方商用 Search API	Tavily
SearXNG	開源自架 metasearch engine	SearXNG
ddgs	非官方 Python metasearch library	DuckDuckGo

完整比較

比較	ddgs	Tavily	SearXNG
官方 API	❌ 非官方	✅ 官方	✅ 自架
API Key	❌	✅（`tvly-...`）	❌
費用	免費	Free 1,000 credits/月	免費（自架成本）
AI 合成回答	❌	✅ `include_answer`	❌
結果格式	title + href + body	摘要/chunks + score	title + url + content + engine
Rate limit 風險	高	低	中（取決於上游）
LLM 優化	低	高	低
授權	MIT	商用 API	AGPL-3.0
穩定性	低（非官方）	高（SLA）	中（上游不穩定）
適合情境	快速 prototype	正式 RAG / Agent	隱私優先、自架

各工具核心特性

Tavily

/search → 摘要 / chunks，可開 include_answer
search_depth: ultra-fast / fast / basic / advanced（1–2 credits）
topic: general / news / finance
/extract → 完整網頁正文（markdown）
/research → 多輪迭代搜尋，回傳完整 report

SearXNG

GET /search?q=...&format=json
同時查詢 251 個上游來源（Google / Bing / DDG / Wikipedia...）
自架 Docker，只改 settings.yml
回傳：title + url + content + engine + score

ddgs

from ddgs import DDGS
DDGS().text(query=..., timelimit="w", max_results=5)
方法：text / news / images / videos / books / extract
回傳：title + href + body

決策流程

需要即時網路資料
    ↓
是否在乎隱私 / 不想用第三方 API？
    ├─ 是 → SearXNG（自架）
    └─ 否
         ↓
    是否是正式產品 / 需要穩定性？
         ├─ 是 → Tavily（Official API）
         └─ 否 → ddgs（prototype 快速驗證）

常見使用模式

RAG with Web Search

# Tavily 版
from tavily import TavilyClient

def retrieve(query: str) -> list[str]:
    results = TavilyClient(api_key="tvly-...").search(
        query=query, search_depth="basic", max_results=5
    )
    return [r["content"] for r in results["results"]]

context = "\n".join(retrieve("HNSW 的時間複雜度"))
prompt = f"根據以下資料回答問題：\n{context}\n\n問題：HNSW 的時間複雜度？"

LangChain Agent with Web Search Tool

from langchain_tavily import TavilySearch
from langchain_community.tools import DuckDuckGoSearchRun

# 選一個掛進 agent
tools = [TavilySearch(max_results=3)]
# 或
tools = [DuckDuckGoSearchRun()]

SearXNG HTTP call

import requests

def searxng_search(query: str, n: int = 5) -> list[dict]:
    resp = requests.get(
        "http://localhost:8080/search",
        params={"q": query, "format": "json", "pageno": 1}
    )
    return resp.json()["results"][:n]

三種主流方案

完整比較

各工具核心特性

決策流程

常見使用模式

Related Notes