X (Twitter)

[Anthropic Project Blog Interpretation] Advanced Tool Usage Features: The combination of three technologies—tool search tool, programmatic tool invocation, and tool usage examples—significantly reduces token consumption, makes tool selection clearer, and makes complex invocations more accurate. Anthropic recently launched advanced tool use on the Claude developer platform, enabling AI agents to efficiently handle hundreds or even thousands of tools without being limited by context windows. Imagine an agent needing to operate IDEs, Git, Slack, GitHub, Jira, or databases simultaneously—traditionally, tool definitions consume massive amounts of tokens, leading to context bloat, incorrect tool selection, or invocation delays. These new features significantly improve the agent's usability and scalability through dynamic loading, code orchestration, and example guidance. https://t.co/RiM4CuLtgp Core Challenges and Coping Strategies: Building a reliable tool usage system faces three major pain points: First, token consumption is too high—for example, pulling tool definitions from multiple services (such as GitHub and Slack) could instantly consume 50,000+ tokens. Second, the choice of tools is inaccurate—tools with similar names (such as notification-send-user and notification-send-channel) are easily confused. Third, the calling pattern is ambiguous—although the JSON pattern standardizes parameters, it cannot intuitively display complex formats, such as dates or nested objects. Anthropic's strategy is "delay and intelligence": instead of loading all tools at once, it discovers and invokes them on demand; it uses code instead of natural language to coordinate multi-step operations, reducing inference rounds; and it clarifies usage through examples. These methods essentially shift tool usage from static description to dynamic execution, helping agents implement complex workflows in resource-constrained environments. Three key technologies 1. Tool Search Tool This is a "meta-tool" that allows agents to search for and load relevant tools at runtime, rather than preloading all definitions. When the tool flag `defer_loading: true` is set, only searched tools and a few core tools enter the initial context. Agents can dynamically pull tools by name or description; for example, when querying GitHub tasks, only `github.createPullRequest` is loaded. Advantages: Token savings of up to 85% (from 77K to 8.7K), and significant accuracy improvements (e.g., Claude Opus 4 from 49% to 74%). Simple implementation: Support for batch lazy loading of MCP is achieved by adding a search configuration to the tool array. This allows agents to efficiently navigate a large tool library, acting like a "smart index." 2. Programmatic Tool Calling Instead of calling tools one by one in natural language, the agent generates Python code to perform multi-tool coordination in a sandbox environment. Tools need to be marked with allowed_callers: ["code_execution_20250825"], and Claude outputs code snippets containing loops, conditionals, and parallel execution (such as asyncio.gather). Example: When checking for budget overruns, the code can retrieve team member, budget, and expenditure data in parallel, and only return the final result (such as a list of overruns) to the agent, avoiding intermediate data from polluting the context. Advantages: Token count reduced by 37% (from 43,588 to 27,297), latency reduced (no need for multiple rounds of inference), and accuracy increased from 25.6% to 28.5% in knowledge retrieval tasks. This is particularly suitable for handling large tables or API links, such as batch data analysis in Claude for Excel. 3. Tool Use Examples Supplement the JSON pattern by providing input examples to demonstrate the actual call patterns. For example, in the create_ticket tool, list the date format (YYYY-MM-DD), nested objects (such as reporter), and optional parameters (for emergency upgrades). Each tool can include 2-3 variant examples. Advantages: Accuracy with complex parameters jumps from 72% to 90%, especially with ID formats or parameter associations. This is like giving the agent a "user manual," allowing it to quickly grasp the implicit rules. Experimental Results and Outlook: Internal benchmark tests show improvements in these features on both MCP and GIA benchmarks: context retention reaches 85%, and overall accuracy improves by an average of 10-20%. For example, when dealing with large toolsets, Claude Opus 4.5's performance increases from 79.5% to 88.1%. In practical applications, it has already enabled agents to seamlessly integrate with scenarios such as Excel or Jira.

Thread by meng shao (@shao__meng)

Author details

Thread content