Explorer
Composer un thread

Thread Easy

Votre partenaire tout-en-un pour les threads Twitter

© 2025 Thread Easy All Rights Reserved.

Explorer

Newest first — browse tweet threads

Author handle

From date

To date

Blur thumbnails

Keep on to blur preview images; turn off to show them clearly

question from my current experiments: how good can we get a coding agent harness by obsessively choosing “slightly better tooling” across every important dimension of the harness

the biggest area that drives agent perf is model intelligence (see Opus 4.5). but what about every tooling decision we make in the harness. if every tool is X% better, how much more perf do we unlock on Task

there’s some key primitives that have become defaults across many full feature coding agents:
- good local search (ex: recent growth of “better” search with warpgrep, mgrep, etc)
- good web search, often this tool is agentic itself where we call a websearch+agent endpoint to better prepares data (ex: @p0)
- good context management opinions baked in such as Anthropic’s Tool Search Tool, Context Editing, better compaction + filesystem organization instructions to offload and reload context as needed
- well tuned default subagents for common tasks like planning or reviewing
- etc

i’m pretty excited about a future where:
1. A great baseline harness is a delivery mechanism for builders to build on (think Claude Agent SDK and other harnesses)

2. Builders inject a set of capabilities that plug in to the harness. I’m pretty bullish on Skills as a distribution mechanism rn, we also have tools/MCPs which can live in Skills

3. Builders optimize the harness prompts to work well with the set of skills/tools exposed in the harness

4. Builders iteratively update the harness from evals

in this world there’s a lot of value for the:
- models that drive harnesses
- full agent products that curated a great model+harness pair
- the tooling/capability layer that plugs into the harness and make money every time that capability is invoked

question from my current experiments: how good can we get a coding agent harness by obsessively choosing “slightly better tooling” across every important dimension of the harness the biggest area that drives agent perf is model intelligence (see Opus 4.5). but what about every tooling decision we make in the harness. if every tool is X% better, how much more perf do we unlock on Task there’s some key primitives that have become defaults across many full feature coding agents: - good local search (ex: recent growth of “better” search with warpgrep, mgrep, etc) - good web search, often this tool is agentic itself where we call a websearch+agent endpoint to better prepares data (ex: @p0) - good context management opinions baked in such as Anthropic’s Tool Search Tool, Context Editing, better compaction + filesystem organization instructions to offload and reload context as needed - well tuned default subagents for common tasks like planning or reviewing - etc i’m pretty excited about a future where: 1. A great baseline harness is a delivery mechanism for builders to build on (think Claude Agent SDK and other harnesses) 2. Builders inject a set of capabilities that plug in to the harness. I’m pretty bullish on Skills as a distribution mechanism rn, we also have tools/MCPs which can live in Skills 3. Builders optimize the harness prompts to work well with the set of skills/tools exposed in the harness 4. Builders iteratively update the harness from evals in this world there’s a lot of value for the: - models that drive harnesses - full agent products that curated a great model+harness pair - the tooling/capability layer that plugs into the harness and make money every time that capability is invoked

building agents and harnesses, prev @awscloud, phd cs @ temple

Tue Dec 02 20:44:21

We just helped a founder exit for $1,250,000 on @acquiredotcom.

Business was a bootstrapped SaaS.
Multiple 4.7x profit.
Took about 65 days from listing to close.

Another life changing outcome.

We just helped a founder exit for $1,250,000 on @acquiredotcom. Business was a bootstrapped SaaS. Multiple 4.7x profit. Took about 65 days from listing to close. Another life changing outcome.

Founder and CEO of @acquiredotcom. https://t.co/wRMIssDmhl has helped 100s of startups get acquired and facilitated $500m+ in closed deals.

Andrew Gazdecki

Tue Dec 02 20:36:21

Wenfeng must pull a Zucc and serve him some 清蒸鱼 or 糖水 or mooncakes or whatever the hell Liangs eat in Mililing village
this would be such kino

Wenfeng must pull a Zucc and serve him some 清蒸鱼 or 糖水 or mooncakes or whatever the hell Liangs eat in Mililing village this would be such kino

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Tue Dec 02 20:35:03

Just got off the phone with a girl from Brazil.

She saw a problem while watching TV with her mom.

She started vibe coding an app to solve that problem.

In 6 months, she got 10,000+ users.

And made $100K USD in sales.

(that's a lot in Brazil, esp in 6 months)

Now she's launching an iOS app (where the users actually are), and expanding the business model.

The funny thing is, what she built is just a Brazil-version of a very popular app in the US - that didn't work in Brazil.

There is a HUGE opportunity to take successful US/English apps, and localize them to different countries.

Almost every day, I'm talking to founders doing this.

I'm gonna get her on the channel in the next month, lmk what questions you have about this!

Just got off the phone with a girl from Brazil. She saw a problem while watching TV with her mom. She started vibe coding an app to solve that problem. In 6 months, she got 10,000+ users. And made $100K USD in sales. (that's a lot in Brazil, esp in 6 months) Now she's launching an iOS app (where the users actually are), and expanding the business model. The funny thing is, what she built is just a Brazil-version of a very popular app in the US - that didn't work in Brazil. There is a HUGE opportunity to take successful US/English apps, and localize them to different countries. Almost every day, I'm talking to founders doing this. I'm gonna get her on the channel in the next month, lmk what questions you have about this!

https://t.co/zSf5Z2H78P https://t.co/ryMAyS77qn https://t.co/Gm6gdHaLgp On a mission to inspire 1B people to build stuff!

Tue Dec 02 20:30:41

> without mentioning it once
V3.2 & Speciale are evaluated against Sonnet, 5-High and 3.0-Pro but not Opus or 5.1. I don't think they just wanted to claim fleeting primacy, we see that those don't change the picture much.
It's been finished at least over a week ago.

> without mentioning it once V3.2 & Speciale are evaluated against Sonnet, 5-High and 3.0-Pro but not Opus or 5.1. I don't think they just wanted to claim fleeting primacy, we see that those don't change the picture much. It's been finished at least over a week ago.

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Tue Dec 02 20:27:24

This is solid - I would add that at some point post-paths Aaron mentions, it’s better to cold inbound than get an intro

An attempted intro from someone perceived as low quality can (unfortunately) impact how your co is perceived

I’ve invested in cold inbounds (and outbound!)

This is solid - I would add that at some point post-paths Aaron mentions, it’s better to cold inbound than get an intro An attempted intro from someone perceived as low quality can (unfortunately) impact how your co is perceived I’ve invested in cold inbounds (and outbound!)

Partner @a16z and twin to @venturetwins | Investor in @gammaapp, @happyrobot_ai, @krea_ai, @tomaauto, @partiful, Salient, @scribenoteinc & more

Tue Dec 02 20:20:37

Previous
1
1772
1773
1774
5634
Next