LogoThread Easy
  • Explorar
  • Criar thread
LogoThread Easy

Seu parceiro completo para threads do Twitter

© 2025 Thread Easy All Rights Reserved.

Explorar

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

RT @samarthg1911: @neuranne 's tiny experiment is among the best books i have read this year. Just read about learning in public. Turns out…

RT @samarthg1911: @neuranne 's tiny experiment is among the best books i have read this year. Just read about learning in public. Turns out…

hypercurious :) founder @ness_labs • neuroscientist @KingsIoPPN • author of Tiny Experiments • personal science, systematic curiosity, experimental thinking ꩜⋆✦

avatar for Anne-Laure Le Cunff
Anne-Laure Le Cunff
Mon Dec 15 16:54:14
open interesting question on models adapting to harnesses + thoughts on something like a “HarnessBench”

1. are smarter models better or worse at transferring to new harnesses?  Saw recent results that Opus in CC Harness had much bigger jump than Sonnet in CC Harness

2. What’s the gap in in-context adaptation to a new harness vs finetuning

HarnessBench:
- this idea exists somewhat with how Terminal Bench reports results but basically we need more evals around harnesses not just models
- we don’t have good metrics on model generalization across harnesses
- HarnessBench basically = an Eval of diverse tasks where we measure the mean performance of a harness across a basket of fixed models.  We also get per model+harness bench scores from this ofc
- I think it’s a valuable + fun question to explore which helps us peak into what about a harness helps some models and not others, and what’s generally just “good” to have in a harness

we need to hill climb harnesses + also get some interpretability.  and also it’s very possible that the RL finetuning is everything

open interesting question on models adapting to harnesses + thoughts on something like a “HarnessBench” 1. are smarter models better or worse at transferring to new harnesses? Saw recent results that Opus in CC Harness had much bigger jump than Sonnet in CC Harness 2. What’s the gap in in-context adaptation to a new harness vs finetuning HarnessBench: - this idea exists somewhat with how Terminal Bench reports results but basically we need more evals around harnesses not just models - we don’t have good metrics on model generalization across harnesses - HarnessBench basically = an Eval of diverse tasks where we measure the mean performance of a harness across a basket of fixed models. We also get per model+harness bench scores from this ofc - I think it’s a valuable + fun question to explore which helps us peak into what about a harness helps some models and not others, and what’s generally just “good” to have in a harness we need to hill climb harnesses + also get some interpretability. and also it’s very possible that the RL finetuning is everything

building agents and harnesses, prev @awscloud, phd cs @ temple

avatar for Viv
Viv
Mon Dec 15 16:54:11
@a16z @MireloAI @appenz @cjsimongabriel @flwenz We're excited to partner with our friends at @IndexVentures to lead Mirelo's $41M seed round.

More about our investment on the @a16z blog ⬇️

https://t.co/4GdXFJmugy

@a16z @MireloAI @appenz @cjsimongabriel @flwenz We're excited to partner with our friends at @IndexVentures to lead Mirelo's $41M seed round. More about our investment on the @a16z blog ⬇️ https://t.co/4GdXFJmugy

Partner @a16z AI 🤖 and twin to @omooretweets | Investor in @elevenlabsio, @krea_ai, @bfl_ml, @hedra_labs, @wabi, @WaveFormsAI, @ViggleAI, @MireloAI

avatar for Justine Moore
Justine Moore
Mon Dec 15 16:54:00
RT @gkxspace: Google Antigravity,不只是写代码,很多工作都能用上。

你只需要掌握它的 3 个核心:

1) Artifacts(可验证的输出)
它会产出你看得懂、能复查的东西:
- 任务清单 / 计划
- 文档 / 图 / 截图 / 录屏
-…

RT @gkxspace: Google Antigravity,不只是写代码,很多工作都能用上。 你只需要掌握它的 3 个核心: 1) Artifacts(可验证的输出) 它会产出你看得懂、能复查的东西: - 任务清单 / 计划 - 文档 / 图 / 截图 / 录屏 -…

Prompt Engineer, dedicated to learning and disseminating knowledge about AI, software engineering, and engineering management.

avatar for 宝玉
宝玉
Mon Dec 15 16:53:23
@a16z @MireloAI @appenz @cjsimongabriel @flwenz On Mirelo's platform, you can upload a video and get synced sound effects (and music) generated by AI.

This is obviously great for AI videos - but it's also SO good at "real" filmed footage.

Take this tennis video - I uploaded a muted version, Mirelo made the sounds 🤯

@a16z @MireloAI @appenz @cjsimongabriel @flwenz On Mirelo's platform, you can upload a video and get synced sound effects (and music) generated by AI. This is obviously great for AI videos - but it's also SO good at "real" filmed footage. Take this tennis video - I uploaded a muted version, Mirelo made the sounds 🤯

@a16z @MireloAI @appenz @cjsimongabriel @flwenz We're excited to partner with our friends at @IndexVentures to lead Mirelo's $41M seed round. More about our investment on the @a16z blog ⬇️ https://t.co/4GdXFJmugy

avatar for Justine Moore
Justine Moore
Mon Dec 15 16:47:33
Thank you @andrewrsorkin for the time this morning on @SquawkCNBC - very exciting to launch @USTechForce  to get America's best and brightest tech talent to help us modernize the federal government. Apply at https://t.co/9prbYAoXCL

Thank you @andrewrsorkin for the time this morning on @SquawkCNBC - very exciting to launch @USTechForce to get America's best and brightest tech talent to help us modernize the federal government. Apply at https://t.co/9prbYAoXCL

Dir., Office of Personnel Management (previously, MP at a16z); Author of Secrets of Sand Hill Road; father of three amazing/crazy/beautiful girls.

avatar for Scott Kupor
Scott Kupor
Mon Dec 15 16:45:30
  • Previous
  • 1
  • More pages
  • 702
  • 703
  • 704
  • More pages
  • 5634
  • Next