LogoThread Easy
  • Explorer
  • Composer un thread
LogoThread Easy

Votre partenaire tout-en-un pour les threads Twitter

© 2025 Thread Easy All Rights Reserved.

Explorer

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

MCP servers can now deliver interactive UIs to AI Agents.

Anthropic, OpenAI, and mcp-ui just released MCP Apps.

Build data visualizations, complex forms, and rich interfaces that work across Claude, ChatGPT and other MCP hosts.

MCP servers can now deliver interactive UIs to AI Agents. Anthropic, OpenAI, and mcp-ui just released MCP Apps. Build data visualizations, complex forms, and rich interfaces that work across Claude, ChatGPT and other MCP hosts.

MCP Apps: https://t.co/XyPUBOFyoh More such AI tools and projects in https://t.co/BvTc8nQQW5: Get access to 100+ AI Agent, RAG, LLM, and MCP tutorials with open-source code - All for FREE.

avatar for Unwind AI
Unwind AI
Tue Nov 25 16:30:06
#每日推荐 如果你用Better Auth,那么你可以尝试使用一下这个github 1.2k的Better Auth UI,无缝集成Better Auth

https://t.co/hqdD7GamzR

#每日推荐 如果你用Better Auth,那么你可以尝试使用一下这个github 1.2k的Better Auth UI,无缝集成Better Auth https://t.co/hqdD7GamzR

✦ Indie Hacker / AI Maker / Full Stacker ✦ Founder of https://t.co/HDnzUGieag(DR 75) & https://t.co/t6DoP7ODNe & https://t.co/YuOLvgIStF & https://t.co/ZvHVC3guiZ

avatar for Justin3go
Justin3go
Tue Nov 25 16:29:55
RT @Andercot: Off-hand thoughts on robotics:
- Possibly the largest market in history
- Robots will transact on blockchains exclusively
- M…

RT @Andercot: Off-hand thoughts on robotics: - Possibly the largest market in history - Robots will transact on blockchains exclusively - M…

Partner @a16z investing in American Dynamism 🇺🇸. Frontier technologies, markets, & culture.

avatar for Oliver Hsu
Oliver Hsu
Tue Nov 25 16:28:53
RT @AskPerplexity: 🚨The White House just launched the Genesis Mission — a Manhattan Project for AI

The Department of Energy will build a n…

RT @AskPerplexity: 🚨The White House just launched the Genesis Mission — a Manhattan Project for AI The Department of Energy will build a n…

Partner @a16z investing in American Dynamism 🇺🇸. Frontier technologies, markets, & culture.

avatar for Oliver Hsu
Oliver Hsu
Tue Nov 25 16:28:40
在这个帖子下面发一些 FLUX.2 模型的测试的合集

对于设计师友好,FLUX.2 支持非常精准的色值控制

由于 LLM 部分比较差在世界知识和多模态推理上肯定是不如 Banana

简单测试了一下,一致性,比banana差一些,给了6张图,结果生成出来少了一个家具。

在这个帖子下面发一些 FLUX.2 模型的测试的合集 对于设计师友好,FLUX.2 支持非常精准的色值控制 由于 LLM 部分比较差在世界知识和多模态推理上肯定是不如 Banana 简单测试了一下,一致性,比banana差一些,给了6张图,结果生成出来少了一个家具。

试试文生图,提示词: A square-shaped vacuum-sealed transparent plastic bag tightly packed with vibrant pink peonies, arranged in a compact and visually rich way, petals showing layered natural textures, deep but realistic shades of pink, soft studio lighting, sharp plastic reflections, black background, high contrast and detail, hyperrealistic editorial product photography.

avatar for 歸藏(guizang.ai)
歸藏(guizang.ai)
Tue Nov 25 16:28:17
my tldr: more evals need to be agent first not model first where agent=model+harness

in practice it’s basically impossible and also usually not useful to eval a model without its harness, even if you could, what is it really measuring?

some notes:
1. harnesses today provide tons of value on top of the model. companies like @FactoryAI Droid and @AmpCode specialize in creating delightful & performant harnesses optimized for Coding across models.  you can sell a harness as your product, “HaaS=harness as a service”

2. models today are trained with components of their “harness in the loop”, this includes their tool descriptions and (I think) also behaviors for when/how to do interleaved thinking

3. fixing a harness to make evals across models “fair” is not fair.  Models are non-fungible in their harness, fixing the harness isn’t standardizing because we don’t have the interpretability tools to understand how each harness affects each model.  we just use evals as a proxy for this, fixing the harness implies we know model perf is fixed across harnesses which it’s not

evals should measure the ability to do a Task.  why would you decouple the optimal setting needed to elicit good behavior from the model itself?

like we could measure “what happens if I give this model the worst possible conditions to do this task and it struggles, or does it perfectly”…but why?!  although cool and interesting, it’s not practically useful today.  the goal is to design systems that do work well and a model is a single (though the most important) component of that system

more systems engineering in evals is a good thing even as models get smarter and need less guidance in their harness, strong believer that the harness will never truly go away, we may simply rename it

my tldr: more evals need to be agent first not model first where agent=model+harness in practice it’s basically impossible and also usually not useful to eval a model without its harness, even if you could, what is it really measuring? some notes: 1. harnesses today provide tons of value on top of the model. companies like @FactoryAI Droid and @AmpCode specialize in creating delightful & performant harnesses optimized for Coding across models. you can sell a harness as your product, “HaaS=harness as a service” 2. models today are trained with components of their “harness in the loop”, this includes their tool descriptions and (I think) also behaviors for when/how to do interleaved thinking 3. fixing a harness to make evals across models “fair” is not fair. Models are non-fungible in their harness, fixing the harness isn’t standardizing because we don’t have the interpretability tools to understand how each harness affects each model. we just use evals as a proxy for this, fixing the harness implies we know model perf is fixed across harnesses which it’s not evals should measure the ability to do a Task. why would you decouple the optimal setting needed to elicit good behavior from the model itself? like we could measure “what happens if I give this model the worst possible conditions to do this task and it struggles, or does it perfectly”…but why?! although cool and interesting, it’s not practically useful today. the goal is to design systems that do work well and a model is a single (though the most important) component of that system more systems engineering in evals is a good thing even as models get smarter and need less guidance in their harness, strong believer that the harness will never truly go away, we may simply rename it

building agents and harnesses, prev @awscloud, phd cs @ temple

avatar for Viv
Viv
Tue Nov 25 16:25:47
  • Previous
  • 1
  • More pages
  • 2386
  • 2387
  • 2388
  • More pages
  • 5635
  • Next