LogoThread Easy
  • 発見
  • スレッド作成
LogoThread Easy

Twitter スレッドの万能パートナー

© 2025 Thread Easy All Rights Reserved.

探索

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

RT @heathercmiller: ANNOUNCING 🥁🥁🥁: 
the inaugural ACM Conference on AI and Agentic Systems! 🚀
(ACM CAIS 2026)

Agents are now everywhere,…

RT @heathercmiller: ANNOUNCING 🥁🥁🥁: the inaugural ACM Conference on AI and Agentic Systems! 🚀 (ACM CAIS 2026) Agents are now everywhere,…

CTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, @DSPyOSS. https://t.co/nmRYAKFsWr

avatar for Matei Zaharia
Matei Zaharia
Tue Dec 23 17:40:04
这篇微软和复旦合作的AniX论文有点意思,让AI解读下:

如果能把自己设计的角色放进一个3D世界,然后像玩游戏一样控制它做各种动作,会是什么感觉?

微软研究院和复旦大学的团队做了一个叫AniX的系统,基本上就是这么回事。

给它一个3D场景(用3DGS技术生成的那种),再给它一个角色,然后用自然语言告诉它"往前跑"或者"弹吉他",它就能生成相应的视频。

最核心的能力是四个方面:

1. 场景和角色的一致性
生成的视频里,角色的长相和场景的样子都跟你提供的保持一致。

2. 动作库很丰富
不只是简单的走路跑步,还能做手势(比如挥手、敬礼),甚至和物体互动(打电话、弹吉他)。

训练数据里只有4个基础移动动作,但模型学会了举一反三,能做142种没见过的动作。(??!)

3. 可以持续互动
你可以一轮一轮地给指令,每次生成的视频都会接着上一段,保持连贯性。就像真的在探索一个世界。

4. 镜头可控
这个设计挺聪明。它不是用复杂的数学编码来控制镜头,而是直接在3DGS场景里渲染出你想要的镜头路径,然后把这个渲染结果作为条件输入。

相当于给模型看了一个"参考视频",告诉它镜头应该怎么动。

整个系统的核心思路是条件自回归视频生成。

训练数据来源很接地气,就是GTA-V游戏录像。

他们录了2000多段视频,每段只包含一个动作,然后做了三件事:

① 把角色抠出来
② 把背景补全(用AI修复工具)
③ 给动作打标签

每个角色用四个视角的图片来表示(前后左右),这样模型就能从不同角度认识这个角色。

模型架构基于HunyuanCustom(130亿参数),用的是Flow Matching训练方法。

怎么把各种条件信息喂给模型:
① 场景和角色遮罩直接融合到噪声里
② 文字指令和多视角角色图片拼接到序列里
③ 用不同的位置编码来区分这些输入

有个有意思的发现:

用简单的移动动作数据微调预训练模型,不但没破坏模型的泛化能力,反而让动作质量变好了。

感觉跟大语言模型后训练很像,微调不是重新学知识,而是调整"说话风格"。

他们用WorldScore这套评估体系测了视觉质量。

结果在几乎所有指标上都超过了现有的视频生成模型和专门的世界模型。

动作控制成功率特别能说明问题:

① 基础移动动作:100%成功
② 142种新动作:80.7%成功

对比一下,其他模型在基础动作上的成功率都在50%以下,有些甚至只有3.3%。

角色一致性用DINOv2和CLIP分数来衡量,AniX分别达到0.698和0.721,明显高于其他方法。

几个关键设计选择

多视角角色输入确实有用。

他们对比了单视角、双视角和四视角,角色一致性分数随着视角增加而提升。

角色遮罩也很重要。

有了每帧的遮罩信息,模型能更好地区分哪些是动态的角色,哪些是静态的场景。

视觉条件对长时间生成的帮助很明显。

如果去掉3DGS场景条件或者多视角角色条件,生成质量会随着时间推移明显下降。

原始模型生成93帧360P视频需要121秒(单张H100)。

用DMD2蒸馏成4步版本后,只需要21秒,质量几乎没损失。

想到几个有意思的点:

游戏数据的价值。
GTA-V这种游戏提供了大量结构化的训练数据,角色、场景、动作都是现成的。

这可能是个被低估的数据源。

混合训练策略。

他们后来加了400段真人视频,用"rendered"和"real"标签来区分游戏和真实风格。

这种简单的数据标注就能让模型学会区分不同的视觉风格,挺巧妙的。

泛化能力的来源。

只用4个基础动作训练,却能做142种新动作,这说明预训练模型里已经有了丰富的人体运动知识。

微调只是在激活和对齐这些知识。

镜头控制的思路。

直接渲染参考视频比编码成抽象的数学表示更直观,也更可控。

这种"所见即所得"的设计理念值得借鉴。

论文里没明说局限性,但能看出来的问题:

训练数据还是太少,2000多段视频对于这么复杂的任务来说不算多。

物体交互动作的成功率(80.7%)虽然不错,但还有提升空间。

长时间生成虽然有改善,但从图表看,质量还是会随时间下降,这可能是自回归模型的通病。

场景必须是3DGS格式,这对普通用户来说门槛还是有点高。

虽然可以用Marble这种工具生成,但又多了一层依赖。

整体来说,AniX在可控角色动画生成这个方向上往前走了一大步。

不需要海量数据和复杂设计,找对方法,小数据也能做出大效果。

这篇微软和复旦合作的AniX论文有点意思,让AI解读下: 如果能把自己设计的角色放进一个3D世界,然后像玩游戏一样控制它做各种动作,会是什么感觉? 微软研究院和复旦大学的团队做了一个叫AniX的系统,基本上就是这么回事。 给它一个3D场景(用3DGS技术生成的那种),再给它一个角色,然后用自然语言告诉它"往前跑"或者"弹吉他",它就能生成相应的视频。 最核心的能力是四个方面: 1. 场景和角色的一致性 生成的视频里,角色的长相和场景的样子都跟你提供的保持一致。 2. 动作库很丰富 不只是简单的走路跑步,还能做手势(比如挥手、敬礼),甚至和物体互动(打电话、弹吉他)。 训练数据里只有4个基础移动动作,但模型学会了举一反三,能做142种没见过的动作。(??!) 3. 可以持续互动 你可以一轮一轮地给指令,每次生成的视频都会接着上一段,保持连贯性。就像真的在探索一个世界。 4. 镜头可控 这个设计挺聪明。它不是用复杂的数学编码来控制镜头,而是直接在3DGS场景里渲染出你想要的镜头路径,然后把这个渲染结果作为条件输入。 相当于给模型看了一个"参考视频",告诉它镜头应该怎么动。 整个系统的核心思路是条件自回归视频生成。 训练数据来源很接地气,就是GTA-V游戏录像。 他们录了2000多段视频,每段只包含一个动作,然后做了三件事: ① 把角色抠出来 ② 把背景补全(用AI修复工具) ③ 给动作打标签 每个角色用四个视角的图片来表示(前后左右),这样模型就能从不同角度认识这个角色。 模型架构基于HunyuanCustom(130亿参数),用的是Flow Matching训练方法。 怎么把各种条件信息喂给模型: ① 场景和角色遮罩直接融合到噪声里 ② 文字指令和多视角角色图片拼接到序列里 ③ 用不同的位置编码来区分这些输入 有个有意思的发现: 用简单的移动动作数据微调预训练模型,不但没破坏模型的泛化能力,反而让动作质量变好了。 感觉跟大语言模型后训练很像,微调不是重新学知识,而是调整"说话风格"。 他们用WorldScore这套评估体系测了视觉质量。 结果在几乎所有指标上都超过了现有的视频生成模型和专门的世界模型。 动作控制成功率特别能说明问题: ① 基础移动动作:100%成功 ② 142种新动作:80.7%成功 对比一下,其他模型在基础动作上的成功率都在50%以下,有些甚至只有3.3%。 角色一致性用DINOv2和CLIP分数来衡量,AniX分别达到0.698和0.721,明显高于其他方法。 几个关键设计选择 多视角角色输入确实有用。 他们对比了单视角、双视角和四视角,角色一致性分数随着视角增加而提升。 角色遮罩也很重要。 有了每帧的遮罩信息,模型能更好地区分哪些是动态的角色,哪些是静态的场景。 视觉条件对长时间生成的帮助很明显。 如果去掉3DGS场景条件或者多视角角色条件,生成质量会随着时间推移明显下降。 原始模型生成93帧360P视频需要121秒(单张H100)。 用DMD2蒸馏成4步版本后,只需要21秒,质量几乎没损失。 想到几个有意思的点: 游戏数据的价值。 GTA-V这种游戏提供了大量结构化的训练数据,角色、场景、动作都是现成的。 这可能是个被低估的数据源。 混合训练策略。 他们后来加了400段真人视频,用"rendered"和"real"标签来区分游戏和真实风格。 这种简单的数据标注就能让模型学会区分不同的视觉风格,挺巧妙的。 泛化能力的来源。 只用4个基础动作训练,却能做142种新动作,这说明预训练模型里已经有了丰富的人体运动知识。 微调只是在激活和对齐这些知识。 镜头控制的思路。 直接渲染参考视频比编码成抽象的数学表示更直观,也更可控。 这种"所见即所得"的设计理念值得借鉴。 论文里没明说局限性,但能看出来的问题: 训练数据还是太少,2000多段视频对于这么复杂的任务来说不算多。 物体交互动作的成功率(80.7%)虽然不错,但还有提升空间。 长时间生成虽然有改善,但从图表看,质量还是会随时间下降,这可能是自回归模型的通病。 场景必须是3DGS格式,这对普通用户来说门槛还是有点高。 虽然可以用Marble这种工具生成,但又多了一层依赖。 整体来说,AniX在可控角色动画生成这个方向上往前走了一大步。 不需要海量数据和复杂设计,找对方法,小数据也能做出大效果。

原论文地址 https://t.co/0RSMzZPuon

avatar for 向阳乔木
向阳乔木
Tue Dec 23 17:33:25
Building is so damn fun right now that I do that instead of gaming every night.

The age of AI is exciting af and I feel inspired - looots of ideas coming in

Building is so damn fun right now that I do that instead of gaming every night. The age of AI is exciting af and I feel inspired - looots of ideas coming in

Founder 📈 @parqetapp Host of 🎙 @minimalempires Prev. @stripe

avatar for Sumit Kumar
Sumit Kumar
Tue Dec 23 17:29:07
This prompt is from @azed_ai. Product photo is @Mascobot’s a16z GPU rig.

Create a 3×3 grid in
3:4 aspect ratio for a high-end commercial marketing campaign using the uploaded product as the central subject.

Each frame must present a distinct visual concept while maintaining perfect product consistency across all nine images.

Grid Concepts (one per cell):

1. Iconic hero still life with bold composition

2. Extreme macro detail highlighting material, surface, or texture

3. Dynamic liquid or particle interaction surrounding the product

4. Minimal sculptural arrangement with abstract forms

5. Floating elements composition suggesting lightness and innovation

6. Sensory close-up emphasizing tactility and realism

7. Color-driven conceptual scene inspired by the product palette

8. Ingredient or component abstraction (non-literal, symbolic)

9. Surreal yet elegant fusion scene combining realism and imagination

Visual Rules:
Product must remain 100% accurate in shape, proportions, label, typography, color, and branding
No distortion, deformation, or redesign of the product
Clean separation between product and background

Lighting & Style:
Soft, controlled studio lighting
Subtle highlights, realistic shadows
High dynamic range, ultra-sharp focus
Editorial luxury advertising aesthetic
Premium sensory marketing look

Overall Feel:
Modern, refined, visually cohesive
High-end commercial campaign
Designed for brand websites, social grids, and digital billboards
Hyperreal, cinematic, polished, and aspirational

This prompt is from @azed_ai. Product photo is @Mascobot’s a16z GPU rig. Create a 3×3 grid in 3:4 aspect ratio for a high-end commercial marketing campaign using the uploaded product as the central subject. Each frame must present a distinct visual concept while maintaining perfect product consistency across all nine images. Grid Concepts (one per cell): 1. Iconic hero still life with bold composition 2. Extreme macro detail highlighting material, surface, or texture 3. Dynamic liquid or particle interaction surrounding the product 4. Minimal sculptural arrangement with abstract forms 5. Floating elements composition suggesting lightness and innovation 6. Sensory close-up emphasizing tactility and realism 7. Color-driven conceptual scene inspired by the product palette 8. Ingredient or component abstraction (non-literal, symbolic) 9. Surreal yet elegant fusion scene combining realism and imagination Visual Rules: Product must remain 100% accurate in shape, proportions, label, typography, color, and branding No distortion, deformation, or redesign of the product Clean separation between product and background Lighting & Style: Soft, controlled studio lighting Subtle highlights, realistic shadows High dynamic range, ultra-sharp focus Editorial luxury advertising aesthetic Premium sensory marketing look Overall Feel: Modern, refined, visually cohesive High-end commercial campaign Designed for brand websites, social grids, and digital billboards Hyperreal, cinematic, polished, and aspirational

Partner @a16z AI 🤖 and twin to @omooretweets | Investor in @elevenlabsio, @krea_ai, @bfl_ml, @hedra_labs, @wabi, @WaveFormsAI, @ViggleAI, @MireloAI

avatar for Justine Moore
Justine Moore
Tue Dec 23 17:26:37
2026 and 2027 will be the years I'm testing, applying, and adjusting my 3-year plan - hoping to reap the big rewards in 2028. 

Will I become a millionaire then? We shall see.

2026 and 2027 will be the years I'm testing, applying, and adjusting my 3-year plan - hoping to reap the big rewards in 2028. Will I become a millionaire then? We shall see.

I build stuff. On my way to making $1M 💰 My projects 👇

avatar for Florin Pop 👨🏻‍💻
Florin Pop 👨🏻‍💻
Tue Dec 23 17:23:40
Is AI delivering real productivity gains? What's the ROI so far? Hot takes abound, but data have been scarce.

@noamseg and I took it upon ourselves to find out what’s actually happening on the ground by running one of the largest independent, in-depth surveys on how AI is affecting productivity for tech workers (1,750 respondents). We surveyed product managers, engineers, designers, founders, and others about how they’re using AI at work.

tl;dr: AI is overdelivering.

1. 55% of respondents say AI has exceeded their expectations, and almost 70% say it’s improved the quality of their work.
2. More than half of respondents said AI is saving them at least half a day per week on their most important tasks. We’ve never seen a tool deliver a productivity boost like this before.
3. Founders are getting the most out of AI. Half (49%) report that AI saves them over 6 hours per week, dramatically higher than for any other role. Close to half (45%) also feel that the quality of their work is “much better” thanks to AI.
4. Designers are seeing the fewest benefits. Only 45% report a positive ROI (compared with 78% of founders), and 31% report that AI has fallen below expectations, triple the rate among founders.
5. Engineers have accepted AI as a coding partner and now want it to handle the more boring (but necessary) work of building products: documentation, code review, and writing tests.
6. n8n is currently dominating the agent landscape, though actual adoption of agentic platforms in 2025 has been slow.
7. A whopping 92.4% of respondents report at least one significant downsides to using AI tools. There’s definitely room for improvement.

Here's the full report: https://t.co/2ra234FE8e

Inside:
- What exactly AI is doing for people, function by function?
- Where are the biggest opportunities for AI startups?
- Which AI tools have product-market fit?
- The downsides of AI productivity
- Bonus: The state of agentic AI: promise outpaces practice
- What this all means
- Appendix: Who took this survey

Is AI delivering real productivity gains? What's the ROI so far? Hot takes abound, but data have been scarce. @noamseg and I took it upon ourselves to find out what’s actually happening on the ground by running one of the largest independent, in-depth surveys on how AI is affecting productivity for tech workers (1,750 respondents). We surveyed product managers, engineers, designers, founders, and others about how they’re using AI at work. tl;dr: AI is overdelivering. 1. 55% of respondents say AI has exceeded their expectations, and almost 70% say it’s improved the quality of their work. 2. More than half of respondents said AI is saving them at least half a day per week on their most important tasks. We’ve never seen a tool deliver a productivity boost like this before. 3. Founders are getting the most out of AI. Half (49%) report that AI saves them over 6 hours per week, dramatically higher than for any other role. Close to half (45%) also feel that the quality of their work is “much better” thanks to AI. 4. Designers are seeing the fewest benefits. Only 45% report a positive ROI (compared with 78% of founders), and 31% report that AI has fallen below expectations, triple the rate among founders. 5. Engineers have accepted AI as a coding partner and now want it to handle the more boring (but necessary) work of building products: documentation, code review, and writing tests. 6. n8n is currently dominating the agent landscape, though actual adoption of agentic platforms in 2025 has been slow. 7. A whopping 92.4% of respondents report at least one significant downsides to using AI tools. There’s definitely room for improvement. Here's the full report: https://t.co/2ra234FE8e Inside: - What exactly AI is doing for people, function by function? - Where are the biggest opportunities for AI startups? - Which AI tools have product-market fit? - The downsides of AI productivity - Bonus: The state of agentic AI: promise outpaces practice - What this all means - Appendix: Who took this survey

Deeply researched product, growth, and career advice

avatar for Lenny Rachitsky
Lenny Rachitsky
Tue Dec 23 17:23:39
  • Previous
  • 1
  • More pages
  • 53
  • 54
  • 55
  • More pages
  • 5634
  • Next