探索 | Thread Easy - 展開 Twitter 線程｜閱讀、摘要與創作

but also, that's partially a style/gimmick this gimmick takes real brains to execute but it's an arbitrary design choice, not a product of raw capability jump overflowing into "huh, this teortaxes guy ain't that smart". It still sometimes objects inanely a bit smaller step

I'm surprised how small Opus feels sometimes just like Sonnet, I can unintentionally collapse it into parroting (which it can be rescued from) I suspect it relies on CoT preservation though. need to retest

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Thu Dec 11 22:56:53

after playing around with 5.2 more, I can say yeah it's smart it has "big model smell" in that it's almost like it keeps a parallel context with its own larger perspective of the problem. Flexible, but unbreakable. It's more than "biases". It thinks around rendered tokens.

but also, that's partially a style/gimmick this gimmick takes real brains to execute but it's an arbitrary design choice, not a product of raw capability jump overflowing into "huh, this teortaxes guy ain't that smart". It still sometimes objects inanely a bit smaller step

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Thu Dec 11 22:54:53

普通版：输入 1.75 美元，输出 14 美元。专业版：输入 21 美元，输出 168 美元。总体比 GPT 5.1 涨价 40% 。太强了。太贵了。 AI 今年的趋势，一个是文本模型涨价（GPT 5.2），一个是图像模型涨价（banana Pro）。 AI 明年的趋势，会不会是，视频模型涨价？

聊硅基 AI，看有机 Orange。

Orange AI

Thu Dec 11 22:51:42

Sam 狂喜，OpenAI 的年底答卷 GPT 5.2 正式发布不要被他的版本号欺骗，这是今年 OpenAI 的年底大招。官方定位是：迄今为止面向专业知识工作的最强大模型。模型性能大幅提升，价格也大幅提升了 40%。在降本的大趋势下，模型涨价，一般都需要底气。这个模型的底气在哪里？前阵子 OpenAI 设计了 GDPval，一个以国内生产总值（GDP）这一关键经济指标为灵感。 1320个专业任务，覆盖了美国 GDP 贡献排名前 9 大行业中精选出的 44 个职业。任务要求提交真实的成果作品，例如销售演示文稿、会计电子表格、急诊排班表、制造流程图，或短视频。刚发布 GDPval 的时候，Claude Opus 4.1 以 47.6 的分数遥遥领先。但是今天， GPT-5.2 直接把分数刷到了 70% 以上。

Coding 编码能力 SWE-Bench Pro 是一项针对真实世界软件工程的严格评估。与仅测试 Python 的 SWE-bench Verified 不同，SWE-Bench Pro 测试四种语言，并致力于具备更强的抗污染能力、更高的挑战性、更丰富的多样性以及更强的工业相关性。 GPT‑5.2 Thinking 在 SWE-Bench Pro 上取得了 55.6%的全新最先进水平。超过了 Claude Opus 4.5 的 52% 和 Gemini 3 Pro 的 43.3% 。

Orange AI

Thu Dec 11 22:50:05

GPT-5.2 is now live in Windsurf! Available for 0x credits for a limited time (paid and trial users). The version bump undersells the jump in intelligence: - Biggest leap for GPT models in agentic coding since GPT-5 - SOTA coding model at its price point - Default in Windsurf

Download the latest versions of Windsurf and Windsurf Next to try it out: https://t.co/E6JgVot67u

Windsurf

Thu Dec 11 22:49:52

GPT-5.2 发布啦! 给大家带来GPT-5.2 本次的更新内容首先这次发布了三个版本： GPT-5.2 Instant (chat) - 快速、高效的日常工作助手 GPT-5.2 Thinking - 深度工作的首选，适合复杂任务 GPT-5.2 Pro - 最智能和最可信的选择那么, 是不是看上去 GPT-5.2 Pro 看上去是最好的? 没错, 价格也是最好的, 一百万 token 输出要 $168..... 编程能力上 SWE-bench Verified 刷到了 80.0%，并且幻觉成都进一步下降. 召回能力在256K上下文上4个查找点可以做到接近100%召回, 8个查找点大概在77%的召回率. 稍后为大家带来编程能力实测！

具体性能/1

karminski-牙医

Thu Dec 11 22:42:24

探索

Newest first — browse tweet threads

探索

Newest first — browse tweet threads

but also, that's partially a style/gimmick this gimmick takes real brains to execute but it's an arbitrary design choice, not a product of raw capability jump overflowing into "huh, this teortaxes guy ain't that smart". It still sometimes objects inanely a bit smaller step

after playing around with 5.2 more, I can say yeah it's smart it has "big model smell" in that it's almost like it keeps a parallel context with its own larger perspective of the problem. Flexible, but unbreakable. It's more than "biases". It thinks around rendered tokens.

GPT-5.2 is now live in Windsurf! Available for 0x credits for a limited time (paid and trial users). The version bump undersells the jump in intelligence: - Biggest leap for GPT models in agentic coding since GPT-5 - SOTA coding model at its price point - Default in Windsurf

探索

Newest first — browse tweet threads

探索

Newest first — browse tweet threads

but also, that's partially a style/gimmick this gimmick takes real brains to execute but it's an arbitrary design choice, not a product of raw capability jump overflowing into "huh, this teortaxes guy ain't that smart". It still sometimes objects inanely a bit smaller step

after playing around with 5.2 more, I can say yeah it's smart it has "big model smell" in that it's almost like it keeps a parallel context with its own larger perspective of the problem. Flexible, but unbreakable. It's more than "biases". It thinks *around* rendered tokens.

GPT-5.2 is now live in Windsurf! Available for 0x credits for a limited time (paid and trial users). The version bump undersells the jump in intelligence: - Biggest leap for GPT models in agentic coding since GPT-5 - SOTA coding model at its price point - Default in Windsurf

after playing around with 5.2 more, I can say yeah it's smart it has "big model smell" in that it's almost like it keeps a parallel context with its own larger perspective of the problem. Flexible, but unbreakable. It's more than "biases". It thinks around rendered tokens.