LogoThread Easy
  • Explorer
  • Composer un thread
LogoThread Easy

Votre partenaire tout-en-un pour les threads Twitter

© 2025 Thread Easy All Rights Reserved.

Explorer

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

Heard too many wrong (IMHO) takes on DeepSeek R1's "RL" vs SFT at #NeurIPS2025 yesterday!🤦♂️ With the degenerate MDP that R1 uses, where it splits the verifier reward for the solution equally among all intermediate+solution tokens, R1's RL really is a filtered/iterative form of SFT! Come chat with us at LAW and ForLM workshops on Sunday.. 👇

Heard too many wrong (IMHO) takes on DeepSeek R1's "RL" vs SFT at #NeurIPS2025 yesterday!🤦♂️ With the degenerate MDP that R1 uses, where it splits the verifier reward for the solution equally among all intermediate+solution tokens, R1's RL really is a filtered/iterative form of SFT! Come chat with us at LAW and ForLM workshops on Sunday.. 👇

AI researcher & teacher @SCAI_ASU. Former President of @RealAAAI; Chair of @AAAS Sec T. Here to tweach #AI. YouTube Ch: https://t.co/4beUPOmf6y Bsky: rao2z

avatar for Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
Fri Dec 05 14:29:51
Heard too many wrong (IMHO) takes on DeepSeek R1's "RL" vs SFT at #NeurIPS2025 yesterday!🤦♂️ With the degenerate MDP that R1 uses, where it splits the verifier reward for the solution equally among all intermediate+solution tokens, R1's RL really is a filtered/iterative form of SFT! Come chat with us at LAW and ForLM workshops on Sunday.. 👇

Heard too many wrong (IMHO) takes on DeepSeek R1's "RL" vs SFT at #NeurIPS2025 yesterday!🤦♂️ With the degenerate MDP that R1 uses, where it splits the verifier reward for the solution equally among all intermediate+solution tokens, R1's RL really is a filtered/iterative form of SFT! Come chat with us at LAW and ForLM workshops on Sunday.. 👇

AI researcher & teacher @SCAI_ASU. Former President of @RealAAAI; Chair of @AAAS Sec T. Here to tweach #AI. YouTube Ch: https://t.co/4beUPOmf6y Bsky: rao2z

avatar for Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
Fri Dec 05 14:29:51
It's a bit embarrassing for the rest of the world (most of all Americans who fantasize about nuclear a lot) that Russian Rosatom, coasting on Soviet stack (30+ year old), personally founded by Vlad Putin, is still the largest international player in nuclear power

It's a bit embarrassing for the rest of the world (most of all Americans who fantasize about nuclear a lot) that Russian Rosatom, coasting on Soviet stack (30+ year old), personally founded by Vlad Putin, is still the largest international player in nuclear power

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

avatar for Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Fri Dec 05 14:28:19
关于企业应用中的 RAG(检索增强生成)最新系统性综述来了。

企业要真正用好 RAG,还有很长的路要走,别想当然地觉得 RAG 系统已经可以量产了。

实验室里的原型和企业实际部署之间,差距比大多数人想象的还要大。

这份系统性文献回顾分析了 77 篇关于 RAG + LLM 系统在企业知识管理和文档自动化方面的高质量研究,涵盖了 2015 年到 2025 年中的出版物。

研究结果揭示了一个集中的技术栈:

- 63.6% 的实现使用 GPT 模型。

- 80.5% 依赖于像 FAISS 或 Elasticsearch 这样的标准检索框架。

- 66.2% 倾向于使用云基础设施进行扩展 (scaling)。

但“从实验室到市场”的差距仍然很大。

虽然检索和分类任务经常使用严格的验证方法,比如 k-fold 交叉验证(93.5%),但由于计算限制,生成式评估主要依赖于静态的 hold-out 数据集。

只有 13% 的研究在实际的企业环境中部署了 RAG 原型。

目前主要有五个反复出现的挑战:

- 幻觉控制(48.1% 的研究)。

- 数据隐私和安全(37.7%)。

- 延迟和可扩展性(31.2%)。

- 领域自适应(23.4%)。

- 难以衡量业务影响(15.6%)。

技术指标覆盖得很好,80.5% 的研究中出现了精确率、召回率和准确率,44.2% 出现了 ROUGE 和 BLEU。

但是,只有 19.5% 的研究中出现了人机协作评估,而衡量最终用户结果的真实案例研究仍然很少。

在领域内数据上进行微调,通常比 zero-shot 方法提高 10-20% 的事实性增益。

混合检索(将密集向量与知识图谱相结合)出现在 23.1% 的研究中,并且通常可以提高可解释性和精确度。

这项研究为弥合学术原型和生产系统之间的差距提供了一个数据驱动的路线图。

这项技术在受控环境中有效,但是,保护隐私的检索、低于 100 毫秒的延迟以及以业务为中心的评估框架仍然是企业部署面临的公开挑战。

🔖 报告链接:

关于企业应用中的 RAG(检索增强生成)最新系统性综述来了。 企业要真正用好 RAG,还有很长的路要走,别想当然地觉得 RAG 系统已经可以量产了。 实验室里的原型和企业实际部署之间,差距比大多数人想象的还要大。 这份系统性文献回顾分析了 77 篇关于 RAG + LLM 系统在企业知识管理和文档自动化方面的高质量研究,涵盖了 2015 年到 2025 年中的出版物。 研究结果揭示了一个集中的技术栈: - 63.6% 的实现使用 GPT 模型。 - 80.5% 依赖于像 FAISS 或 Elasticsearch 这样的标准检索框架。 - 66.2% 倾向于使用云基础设施进行扩展 (scaling)。 但“从实验室到市场”的差距仍然很大。 虽然检索和分类任务经常使用严格的验证方法,比如 k-fold 交叉验证(93.5%),但由于计算限制,生成式评估主要依赖于静态的 hold-out 数据集。 只有 13% 的研究在实际的企业环境中部署了 RAG 原型。 目前主要有五个反复出现的挑战: - 幻觉控制(48.1% 的研究)。 - 数据隐私和安全(37.7%)。 - 延迟和可扩展性(31.2%)。 - 领域自适应(23.4%)。 - 难以衡量业务影响(15.6%)。 技术指标覆盖得很好,80.5% 的研究中出现了精确率、召回率和准确率,44.2% 出现了 ROUGE 和 BLEU。 但是,只有 19.5% 的研究中出现了人机协作评估,而衡量最终用户结果的真实案例研究仍然很少。 在领域内数据上进行微调,通常比 zero-shot 方法提高 10-20% 的事实性增益。 混合检索(将密集向量与知识图谱相结合)出现在 23.1% 的研究中,并且通常可以提高可解释性和精确度。 这项研究为弥合学术原型和生产系统之间的差距提供了一个数据驱动的路线图。 这项技术在受控环境中有效,但是,保护隐私的检索、低于 100 毫秒的延迟以及以业务为中心的评估框架仍然是企业部署面临的公开挑战。 🔖 报告链接:

Believing is seeing

avatar for Yangyi
Yangyi
Fri Dec 05 14:28:12
RT @AndrewYNg: Separate reports by the publicity firm Edelman and Pew Research show that Americans, and more broadly large parts of Europe…

RT @AndrewYNg: Separate reports by the publicity firm Edelman and Pew Research show that Americans, and more broadly large parts of Europe…

αι hypnotist ☰ 𝓐𝓼𝓹⦂𝓻⦂𝓃𝓰 𝓫𝓪𝓼𝓮 𝓶𝓸𝓭𝓮𝓵 ☲ post-academic ☴ nom de 🪶 ≠ anon

avatar for αιamblichus
αιamblichus
Fri Dec 05 14:24:16
It's so weird how the media refers to Dearborn, MI like it's some small town in the USA. Dearborn is Detroit. It's like saying Brooklyn instead of NYC. The demographic shift happening is in Detroit but the media calls it Dearborn. Why do you think that is?

It's so weird how the media refers to Dearborn, MI like it's some small town in the USA. Dearborn is Detroit. It's like saying Brooklyn instead of NYC. The demographic shift happening is in Detroit but the media calls it Dearborn. Why do you think that is?

GP @a16z — Building American Dynamism 🇺🇸 — Anthropologist — Formerly Founder/CEO @OpenDNS — Lokah Samastah Sukhino Bhavantu

avatar for David Ulevitch 🇺🇸
David Ulevitch 🇺🇸
Fri Dec 05 14:19:55
  • Previous
  • 1
  • More pages
  • 1508
  • 1509
  • 1510
  • More pages
  • 5634
  • Next