탐색 | Thread Easy - 트위터 스레드 펼치기 | 리더, 요약, 작성

Pangram claims to be a highly accurate AI detector with a false positive rate of 1 in 10,000. Let's take this at face value and see what it means. The claimed false positive rate (the chance of incorrectly detecting human-written text as AI-generated) seems very impressive. Definitely an improvement over the first generation of AI detectors. So how useful is Pangram? Let's take a concrete application: is it a viable solution to the problem of college students using AI in violation of course policies? Suppose every instructor started using an AI detector on all student submissions. I'd estimate that students submit 500 – 1,000 written works in the course of a 4 year education (!) — 30+ courses X ~5 assessments per course X many independent problems per assessment. If each of these were run through an AI detector with a FPR of 1 / 10,000, you'd have 5–10% of your student body falsely accused of cheating at some point during their time at university. So now you have three options: * Continue to treat cheating as the serious violation that it is, and initiate disciplinary proceedings whenever the AI detector flags suspected cheating. I hope it's obvious that this is not really viable. Even if we assume that most innocent students will be exonerated, the anxiety and wasted time is unfathomable. * Apply a small penalty instead of treating it as a serious violation. This normalizes cheating and is likely to backfire. * Use AI detection as only one signal and gather additional evidence of integrity violation (Pangram itself recommends this). But the problem is that all of the ways of doing this that I'm aware of either don't work, or can only be done once you've already initiated disciplinary proceedings, which brings you back to option 1. There are many other downsides to the systematic use of AI detection. * Students who know what they are doing can easily evade AI detection by paraphrasing their text either manually or using automated tools. If Pangram (or any other specific tool) starts to be adopted on a much bigger scale, the evasion tools will be incentivized to get better as well, specifically by training on the outputs of Pangram. * While simply offloading an exercise to AI of course fails to achieve learning goals, depending on the course and activity, there may be many healthy ways to use AI. Use of AI detection will make students uncomfortable with using any of these, since they likely increase the risk of false positives. If instructors treat AI as the problem, I doubt there is any solution. The actual problem is that our testing practices aren't that effective at assessing student mastery and engagement with learning. We should look to alternative assessment practices such as complementing written work with oral exams and assignment sequences where the student builds on their work throughout the semester. Once I started deploying these, I realized that they bring pedagogical benefits far beyond AI detection! This is not to say that AI detectors are useless. Pangram published a recent analysis of concerning levels of AI use in ICLR reviews. This is a good application of AI detection because it isn't about accusing individuals but about the aggregate. It doesn't require a very low false positive rate in order to be useful.

Princeton CS prof and Director @PrincetonCITP. Coauthor of "AI Snake Oil" and "AI as Normal Technology". https://t.co/ZwebetjZ4n Views mine.

Arvind Narayanan

Tue Dec 09 14:59:50

准备租自如的小区房了，租个主卧带独卫的 😂 就是室友是两个女生，这不怎么方便，需要再买个小洗衣机放独立卫生间。城中村环境太差了，以后不住城中村公寓了。小区合租噪音就可能来自室友，但自己一个人住小区三室一厅或者两室一厅又太奢侈了，关键还是没有对象，有的话好像就不是问题了

程序员，开发过 2 个 iOS App，几个不知名的浏览器插件，几个小程序。喜欢看动画、漫画、轻小说和网文。希望早日成为自宅警備員。

Plusye

Tue Dec 09 14:59:21

RT @ConradBastable: In the 2010s Tech world we had a series of startups who copied Google’s corporate culture & policies out of a misplaced…

We're in a race. It's not USA vs China but humans and AGIs vs ape power centralization. @deepseek_ai stan #1, 2023–Deep Time «C’est la guerre.» ®1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Tue Dec 09 14:57:52

“The urge to save humanity is almost always a false-face for the urge to rule it.” H. L. Mencken Beware of high priests in search of new dogmas. I’ve always been partial to old dogmas. The people who designed the old ones are already dead.

Co-Founder, American Dynamism. General Partner @a16z. Catholic. Mother. American. 🇺🇸 🚀💪

Katherine Boyle

Tue Dec 09 14:57:41

《无限聊斋》今日故事已经更新了3章，有兴趣的朋友可以看一看：第29章不想死就跪下喊主人第30章六道流霜・破军拓！第31章你好，我叫唐晓甜，春晓的晓，甜蜜蜜的甜第32章高玩把妹四件套：逛古街、看花灯、牵小手、谈理想第33章喂，你睡觉时竟然打呼噜我原先想写的是灵异，无奈从第二章开始没人看（续看率暴跌到不到20%），无奈我开始改写女频小说（给男主角送一个很容易就泡女侠女朋友，两个人开始谈恋爱...）如果你们喜欢看，我给它们放假，恋爱久一点（多写几章技巧，送给大家），如果看的人少，我把男主安排去打老板（Boss）了啊！

主角不止自己要打老板，呼朋唤友一起打老板... 大家一起打老板...

Y11

Tue Dec 09 14:56:26

Beyond Vibe Coding - AI 辅助开发指南 Google 工程负责人 @addyosmani 的新书，目的是纠正当前流行的 “Vibe Coding” 误区，为构建生产级软件提供一套严谨的 AI 辅助工程学框架。这本书我是在 Oreilly 在线阅读的，应该也能找到 PDF 版本。核心论点：从“氛围编码”到“AI 辅助工程” 1. “Vibe Coding” 的定义与局限 Andrej Karpathy 曾描述过一种未来愿景：“我只管看、说、跑代码，主要靠复制粘贴，只要 ‘感觉 Vibe’ 对了就行。” 这就是“Vibe Coding”——一种依赖高层级提示词、强调快速原型、忽略底层实现细节的开发方式。 2. “70% 陷阱” Addy 指出，Vibe Coding 虽然能让人极速完成 70% 的工作，但剩下的 30%（即生产级交付）若无深厚的工程底蕴，将寸步难行。 · 进二退二模式：修复一个 Bug 导致两个新 Bug，因为开发者不理解 AI 生成的代码逻辑。 · 隐形成本：缺乏维护性、安全性漏洞（如泄露数据库凭证）和性能瓶颈。 · 边际效应递减：对于新手，AI 是拐杖；但对于资深工程师，必须从“盲目接受”转向“严谨审查”。结论：必须从“玩票性质的编码”转向AI 辅助工程。这要求结合 AI 的创造力与传统工程的严谨性（规范、测试、审查）。关键方法论：AI 辅助开发的“工程学” 该指南提出了一套系统化的方法来弥补 AI 生成与生产标准之间的差距。 A. “先规划，后编码”原则这是最关键的范式转变。不要直接让 AI 写代码，而应强制执行 Spec-Driven Development。 · Mini-PRD / SPEC. md：在写代码前，要求 AI 先生成一份架构计划或小型产品需求文档。 · 计划模式：利用 AI 工具（如 Claude Code 或 Gemini CLI）的计划功能，先确认架构路径，再进入实施。 · 纠错前置：90% 的情况下，AI 起初会建议一个过于复杂的方案，通过规划阶段可以提前将其简化。 B. 上下文工程提示词工程已过时，现在是上下文工程的时代。需要将 AI 模型视为 CPU，将上下文窗口视为内存，通过动态加载信息来优化输出。 · 动态组装：不要静态地粘贴代码。应根据当前任务，动态抓取相关的代码片段、API 文档、完整的错误堆栈和数据库 Schema。 · 消除“上下文腐烂”：随着对话变长，无关信息会干扰 AI。需要定期总结并清理旧的上下文。 · 视觉上下文：直接传入设计图（Figma）或浏览器截图，因为“一图胜千言”，能大幅减少前端样式的反复调试。 C. 提示词的进阶策略 · 思维链：强制 AI 在输出代码前展示推理步骤（“第一步分析瓶颈，第二步建议索引...”）。 · 基于约束的提示：明确“负面约束”，例如“不使用外部库”、“必须兼容 IE11”等。 · 角色扮演：指定 AI 身份，如“作为资深安全审计员，请审查这段代码的 SQL 注入风险”。技术栈演进：CLI Agent 与多智能体编排详细探讨了开发环境的未来形态，即从 IDE 插件转向终端智能体和多智能体系统。 · CLI 编码智能体：工具如 Claude Code、Gemini CLI 或 Aider 直接驻留在终端。它们不仅是代码补全工具，更是能执行复杂任务（如 Git 操作、运行测试、文件读写）的独立实体。 · 多智能体编排： · 分工架构：一个“规划智能体”负责拆解任务，分发给“编码智能体”实施，再由“测试智能体”验证，最后由“文档智能体”更新 README。 · 流水线作业：类似于 CI/CD，但每个环节由 AI 驱动。 · 沙盒与回滚：由于智能体具有自主性，必须配置沙盒环境和检查点，以便在 AI“暴走”或犯错时一键回滚。生产现实：信任与质量门禁在享受 AI 效率的同时，必须建立严格的质量门禁。 · 像审查初级工程师一样审查 AI：永远不要盲目信任 AI 的代码。 · 测试驱动：让 AI 先写测试用例（红），再写代码让测试通过（绿），最后重构。这是确保 AI 代码逻辑正确的最佳护栏。 · 安全第一：AI 倾向于生成“能跑通”但“不安全”的代码（如硬编码密钥）。必须进行专门的安全扫描。总结：未来的开发者画像 Addy 通过此书/网站传达了一个清晰的信号：软件开发的门槛虽然在降低，但卓越工程的标准并未降低。未来的开发者将经历心态的转变： 1. 从编码者到决策人：核心技能不再是默写语法，而是提供高质量上下文、验证 AI 输出、并做出架构决策。 2. 从实现到意图：专注于精准描述“想要什么”，而非纠结“怎么写”。 3. 从单兵作战到人机结对：学会管理一个 AI 智能体团队，指挥它们协作完成复杂系统。书籍网站

邵猛，中年失业程序员 😂 专注 - Context Engineering, AI Agents. 分享 - AI papers, apps and OSS. ex Microsoft MVP 合作 - 私信/邮箱：shaomeng@outlook.com 📢 公众号/小红书: AI 启蒙小伙伴

meng shao

Tue Dec 09 14:56:01

탐색

Newest first — browse tweet threads

탐색

Newest first — browse tweet threads

RT @ConradBastable: In the 2010s Tech world we had a series of startups who copied Google’s corporate culture & policies out of a misplaced…

“The urge to save humanity is almost always a false-face for the urge to rule it.” H. L. Mencken Beware of high priests in search of new dogmas. I’ve always been partial to old dogmas. The people who designed the old ones are already dead.