Keep on to blur preview images; turn off to show them clearly

LMArena 的机制很简单:用户面对两段匿名模型输出,只凭观感、流畅度、完成度做出选择。 某种程度上比常规 benchmark 更贴近真实使用场景。 这次文心的考试结果,在三个方面得分很高:创意写作第一,复杂长问题理解稳定领先,指令遵循能力:进入一线梯队。这三个维度,恰好组成了一个 agent-ready 语言模型最核心的三角形结构。 特别是创意写作这个点,这其实是最不容易欺骗的维度,它考的是语言建模是否真的贴近人类思维节奏,能不能写出不油腻,有节奏感的段落。 这次我看到的文心不一样的点就在能用中文表达出轻盈克制,并且不空洞的内容。这种能力其实有点稀缺, 更关键的是这还只是 Preview,正式版本会下周百度世界大会上亮相。现在看到的能力,还只是可控泄露版本。


curious guy creating things @ https://t.co/HXWladhJaA - up and coming wife guy


喜欢摇滚乐、爱钓鱼的PM 网站:https://t.co/vnUpLt752o


the equivalent of the bailout in this case is technocapital realizing it doesnt actually need the next generation of founders. its okay to cook gen alpha until they cant even form a coherent sentence, because you wont need them by the time theyre 30 to continue economic growth


traveler btw worlds. bias for makers, I heart art + tech! capitalist. EIC a16zcrypto; Editor in Chief a16z + podcast showrunner 2014-2022; fmr WIRED, Xerox PARC


traveler btw worlds. bias for makers, I heart art + tech! capitalist. EIC a16zcrypto; Editor in Chief a16z + podcast showrunner 2014-2022; fmr WIRED, Xerox PARC
