探索 | Thread Easy - 展开 Twitter 线程｜阅读、总结与创作

A simple, lightweight PowerShell script to remove pre-installed apps, disable telemetry.

https://t.co/ovndB7f6Vj

GitHub Projects Community

Tue Nov 25 15:58:03

✨ I built a brand new onboarding that shows you 8 AI video demos during the onboarding. Free trials were costing me $$$. imo this is better, you see exactly what you're getting. Here are the 8 videos, 8 use cases 🧵👇

Create multi-angle actors from a single selfie. They look like you. Sound like you. And they can wear clothes, demo products, present, even interact with physical objects. Like real UGC actors.

Paul Grisel

Tue Nov 25 15:58:01

Who invented deep learning? https://t.co/aJfpTfQSNk Timeline: 1965: first deep learning (Ivakhnenko & Lapa, 8 layers by 1971) 1967-68: end-to-end deep learning by stochastic gradient descent (Amari, 5 layers) 1970: backpropagation (Linnainmaa, 1970) for NNs (Werbos, 1982): rarely >5 layers (1980s) 1991-93: unsupervised pre-training for deep NNs (Schmidhuber, others, 100+ layers) 1991-2015: deep residual learning (Hochreiter, others, 100+ layers) 1996-: deep learning without gradients (100+ layers) ★ 1965: first deep learning (Ivakhnenko & Lapa, 8 layers by 1971) Successful learning in deep feedforward network architectures started in 1965 in Ukraine (back then the USSR) when Alexey Ivakhnenko & Valentin Lapa introduced the first general, working learning algorithms for deep multi-layer perceptrons (MLPs) or feedforward NNs (FNNs) with many hidden layers (already containing the now popular multiplicative gates) [DEEP1-2][DL1-2][DLH][WHO5]. A paper of 1971 [DEEP2] described a deep learning net with 8 layers, trained by their highly cited method which was still popular in the new millennium [DL2]. Given a training set of input vectors with corresponding target output vectors, layers are incrementally grown and trained by regression analysis. In a fine-tuning phase, superfluous hidden units are pruned through regularisation with the help of a separate validation set [DEEP2][DLH]. This simplifies the net and improves its generalization on unseen test data. The numbers of layers and units per layer are learned in problem-dependent fashion. This is a powerful generalization of the original 2-layer Gauss-Legendre NN (1795-1805) [DLH]. That is, Ivakhnenko and colleagues had connectionism with adaptive hidden layers two decades before the name "connectionism" became popular in the 1980s. Like later deep NNs, his nets learned to create hierarchical, distributed, internal representations of incoming data. He did not call them deep learning NNs, but that's what they were. Ivakhnenko's pioneering work was repeatedly plagiarized by researchers who went on to share a Turing award [DLP][NOB]. For example, the depth of Ivakhnenko's 1971 layer-wise training [DEEP2] was comparable to the depth of Hinton's and Bengio's 2006 layer-wise training published 35 years later [UN4][UN5] without comparison to the original work [NOB] — done when compute was millions of times more expensive. Similarly, LeCun et al. [LEC89] published NN pruning techniques without referring to Ivakhnenko's original work on pruning deep NNs. Even in their much later "surveys" of deep learning [DL3][DL3a], the awardees failed to mention the very origins of deep learning [DLP][NOB]. Ivakhnenko & Lapa also demonstrated that it is possible to learn appropriate weights for hidden units using only locally available information without requiring a biologically implausible backward pass [BP4]. Six decades later, Hinton later attributed this achievement to himself [NOB25a]. How do Ivakhnenko's nets compare to even earlier multilayer feedforward nets without deep learning? In 1958, Frank Rosenblatt studied multilayer perceptrons (MLPs) [R58]. His MLPs had a non-learning first layer with randomized weights and an adaptive output layer. This was not yet deep learning, because only the last layer learned [DL1]. MLPs were also discussed in 1961 by Karl Steinbuch [ST61-95] and Roger David Joseph [R61]. See also Oliver Selfridge's multilayer Pandemonium [SE59] (1959). In 1962, Rosenblatt et al. even wrote about "back-propagating errors" in an MLP with a hidden layer [R62], following Joseph's 1961 preliminary ideas about training hidden units [R61], but Joseph & Rosenblatt had no working deep learning algorithm for deep MLPs. What's now called backpropagation is quite different and was first published in 1970 by Seppo Linnainmaa [BP1-4]. Why did deep learning emerge in the USSR in the mid 1960s? Back then, the country was leading many important fields of science and technology, most notably in space: first satellite (1957), first man-made object on a heavenly body (1959), first man in space (1961), first woman in space (1962), first robot landing on a heavenly body (1965), first robot on another planet (1970). The USSR also detonated the world's biggest bomb ever (1961), and was home of many leading mathematicians, with sufficient funding for blue skies math research whose enormous significance would emerge only several decades later. The other sections mentioned above (1967, 1970, 1991-93, 1991-2015, 1996) are covered by [WHO5]: Who invented deep learning? Technical Note IDSIA-16-25, IDSIA, Nov 2025. SELECTED REFERENCES (many additional references in [WHO5] - see link above): [BP1] S. Linnainmaa. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 1970. See chapters 6-7 and FORTRAN code on pages 58-60. See also BIT 16, 146-160, 1976. Link. The first publication on "modern" backpropagation, also known as the reverse mode of automatic differentiation. [BP4] J. Schmidhuber (2014). Who invented backpropagation? [BPA] H. J. Kelley. Gradient Theory of Optimal Flight Paths. ARS Journal, Vol. 30, No. 10, pp. 947-954, 1960. Precursor of modern backpropagation [BP1-4]. [DEEP1] Ivakhnenko, A. G. and Lapa, V. G. (1965). Cybernetic Predicting Devices. CCM Information Corporation. First working Deep Learners with many layers, learning internal representations. [DEEP1a] Ivakhnenko, Alexey Grigorevich. The group method of data of handling; a rival of the method of stochastic approximation. Soviet Automatic Control 13 (1968): 43-55. [DEEP2] Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man and Cybernetics, (4):364-378. [DL1] J. Schmidhuber, 2015. Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. Got the first Best Paper Award ever issued by the journal Neural Networks, founded in 1988. [DL2] J. Schmidhuber, 2015. Deep Learning. Scholarpedia, 10(11):32832. [DL3] Y. LeCun, Y. Bengio, G. Hinton (2015). Deep Learning. Nature 521, 436-444. A "survey" of deep learning that does not mention the pioneering works of deep learning [DLP][NOB]. [DL3a] Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021. Another "survey" of deep learning that does not mention the pioneering works of deep learning [DLP][NOB]. [DLH] J. Schmidhuber. Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, IDSIA, Lugano, Switzerland, 2022. Preprint arXiv:2212.11279. [DLP] J. Schmidhuber. How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23, Swiss AI Lab IDSIA, 2023. [GD'] C. Lemarechal. Cauchy and the Gradient Method. Doc Math Extra, pp. 251-254, 2012. [GD''] J. Hadamard. Memoire sur le probleme d'analyse relatif a Vequilibre des plaques elastiques encastrees. Memoires presentes par divers savants estrangers à l'Academie des Sciences de l'Institut de France, 33, 1908. [GDa] Y. Z. Tsypkin (1966). Adaptation, training and self-organization automatic control systems, Avtomatika I Telemekhanika, 27, 23-61. On gradient descent-based on-line learning for non-linear systems. [GDb] Y. Z. Tsypkin (1971). Adaptation and Learning in Automatic Systems, Academic Press, 1971. On gradient descent-based on-line learning for non-linear systems. [GD1] S. I. Amari (1967). A theory of adaptive pattern classifier, IEEE Trans, EC-16, 279-307 (Japanese version published in 1965). Probably the first paper on using stochastic gradient descent [STO51-52] for learning in multilayer neural networks (without specifying the specific gradient descent method now known as reverse mode of automatic differentiation or backpropagation [BP1]). [GD2] S. I. Amari (1968). Information Theory—Geometric Theory of Information, Kyoritsu Publ., 1968 (in Japanese, see pages 119-120). Contains computer simulation results for a five layer network (with 2 modifiable layers) which learns internal representations to classify non-linearily separable pattern classes. [GD2a] S. Saito (1967). Master's thesis, Graduate School of Engineering, Kyushu University, Japan. Implementation of Amari's 1967 stochastic gradient descent method for multilayer perceptrons [GD1]. (S. Amari, personal communication, 2021.) [NOB] J. Schmidhuber. A Nobel Prize for Plagiarism. Technical Report IDSIA-24-24 (7 Dec 2024, updated Oct 2025). [NOB25a] G. Hinton. Nobel Lecture: Boltzmann machines. Rev. Mod. Phys. 97, 030502, 25 August 2025. One of the many problematic statements in this lecture is this: "Boltzmann machines are no longer used, but they were historically important [...] In the 1980s, they demonstrated that it was possible to learn appropriate weights for hidden neurons using only locally available information without requiring a biologically implausible backward pass." Again, Hinton fails to mention Ivakhnenko who had shown this 2 decades earlier in the 1960s [DEEP1-2]. [WHO5] J. Schmidhuber (AI Blog, 2025). Who invented deep learning? Technical Note IDSIA-16-25, IDSIA, Nov 2025. See link above.

Invented principles of meta-learning (1987), GANs (1990), Transformers (1991), very deep learning (1991), etc. Our AI is used many billions of times every day.

Jürgen Schmidhuber

Tue Nov 25 15:55:42

Black Forest Labs 发布 FLUX.2，依旧开源！支持文生图、多图参考以及图像编辑，文本生成和提示词遵循能力大幅提高。具体的模型能力有： - 最多同时参考 10 张图片，提供最佳一致性。 - 更丰富的细节、更清晰的纹理和更稳定的光线。 - 复杂排版、信息图、表情包和用户界面的文字渲染 - 在遵循复杂、结构化指令方面得到改进 - 现实世界知识、光照和空间逻辑方面显著更有根据 - 支持高达 4MP 分辨率的图像编辑这次发布了四个模型版本： FLUX.2 [pro]：与最优秀的封闭模型相媲美的最先进图像质量，在提示遵从性和视觉逼真度方面与其他模型相当，同时生成图像更快且成本更低。速度与质量两者兼得。 FLUX.2 [flex]：可控制模型参数，例如步数和引导强度，让开发者对质量、提示遵从性与速度拥有完全控制。该模型在渲染文本和细节方面表现出色。 FLUX.2 [dev]：32B 开放权重模型，源自 FLUX.2 基础模型。当前最强大的开源图像生成与编辑模型，将文本到图像合成与多输入图像的图像编辑结合在单一模型中。 FLUX.2 [klein]（即将推出）：开源，Apache 2.0 许可证模型，从 FLUX.2 基础模型通过蒸馏得到的同尺寸模型。比同等尺寸且从头训练的可比模型更强大且更利于开发者使用。 FLUX.2 - VAE：一种新的变分自编码器，用于潜在表示，在可学习性、质量和压缩率之间提供优化的权衡。

多图参考和图像编辑

歸藏(guizang.ai)

Tue Nov 25 15:55:16

I think everybody that knows me knows that I am really not a fan of Hegseth, but - having read about the new DoW/DoD procurement architecture, it might work better than the old one: https://t.co/7pZUkbjFzF

Choose disfavour where obedience does not bring honour. I do math. And was once asked by R. Morris Sr. : "For whom?" @halvarflake@mastodon.social

Halvar Flake

Tue Nov 25 15:55:12

RT @bclavie: Do you love data? Is the most exciting release of the last 2 weeks @pleiasfr’s SYNTH? Then we should talk. We’re looking for…

Artisanal baker of reasoning models @pleiasfr

Alexander Doria

Tue Nov 25 15:50:55

探索

最新在前，按卡片方式浏览线程

探索

最新在前，按卡片方式浏览线程

A simple, lightweight PowerShell script to remove pre-installed apps, disable telemetry.

✨ I built a brand new onboarding that shows you 8 AI video demos during the onboarding. Free trials were costing me $$$. imo this is better, you see exactly what you're getting. Here are the 8 videos, 8 use cases 🧵👇

I think everybody that knows me knows that I am really not a fan of Hegseth, but - having read about the new DoW/DoD procurement architecture, it might work better than the old one: https://t.co/7pZUkbjFzF

RT @bclavie: Do you love data? Is the most exciting release of the last 2 weeks @pleiasfr’s SYNTH? Then we should talk. We’re looking for…

探索

最新在前，按卡片方式浏览线程

探索

最新在前，按卡片方式浏览线程

A simple, lightweight PowerShell script to remove pre-installed apps, disable telemetry.

✨ I built a brand new onboarding that shows you 8 AI video demos during the onboarding. Free trials were costing me $$$. imo this is better, you see exactly what you're getting. Here are the 8 videos, 8 use cases 🧵👇

I think everybody that knows me knows that I am really not a fan of Hegseth, *but* - having read about the new DoW/DoD procurement architecture, it might work better than the old one: https://t.co/7pZUkbjFzF

RT @bclavie: Do you love data? Is the most exciting release of the last 2 weeks @pleiasfr’s SYNTH? Then we should talk. We’re looking for…

I think everybody that knows me knows that I am really not a fan of Hegseth, but - having read about the new DoW/DoD procurement architecture, it might work better than the old one: https://t.co/7pZUkbjFzF