Explorer
Composer un thread

Thread Easy

Votre partenaire tout-en-un pour les threads Twitter

© 2025 Thread Easy All Rights Reserved.

Explorer

Newest first — browse tweet threads

Author handle

From date

To date

Blur thumbnails

Keep on to blur preview images; turn off to show them clearly

It’s AI French week: after our small but very thin bread, some sizable cheese. And very cool results on eval (de-)contamination.

It’s AI French week: after our small but very thin bread, some sizable cheese. And very cool results on eval (de-)contamination.

Reasoning models have come! Co-founder @pleiasfr

Alexander Doria

Wed Nov 12 22:07:43

perfect summary. so much to do in the space of actual language modeling and yet we’re just building on top a handful of recipes.

perfect summary. so much to do in the space of actual language modeling and yet we’re just building on top a handful of recipes.

Reasoning models have come! Co-founder @pleiasfr

Alexander Doria

Wed Nov 12 22:05:48

This is the man who coined "Baguettotron" (and very topically do great work on synth data). Deserves a few more follow.

This is the man who coined "Baguettotron" (and very topically do great work on synth data). Deserves a few more follow.

Where it all started

Alexander Doria

Wed Nov 12 17:57:37

Since the embargo is now over, happy to share the slides of the first ever presentation of Baguettotron at @EPFL and a few additional thoughts beyond the blogpost.

As many noticed we took an inspiration from the Physics of Language Model which coined the expression "synthetic playground" and promoted the use of synthetic data to design systematic "controlled experiments", bringing in effect LLM research closer to physics than to the current very empirical approach to data.

The conference includes some very early controlled evaluations favoring deep layer architecture. We believe that depth scaling benefit the most from dense reasoning traces, likely allowing for a more optimal combinatory process across layers at inference time.

A key topic of discussion has been the precocity of reasoning signals with SYNTH and that prompted me to retro-actively benchmark the 150 checkpoints of Baguettotron. I must say I was surprised by the overall results suggesting that the model is non-random in MMLU with a few billion tokens in (and @mkurman88 is now showing it might happen much earlier than that)

I really hope to see this dataset powering more intriguing discoveries on LLM training over the next few months, as we have now the ability to run full training not just ablations with very few tokens and parameters.

Since the embargo is now over, happy to share the slides of the first ever presentation of Baguettotron at @EPFL and a few additional thoughts beyond the blogpost. As many noticed we took an inspiration from the Physics of Language Model which coined the expression "synthetic playground" and promoted the use of synthetic data to design systematic "controlled experiments", bringing in effect LLM research closer to physics than to the current very empirical approach to data. The conference includes some very early controlled evaluations favoring deep layer architecture. We believe that depth scaling benefit the most from dense reasoning traces, likely allowing for a more optimal combinatory process across layers at inference time. A key topic of discussion has been the precocity of reasoning signals with SYNTH and that prompted me to retro-actively benchmark the 150 checkpoints of Baguettotron. I must say I was surprised by the overall results suggesting that the model is non-random in MMLU with a few billion tokens in (and @mkurman88 is now showing it might happen much earlier than that) I really hope to see this dataset powering more intriguing discoveries on LLM training over the next few months, as we have now the ability to run full training not just ablations with very few tokens and parameters.

Full presentation: https://t.co/3jPDpExmey

Alexander Doria

Wed Nov 12 16:35:55

i'm so bad at linkedin-style marketing. redoing it:
>**pre-trained** for 1.5k 🤯 (w/ 2/h/h100 rate)

i'm so bad at linkedin-style marketing. redoing it: >pre-trained for 1.5k 🤯 (w/ 2/h/h100 rate)

Reasoning models have come! Co-founder @pleiasfr

Alexander Doria

Wed Nov 12 15:40:32

Seeing now very early linguistic understanding. I’m actually surprised myself: never tested a model below 10b tokens.

Seeing now very early linguistic understanding. I’m actually surprised myself: never tested a model below 10b tokens.

Reasoning models have come! Co-founder @pleiasfr

Alexander Doria

Wed Nov 12 07:08:33

Previous
1
2
3
23
24
Next