LogoThread Easy
  • Explorer
  • Composer un thread
LogoThread Easy

Votre partenaire tout-en-un pour les threads Twitter

© 2025 Thread Easy All Rights Reserved.

Explorer

Newest first — browse tweet threads

Keep on to blur preview images; turn off to show them clearly

You can't just tell your model "either you learn to write Shakespearean poetry and algebraic topology, or ... OR ... you mess up Q(Z|X) to make it dumb"

You can't just tell your model "either you learn to write Shakespearean poetry and algebraic topology, or ... OR ... you mess up Q(Z|X) to make it dumb"

Research Scientist @meta (FAIR), Prof. @Unige_en, co-founder @nc_shape. I like reality.

avatar for François Fleuret
François Fleuret
Thu Nov 13 05:34:46
Black is vanilla 1.5B 28 layers, blue is the free transformer from the arxiv paper, with a +3.5% overhead in compute and memory, red and orange are two variants of v2 with a +1.3% overhead and far simpler code.

Black is vanilla 1.5B 28 layers, blue is the free transformer from the arxiv paper, with a +3.5% overhead in compute and memory, red and orange are two variants of v2 with a +1.3% overhead and far simpler code.

Research Scientist @meta (FAIR), Prof. @Unige_en, co-founder @nc_shape. I like reality.

avatar for François Fleuret
François Fleuret
Wed Nov 12 19:02:45
Weirdest graph ever, but this thing is robust. The recovery on Human Eval + is spectacular.

Anway version +1 already running, we'll see.

Weirdest graph ever, but this thing is robust. The recovery on Human Eval + is spectacular. Anway version +1 already running, we'll see.

Research Scientist @meta (FAIR), Prof. @Unige_en, co-founder @nc_shape. I like reality.

avatar for François Fleuret
François Fleuret
Wed Nov 12 07:01:00
Today I decided to replace the KL penalty with some yolo crazy approach, which worked. When looking at it closely it is the standard KL penalty with a minor but very important change that assures a property I tried to obtain months ago without success. Today is a good day.

Today I decided to replace the KL penalty with some yolo crazy approach, which worked. When looking at it closely it is the standard KL penalty with a minor but very important change that assures a property I tried to obtain months ago without success. Today is a good day.

Research Scientist @meta (FAIR), Prof. @Unige_en, co-founder @nc_shape. I like reality.

avatar for François Fleuret
François Fleuret
Tue Nov 11 21:53:16
Today I decided to replace the KL penalty with some yolo crazy approach, which worked. When looking at it closely it is the standard KL penalty with a minor but very important change that assures a property I tried to obtain months ago without success. Today is a good day.

Today I decided to replace the KL penalty with some yolo crazy approach, which worked. When looking at it closely it is the standard KL penalty with a minor but very important change that assures a property I tried to obtain months ago without success. Today is a good day.

Research Scientist @meta (FAIR), Prof. @Unige_en, co-founder @nc_shape. I like reality.

avatar for François Fleuret
François Fleuret
Tue Nov 11 21:53:16
I take this back, it is possible that it will eventually get better and the loss was going down quickly initially because the KL was not weighted enough. WE HAVE TO BE PATIENT.

I take this back, it is possible that it will eventually get better and the loss was going down quickly initially because the KL was not weighted enough. WE HAVE TO BE PATIENT.

Research Scientist @meta (FAIR), Prof. @Unige_en, co-founder @nc_shape. I like reality.

avatar for François Fleuret
François Fleuret
Tue Nov 11 20:21:20
  • Previous
  • 1
  • 2
  • 3
  • More pages
  • 17
  • 18
  • Next