X (Twitter)

Russian Sberbank has scaled its "bruh just copy DeepSeek" paradigm to Gigachat 3 Ultra 702B-A36B. Very little info, except 5.5T synthetic tokens. > Больше подробностей в хабр статье (to do). lmao. Well, that's something. Not many Western labs have done even this much.

cute dumb 10B 1.8AB DeepSeek too huggingface.co/ai-sage/GigaCh… I think they can improve from that, once they smuggle/rent more compute and polish their post-training. RN just finetuning DS would have been better and they've even invested in interpretability for DS-MoEs. Matter of will

Fil de Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex)

Informations sur l'auteur

Contenu du fil