RT @FeitengLi: Facing the wall #VoxCPM 1.5B TTS model is excellent, supporting 44.1kHz audio output with commercial-grade sound quality; The overall model architecture also references ByteDance's DiTAR, and is a standard GPT + FlowMatching/DiT variant, using continuous codec representations to compress to as low as 6…
Loading thread detail
Fetching the original tweets from X for a clean reading view.
Hang tight—this usually only takes a few seconds.