LearningMay 28, 2026 10 min read

Speeding Up Text Generation

Nemotron-Labs Diffusion introduces a new approach to text generation, offering significant runtime performance benefits

Introduction to Nemotron-Labs Diffusion

Nemotron-Labs Diffusion is a new family of language models that generates text in a different way. Unlike traditional large language models (LLMs) that generate text one token at a time, Nemotron-Labs Diffusion generates multiple tokens in parallel and then refines them in multiple steps. This approach offers significant runtime performance benefits and can revise generated tokens, making it more suitable for revising existing text.

How Nemotron-Labs Diffusion Works

The Nemotron-Labs Diffusion family includes text models at 3B, 8B, and 14B scales, as well as a 8B scale vision-language model (VLM). These models support three generation modes: autoregressive mode, diffusion mode, and self-speculation mode. Autoregressive mode runs like a standard left-to-right LLM, while diffusion mode generates block by block, gradually generating tokens over multiple steps. Self-speculation mode uses diffusion to draft multiple candidate tokens, then uses autoregressive decoding to verify them. For developers interested in exploring AI art generation, our Complete Midjourney Mastery playlist provides a comprehensive guide to mastering AI art generation.

Source

Original reporting by Hugging Face.

Original source

Read the full article on Hugging Face →

Share𝕏 Twitter in LinkedIn