Rubin Squirrel 0.5B
A lightweight 500 million parameter model built for research and education. Ideal for learning the fundamentals of training, fine-tuning, and deployment, it provides fast iteration speed and serves as a perfect entry point for exploring transformer-based architectures without heavy compute requirements.
Rubin Omni 2B
A medium-scale foundation model trained with pretraining, supervised fine-tuning (SFT), and GRPO (Generative Reinforcement with Preference Optimization). Designed for practical applications, it offers strong generalization while remaining efficient, making it a balanced choice for academic and applied research.
Rubin Omega 8B
A large-scale model featuring Latent Head Attention, enabling improved representation learning and context management. This scaled version is optimized for both reasoning and generation tasks, delivering robust performance in complex workloads while remaining manageable on modern GPU clusters.
Rubin Dragon 24B [8x 3B]
Our flagship model built with a Mixture of Experts (MoE) architecture combined with the Deepseek MLA and DSA. leading to more coherent, efficient, and explainable outputs. It scales to enterprise-grade workloads, supporting advanced research, production deployments, and cutting-edge applications.