xAI has officially released Grok 4.1. This version is now available to all users on grok.com, the X platform, as well as iOS and Android apps, including free users, and is enabled by default in Auto mode.
Elon Musk, founder of xAI, stated that users will “noticeably feel improvements in speed and quality.” Unlike previous updates that focused on computing power or scale, Grok 4.1 emphasizes three intuitive yet highly challenging directions: faster responses, higher factual accuracy, and a more natural and personalized conversational experience.
Performance Improvements: Fewer Hallucinations, Higher Fact Accuracy, Stronger Style Control
Grok 4.1 performs exceptionally well in information query tests. Official data shows: Grok 4.1’s hallucination rate has dropped from 12.09% to 4.22%, a reduction of nearly threefold; FActScore has improved from 9.89% to 2.97%, also showing significant enhancement. Against the backdrop of widespread factual instability in current large models, this represents a genuine structural upgrade.
xAI stated that the performance improvement of Grok 4.1 is due to its reinforcement learning infrastructure and new reward model system: Grok 4.1 uses a “cutting-edge reasoning model” as the reward model, enabling the model to self-evaluate and iterate quickly. This means training is no longer overly reliant on large-scale manual annotation, and also makes style, tone, and collaborative capabilities more controllable.
Grok 4.1 achieved a blind preference rate of 64.78% in silent testing
In the most recent round of silent testing (from November 1 to 14), Grok 4.1 achieved a blind preference rate of 64.78%, significantly higher than the previous version.
Grok 4.1’s performance on the LMSYS Arena
Grok 4.1 has shown a leap in performance on the international blind testing platform LMSYS Arena. In the latest evaluation round, Grok 4.1’s Thinking mode (code-named quasarflux) achieved an Elo rating of 1483 (Elo rating system, used to measure the relative strength of models in blind test battles), ranking first among all publicly available models; its non-reasoning mode also reached 1465 Elo, ranking second. This achievement is rare in itself—it outperforms many other models that use full reasoning configurations, even without employing a chain-of-thought approach.
In comparison, the previous generation Grok 4 was ranked 33rd overall. Now, Grok 4.1 has not only jumped a rank but also signifies that its foundational conversational quality and comprehensive capabilities have steadily entered the industry’s top tier.