Wain AI/Tech Blog

Daily updates on AI news and trends worldwide

広告
広告スペース(728x90)

xAI Announces High-Performance AI Model 'Grok 4' - Achieves Industry-Leading Results Across Multiple Benchmarks

xAI's Grok 4, announced on July 9, 2025, demonstrates superior performance over competing models through scaled reinforcement learning and native tool use capabilities.

xAI Announces High-Performance AI Model 'Grok 4' - Achieves Industry-Leading Results Across Multiple Benchmarks

On July 9, 2025, xAI announced its latest AI models, “Grok 4” and the high-performance variant “Grok 4 Heavy”1. The announcement highlights scaled reinforcement learning, native tool use capabilities, and industry-leading achievements across multiple academic benchmarks.

Particularly impressive is Grok 4 Heavy’s achievement of 44.4% on the “Humanity’s Last Exam” benchmark when using tools1. Additionally, it scored 15.9% on the ARC-AGI V2 benchmark, nearly double Claude Opus 4’s score of approximately 8.6%1.

Performance Improvements Through Scaled Reinforcement Learning

xAI leveraged its “Colossus” cluster consisting of 200,000 GPUs to conduct reinforcement learning training at an unprecedented scale1. This enabled successful refinement of Grok 4’s reasoning capabilities at pretraining scale.

Technical Improvements

The training process achieved a 6x improvement in computational efficiency through innovations across the entire stack1. The company also conducted a massive data collection effort, significantly expanding verifiable training data from primarily math and coding domains to many other areas.

Integration of Native Tool Use

Grok 4 has been trained with reinforcement learning to use tools such as code interpreters and web browsing1. This allows the model to augment its thinking with tools in situations that typically challenge large language models.

Advanced Search Capabilities Within X

As a notable feature, Grok 4 can use powerful tools to search deeply within X (formerly Twitter)1. It employs advanced keyword and semantic search tools and can even view media to improve response quality.

Grok 4 Heavy and Parallel Test-Time Compute

xAI has made further progress on parallel test-time compute, which allows consideration of multiple hypotheses simultaneously1. Grok 4 Heavy, leveraging this technology, establishes new standards for performance and reliability.

The system employs multiple agents working on problems simultaneously, then comparing their work “like a study group” to find the best answer2.

Benchmark Results and Industry Positioning

According to Artificial Analysis, Grok 4 achieves an Intelligence Index of 73, surpassing OpenAI o3’s 70 and Google Gemini 2.5 Pro’s 702. Key benchmark results include:

  • Humanity’s Last Exam: Grok 4 Heavy achieves 44.4% (with tools), significantly outperforming Gemini 2.5 Pro’s 26.9%2
  • ARC-AGI V2: Sets a new record with 15.9%, approximately double the next best commercial AI model1
  • USAMO 2025: Tops the leaderboard with 61.9%1
  • AIME 2025: Excellent performance in competitive mathematics1

API and Subscription Plans

Grok 4 is available to developers as an API with a 256k context window and multimodal understanding capabilities1. The API ensures enterprise-grade security and compliance, including SOC 2 Type 2, GDPR, and CCPA certifications.

For subscription plans, a new $300/month “SuperGrok Heavy” plan has been introduced, providing early access to Grok 4 Heavy and priority access to new features2. Regular Grok 4 is available to SuperGrok and Premium+ subscribers, as well as through the xAI API1.

Future Roadmap

xAI has announced plans to continue scaling reinforcement learning to unprecedented levels, building on Grok 4’s advancements1. The company plans to expand from verifiable rewards in controlled domains to tackling complex real-world problems where models can learn and adapt in dynamic environments.

The company has also revealed plans to release an AI coding model in August 2025, a multimodal agent in September, and a video generation model in October2.

For those interested in learning more about xAI’s Grok 4 technology, the official announcement page provides detailed benchmark results and demonstration videos. For developers, the xAI API documentation offers integration methods and usage examples.

Sources

  1. Grok 4 | xAI - xAI official announcement
  2. Elon Musk’s xAI launches Grok 4 alongside a $300 monthly subscription - TechCrunch
  3. xAI launches Grok 4 with new $300/month SuperGrok Heavy subscription - Teslarati