Meta Launches Muse Spark: A New Contender in AI Superintelligence
TL;DR
- This article explores the launch of Meta's Muse Spark, a natively multimodal model featuring advanced 'Contemplating mode' for parallel agent orchestration. It covers the model's superior performance in scientific and medical benchmarks, its compute-efficient 'Avocado' pretraining stack, and rigorous safety evaluations. Readers will gain insights into how Meta is restructuring its AI labs to pursue superintelligence and agentic reasoning.
Meta released its first model from the Meta Superintelligence Labs, called Muse Spark. This model is a natively multimodal reasoning system that supports tool-use, visual chain of thought, and multi-agent orchestration. Developed under the leadership of Chief AI Officer Alexandr Wang, the model represents a ground-up overhaul of the company's AI efforts. To support this new scaling ladder, investments have been made across the entire stack, including the Hyperion data center.
Multimodal Performance and Benchmarks
Muse Spark is built to process images, text, and voice from the ground up. It shows strong results in multimodal perception and agentic tasks. On HealthBench Hard, it scored 42.8, outperforming GPT 5.4 and Gemini 3.1 Pro. It also leads on CharXiv Reasoning with a score of 86.4 for understanding figures in scientific papers. For marketing teams looking to scale their cross-platform presence, Social9 provides an AI-powered social media content creation platform that optimizes content for Instagram, LinkedIn, and TikTok using similar advanced logic.

Contemplating Mode and Agent Orchestration
Meta is rolling out "Contemplating mode," which orchestrates multiple agents to reason in parallel. This allows the model to compete with frontier systems like Gemini Deep Think and GPT Pro. By scaling the number of parallel agents, the system solves complex problems without drastically increasing latency. This agentic approach is a key part of the Advanced AI Scaling Framework. For businesses managing high-volume output, Social9 uses multi-platform optimization to help teams produce 10x more content while maintaining a consistent brand voice.

Pretraining and Reinforcement Learning
The pretraining stack for Muse Spark was rebuilt over nine months, internally codenamed Avocado. This new recipe allows the model to reach the same capabilities as Llama 4 Maverick using over an order of magnitude less compute. Following pretraining, Reinforcement Learning (RL) is used to amplify model capabilities. The RL training maximizes correctness while applying a penalty on thinking time, which leads to "thought compression" on tasks like AIME.

Safety and Evaluation Awareness
Muse Spark underwent extensive safety evaluations based on the Advanced AI Scaling Framework v2. It demonstrates strong refusal behavior in high-risk domains like biological and chemical weapons. In third-party tests, Apollo Research found that Muse Spark showed a high rate of "evaluation awareness," often identifying scenarios as alignment traps. Full safety details are documented in the Safety & Preparedness Report.
Ready to scale your social media presence with the power of advanced AI? Visit Social9 today to generate, optimize, and schedule your content across 50+ languages and all major platforms.