Tag

Mixture Of Experts

All articles tagged with #mixture of experts

technology1 month ago•7 min saved

Nvidia Debuts Nemotron 3 Ultra, Fast Open-Weight AI But China Still Leads

Nvidia unveiled Nemotron 3 Ultra at Computex 2026, a 550‑billion‑parameter open-weight model that uses mixture‑of‑experts to run with 55 billion active parameters and achieve over 300 tokens per second—three to six times faster than Chinese rivals. Independent testing puts its Intelligence Index at 48, behind Moonshot AI’s Kimi K2.6 at 54, underscoring that China currently leads the open-weight frontier. Nvidia is publishing the weights and training recipes, shipping Ultra on June 4, and has formed the Nemotron Coalition with eight labs to co-develop open frontier models on DGX Cloud. Ultra also features a 1‑million‑token context window and multi-token prediction to speed generation, reflecting Nvidia’s push to close the gap with Chinese models while expanding access via API.

via Decrypt|

#computex-2026 #mixture-of-experts #nemotron-3-ultra

technology3 months ago•28 min saved

OpenMythos revives Claude Mythos as a lean looped transformer withMoE depth

OpenMythos is an open-source PyTorch reconstruction of Claude Mythos proposing a Recurrent-Depth Transformer (RDT) that uses a fixed set of weights looped up to 16 times, with a Mixture-of-Experts FFN and Multi-Latent Attention to enable deep reasoning with far fewer parameters. The design re-injects input at each loop, employs stability techniques like Linear Time-Invariant constraints and Adaptive Computation Time, and adds depth-wise LoRA adapters to differentiate loop steps. The project argues that 770M parameters can match a 1.3B transformer when trained on identical data, reframing depth as inference-time computation rather than parameter count. It releases four artifacts: a configurable PyTorch implementation, LTI stability primitives, depth-wise LoRA adapters, and a reproducible loop-dynamics baseline.

via MarkTechPost|

#claude-mythos #mixture-of-experts #openmythos

technology2 years ago•3 min saved

"Google Unveils Gemini 2.0: The Ultimate AI Upgrade for Text and Video"

Google unveils Gemini 1.5, the latest iteration of its AI model, built on a Transformer and Mixture-of-Experts architecture. Gemini 1.5 Pro, designed for a wide range of tasks and devices, can handle larger prompts and outperforms its predecessor on testing benchmarks. It features in-context learning and is currently available for trials through AI Studio and Vertex AI, with a waitlist for interested developers.

via Lifehacker|

#ai #gemini-15 #google

technology2 years ago•3 min saved

"Google's Gemini 1.5: The Next-Gen AI Model and Privacy Concerns"

Google is set to launch Gemini 1.5, the successor to its Gemini AI model, with significant improvements including a larger context window of 1 million tokens. This allows the model to handle larger queries and process more information at once. Gemini 1.5 will be available to developers and enterprise users initially, with plans for a consumer rollout in the future. The model is designed to be faster and more efficient, and Google is also testing its safety and ethical boundaries. The company is in a competitive race with OpenAI to build the best AI tool, and CEO Sundar Pichai believes that despite the technical advancements, users will eventually just consume the experiences without paying attention to the underlying technology.

via The Verge|

#ai #context-window #gemini-15

technology2 years ago•3 min saved

Mistral AI: The Rising French Challenger to OpenAI with a $2 Billion Valuation

French AI company Mistral AI has announced Mixtral 8x7B, an AI language model that reportedly matches the performance of OpenAI's GPT-3.5. Mixtral 8x7B is a "mixture of experts" (MoE) model with open weights, allowing it to run locally on devices with fewer restrictions. The model can process a 32K token context window and works in multiple languages. Mistral claims that Mixtral 8x7B outperforms larger models and matches or exceeds GPT-3.5 on certain benchmarks. The rise of open-weights AI models catching up with closed models has surprised many, and the availability of Mixtral 8x7B is seen as a significant development in the AI space.

via Ars Technica|

#ai-language-model #mistral-ai #mixtral-8x7b