Acute stress impairs the brain’s ability to link memories from separate days, reducing inference skills; brain imaging points to disruption in hippocampal memory integration after a mock job interview.
Ben Thompson argues that the AI compute boom is moving from GPU-dominated training to memory-centric, agentic-inference architectures; Cerebras’ wafer-scale chips offer extraordinary on-chip memory and bandwidth for fast answer inference but face cost and scalability limits, while the long-term potential lies in memory hierarchies that support autonomous agentic work, potentially reducing Nvidia’s dominance and reconfiguring compute across training, inference, and even space data centers.
Google is in talks with Marvell Technology to co-develop two new AI chips aimed at running models more efficiently: a memory processing unit designed to work alongside Google’s Tensor Processing Unit and a new TPU built specifically for inference, underscoring rising demand for accelerators that speed up AI workloads.
CoreWeave and Anthropic announced a multi-year agreement to run Anthropic’s Claude models on CoreWeave’s AI cloud, enabling production-scale workloads with high performance and reliability; the deal expands CoreWeave’s ecosystem to include nine of the top AI model providers and will begin rolling out later this year.
At its GTC conference, Nvidia unveiled a product that pairs its chips with Groq’s acceleration tech to boost AI inference speed and cut costs, a move aimed at defending its dominant hardware position as rivals advance. The announcement follows Nvidia’s $20 billion Groq licensing deal and includes NemoClaw to help software companies deploy AI agents, all while supply-chain and manufacturing constraints shape growth prospects for its Rubin and Blackwell line.
NVIDIA and Nebius unveiled a $2 billion partnership to scale Nebius's full-stack AI cloud, enabling more than 5 gigawatts of NVIDIA-powered capacity by 2030 through AI factory design, optimized inference and agentic AI software, and deployment of multi-generation NVIDIA infrastructure across Nebius’s platform.
AI inference costs are emerging as a new component of engineering compensation, with candidates asking about dedicated AI compute budgets and some suggesting token-based pay alongside salary, bonus, and equity. CFOs warn these costs are rising as GPUs and model usage drive productivity, potentially making tokens a practical fourth pillar of tech compensation.
A September 2025 letter of intent for Nvidia to invest up to $100 billion in OpenAI’s AI infrastructure has not materialized; Nvidia’s Jensen Huang says the figure was never a commitment, and Reuters reports OpenAI has been seeking alternatives and citing Nvidia chip speed concerns for inference. OpenAI has since struck deals with Cerebras, Groq, AMD, and Broadcom to diversify compute, while Nvidia emphasizes a large future investment but not at that scale. The news triggered a stock dip for Nvidia and highlighted questions about timing and strategic fit.
OpenAI is reportedly seeking alternatives to Nvidia GPUs due to dissatisfaction with inference performance, citing eight sources. The move follows reports that Nvidia’s plan to invest up to $100 billion in OpenAI has stalled. OpenAI has previously struck deals with AMD and Broadcom to develop custom AI accelerators, signaling a push to diversify hardware sources even as Nvidia remains a major partner.
NVIDIA has announced the BlueField-4 data processor powering a new AI-native storage platform designed to enhance long-term memory and context sharing for large-scale AI inference, boosting performance and efficiency for multi-agent AI systems, with availability expected in late 2026.
Nvidia's CEO Jensen Huang addressed investor concerns about the company's future growth amid new AI model improvement methods like "test-time scaling," which enhances AI inference by adding more compute power. Despite competition from startups developing fast AI inference chips, Huang emphasized Nvidia's strong position in the market, noting that while most workloads currently focus on pretraining, the future will see increased AI inference. He reassured investors of Nvidia's scale and reliability, aligning with industry leaders like Microsoft's Satya Nadella on the significance of these developments.
Intel's CEO, Pat Gelsinger, stated that the entire industry is motivated to eliminate NVIDIA's CUDA dominance in the AI market. Intel believes that the future of AI lies in inference rather than training models and aims to prioritize inference developments. Gelsinger sees NVIDIA's success as a temporary "bubble" and believes that the industry will adopt new training methods to bring a broader set of technologies. Intel praised its OpenVINO model and aims to transition towards next-gen markets. However, Intel needs to do more work to challenge CUDA's dominance, and for now, NVIDIA remains the leader in the AI segment.
Nvidia has unveiled its new AI chip, the GH200, designed for running artificial intelligence models. The chip features a powerful GPU paired with 141GB of cutting-edge memory and a 72-core ARM central processor. Nvidia aims to address the increasing demand for GPU capacity by offering a chip that allows larger AI models to fit on a single system, reducing the need for multiple GPUs. The company expects the new chip to significantly lower the costs of running large language models for inference, making it more accessible for various applications. Nvidia's announcement comes as it faces competition from rivals such as AMD, Google, and Amazon in the AI hardware space.