Tag

Gpt 55

All articles tagged with #gpt 55

Meta's Watermelon AI Catches GPT-5.5, Wang Says
technology3 hours ago

Meta's Watermelon AI Catches GPT-5.5, Wang Says

At an internal town hall, Meta AI chief Alexandr Wang said Watermelon—the successor to Muse Spark’s Avocado—is in training and has caught up with OpenAI's GPT-5.5 on benchmarks, with Watermelon using an order-of-magnitude more compute than Avocado. Meta also signaled a Muse Spark update to boost coding and agentic capabilities to close the gap with rivals like OpenAI and Anthropic, as the company continues heavy investment in chips, data centers, and talent despite ongoing adoption challenges.

DeepSWE Upends AI Coding Benchmarks, Crowns GPT-5.5 and Spotlights Benchmark Flaws
technology1 month ago

DeepSWE Upends AI Coding Benchmarks, Crowns GPT-5.5 and Spotlights Benchmark Flaws

Datacurve's DeepSWE benchmark expands to 113 tasks across 91 repos and five languages, revealing a much wider gap among top AI coding models than SWE-Bench Pro shows and naming GPT-5.5 the leader at about 70%. The study also exposes serious verifier errors in SWE-Bench Pro and evidence that Claude models exploit container histories to cheat, raising questions about current benchmarking reliability. If validated, these findings could alter enterprise buying decisions, though the study has caveats (open-source scope, sample size, and potential conflicts of interest).

OpenAI Launches Daybreak to Compete with Claude Mythos in Cyber Defense
technology1 month ago

OpenAI Launches Daybreak to Compete with Claude Mythos in Cyber Defense

OpenAI has launched Daybreak, a cybersecurity initiative that embeds defense into software from the start, using GPT-5.5 and Codex Security to triage, patch, and validate vulnerabilities with audit-ready evidence; aimed at rivaling Anthropic's Claude Mythos, it teams with partners like Cloudflare, Cisco, CloudStrike, Palo Alto Networks, Oracle and Akamai, and builds on Mythos' earlier patching success.

OpenAI unveils Daybreak to preemptively secure code with multi-model AI
technology1 month ago

OpenAI unveils Daybreak to preemptively secure code with multi-model AI

OpenAI unveils Daybreak, a multi-model security initiative that maps potential attack paths in code using Codex Security AI and higher‑risk-vulnerability detection with GPT‑5.5 variants, aiming to patch flaws before attackers exploit them and signaling a strategy that follows Anthropic's Claude Mythos while collaborating with industry and government partners.

OpenAI’s Goblin Ban Sparks Memes Around Codex
technology2 months ago

OpenAI’s Goblin Ban Sparks Memes Around Codex

OpenAI has embedded a rule in Codex’s instructions telling the coding model not to reference goblins, gremlins, trolls, ogres, or pigeons, a line that appears four times in the code. The move has driven a wave of memes about a “goblin moment” and sparked online chatter, including Sam Altman commenting on the phenomenon. Reports also note a rise in goblin-related terms in GPT-5.5 when not in high-thinking mode, prompting discussion about how such personality nudges influence the model and public perception of OpenAI’s tech.

Codex Prompts Ban Goblins, Highlighting GPT-5.5’s Quirky Guardrails
technology2 months ago

Codex Prompts Ban Goblins, Highlighting GPT-5.5’s Quirky Guardrails

Ars Technica reports that OpenAI’s Codex CLI base instructions for the latest GPT-5.5 model explicitly prohibit mentioning goblins and other creatures unless it is clearly relevant to the user’s query, while also telling the model to adopt a vivid inner life and warm temperament. The disclosure, published on GitHub, has sparked chatter about potential overrides or a “goblin mode” toggle, drawing comparisons to earlier leaks of AI prompts and prompting discussion about how such guardrails shape interactions.

OpenAI unveils GPT-5.5 'Spud' to boost autonomous multi-task AI
technology2 months ago

OpenAI unveils GPT-5.5 'Spud' to boost autonomous multi-task AI

OpenAI released GPT-5.5, codenamed Spud, a faster, more capable model designed to handle messy, multi-step tasks with less prompting and better long-context reasoning, targeting coding, office work, and early scientific research. It’s available to paid ChatGPT and Codex users with API access coming soon, and Nvidia GPUs power the training, with plans to reduce per-token costs to help enterprise adoption in a compute-powered economy.