GPT-5.4 Is Now Better at Your Computer Than You Are
OpenAI shipped GPT-5.4 this week with a 1-million-token context window and the ability to autonomously run multi-step workflows across real software environments. That alone is a big deal.
But here's the part that got my attention: they tested it on OSWorld-V, a benchmark that simulates actual desktop tasks -- the kind of stuff you do in a workday. GPT-5.4 scored 75%. The human baseline is 72.4%.
Think about what that means. Not "AI wrote some code" or "AI summarized a document." AI sat down at a virtual computer and outperformed the average person at getting real work done. That's a different category of capability.
I'm not saying panic. I'm saying pay attention. The gap between "AI can help with tasks" and "AI can do tasks" just got a lot smaller.
Google Went Open Source With Gemma 4
Google dropped Gemma 4 this week under Apache 2.0. These are open models built specifically for reasoning and agentic workflows, and they're free for anyone to use, modify, or build on.
Why does this matter? Because open models change who gets to play. When Google puts serious reasoning capability into the open, small teams and solo developers get access to the same kind of infrastructure that used to be locked behind expensive APIs.
Google's pitch is "best intelligence per parameter" and honestly, if that holds up in the wild, Gemma 4 could become a go-to for anybody building agents without wanting to hand every inference dollar to OpenAI or Anthropic.
This is a smart move by Google. They're not winning the closed model race right now, so they're shifting the battlefield to open. That strategy worked for Linux. It works for Kubernetes. It might just work here too.
AI Is Hunting Zero-Days and Banks Are Getting Nervous
Anthropic previewed Claude Mythos this week, a model built specifically for cybersecurity. It's already found thousands of previously unknown zero-day vulnerabilities across major systems.
Let that sit for a second. A model trained to find security holes found thousands of them that humans hadn't caught. That's both impressive and genuinely unsettling, depending on who gets access to it.
And speaking of unsettling, the Bank of England came out this week warning financial executives about AI risk to the banking system. They're worried about a model sophisticated enough to probe financial infrastructure. That's not a theoretical concern anymore.
The cybersecurity angle is where AI gets complicated fast. The same capability that finds vulnerabilities to patch them can find vulnerabilities to exploit them. The difference is who's holding the keys. Right now, that's a question nobody has a clean answer to.
Keep your eyes on how Anthropic handles access controls for Mythos. That's going to tell us a lot about where this is all headed.