What Did OpenAI Do This Week?

Sep 15, 2024

∙ Paid

OPENAI O1; PARADIGM SHIFT OR NAR?

The long wait is over. OpenAI has finally dropped its new AI model, once internally dubbed ‘Strawberry.’ Many are hyping it up, tossing around words like ‘paradigm shift.’ But is the buzz justified? What’s really going to change? If you’re in the legal, healthcare, and scientific sectors you’ll want to subscribe now…

The o1 series of AI models are the first OpenAI models trained with reinforcement learning (RL). In other words, they have been designed to spend more time computing the answer before responding to user queries. The model utilises an internal "chain of thought" process to reason through problems step-by-step. o1-preview and o1-mini are new large language models trained with reinforcement learning to perform complex reasoning tasks. The o1 models excel at reasoning tasks in science, coding, and math compared to previous models, whereas o1 mini is the cost-efficient version accessible for developers looking to leverage advanced reasoning without incurring the chonky costs of the full model. Interestingly, the series has been labelled as a ‘preview’ from the outset (hello, PR). Per Sam Altman; ‘o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.’ Greg Brockman, Open AI’s President and Co-founder explained its tech significance: “This is a new paradigm with vast opportunity… One way to think about this is that our models do System I thinking, while chains of thought unlock System II thinking. People have discovered a while ago that prompting the model to “think step by step” boosts performance.”

To highlight the reasoning improvement over GPT-4o, the company tested its models on a diverse set of human exams and ML benchmarks. OpenAI shared evals for this unfinalised o1 model to show the world that this isn’t a one-off improvement:

o1 significantly outperforms GPT-4o on the vast majority of reasoning-heavy tasks. o1 also ranks in the 89th percentile on competitive programming questions, placing it among the top 500 students in the US in a qualifier for the USA Math Olympiad, and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems:

OpenAI acknowledge that the model provides new safety opportunities which they are actively exploring, which include reliability, hallucinations, and robustness to adversarial attackers. OpenAI saw an uplift in safety metrics by letting the model reason about policies via chain of thought. Could agentic AI bots use such a model to help people not believe conspiracy theories (or the reverse…)?

ChatGPT Plus and Team users were able to access o1 models in ChatGPT from 12th September. Both o1-preview and o1-mini can be selected manually in the model picker, and at launch, weekly rate limits are 30 messages for o1-preview and 50 for o1-mini. OpenAI stated that they’re working to increase those rates and enable ChatGPT to automatically choose the right model for a given prompt.

Subscribe now to discover the actions you need to take as a result of this new model and explore 20+ essential stories impacting you and your business. Stay ahead—tap into the trends shaping the future ⬇

Keep reading with a 7-day free trial

Subscribe to What Did OpenAI Do This Week? to keep reading this post and get 7 days of free access to the full post archives.