YouTube video summary

The Fix For AI's Spending Problem Is Not Good For OpenAI And Anthropic

Business

06 Jun 202618 min summaryFrom CNBC

The Fix For AI's Spending Problem Is Not Good For OpenAI And Anthropic

Save to your library

Chat with this summary

Introduction to Model Routing and Cost Efficiency in AI

Corporate America is discovering a new way to buy AI, which involves using models that are good enough for specific tasks, rather than always using the best and most powerful model, resulting in five to ten times better cost efficiency 10s.
The traditional way of buying AI, where companies pay for the best model regardless of the task, is being replaced by a new approach called model routing, where tasks are matched to the most suitable model, with hard problems being sent to top models and easy tasks being sent to cheaper models 1m30s.
This shift is driven by the increasing costs of AI, with some companies blowing through their annual budgets in just a few months, and the realization that cheaper models can be good enough for everyday work, with a batch of output from a top model costing around $25, compared to under $1 for a cheaper model 2m40s.
The model routing approach is taking off, with one company seeing its volume increase fivefold in just six months, and CEOs like Sam Altman acknowledging that the era of picking one model is over, as companies look for more efficient and cost-effective solutions 3m50s.

Impact of Model Routing on AI Providers and Business Models

However, this shift may not be beneficial for companies like OpenAI and Anthropic, which primarily get paid for the hard or sensitive tasks and have built their business models on endless demand and premium prices, as the routing of easy tasks to cheaper models could reduce their revenue 5m10s.
Companies like Cognition, the maker of the coating agent Devin, are developing routing layers to automatically route tasks across models, and are now offering guarantees, such as an AI productivity guarantee, to ensure that customers get value from their AI investments 8m20s.
The goal is to find a middle ground where the cost of running tasks using AI is balanced with the output received, ensuring that spending $5 on a task is acceptable if it yields $20 of output, but not if it costs $500 to achieve the same output 10s.

Balancing Cost and Output in AI Task Execution

Many companies, including Fortune 500 companies, are hesitant to use open-source or cheaper AI models due to stigma, instead opting for more expensive and secure options, even if it means hosting them locally 1m6s.
The trend of numerous good AI models being released, with dozens of options available, allows users to choose models that fit their specific needs and budgets, and companies are starting to think more about mass automation of tasks 2m6s.
The milestone of having more than 50% of tasks kicked off by agents or events, rather than humans, has been reached, and this shift towards bulk task automation requires the ability to support higher volumes and more efficient agent routing 3m30s.

Agent and Model Routing for Task Automation

Agent routing, which involves assigning the best agents to difficult tasks and cheaper agents to less complex ones, is being implemented, along with a guarantee to prove the value of the output and help customers manage their budgets and ROI 4m40s.
Measuring ROI in the AI world is notoriously difficult, and to prevent customers from gaming the system, the focus is on output and productivity rather than activity, similar to how human software engineers are evaluated 7m20s.
Measuring the effectiveness of AI models should be based on the actual engineering effort and the value it brings, rather than just the number of tokens or lines of code produced, as simply spending billions of tokens does not necessarily mean anything productive is being accomplished 10s.
A more effective approach is to measure the capacity and speed of engineers, as well as the number of issues being solved, and to verify the usefulness of the work done by the AI model, which can be achieved through a conservative system that assigns a zero value to sessions where no useful work is done 1m20s.
This approach has been successfully implemented by companies such as Mercedes Benz, which saw a significant return on investment (ROI) when using AI to complete tasks that would have otherwise taken months, with some tasks being completed in just eight days 3m40s.

AI Adoption and Experimentation in Enterprises

The idea that companies need to go through a period of waste to get their employees comfortable with using AI is not universally applicable, and some companies may be at different points in their AI journey, with some still experimenting with AI and others already seeing significant benefits 5m30s.
Model routing, which involves using cheaper models for non-critical workloads, can be an effective way for companies to reduce costs and optimize their AI usage, especially for companies that are still using frontier models, with 95% of enterprises still using these models 8m10s.
The concept of use case saturation, where AI models are used for simple and well-defined tasks, can help companies understand the limitations of their AI models and identify areas where model routing can be applied, such as using cheaper models for tasks like answering simple questions 10m20s.

Model Performance and Cost Optimization

The ability to perform complex tasks such as building a website, fixing bugs, or doing migrations was previously thought to be limited to only the top 1 or 2 models in the industry, but now dozens of models can accomplish these tasks, allowing for optimization on the price performance curve 10s.
As models improve, they can be used for tasks that require the best intelligence, but for boilerplate work, using models that are still good enough for the task can provide five to ten times better cost efficiency 1m20s.
Switching from one model provider to another, such as from Anthropic to a different provider, can be done, but it takes time to learn a new interface and product experience, and the ideal is to have models that can be routed and switched almost invisibly 2m40s.
Having a neutral commit that gives access to all different providers can provide more optionality, and some companies are starting to think about what they want to bet on and commit to 4m10s.

Security and Trust in AI Model Usage

There are concerns about the security of open-source models like DeepSeek, but there are also great American open-source models available, and securing models is about figuring out how to work with systems that humans use, with processes like code review, QA, and beta deployment 6m20s.
Running models on American soil can free up anxiety, and it is possible to have a mix of price performance with American models, and companies can use models that are good enough for their tasks while still having access to the best intelligence 8m30s.
The concept of AI as an employee requires the same guardrails and security guidelines as any other employee, and hyperscalers are hosting many Chinese models, providing optionality, which is a key motivator, especially when considering margins and budget constraints 10s.

Evolution of AI Model Pricing and Market Dynamics

The mix of AI models is shifting, with big labs releasing more powerful but expensive models, while cheaper models are improving, leading to a price frontier curve with multiple points, allowing people to use and mix different models for various tasks 1m5s.
There will always be a place for frontier models, especially for critical and strategically sensitive work, but the growth of model options may lead to commoditization, potentially hurting companies like OpenAI and Anthropic 2m6s.
The spectrum of difficulty in AI tasks ranges from easy to hard, with easier tasks becoming commoditized and cheaper, but the hardest tasks are worth a lot and will continue to be valuable as the frontier of what counts as the hardest task moves 3m40s.
Many enterprises are using AI models for both hard and easy tasks, which may lead to pricing issues, as they are currently priced as if they are being used for every task, and companies like OpenAI and Anthropic may need to adapt to this changing landscape 5m10s.

Future Growth and Market Potential of AI

Despite potential challenges, there is still a long way to go with AI, and the market has significant room to grow, with less than 1% of potential users currently utilizing AI agents, making it likely that the pie will continue to grow for everyone, including companies like OpenAI and Anthropic 8m20s.
Remaining independent is crucial in this context, as it allows for working with various models and providing neutral evaluations, which is essential for the growth and development of the AI industry 10m40s.

Company Independence and Value Alignment in AI

Cognition is considered one of the labs with a great reputation, and its independence is seen as crucial for being value-aligned with its customers, allowing it to provide unbiased advice on spending and model usage 10s.
The importance of independence is emphasized as it enables the company to act as a neutral transformation partner, helping customers optimize their spending and model usage without being influenced by external interests 1m20s.
There have been offers from other labs to acquire or partner with the company, but they have not been considered interesting due to the company's focus on its independent mission and the potential for growth and innovation 2m30s.

Model Routing as a Strategic Trend

The concept of model routing, which allows for the selection of the most suitable model for a specific task, is seen as a growing trend, with companies like Perplexity having already adopted this approach from the beginning 4m10s.
The company believes that providing productivity guarantees and allowing customers to commit to AI without bearing the risk of its performance is a win-win situation for both parties, and it's an area where the company is focusing its efforts 5m40s.

Model Preferences and Usage Patterns

When it comes to personal use of models for coding tasks, the company's preference is split between GPT 5.5 and Opus 4.8, with GPT being better suited for reasoning tasks and Opus for navigating flows and testing 8m20s.
The ratio of usage between GPT and Opus has shifted over time, with Opus being more dominant in the past, but now the usage is roughly 50/50 between the two models 10m10s.
For everyday tasks, the company uses a variety of models and apps, often comparing their outputs and playing them off against each other to get the best results 11m30s.

Overview of AI Models and Industry Players

The conversation starts with a discussion about AI models, including Meta AI, Gemini, Claude, and GPT, which are the primary models used, with the last one being difficult to remember at the moment 10s.
A $10 million guarantee is mentioned, which has generated a significant amount of interest and inbound inquiries from both new and existing customers, indicating a potential sales opportunity 4m42s.

Customer Perspectives and Industry Insights from Cisco

The view from a company doing the routing is presented, followed by an introduction to Jeetu Patel, Cisco's President and Chief Product Officer, who provides insight into how thousands of companies are buying and deploying AI 6m6s.
Jeetu Patel discusses Cisco Live, an annual event in Las Vegas where 20,000 customers gather to learn about the company's roadmap and future plans, including the integration of AI and infrastructure 8m20s.
The main concerns of customers at Cisco Live are discussed, including the constraint on infrastructure, trust in delegating tasks to AI agents, and tokenomics, which refers to the pricing and cost of using AI models 10m30s.
An example is given to illustrate the potential cost of using AI models, with an estimated $200 per week per employee, which can quickly add up to significant amounts, such as $400 million for a company with 40,000 employees or $900 million for a company with 90,000 employees 12m40s.

Challenges and Costs of AI Token Usage

The high cost of token usage is acknowledged, with Jeetu Patel confirming that it is a widespread issue, not just within Cisco, but across many companies 16m0s.
The conversation starts with the topic of budgeting for engineering, and it is mentioned that no one budgeted enough for the skyrocketing costs, which is a good thing because it means people are using the technology 10s.
There are three phases to get familiar with a new technology: getting familiar, getting good, and getting efficient, and a certain amount of wastage is necessary to get staff comfortable with it 42s.
The concept of token economics and pricing was not a thing last year, but with the introduction of agents and recursive self-learning loops, the equation has changed, and companies need to focus on tokenomics to monitor and control agent behavior 2m6s.
Cisco has developed a tool to monitor and control agent behavior, and the company has had to adjust its budget to accommodate the high costs of token usage, deprioritizing costs elsewhere and putting them into tokens 4m30s.
The company has had to make tough decisions, including layoffs, but the restructuring has been focused on moving resources to important areas such as silicon and optics, rather than making way for token usage 8m10s.

Infrastructure and Network Impacts of AI Agents

The current networking supercycle, with its increased demand for bandwidth, is also having an impact on the company's decisions and investments, and the introduction of desktop computing is bringing AI closer to users 10m40s.
Agents are more consumptive in network bandwidth than humans, with an agent requiring 450% more network bandwidth to conduct the same task as a human, making it more expensive and necessitating accommodations for increased network capacity 10s.
To address this issue, companies like Cisco are exploring ways to monitor waste and become more efficient in their token usage, including model routing, which can significantly reduce costs, such as from $25 to under $1 for the same output 2m6s.

Model Routing and Cost Reduction Strategies

Model routing involves using an intelligent routing layer to direct tasks to the most cost-effective models, with some companies opting to develop this capability in-house, while others may rely on third-party providers, and this layer can help reduce token costs by up to 95% 2m6s.
Companies are becoming more comfortable with using smaller, more efficient models, and the adoption of intelligent model routing is expected to become more prominent in architectures, potentially impacting the demand for premium models from providers like OpenAI and Anthropic 2m6s.
The decreasing cost per token can lead to increased usage, but it also raises questions about the value commensurate with the cost, and if the price goes down, it may not necessarily be a problem, but rather an opportunity for more widespread adoption, as the industry navigates the challenges of token costs and value generation 2m6s.

AI Value and ROI Challenges

The AI trade has been experiencing skepticism regarding return on investment and value, with companies scrambling to demonstrate the value of their products, and this issue is not new, but rather one that has been present in the industry for some time 10s.
To show growth, AI companies need to demonstrate a persistent demand signal, where people continue to use their products because they provide value, and this can be achieved by showing value rather than offering large sums of money 1m20s.
Companies may crack down on their budgets and look for creative solutions to the increasing prices of AI tokens, which may not be commensurate with the value generated, similar to what Uber did 2m6s.

Shift to Local and On-Premise AI Models

As prices for AI tokens continue to rise, companies, especially Fortune 500 companies, may shift to running local models, which could lead to a new class of computing, with companies like Nvidia potentially being affected as they sell a lot of GPUs in data centers 4m30s.
Running local models, also known as desk side computing, involves processing workloads locally, which can reduce costs, but also requires more network bandwidth and coordination between different models and agents, leading to a complex routing challenge and trust decision 6m40s.
The shift to local models and desk side computing will require a significant infrastructure buildout, and according to Jevons paradox, this could lead to increased growth and demand for computing resources, rather than a reduction in costs 10m0s.

Bandwidth and Infrastructure Demands of AI

The growth of AI is leading to an increased need for bandwidth in offices, resulting in a 25% growth in the last quarter, which is significantly higher than the historical 3% growth rate, and this trend is expected to continue as agents start working on the desk side 10s.
The cost per token is expected to decrease, while usage and traffic will increase, and token generation will be distributed across various locations, not just data centers, allowing for more efficient use of resources 1m20s.

Security and Trust in AI Agent Operations

Companies like Robin Hood are allowing agents to perform tasks such as trading, and Cisco is also enabling agents to take actions, but security and trust are crucial considerations in this trend, and measures are being taken to ensure secure and trustworthy interactions 2m6s.
To address the issue of trust, a human-in-the-loop approach is being implemented, where tasks can be delegated to agents with checkpoints for human review, and the decision to make the process fully autonomous can be made individually by network operations or security personnel 4m30s.
Model routing was a significant topic of discussion at Cisco Live, as customers are concerned about the cost of tokens and the infrastructure required to support AI, and Cisco is working to make it more affordable and efficient for them 6m40s.

Cisco's AI Strategy and Model Development

Cisco has invested heavily in building small, efficient models, and is focusing on making AI more accessible and cost-effective for its customers, while also prioritizing security and trust in its offerings 9m10s.
Cisco is not offering Chinese models to its customers, due to security concerns and the regulated nature of its customers' industries, and is instead focusing on building its own models and solutions that meet the needs of its security-conscious customers 11m30s.
The development of special purpose models for specific tasks, such as security and networking, has been successful, with some 8 billion parameter models performing better than 120 billion parameter models in certain benchmarks 10s.

Specialized AI Models and Performance

These special purpose models are designed to excel in particular areas, such as security or observability, rather than being general-purpose models, and the focus is on being the world's best in these specific areas 1m30s.
The current top models being used are GPT 5.5 and Opus 4.8, with GPT 5.5 having recently caught up with Opus 4.8, and there is a desire to see more competition in the market 4m20s.
The usage of these models is expected to become more efficient, rather than charging more for the models, with the industry likely to focus on making the models more efficient in token generation 8m30s.

Model Usage and Market Competition

Different models are being used for various tasks, such as ChatGPT, GPT 5.5, Opus 4.8, and Gemini, with each having its strengths and weaknesses, and being used for tasks such as current events, reasoning, and cooking questions 10m40s.
The partnerships with companies like OpenAI and Anthropic are valued, with their forward-deployed engineers and great job in deploying models being appreciated, and the competition between them is seen as beneficial 6m10s.
The individual expressing their thoughts mentions that they do not recall the last time they used a search engine, and when they do use it, they often rely on Google News or AI mode, giving credit to Demis and Sundar for their full-stack approach, but noting that they lack a coding product 10s.

Elon Musk's Influence and Potential in AI

The conversation shifts to discussing major models and labs, with a notable absence of mention of Xai or Grok, despite acknowledging Elon's significant role in the AI space, including his plans for orbital data centers and the potential of his Colossus data centers, one of which has been given to Anthropic 2m6s.
Elon's potential to become a formidable hyperscaler is discussed, with the individual noting that while he already has a great model company, he lacks the coding piece, but could potentially change this with his work on cursor, and his use of Grok is seen as a notable use case 2m6s.
The individual believes that Elon's efforts, including his work with cursor and Grok, could have a huge upside and meaningfully change the landscape, making him a significant player in the AI space 2m6s.

Conclusion and Acknowledgments

The conversation concludes with an expression of appreciation for the discussion and a look forward to future conversations, with thanks given to the production team, including Sami, Jasmine, Robert, and Evan 9m30s.

Made with Recall · in 3 seconds

Get a summary like this for anything you read, watch or save.

Recall summarizes any link you paste, then keeps it in your personal library so you can search, chat with it, and never lose a key idea again.

YouTube videosArticlesPodcastsPDFsAnything else

Save this summary

Keep it in your library.

Save to your library

Browse all from CNBC →

How Elon Musk's AI Empire In Memphis Became A Cautionary Tale

How Elon Musk's AI Empire In Memphis Became A Cautionary Tale

YouTube19 Jul 2026

AI’s Next Race: Cost, Control, and Compute

Artificial Intelligence

AI’s Next Race: Cost, Control, and Compute

YouTube13 Jul 2026

Why Amex And Chase Love Lounges

Why Amex And Chase Love Lounges

YouTube04 Jul 2026

Why Retail Investors Are Betting On SpaceX’s Massive IPO

Why Retail Investors Are Betting On SpaceX’s Massive IPO

YouTube14 Jun 2026

U.S. Confronts The Hidden Risk Of Chinese Circuit Boards Fundamental To AI Chips

U.S. Confronts The Hidden Risk Of Chinese Circuit Boards Fundamental To AI Chips

YouTube07 Jun 2026

Tokens Or Humans? The New AI Cost Trade-Off Reshaping Corporate Budgets

Tokens Or Humans? The New AI Cost Trade-Off Reshaping Corporate Budgets

YouTube02 Jun 2026

Ready to get started?

Save, summarize and chat with your content.

IT'S FREE

No credit card required · 30 Day Refund on Premium · 24 Hour Support

Recall web app on laptop, personal AI knowledge base for summarizing and chatting with your content