- Lumian
- Posts
- Lumian Gen AI Newsletter Issue #48
Lumian Gen AI Newsletter Issue #48
Gemini’s AI Lead, Microsoft x TikTok, Meta Shuts AI Celebs
Welcome to the 48th edition of the Lumian Weekly Gen AI Newsletter!
Imagine you've been enjoying an all-you-can-eat buffet for years, piling your plate high with whatever you want. Then one day, the buffet shuts down and you're told it’s now a pay-per-plate restaurant. Welcome to the world of AI data.
For years, AI developers have had it good, scraping the internet for text, images, and videos to train their models. It was a digital free-for-all. But according to a report from the Data Provenance Initiative, that golden age is ending. Websites are slamming the door on data scrapers, and it's causing a full-blown crisis.
Out of 14,000 web domains that typically feed AI datasets, a staggering 25% of high-quality data sources have put up "No Trespassing" signs, thanks to the trusty old Robots Exclusion Protocol (robots.txt). It’s like showing up at your favorite buffet only to find a giant padlock on the door.
High-quality data is the secret sauce that makes AI models smart, accurate, and, well, useful. But as more and more data gets locked behind paywalls, terms of service, and legal threats, AI developers are scrambling like diners at a closed buffet. The machines' appetites have outgrown the internet’s pantry. By 2028, according to Epoch AI, the high-quality textual data on the internet will be all gobbled up. This looming “data wall” threatens to slow AI progress.
Big deal! When you hit a wall, you have two options: climb over it or find a way around it. AI developers are trying both. They’re focusing on data quality over quantity, filtering and sequencing data to maximize learning. This is the “main differentiator” between AI models. Quality sources, like academic textbooks, are now the crème de la crème.
But it’s not just about better data; it’s about diverse data. Leading models like OpenAI’s GPT-4 and Google’s Gemini are now feasting on a smorgasbord of text, images, video, and audio. Training on video is particularly tough due to its density, requiring models to digest fewer frames to keep their digital waistlines in check.
Then there’s the ownership issue. Much of the material used to train LLMs is copyrighted and used without consent. Getty Images sued Stability AI for unauthorized use of its image store. The New York Times sued OpenAI and Microsoft for copyright infringement. Others, like News Corp, struck lucrative deals. In the US, AI companies argue this falls under "fair use." But scale changes the equation.
So what’s next? AI developers are turning to synthetic data—AI-generated content used to train other AI. AlphaGo Zero, an AI that mastered the game of Go, learned by playing millions of matches against itself, without pre-existing data. This "reinforcement learning" could apply to other areas, like writing math proofs. By generating multiple first steps and having a “helper” AI judge them, models could improve their outputs.
Yet, this approach has limits. In gaming, it’s clear what a "win" looks like. In fields like healthcare or education, defining a “good” decision is trickier and requires costly expert data. As pre-training data dries up, post-training becomes crucial. Companies like Scale AI and Surge AI are booming, collecting human feedback to refine models. The best labelers can earn up to $100 an hour.
Not surprisingly, the free-for-all days are over, and it’s time for a strategic, respectful approach to data collection. The challenge now is finding new data sources or sustainable alternatives to keep the AI engines running. And my sense is that it isn’t just about feeding the beast but about keeping it well-fed and happy without burning down the digital kitchen. Because when the data well runs dry, everyone feels the pinch.
Happy reading! 📚🤖🎵
In this week’s issue:
News Flash: Gemini’s AI Lead, Microsoft x TikTok, Meta Shuts AI Celebs
Winning with Consumers in AI: Forerunner Ventures Report
AI Frontier: AI Drawing tools you can use today
Fundraising: The biggest deals in AI
Nerd Out: Technical and Business Content for Everyone
⏱️ News Flash
The 2-Minute Scoop to Keep You in the Loop
What's the Buzz?
Google has launched its cutting-edge AI model, Gemini 1.5 Pro, surpassing major competitors like GPT-4o.
Breaking It Down
Gemini 1.5 Pro has achieved the top position on the LMSYS Chatbot Arena leaderboard with an impressive ELO score of 1300, excelling in multi-lingual tasks, mathematics, complex prompts, and coding. Its extensive context window of up to two million tokens allows for processing vast amounts of information, offering significant improvements over previous models.
Why It Matters
This advancement in AI technology by Google not only intensifies competition but also opens up new possibilities for automation and decision support across various industries, potentially transforming enterprise operations.
What's the Buzz?
Microsoft's cloud AI business is thriving, thanks to substantial payments from TikTok for access to OpenAI’s models, though TikTok is developing its own AI capabilities.
Breaking It Down
TikTok pays Microsoft nearly $20M (yes, million!) monthly for OpenAI’s models, accounting for a significant portion of Microsoft's cloud revenue, which is projected to hit $1B annually. However, ByteDance's internal development of a large language model could reduce its reliance on Microsoft's services, posing a risk to future revenue growth.
Why It Matters
Microsoft's substantial revenue from TikTok highlights the importance of AI integration in cloud services, but also underscores the vulnerability of relying heavily on a single client. Diversifying its client base and AI offerings is crucial to maintaining market leadership and ensuring steady growth amidst potential shifts in client needs and competitive pressures.
What's the Buzz?
Meta has scrapped its AI-powered celebrity chatbots after they failed to gain traction and were deemed creepy by users.
Breaking It Down
Meta paid millions to celebrities like Tom Brady and Kendall Jenner to create AI personas for platforms like Instagram and Facebook. Despite this investment, the chatbots failed to attract significant followings, leading Meta to pull the plug on the project.
Why It Matters
The failure of Meta's AI celebrity chatbots highlights the challenges of integrating AI into social media in a way that resonates with users and underscores the importance of genuine engagement over high-profile gimmicks in the AI landscape. Meanwhile, Google has hired back the AI leadership of Character.AI, the dominant AI player in this space.
🤑 Winning with Consumers in AI
Forerunner Ventures Report
A recent report from Forerunner Ventures offers a compelling examination of how AI can reshape consumer experiences. Several key ideas stood out, particularly in how they navigate the intersection of value and technological shifts, the balance between access and editing, and AI's role in trust-building.
Value Shifts and Technological Advancements
A significant theme in the report is the evolving consumer priorities towards simplicity and personalization amidst an overwhelming array of choices. These value shifts align seamlessly with advancements in AI & ML, which are poised to meet these demands through enhanced capabilities. Generative AI, in particular, represents a frontier of innovation, providing tailored and efficient solutions that resonate with modern consumer expectations.
From Peak Access to Effective Editing
We are currently experiencing what Forerunner Ventures terms "Peak Access." The digital era’s proliferation of choices and constant information flow, while advantageous, often results in decision paralysis. For example, the sheer volume of options on streaming platforms can lead to more time spent deciding what to watch than actually watching content, contributing to decision fatigue.
AI has the potential to mitigate this by transitioning from merely providing access to curating and editing options based on individual preferences. This shift from access to editing can streamline decision-making processes, offering personalized recommendations that align closely with user tastes. An AI capable of discerning your specific vacation rental preferences and narrowing down hundreds of choices to a select few epitomizes this potential. Such targeted assistance could substantially reduce decision fatigue and enhance overall consumer satisfaction.
Building Trust in AI
For AI to genuinely alleviate decision fatigue and enhance user experiences, it must build and maintain consumer trust. Forerunner Ventures highlights a phased trust-building approach, beginning with a "do it with me" model where AI assists users in decision-making. This collaborative phase is crucial for users to gain confidence in AI's capabilities. As AI demonstrates reliability and effectiveness, the model can gradually transition to "do it for me," where AI autonomously manages more tasks.
Throughout this process, transparency regarding AI's capabilities and limitations is essential. Consumers must have a clear understanding of what AI can and cannot do to set realistic expectations. This mirrors the gradual adoption of autonomous driving technologies, where incremental trust is built through features like lane assist before fully autonomous driving is adopted.
The team at Forerunner Ventures is great at all things consumer! This insightful report is a recommended read for anyone interested in the future of consumer tech and AI's evolving role in it.
🚀 AI in Practice
Cutting-Edge AI Drawing Tools You Can Use Today
Simply Draw - AI feedback meets art - anyone can learn to draw
Drawings Alive - Breathe life into drawings with AI
🤑 Fundraising
The (AI) Intelligent Investor
🤖 Nerd Out
Technical and Business Readings
😜 AI’s Productivity Hack
No Meetings!
How did you like this week's newsletter?Vote below: |
If you were forwarded this newsletter, you can access more of our content by subscribing here.
Best,
Reply