Apple’s latest research about running large language models on smartphones offers the clearest signal yet that the iPhone maker plans to catch up with its Silicon Valley rivals in generative artificial intelligence.
The paper, entitled “LLM in a Flash,” offers a “solution to a current computational bottleneck,” its researchers write.
Its approach “paves the way for effective inference of LLMs on devices with limited memory,” they said. Inference refers to how large language models, the large data repositories that power apps like ChatGPT, respond to users’ queries. Chatbots and LLMs normally run in vast data centers with much greater computing power than an iPhone.
The paper was published on December 12 but caught wider attention after Hugging Face, a popular site for AI researchers to showcase their work, highlighted it late on Wednesday. It is the second Apple paper on generative AI this month and follows earlier moves to enable image-generating models such as Stable Diffusion to run on its custom chips.
Device manufacturers and chipmakers are hoping that new AI features will help revive the smartphone market, which has had its worst year in a decade, with shipments falling an estimated 5 percent, according to Counterpoint Research.
Despite launching one of the first virtual assistants, Siri, back in 2011, Apple has been largely left out of the wave of excitement about generative AI that has swept through Silicon Valley in the year since OpenAI launched its breakthrough chatbot ChatGPT. Apple has been viewed by many in the AI community as lagging behind its Big Tech rivals, despite hiring Google’s top AI executive, John Giannandrea, in 2018.
While Microsoft and Google have largely focused on delivering chatbots and other generative AI services over the Internet from their vast cloud computing platforms, Apple’s research suggests that it will instead focus on AI that can run directly on an iPhone.
Apple’s rivals, such as Samsung, are gearing up to launch a new kind of “AI smartphone” next year. Counterpoint estimated more than 100 million AI-focused smartphones would be shipped in 2024, with 40 percent of new devices offering such capabilities by 2027.
The head of the world’s largest mobile chipmaker, Qualcomm chief executive Cristiano Amon, forecast that bringing AI to smartphones would create a whole new experience for consumers and reverse declining mobile sales.
“You’re going to see devices launch in early 2024 with a number of generative AI use cases,” he told the Financial Times in a recent interview. “As those things get scaled up, they start to make a meaningful change in the user experience and enable new innovation which has the potential to create a new upgrade cycle in smartphones.”
More sophisticated virtual assistants will be able to anticipate users’ actions such as texting or scheduling a meeting, he said, while devices will also be capable of new kinds of photo editing techniques.
Google this month unveiled a version of its new Gemini LLM that will run “natively” on its Pixel smartphones.
Running the kind of large AI model that powers ChatGPT or Google’s Bard on a personal device brings formidable technical challenges, because smartphones lack the huge computing resources and energy available in a data center. Solving this problem could mean that AI assistants respond more quickly than they do from the cloud and even work offline.
Ensuring that queries are answered on an individual’s own device without sending data to the cloud is also likely to bring privacy benefits, a key differentiator for Apple in recent years.
“Our experiment is designed to optimize inference efficiency on personal devices,” its researchers said. Apple tested its approach on models including Falcon 7B, a smaller version of an open source LLM originally developed by the Technology Innovation Institute in Abu Dhabi.
Optimizing LLMs to run on battery-powered devices has been a growing focus for AI researchers. Academic papers are not a direct indicator of how Apple intends to add new features to its products, but they offer a rare glimpse into its secretive research labs and the company’s latest technical breakthroughs.
“Our work not only provides a solution to a current computational bottleneck but also sets a precedent for future research,” wrote Apple’s researchers in the conclusion to their paper. “We believe as LLMs continue to grow in size and complexity, approaches like this work will be essential for harnessing their full potential in a wide range of devices and applications.”
Apple did not immediately respond to a request for comment.