Meta's Llama AI models now support images, too

Kyle Wiggers

Updated 27 September 2024 at 6:41 pm·6-min read

Benjamin Franklin once wrote that nothing is certain except death and taxes. Let me amend that phrase to reflect the current AI gold rush: Nothing is certain except death, taxes, and new AI models, with the last of those three arriving at an ever-accelerating pace.

Earlier this week, Google released upgraded Gemini models, and, earlier in the month, OpenAI unveiled its o1 model. But on Wednesday, it was Meta's turn to trot out its latest at the company's annual Meta Connect 2024 developer conference in Menlo Park.

Llama's multimodality

Meta's multilingual Llama family of models has reached version 3.2, with the bump from 3.1 signifying that several Llama models are now multimodal. Llama 3.2 11B — a compact model — and 90B, which is a larger, more capable model, can interpret charts and graphs, caption images, and pinpoint objects in pictures given a simple description.

Given a map of a park, for example, Llama 3.2 11B and 90B might be able to answer questions like, "When will the terrain become steeper?" and "What's the distance of this path?" Or, provided a graph showing a company's revenue over the course of a year, the models could quickly spotlight the best-performing months of the bunch.

For developers who wish to use the models strictly for text applications, Meta says that Llama 3.2 11B and 90B were designed to be "drop-in" replacements for 3.1. 11B and 90B can be deployed with or without a new safety tool, Llama Guard Vision, that's designed to detect potentially harmful (i.e. biased or toxic) text and images fed to or generated by the models.

In most of the world, the multimodal Llama models can be downloaded from and used across a wide number of cloud platforms, including Hugging Face, Microsoft Azure, Google Cloud, and AWS. Meta's also hosting them on the official Llama site, Llama.com, and using them to power its AI assistant, Meta AI, across WhatsApp, Instagram, and Facebook.

But Llama 3.2 11B and 90B can't be accessed in Europe. As a result, several Meta AI features available elsewhere, like image analysis, are disabled for European users. Meta once again blamed the "unpredictable" nature of the bloc's regulatory environment.

Meta has expressed concerns about — and spurned a voluntary safety pledge related to — the AI Act, the EU law that establishes a legal and regulatory framework for AI. Among other requirements, the AI Act mandates that companies developing AI in the EU commit to charting whether their models are likely to be deployed in "high-risk" situations, like policing. Meta fears that the "open" nature of its models, which give it little insight into how the models are being used, could make it challenging to adhere to the AI Act's rules.

Also at issue for Meta are provisions in the GDPR, the EU's broad privacy law, pertaining to AI training. Meta trains models on the public data of Instagram and Facebook users who haven't opted out — data that in Europe is subject to GDPR guarantees. EU regulators earlier this year requested that Meta halt training on European user data while they assessed the company's GDPR compliance.

Meta relented, while at the same time endorsing an open letter calling for "a modern interpretation" of GDPR that doesn't "reject progress."

Earlier this month, Meta said that it would resume training on U.K. user data after "[incorporating] regulatory feedback" into a revised opt-out process. But the company has yet to share an update on training throughout the bloc.

More compact models

Other new Llama models — models that weren't trained on European user data — are launching in Europe (and globally) Wednesday.

Llama 3.2 1B and 3B, two lightweight, text-only models designed to run on smartphones and other edge devices, can be applied to tasks such as summarizing and rewriting paragraphs (e.g. in an email). Optimized for Arm hardware from Qualcomm and MediaTek, 1B and 3B can also tap tools such as calendar apps with a bit of configuration, Meta says, allowing them to take actions autonomously.

There isn't a follow-up, multimodal or not, to the flagship Llama 3.1 405B model released in August. Given 405B's massive size — it took months to train — it's likely a matter of constrained compute resources. We've asked Meta if there are other factors at play and will update this story if we hear back.

Meta's new Llama Stack, a suit of Llama-focused dev tools, can be used to fine-tune all the Llama 3.2 models: 1B, 3B, 11B, and 90B. Regardless of how they're customized, the models can process up to around 100,000 words at once, Meta says.

A play for mindshare

Meta CEO Mark Zuckerberg often talks about ensuring all people have access to the “benefits and opportunities” of AI. Implicit in this rhetoric, however, is a desire that these tools and models be of Meta’s making.

Spending on models that it can then commoditize forces the competition (e.g. OpenAI, Anthropic) to lower prices, spreads Meta's version of AI broadly, and lets Meta incorporate improvements from the open source community. Meta claims that its Llama models have been downloaded over 350 million times and are in use by large enterprises including Zoom, AT&T, and Goldman Sachs.

For many of these developers and companies, it's immaterial that the Llama models aren't "open" in the strictest sense. Meta's license constrains how certain devs can use them; platforms with over 700 million monthly users must request a special license from Meta that the company will grant on its discretion.

Granted, there aren't many platforms of that size without their own in-house models. But Meta isn't being especially transparent about the process. When I asked the company this month whether it had approved a discretionary Llama license for a platform yet, a spokesperson told me that Meta "didn't have anything to share on the topic."

Make no mistake, Meta’s playing for keeps. It's spending millions lobbying regulators to come around to its preferred flavor of "open" AI, and it's ploughing billions into servers, datacenters, and network infrastructure to train future models.

None of the Llama 3.2 models solve the overriding problems with today’s AI, like its tendency to make things up and regurgitate problematic training data (e.g. copyrighted e-books that might've been used without permission, the subject of a class action lawsuit against Meta). But, as I've written before, they do advance one of Meta's key goals: becoming synonymous with AI, and in particular generative AI.

This QVC Elemis deal is the ideal Christmas gift for the man in your life

Llama's multimodality

More compact models

A play for mindshare