• 2024-08-10

Meta releases the open-source AI model Llama 3.1, which used 16,000 Nvidia H100

On July 23rd local time, Meta Corporation unveiled its most powerful open-source AI model to date, Llama 3.1. This model is not only of a large scale but also boasts performance comparable to the most powerful proprietary models. It is a significant milestone in the field of open-source AI.

The Llama 3.1 model family consists of three versions in total, with the largest flagship version featuring 405B (40.5 billion) parameters, making it the largest open-source AI model in recent years. The other two smaller versions have parameter counts of 7 billion and 800 million, respectively.

Meta claims that the Llama 3.1 405B model has outperformed OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet in multiple benchmark tests.

"Up until now, open-source large language models have mostly lagged behind proprietary models in terms of functionality and performance. Now, we are entering a new era led by open-source."

Advertisement

Meta wrote in its official blog, "To date, the total number of downloads for all versions of Llama has exceeded 300 million, and this is just the beginning."Model Evaluation

It is reported that the Llama 3.1 series models have demonstrated improvements in various aspects, being comparable to top AI models in terms of common sense, manipulability, mathematics, tool usage, and multilingual translation capabilities.

Meta has conducted a comprehensive evaluation of Llama 3.1, including tests on over 150 benchmark datasets, covering a variety of languages and task types.

In addition, the model has undergone extensive human evaluation, comparing it with competitive models in real-world application scenarios.

Overall, the Llama 3.1 405B model performs comparably to GPT-4, GPT-4o, and Claude 3.5 Sonnet in tasks such as reasoning and mathematics, and even outperforms them in areas such as long text and multilingualism.However, in the code benchmark tests, the Llama 3.1 405B model did not perform as well as the Claude 3.5 Sonnet.

Additionally, in the performance comparison of small models, both the 8B and 70B models of Llama 3.1 performed exceptionally well, showing strong competitiveness compared to closed-source and open-source models of the same scale, defeating their rivals in almost all tests.

Model Capabilities and Applications

Meta stated that the Llama 3.1 series models have a context window of 128K, equivalent to a 50-page book, and provide support for multiple languages such as English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, greatly enhancing the processing capabilities for long text and multilingual content.Ahmad Al-Dahle, Vice President of Meta's generative artificial intelligence, stated that the Llama 3.1 model is capable of integrating search engine application programming interfaces (APIs) to retrieve information from the internet based on complex queries and to call upon multiple tools to complete tasks. For instance, it can generate and execute Python code to draw charts.

However, the Llama 3.1 is currently not a multimodal model and only supports text input. But Meta has indicated that they are developing Llama models that can recognize images, videos, and understand (and generate) speech.

In terms of application, similar to the previous Llama models, the Llama 3.1 405B is available for download or use on cloud platforms such as Amazon AWS, Microsoft Azure, and Google Cloud. It will also be integrated with Meta's products to provide users with services like chatbots and information queries.

The model is currently in use on Meta.ai and the Meta AI virtual assistant. Starting this week, Llama 3.1 will first be launched on WhatsApp and the Meta.ai website in the United States, followed by Instagram and Facebook in the coming weeks.

Although the state-of-the-art Llama 3.1 405B model can be used for free on Meta.ai, there is a limit to the number of prompts per week (the exact limit is not specified), and exceeding this limit will switch to a smaller 70B model. This seems to suggest that the 405B model is still too costly for Meta to operate fully.Model Scale and Training

The training scale of the Llama 3.1 405B model is truly astonishing.

Meta utilized over 16,000 Nvidia H100 GPUs to train on a dataset exceeding 1.5 trillion tokens, which is equivalent to 750 billion words.

Although Meta has not disclosed the specific development costs, just estimating based on the price of the Nvidia chips used, the cost has already reached hundreds of millions of dollars.Meta has made significant optimizations to its entire training stack to achieve such a large-scale training goal.

In terms of model architecture, they opted for a standard decoder-only transformer model, rather than a mixture of expert models, to maximize training stability.

Regarding training data, Meta has improved the quality of the training data by refining the processing and data screening processes.

Additionally, they have chosen an iterative post-training procedure, "each round using supervised fine-tuning and direct preference optimization, continuously enhancing model performance with high-quality synthetic data."Open Source Strategy and Ecosystem

Despite the high development costs, Meta continues to insist on open-sourcing the Llama model.

In a public letter, Meta's CEO Mark Zuckerberg stated that open-source AI models will eventually surpass proprietary models and are already improving at a faster rate, ultimately becoming like Linux, an open-source operating system that supports most mobile phones, servers, and devices.

He predicts that "the release of Llama 3.1 will be a turning point for the industry, with most developers favoring the use of open-source models in the future."

Image | Zuckerberg's public letter (Source: Meta)To promote Llama 3.1, Meta is collaborating with over 20 companies, including Microsoft, Amazon, Google, Nvidia, and Databricks, to assist developers in deploying their own models.

Meta claims that the operational cost of Llama 3.1 in a production environment is approximately half that of OpenAI's GPT-4o.

In the meantime, Meta has updated the licensing terms for Llama, allowing developers to create third-party AI models based on the outputs of the Llama 3.1 model.

This change addresses a major criticism from the AI community regarding Meta's models and is part of the company's active efforts to gain a voice in the AI field.

Furthermore, to ensure the model's safety and ethics, Meta has, for the first time, included potential cybersecurity and biochemical use cases in the "red team testing" (adversarial testing) of Llama 3.1.They have also publicly released a complete reference system, which includes several example applications and new components, such as the multilingual security model Llama Guard 3 and the prompt injection filter Prompt Guard.

However, the issue of training data remains controversial. Meta refuses to disclose the specific sources of data, only stating that synthetic data was used to improve the model.

In summary, the release of the Llama 3.1 series models symbolizes the first time that open-source AI models are on par with top proprietary models in terms of performance. This could have a profound impact on the AI industry, driving more innovation and applications.

Comment