There’s a brand new AI mannequin household on the block, and it’s one of many few that may be reproduced from scratch.
On Tuesday, Ai2, the nonprofit AI analysis group based by the late Paul Allen, launched OLMo 2, the second household of fashions in its OLMo sequence. (OLMo’s brief for “Open Language Model.”) Whereas there’s no scarcity of “open” language fashions to select from (see: Meta’s Llama), OLMo 2 meets the Open Supply Initiative’s definition of open supply AI, which means the instruments and knowledge used to develop it are publicly out there.
The Open Supply Initiative, the long-running establishment aiming to outline and “steward” all issues open supply, finalized its open supply AI definition in October. However the first OLMo fashions, launched in February, met the criterion as properly.
“OLMo 2 [was] developed start-to-finish with open and accessible training data, open-source training code, reproducible training recipes, transparent evaluations, intermediate checkpoints, and more,” AI2 wrote in a weblog publish. “By openly sharing our data, recipes, and findings, we hope to provide the open-source community with the resources needed to discover new and innovative approaches.”
There’s two fashions within the OLMo 2 household: one with 7 billion parameters (OLMo 7B) and one with 13 billion parameters (OLMo 13B). Parameters roughly correspond to a mannequin’s problem-solving abilities, and fashions with extra parameters typically carry out higher than these with fewer parameters.
Like most language fashions, OLMo 2 7B and 13B can carry out a spread of text-based duties, like answering questions, summarizing paperwork, and writing code.
To coach the fashions, Ai2 used a knowledge set of 5 trillion tokens. Tokens signify bits of uncooked knowledge; 1 million tokens is the same as about 750,000 phrases. The coaching set included web sites “filtered for high quality,” educational papers, Q&A dialogue boards, and math workbooks “both synthetic and human generated.”
Ai2 claims the result’s fashions which can be aggressive, performance-wise, with open fashions like Meta’s Llama 3.1 launch.
“Not only do we observe a dramatic improvement in performance across all tasks compared to our earlier OLMo model but, notably, OLMo 2 7B outperforms LLama 3.1 8B,” Ai2 writes. “OLMo 2 [represents] the best fully-open language models to date.”
The OLMo 2 fashions and all of their elements could be downloaded from Ai2’s web site. They’re underneath Apache 2.0 license, which means they can be utilized commercially.
There’s been some debate not too long ago over the security of open fashions, what with Llama fashions reportedly being utilized by Chinese language researchers to develop protection instruments. After I requested Ai2 engineer Dirk Groeneveld in February whether or not he was involved about OLMo being abused, he informed me that he believes the advantages finally outweigh the harms.
“Yes, it’s possible open models may be used inappropriately or for unintended purposes,” he stated. “[However, this] approach also promotes technical advancements that lead to more ethical models; is a prerequisite for verification and reproducibility, as these can only be achieved with access to the full stack; and reduces a growing concentration of power, creating more equitable access.”