Alibaba claims no. 1 spot in AI math fashions with Qwen2-Math – Uplaza

Be part of our each day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra


Should you haven’t heard of “Qwen2” it’s comprehensible, however that ought to all change beginning at present with a shocking new launch taking the crown from all others in the case of a vital topic in software program improvement, engineering, and STEM fields the world over: math.

What’s Qwen2?

With so many new AI fashions rising from startups and tech firms, it may be laborious even for these paying shut consideration to the house to maintain up.

Qwen2 is an open-source giant language mannequin (LLM) rival to OpenAI’s GPTs, Meta’s Llamas, and Anthropic’s Claude household, however fielded by Alibaba Cloud, the cloud storage division of the Chinese language e-commerce large Alibaba.

Alibaba Cloud started releasing its personal LLMs beneath the sub model identify “Tongyi Qianwen” or Qwen, for brief, in August 2023, together with open-source fashions Qwen-7B, Qwen-72B and Qwen-1.8B, with 72 billion and 1.8-billion parameters respectively (referencing the settings and finally, intelligence of every mannequin), adopted by multimodal variants together with Qwen-Audio and Qwen-VL (for imaginative and prescient inputs), and at last Qwen2 again in early June 2024 with 5 variants: 0.5B, 1.5B, 7B, 14B, and 72B. Altogether, Alibaba has launched greater than 100 AI fashions of various sizes and features within the Qwen household on this time.

And clients, significantly in China, have taken notice, with greater than 90,000 enterprises reported to have adopted Qwen fashions of their operations within the first yr of availability.

Whereas many of those fashions boasted state-of-the-art or close-to-it efficiency upon their launch dates, the LLM and AI mannequin race extra broadly strikes so quick around the globe, they have been rapidly eclipsed in efficiency by different open and closed supply rivals. Till now.

What’s Qwen2-Math?

At this time, Alibaba Cloud’s Qwen staff peeled off the wrapper on Qwen2-Math, a brand new “series of math-specific large language models” designed for English language. Essentially the most highly effective of those outperform all others on the planet — together with the vaunted OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, and even Google’s Math-Gemini Specialised 1.5 Professional.

Alibaba claims no. 1 spot in AI math fashions with Qwen2-Math - Uplaza 3

Particularly, the 72-billion parameter Qwen2-Math-72B-Instruct variant clocks in at 84% on the MATH Benchmark for LLMs, which gives 12,500 “challenging competition mathematics problems,” and phrase issues at that, which will be notoriously troublesome for LLMs to finish (see the check of which is bigger: 9.9 or 9.11).

Right here’s an instance of an issue included within the MATH dataset:

Alibaba claims no. 1 spot in AI math fashions with Qwen2-Math - Uplaza 4

Candidly, it’s not one I might reply alone, and positively not inside seconds, however Qwen2-Math apparently can more often than not.

Maybe unsurprisingly, then, Qwen2-Math-72B Instruct additionally excels and outperforms the competitors at grade faculty math benchmark GSM8K (8,500 questions) at 96.7% and at collegiate-level math (Faculty Math benchmark) at 47.8% as properly.

Alibaba claims no. 1 spot in AI math fashions with Qwen2-Math - Uplaza 5

Notably, nevertheless, Alibaba didn’t evaluate Microsoft’s new Orca-Math mannequin launched in February 2024 in its benchmark charts, and that 7-billion parameter mannequin (a variant of Mistral-7B, itself a variant of Llama) comes up near the Qwen2-Math-7B-Instruct mannequin at 86.81% for Orca-Math vs. 89.9% for Qwen-2-Math-7B-Instruct.

But even the smallest model of Qwen2-Math, the 1.5 billion parameter model, performs admirably and near the mannequin greater than 4 instances its dimension scoring at 84.2% on GSM8Kand 44.2% on faculty math.

What are math AI fashions good for?

Whereas preliminary utilization of LLMs has targeted on their utility in chatbots and within the case of enterprises, for answering worker or buyer questions or drafting paperwork and parsing info extra rapidly, math-focused LLMs search to supply extra dependable instruments for these trying to often clear up equations and work with numbers.

Sarcastically given all code is predicated on mathematic fundamentals, LLMs have thus far not been as dependable as earlier eras of AI or machine studying, and even older software program, at fixing math issues.

The Alibaba researchers behind Qwen2-Math state that they “hope that Qwen2-Math can contribute to the community for solving complex mathematical problems.”

The customized licensing phrases for enterprises and people in search of to make use of Qwen2-Math fall in need of purely open supply, requiring that any business utilization with greater than 100 million month-to-month lively customers get hold of a further permission and license from the creators. However that is nonetheless an especially permissive higher restrict and would permit for a lot of startups, SMBs, and even some giant enterprises to make use of Qwen-2 Math commercially (to make them cash) without spending a dime, primarily.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version