News
This uCRT bug has been already reported through another channel a week ago and it got fixed since (OS #58189958: pow (-1, 2) returns -1). It might take a while to be available in the public insider ...
We evaluate our Qwen2.5-Math base models on three widely used English math benchmarks GSM8K, Math, and MMLU-STEM. In addition, we also evaluate three Chinese math benchmarks CMATH, GaoKao Math Cloze, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results