✪Sorted by the region of the developing organization: light orange represents European models, light blue represents U.S. models, light green represents local models, and light purple represents Chinese models.
✪Explanation of percentage values: figures above 50% are marked in green; figures below 50% are marked in pink.
✪This test cycle includes 18 newly added models (3 small models and 15 large models).
- Language Model Benchmark / Small Models (13B and below)
- Language Model Benchmark / Large Models (above 13B)