✪Sorted by the region of the developing organization: light orange represents European models, light blue represents U.S. models, light green represents local models, light purple represents Chinese models, and light beige represents Asian models.
✪Explanation of percentage values: figures above 50% are marked in black; figures below 50% are marked in red.
✪This round of testing results includes the addition of 5 small-scale models and 9 large-scale models.
✪Korean models have been newly added this month. Starting from May 2026, the testing results will include an additional field indicating the model testing date for reference.
- Language Model Benchmark / Small Models (13B and below)
- Language Model Benchmark / Large Models (above 13B)