[標題]最新消息

AIEC Releases First Benchmark Evaluation Results for Language Models, Showcasing Taiwan’s Progress in Localized and Trustworthy AI Development

The Artificial Intelligence Evaluation Center (AIEC) today (October 3) released the results of its first benchmark evaluation for language models, marking a major milestone in promoting localized AI evaluation and third-party verification in Taiwan. The initiative aims to strengthen the foundation for trustworthy AI development and enhance the reliability of Taiwan’s AI industry.

The evaluation systematically assessed both domestic and international language models based on their model size. In addition to traditional benchmarks such as General Scholastic Ability Test (GSAT) for Chinese and Social Studies tests of College Entrance Examination Center (CEEC), AIEC also introduced a unique “Taiwan Values” evaluation metric, reflecting global trends toward AI sovereignty and providing a key reference for developing or fine-tuning locally adapted AI models.

A total of 42 language models were tested in this evaluation. Results showed that Taiwan-developed TAIDE (Gemma-3-TAIDE-12b) ranked among the top performers in the small-model category (under 13B parameters), outperforming its base model, Google Gemma-3-12b-it, and demonstrating Taiwan’s growing AI research and development capabilities.

In the large-model category (13B and above), OpenAI GPT-5 achieved the highest overall performance, while Google Gemini 2.5 Flash performed exceptionally well in the “Taiwan Values” category, indicating a stronger understanding of local cultural and social contexts. Interestingly, several Chinese-developed language models also scored well in this category, likely due to knowledge distillation techniques that utilized outputs from Western foundation models during training.

Overall, the findings suggest that those models trained without Traditional Chinese corpora from Taiwan may perform worse on “Taiwan Values” benchmarks, underscoring the potential importance of localized data development. In response, the Ministry of Digital Affairs (MODA) is advancing the Taiwan Sovereign AI Corpus Initiative, which provides high-quality Traditional Chinese datasets rooted in local language, culture, and social values—ensuring that AI systems align with Taiwan’s unique linguistic and ethical contexts.

Going forward, AIEC will continue to promote the development of domestically built AI evaluation tools for various products, systems, and application domains, while aligning with international testing frameworks and standards. These efforts aim to strengthen Taiwan’s global competitiveness and establish a secure, robust, and trustworthy AI evaluation ecosystem.

AIEC also plans to invite experts and scholars to contribute evaluation questions, which, after review, may be incorporated into future benchmark datasets—further enhancing Taiwan’s AI evaluation capabilities and international presence.

Images:
AIEC Releases First Benchmark Evaluation Results for Language Models
AIEC Releases First Benchmark Evaluation Results for Language Models