CMMLU

Paper

CMMLU: Measuring massive multitask language understanding in Chinese https://arxiv.org/abs/2306.09212

CMMLU is a comprehensive evaluation benchmark specifically designed to evaluate the knowledge and reasoning abilities of LLMs within the context of Chinese language and culture. CMMLU covers a wide range of subjects, comprising 67 topics that span from elementary to advanced professional levels.

Homepage: https://github.com/haonan-li/CMMLU

Citation

@misc{li2023cmmlu,
      title={CMMLU: Measuring massive multitask language understanding in Chinese},
      author={Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin},
      year={2023},
      eprint={2306.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Groups and Tasks

Groups

cmmlu: All 67 subjects of the CMMLU dataset, evaluated following the methodology in MMLU's original implementation.

Tasks

The following tasks evaluate subjects in the CMMLU dataset using loglikelihood-based multiple-choice scoring:

cmmlu_{subject_english}

Checklist

Is the task an existing benchmark in the literature?
- Have you referenced the original paper that introduced the task?
- If yes, does the original paper provide a reference implementation?
  - Yes, original implementation contributed by author of the benchmark

If other tasks on this dataset are already supported:

Is the "Main" variant of this task clearly denoted?
Have you provided a short sentence in a README on what each new variant adds / evaluates?
Have you noted which, if any, published evaluation setups are matched by this variant?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CMMLU

Paper

Citation

Groups and Tasks

Groups

Tasks

Checklist

Files

README.md

Latest commit

History

README.md

File metadata and controls

CMMLU

Paper

Citation

Groups and Tasks

Groups

Tasks

Checklist