Skip to content

v0.4.6

Latest
Compare
Choose a tag to compare
@baberabb baberabb released this 25 Nov 13:38
· 14 commits to main since this release
9d36354

lm-eval v0.4.6 Release Notes

This release brings important changes to chat template handling, expands our task library with new multilingual and multimodal benchmarks, and includes various bug fixes.

Backwards Incompatibilities

Chat Template Delimiter Handling

An important modification has been made to how delimiters are handled when applying chat templates in request construction, particularly affecting multiple-choice tasks. This change ensures better compatibility with chat models by respecting their native formatting conventions.

📝 For detailed documentation, please refer to docs/chat-template-readme.md

New Benchmarks & Tasks

Multilingual Expansion

  • Spanish Bench: Enhanced benchmark with additional tasks by @zxcvuser in #2390
  • Japanese Leaderboard: New comprehensive Japanese language benchmark by @sitfoxfly in #2439

New Task Collections

  • Multimodal Unitext: Added support for multimodal tasks available in unitext by @elronbandel in #2364
  • Metabench: New benchmark contributed by @kozzy97 in #2357

As well as several slight fixes or changes to existing tasks (as noted via the incrementing of versions).

Thanks, the LM Eval Harness team (@baberabb and @lintangsutawika)

What's Changed

New Contributors

Full Changelog: v0.4.5...v0.4.6