Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New arabicmmlu #2541

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lm_eval/tasks/arabicmmlu/_arabicmmlu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ aggregate_metric_list:
- metric: acc
weight_by_size: True
metadata:
version: 0
version: 1
2 changes: 1 addition & 1 deletion lm_eval/tasks/arabicmmlu/_arabicmmlu_humanities.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ aggregate_metric_list:
- metric: acc
weight_by_size: True
metadata:
version: 0
version: 1
2 changes: 1 addition & 1 deletion lm_eval/tasks/arabicmmlu/_arabicmmlu_language.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ aggregate_metric_list:
- metric: acc
weight_by_size: True
metadata:
version: 0
version: 1
2 changes: 1 addition & 1 deletion lm_eval/tasks/arabicmmlu/_arabicmmlu_other.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ aggregate_metric_list:
- metric: acc
weight_by_size: True
metadata:
version: 0
version: 1
2 changes: 1 addition & 1 deletion lm_eval/tasks/arabicmmlu/_arabicmmlu_social_science.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ aggregate_metric_list:
- metric: acc
weight_by_size: True
metadata:
version: 0
version: 1
2 changes: 1 addition & 1 deletion lm_eval/tasks/arabicmmlu/_arabicmmlu_stem.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ aggregate_metric_list:
- metric: acc
weight_by_size: True
metadata:
version: 0
version: 1
4 changes: 2 additions & 2 deletions lm_eval/tasks/arabicmmlu/_default_arabicmmlu_template_yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
dataset_path: yazeed7/ArabicMMLU
dataset_path: MBZUAI/ArabicMMLU
test_split: test
fewshot_split: dev
fewshot_config:
Expand All @@ -12,4 +12,4 @@ metric_list:
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
version: 1.0
91 changes: 45 additions & 46 deletions lm_eval/tasks/arabicmmlu/_generate_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,48 +13,46 @@
eval_logger = logging.getLogger("lm-eval")


SUBJECTS = {
"Driving Test": "other",
"High Geography": "social_science",
"High History": "humanities",
"Islamic Studies": "humanities",
"Univ Accounting": "social_science",
"Primary General Knowledge": "other",
"Univ Political Science": "social_science",
"Primary Math": "stem",
"Middle General Knowledge": "other",
"High Biology": "stem",
"Primary Natural Science": "stem",
"High Economics": "social_science",
"Middle Natural Science": "stem",
"Middle Geography": "social_science",
"Primary Social Science": "social_science",
"Middle Computer Science": "stem",
"Middle Islamic Studies": "humanities",
"Primary Computer Science": "stem",
"High Physics": "stem",
"Middle Social Science": "social_science",
"Middle Civics": "social_science",
"High Computer Science": "stem",
"General Knowledge": "other",
"High Civics": "social_science",
"Prof Law": "humanities",
"High Islamic Studies": "humanities",
"Primary Arabic Language": "language",
"High Arabic Language": "language",
"Arabic Language (Grammar)": "language",
"Primary History": "humanities",
"Middle History": "humanities",
"Univ Economics": "social_science",
"Arabic Language (General)": "language",
"Univ Computer Science": "stem",
"Primary Islamic Studies": "humanities",
"Primary Geography": "social_science",
"High Philosophy": "humanities",
"Middle Arabic Language": "language",
"Middle Economics": "social_science",
"Univ Management": "other",
}
SUBJECTS = {'Islamic Studies': 'humanities',
'Driving Test': 'other',
'Natural Science (Middle School)': 'stem',
'Natural Science (Primary School)': 'stem',
'History (Primary School)': 'humanities',
'History (Middle School)': 'humanities',
'History (High School)': 'humanities',
'General Knowledge': 'other',
'General Knowledge (Primary School)': 'other',
'General Knowledge (Middle School)': 'other',
'Law (Professional)': 'humanities',
'Physics (High School)': 'stem',
'Social Science (Middle School)': 'social_science',
'Social Science (Primary School)': 'social_science',
'Management (University)': 'other',
'Arabic Language (Primary School)': 'language',
'Arabic Language (Middle School)': 'language',
'Arabic Language (High School)': 'language',
'Political Science (University)': 'social_science',
'Philosophy (High School)': 'humanities',
'Accounting (University)': 'social_science',
'Computer Science (University)': 'stem',
'Computer Science (Middle School)': 'stem',
'Computer Science (Primary School)': 'stem',
'Computer Science (High School)': 'stem',
'Geography (Primary School)': 'social_science',
'Geography (Middle School)': 'social_science',
'Geography (High School)': 'social_science',
'Math (Primary School)': 'stem',
'Biology (High School)': 'stem',
'Economics (University)': 'social_science',
'Economics (Middle School)': 'social_science',
'Economics (High School)': 'social_science',
'Arabic Language (General)': 'language',
'Arabic Language (Grammar)': 'language',
'Islamic Studies (High School)': 'humanities',
'Islamic Studies (Middle School)': 'humanities',
'Islamic Studies (Primary School)': 'humanities',
'Civics (Middle School)': 'social_science',
'Civics (High School)': 'social_science'}


def parse_args():
Expand All @@ -69,8 +67,9 @@ def parse_args():

# get filename of base_yaml so we can `"include": ` it in our "other" YAMLs.
base_yaml_name = os.path.split(args.base_yaml_path)[-1]
with open(args.base_yaml_path, encoding="utf-8") as f:
base_yaml = yaml.full_load(f)

# with open(args.base_yaml_path, encoding="utf-8") as f:
# base_yaml = yaml.full_load(f)

ALL_CATEGORIES = []
for subject, category in tqdm(SUBJECTS.items()):
Expand All @@ -81,8 +80,8 @@ def parse_args():

yaml_dict = {
"include": base_yaml_name,
"tag": f"arabicmmlu_{category}",
"task": f"arabicmmlu_{subject.lower().replace(' ', '_')}",
"tag": f"arabicmmlu_{category}_tasks",
"task": f"arabicmmlu_{subject.lower().replace(' ', '_').replace('(', '').replace(')', '')}",
"task_alias": subject,
"dataset_name": subject,
# "description": description,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Accounting (University)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_accounting_university"
"task_alias": "Accounting (University)"
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"dataset_name": "Arabic Language (General)"
"tag": "arabicmmlu_language_tasks"
"include": "_default_arabicmmlu_template_yaml"
"task": "arabicmmlu_arabic_language_(general)"
"tag": "arabicmmlu_language_tasks"
"task": "arabicmmlu_arabic_language_general"
"task_alias": "Arabic Language (General)"
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"dataset_name": "Arabic Language (Grammar)"
"tag": "arabicmmlu_language_tasks"
"include": "_default_arabicmmlu_template_yaml"
"task": "arabicmmlu_arabic_language_(grammar)"
"tag": "arabicmmlu_language_tasks"
"task": "arabicmmlu_arabic_language_grammar"
"task_alias": "Arabic Language (Grammar)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Arabic Language (High School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_language_tasks"
"task": "arabicmmlu_arabic_language_high_school"
"task_alias": "Arabic Language (High School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Arabic Language (Middle School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_language_tasks"
"task": "arabicmmlu_arabic_language_middle_school"
"task_alias": "Arabic Language (Middle School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Arabic Language (Primary School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_language_tasks"
"task": "arabicmmlu_arabic_language_primary_school"
"task_alias": "Arabic Language (Primary School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Biology (High School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_stem_tasks"
"task": "arabicmmlu_biology_high_school"
"task_alias": "Biology (High School)"
5 changes: 5 additions & 0 deletions lm_eval/tasks/arabicmmlu/arabicmmlu_civics_high_school.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Civics (High School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_civics_high_school"
"task_alias": "Civics (High School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Civics (Middle School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_civics_middle_school"
"task_alias": "Civics (Middle School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Computer Science (High School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_stem_tasks"
"task": "arabicmmlu_computer_science_high_school"
"task_alias": "Computer Science (High School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Computer Science (Middle School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_stem_tasks"
"task": "arabicmmlu_computer_science_middle_school"
"task_alias": "Computer Science (Middle School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Computer Science (Primary School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_stem_tasks"
"task": "arabicmmlu_computer_science_primary_school"
"task_alias": "Computer Science (Primary School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Computer Science (University)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_stem_tasks"
"task": "arabicmmlu_computer_science_university"
"task_alias": "Computer Science (University)"
2 changes: 1 addition & 1 deletion lm_eval/tasks/arabicmmlu/arabicmmlu_driving_test.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"dataset_name": "Driving Test"
"tag": "arabicmmlu_other_tasks"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_other_tasks"
"task": "arabicmmlu_driving_test"
"task_alias": "Driving Test"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Economics (High School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_economics_high_school"
"task_alias": "Economics (High School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Economics (Middle School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_economics_middle_school"
"task_alias": "Economics (Middle School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Economics (University)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_economics_university"
"task_alias": "Economics (University)"
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"dataset_name": "General Knowledge"
"tag": "arabicmmlu_other_tasks"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_other_tasks"
"task": "arabicmmlu_general_knowledge"
"task_alias": "General Knowledge"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "General Knowledge (Middle School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_other_tasks"
"task": "arabicmmlu_general_knowledge_middle_school"
"task_alias": "General Knowledge (Middle School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "General Knowledge (Primary School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_other_tasks"
"task": "arabicmmlu_general_knowledge_primary_school"
"task_alias": "General Knowledge (Primary School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Geography (High School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_geography_high_school"
"task_alias": "Geography (High School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Geography (Middle School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_geography_middle_school"
"task_alias": "Geography (Middle School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Geography (Primary School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_social_science_tasks"
"task": "arabicmmlu_geography_primary_school"
"task_alias": "Geography (Primary School)"

This file was deleted.

5 changes: 0 additions & 5 deletions lm_eval/tasks/arabicmmlu/arabicmmlu_high_biology.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions lm_eval/tasks/arabicmmlu/arabicmmlu_high_civics.yaml

This file was deleted.

This file was deleted.

5 changes: 0 additions & 5 deletions lm_eval/tasks/arabicmmlu/arabicmmlu_high_economics.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions lm_eval/tasks/arabicmmlu/arabicmmlu_high_geography.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions lm_eval/tasks/arabicmmlu/arabicmmlu_high_history.yaml

This file was deleted.

This file was deleted.

5 changes: 0 additions & 5 deletions lm_eval/tasks/arabicmmlu/arabicmmlu_high_philosophy.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions lm_eval/tasks/arabicmmlu/arabicmmlu_high_physics.yaml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "History (High School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_humanities_tasks"
"task": "arabicmmlu_history_high_school"
"task_alias": "History (High School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "History (Middle School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_humanities_tasks"
"task": "arabicmmlu_history_middle_school"
"task_alias": "History (Middle School)"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "History (Primary School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_humanities_tasks"
"task": "arabicmmlu_history_primary_school"
"task_alias": "History (Primary School)"
2 changes: 1 addition & 1 deletion lm_eval/tasks/arabicmmlu/arabicmmlu_islamic_studies.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"dataset_name": "Islamic Studies"
"tag": "arabicmmlu_humanities_tasks"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_humanities_tasks"
"task": "arabicmmlu_islamic_studies"
"task_alias": "Islamic Studies"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"dataset_name": "Islamic Studies (High School)"
"include": "_default_arabicmmlu_template_yaml"
"tag": "arabicmmlu_humanities_tasks"
"task": "arabicmmlu_islamic_studies_high_school"
"task_alias": "Islamic Studies (High School)"
Loading