Skip to content

An official implementation for the paper "Ensemble Distillation for Unsupervised Constituency Parsing"

Notifications You must be signed in to change notification settings

MANGA-UOFA/ED4UCP

Repository files navigation

ED4UCP

An official implementation for the paper "Ensemble Distillation for Unsupervised Constituency Parsing."

PWC

Install

conda create -n ED4UCP python=3.9
conda activate ED4UCP
while read requirement; do pip install $requirement; done < requirements.txt 

Evaluation

Evaluate using the commandline interface

Sentence-level F1 score of a prediction w.r.t. some reference

python evaluate.py --ref path_to_reference_treebank.txt --pred path_to_predicted_treebank.txt

Replicating papers' tables and figures

python replicate.py [--Table1] [--Table2] [--Table3] [--Table4] [--Table6] [--Figure1] [--Figure2] [--Figure4]

Keep those you want to re-evaluate. Discard []s.

Import and use in python

Sentence-level F1 score

from KimEval import sentence_level_f1

f1 = sentence_level_f1(ref=list_of_reference_trees, pred=list_of_predicted_trees, round_it=False)

Use bracket-based representations of trees. Include words and constituent labels. Constituent labels don't matter. You can put any character (like X) instead of all of them.

Corpus-level constituency Label Recall

from KimEval import constituency_label_recall

recall, coverage = constituency_label_recall(ref=list_of_reference_trees, pred=list_of_predicted_trees, round_it=False)

Use bracket-based tree representations. Recall is based on constituent labels in reference trees.

Ensemble

Ensemble using the commandline interface

python ensemble.py \
  [--Run <RUN_ID>] [--Run <ANOTHER_RUN_ID>] \
  [--combination_of_the Bests] [--combination_of_the Worsts] \
  [--references <PATH1> <PATH2> ...] \
  [--MBR_mode <generative|selective>] \
  [--file_name <FILE_NAME>] \
  [--write_directory <WRITE_DIRECTORY>] \
  [--output_file_name <OUTPUT_FILE_NAME>] \
  [--Run_all]

Arguments:

  • --Run <RUN_ID>: Specify the run ID to process. It is according to the experiments' guide. This argument can be appended multiple times to process multiple runs.
  • --combination_of_the <Bests|Worsts>: This allows you to specify whether you want to create ensembles of the best (or worst) models across runs, according to the ensemble guide. This argument can be appended multiple times (to cover both).
  • --references <PATH1> <PATH2> ...: Define one or multiple reference file paths. You can specify multiple paths separated by spaces. It will create an ensemble of all of them.
  • --MBR_mode <generative|selective>: Specify the mode as either generative or selective. Details are provided in the paper. Defaults to generative.
  • --file_name <FILE_NAME>: Specify the file name for --Run and --combination_of_the. Defaults to the value of TEST_FILE_NAME defined in the constants.
  • --write_directory <WRITE_DIRECTORY>: Specify the directory where the output should be written. If not specified, it defaults to the directory composed of MOTHER_PATH and the respective MBR_PATH for the chosen MBR_mode (see the constants).
  • --output_file_name <OUTPUT_FILE_NAME>: Define the name of the output file. If not specified, it defaults to the value of --file_name.
  • --Run_all: If set, it will run all available runs specified in the guide, as well as both combinations of the bests and worsts.

Note: Having multiple ensemble strategies (using --Run, combination_of_the <Bests|Worsts>, or --references) will result in different ensembles, one per each.

Import and use in python

from library.ensemble import ensemble

ensemble_treebank = ensemble(
  references=list_of_reference_treebanks,
  MBR_mode='generative', # or 'selective'
  right=False, # If True, it will add the right-branching heuristic as an additional reference treebank
)

Each reference treebank is a list of bracket-based trees.

Teachers

Each teacher appears as a directory in the teachers directory. In each directory, you will find a GitHub submodule to the commit of the original codebase that I used, and maybe a few postprocessing scripts.


About

An official implementation for the paper "Ensemble Distillation for Unsupervised Constituency Parsing"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages