Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results #3

Open
carlos-gemmell opened this issue Oct 28, 2020 · 1 comment
Open

Reproducing results #3

carlos-gemmell opened this issue Oct 28, 2020 · 1 comment

Comments

@carlos-gemmell
Copy link

Hi

I am trying to reproduce the numbers stated in the paper for appropriate comparisons to a paper I am writing. But when I run the following command I get a corpus BLEU score of 30.69.

. scripts/conala/test.sh ../external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin 2>&1
load model from [../external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin]
Decoding: 100%|██████| 500/500 [02:39<00:00,  3.13it/s]
{'corpus_bleu': 0.30694588794625494, 'oracle_corpus_bleu': 0.4181369862278688, 'avg_sent_bleu': 0.2376696401071103, 'oracle_avg_sent_bleu': 0.3983062032090926, 'exact_match': 0.028, 'oracle_exact_match': 0.084}

I am guessing the reranker is not used in the generation of the results.

To solve this I accessed the testing function directly to generate hyps and evaluate them with the same BLEU functions.

model_file='external_repos/external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin'
reranker_file = 'external_repos/external-knowledge-codegen/best_pretrained_models/reranker.conala.vocab.src_freq3.code_freq3.mined_100000.intent_count100k_topk1_temp5.bin'
self.parser = StandaloneParser('default_parser',
                              model_file,
                              'conala_example_processor',
                              beam_size=15,
                              cuda=True,
                              reranker_path=reranker_file)

This gives me a similar corpus BLEU score of 30.078 and an average sentence BLEU score with NLTK with smooth_fn3 of 25.295.

What are the necessary commands in sequence to get the score from the paper?

@frankxu2004
Copy link
Member

Sorry for the late reply. The 30.69 result you get is correct as is shown in the paper (without reranking)

To perform reranking, follow this part https://github.com/neulab/external-knowledge-codegen#reranking

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants