You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear authors,
I was trying to reimplement the Dolma-Web described in your paper.
However, in the Step 2, using the dolma toolkit, I found Gopher implementation in this repo something different with original Gopher at http://arxiv.org/abs/2112.11446.
Specifically,
There are no computations for 'Duplicate paragraph fraction' and 'Duplicate paragraph character fraction' in current code at /python/dolma/taggers.py , which are provided in Table A1 in the Gopher paper.
Is this a bug or there is no need to compute these metrics? Looking forward to your kind reply.
Best regards,
Xinlin Zhuang
The text was updated successfully, but these errors were encountered:
Dear authors,
I was trying to reimplement the Dolma-Web described in your paper.
However, in the Step 2, using the dolma toolkit, I found Gopher implementation in this repo something different with original Gopher at http://arxiv.org/abs/2112.11446.
Specifically,
There are no computations for 'Duplicate paragraph fraction' and 'Duplicate paragraph character fraction' in current code at /python/dolma/taggers.py , which are provided in Table A1 in the Gopher paper.
Is this a bug or there is no need to compute these metrics? Looking forward to your kind reply.
Best regards,
Xinlin Zhuang
The text was updated successfully, but these errors were encountered: