Skip to content

Commit

Permalink
[fixbug]: Fixed the issue in MinhashBuildIndex where get_datafolder w…
Browse files Browse the repository at this point in the history
…as not used to obtain DataFolder for input_folder and output_folder. (#307)

Fixed the issue in MinhashBuildIndex where get_datafolder was not used to obtain DataFolder for input_folder and output_folder.
  • Loading branch information
Youggls authored Nov 27, 2024
1 parent 371c014 commit fe81883
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/datatrove/pipeline/dedup/minhash.py
Original file line number Diff line number Diff line change
Expand Up @@ -574,8 +574,8 @@ def __init__(
lines_to_buffer: int = 5,
):
super().__init__()
self.input_folder = input_folder
self.output_folder = output_folder
self.input_folder = get_datafolder(input_folder)
self.output_folder = get_datafolder(output_folder)
self.config = config or MinhashConfig()
self.index_name = index_name
self.lines_to_buffer = lines_to_buffer
Expand Down

0 comments on commit fe81883

Please sign in to comment.