Distractor document chunk size #272

cemremengu · 2024-03-20T07:43:03Z

cemremengu
Mar 20, 2024

I am currently checking out RAFT. In this part we need to append 4 randomly selected chunks as distractor documents for each QA pair according to algorithm. However, the appended chunks in the example datapoint are shorter than 512 tokens.

Are they just truncated or I am missing something else?

Rinnolo · 2024-03-25T03:53:06Z

Rinnolo
Mar 25, 2024

This part looks like the example datapoint has been compressed for ease of presentation. They should be 512 tokens because the raft.py shows the distractor documents were added by the following code:

        # add num_distract distractor docs
        docs = [chunk]
        indices = list(range(0, len(chunks)))
        indices.remove(i)
        for j in random.sample(indices, num_distract):
            docs.append(chunks[j])

However,I do not tend to discredit this paper, I'm puzzled by this paper because many details were missed, such as how to difference the oracle documents $D*$ and the distractor documents $d_i$? which dataset were used when training? And the eval.py doesn't seem to use any local model, just only the OpenAI model.

If this is just something I missed when I read the paper, I hope I can be told where the answers to these questions are ;(

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distractor document chunk size #272

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Distractor document chunk size #272

cemremengu Mar 20, 2024

Replies: 1 comment

Rinnolo Mar 25, 2024

cemremengu
Mar 20, 2024

Rinnolo
Mar 25, 2024