-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix #6103
base: main
Are you sure you want to change the base?
bugfix #6103
Conversation
I cannot get why LLaMA-Factory/src/llamafactory/data/processors/supervised.py Lines 178 to 183 in c8f1998
|
it seems that |
besides, when change to the |
There is no remaining sample in
|
yes there is remaining ,below sample show losing data index[13]
|
you must print the |
What does this PR do?
Fixes # (neat pack lose data bug)
in method
preprocess_packed_supervised_dataset
this lineindex= length2indexes[length].pop()
lose data and lead to degenerated accuracy when there is more than 1 sample keyed by length,
thus add
preprocess_packed_supervised_dataset_fullDataGroup
use 'noDegenerateGroups' keep all data then packing,result in complete data trained