Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support llama3.2 vision #5555

Merged
merged 5 commits into from
Nov 23, 2024
Merged

Support llama3.2 vision #5555

merged 5 commits into from
Nov 23, 2024

Conversation

marko1616
Copy link
Contributor

@marko1616 marko1616 commented Sep 26, 2024

🚀 What does this PR do?

Support Llama-3.2-11B-Vision.

✅ Before submitting

🔗 Linked issues

Fixes #5549
Fixes #5796

⚠️ IMPORTANT

bitsandbytes 8 bits quantization is not functional. 4 bits is okay but not 8 bits.

images = [Image.open(image) if isinstance(image, str) else image for image in images]
image_features = processor.image_processor(images)
_ = image_features.pop("num_tiles")
image_features = {k: v if isinstance(v, torch.Tensor) else torch.tensor(v) for k, v in image_features.items()}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@marko1616 marko1616 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because we can't get text at get_mm_inputs how do you think to fix this? Like add a new stage or add text input to get_mm_inputs.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we should do some work here

src/llamafactory/data/template.py Outdated Show resolved Hide resolved
src/llamafactory/data/mm_plugin.py Outdated Show resolved Hide resolved
@marko1616 marko1616 changed the title Support llama3.2vl. Support llama3.2vl(WIP). Sep 26, 2024
@hiyouga hiyouga added the pending This problem is yet to be addressed label Sep 29, 2024
@marko1616 marko1616 marked this pull request as draft October 7, 2024 08:40
@hiyouga hiyouga marked this pull request as ready for review November 23, 2024 16:09
@hiyouga hiyouga changed the title Support llama3.2vl(WIP). Support llama3.2 vision Nov 23, 2024
@hiyouga hiyouga self-requested a review November 23, 2024 18:26
Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hiyouga
Copy link
Owner

hiyouga commented Nov 23, 2024

Verified on Llama3.2 11B vision instruct
image

@hiyouga hiyouga merged commit e68ef89 into hiyouga:main Nov 23, 2024
12 checks passed
@marko1616
Copy link
Contributor Author

Ah, Thank for complete this.

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Nov 23, 2024
@marko1616 marko1616 deleted the feat/llama3.2vl branch November 23, 2024 18:53
@hiyouga
Copy link
Owner

hiyouga commented Nov 23, 2024

It requires ~24GB for fine-tuning
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
2 participants