-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support vision module w8a8 inference #2308
base: main
Are you sure you want to change the base?
Conversation
Conflicts: lmdeploy/lite/apis/calibrate.py lmdeploy/pytorch/configurations/internvl.py
@@ -342,7 +352,8 @@ def calib_search_scale(parser): | |||
|
|||
return parser.add_argument( | |||
'--search-scale', | |||
type=bool, | |||
action='store_true', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is search-scale
time consuming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is time-consuming. By default, it is False.
"""Add argument calib_image to parser.""" | ||
|
||
return parser.add_argument( | ||
'--calib-image', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one image?
@AllentDan @irexyc @RunningLeon |
Accuracy on MMStar
|
Running w8a8 for vision module and awq for llm.
Running w8a8 for both llm and vision module.
Note
--tp
is not supported since the triton kernel can not get the right stream when using accelerate to dispatch modules.