Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model trained with RTX 3090 not working with A100 #89

Open
LeoZHANGboy opened this issue Aug 26, 2024 · 3 comments
Open

Model trained with RTX 3090 not working with A100 #89

LeoZHANGboy opened this issue Aug 26, 2024 · 3 comments

Comments

@LeoZHANGboy
Copy link

Hi authors,

Thanks a lot for your excellent work, and code.

Currently I'm facing a problem with this project. I built up the environment with RTX3090 GPU, and trained a model with Waymo segmentation dataset. It works perfectly with RTX3090, while it doesn't work if I load the built-up envieonment with A100. It does not work for inference with the trained model by RTX3090 and it also does not work when I try to train the same model with A100.

Do you have any ideas how to fix it?

Thank you!

@Gofinge
Copy link
Member

Gofinge commented Aug 26, 2024

load the built-up envieonment with A100

Hi, may I confirm that you directly load the environment build on 3090? If so, note that a100 and 3090 are different architectures and should build different environments from scratch. If not, could you provide the error message?

@LeoZHANGboy
Copy link
Author

Thanks a lot with your reply!

yes, I directly load the environment build on 3090. So it is recommanded to rebuild an environment from the scratch on a100? or is it possible that I only rebuild/reinstall some of the packages in the environment. And what do you mean by "architectures"? May I have some references for it? I am trying to find an easier way to make it work.

@Gofinge
Copy link
Member

Gofinge commented Sep 11, 2024

Thanks a lot with your reply!

yes, I directly load the environment build on 3090. So it is recommanded to rebuild an environment from the scratch on a100? or is it possible that I only rebuild/reinstall some of the packages in the environment. And what do you mean by "architectures"? May I have some references for it? I am trying to find an easier way to make it work.

Yes, or you can build all cuda libraries with a long cuda arch list, e.g.:

TORCH_CUDA_ARCH_LIST="7.5 8.0" python  setup.py install

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants