Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About selection of gpu #35

Open
EricStarer opened this issue Jul 25, 2021 · 3 comments
Open

About selection of gpu #35

EricStarer opened this issue Jul 25, 2021 · 3 comments

Comments

@EricStarer
Copy link

how to select gpu when training with multiple gpus, thanks a lot

@sixiaozheng
Copy link
Collaborator

You can do this by setting ${GPU_NUM} and the environment variable CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=${GPU id list} ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}

For example, train a SETR-PUP on Cityscapes dataset with 4 GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 4

@EricStarer
Copy link
Author

thanks a lot but I met this error, how to deal with it...

Traceback (most recent call last):
File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/launch.py", line 173, in
main()
File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/launch.py", line 169, in main
run(args)
File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/run.py", line 624, in run
)(*cmd_args)
File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

@sixiaozheng
Copy link
Collaborator

It may be that your environment is installed incorrectly. It is recommended to check the version of the package or reinstall the environment according to the A from-scratch setup script in the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants