Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RunTimeError: CUDA out of memory // Requirements on Graphic card? #25

Open
HartmannSa opened this issue Nov 26, 2020 · 10 comments
Open

Comments

@HartmannSa
Copy link

Hi,

while executing
python -m cosypose.scripts.run_cosypose_eval --config tless-siso
I receive the following error message:

RuntimeError: CUDA out of memory. Tried to allocate 1.35 GiB (GPU 0; 5.93 GiB total capacity; 1.47 GiB already allocated; 866.50 MiB free; 36.31 MiB cached)

According to my internet research a reduction of the batch size is recommended. However, I don't know where to set it and in my understanding the batch size shouldn't play any role in this command, since I use the already pre-trained network?!

Could the cause of the error be that there are certain hardware requirements for reproducing the results?
I am using Ubuntu 18.04.5 LTS and an NVIDIA GeForce GTX 1060 6GB (and the nvidia-driver-450).

Here is a larger part of my terminal output:

1:06:35.398140 - Scene: [6]
1:06:35.398203 - Views: [359]
1:06:35.398260 - Group: [2732]
1:06:35.398285 - Image has 5 gt detections. (not used)
1:06:35.701966 - Pose prediction on 4 detections (n_iterations=1): 0:00:00.063503
1:06:35.954221 - Pose prediction on 4 detections (n_iterations=4): 0:00:00.250793
1:06:35.720832 - --------------------------------------------------------------------------------
100%|███████████████████████████████████████████████████████████| 10080/10080 [1:06:24<00:00, 2.53it/s]
1:06:47.763242 - Done with predictions
100%|█████████████████████████████████████████████████████████████| 10080/10080 [39:28<00:00, 4.26it/s]
1:46:18.765271 - Skipped: pix2pose_detections/coarse/iteration=1 (N=50023)
1:46:18.765351 - Skipped: pix2pose_detections/refiner/iteration=1 (N=50023)
1:46:18.765377 - Skipped: pix2pose_detections/refiner/iteration=2 (N=50023)
1:46:18.765398 - Skipped: pix2pose_detections/refiner/iteration=3 (N=50023)
1:46:18.765419 - Evaluation : pix2pose_detections/refiner/iteration=4 (N=50023)
0%| | 0/10080 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/rosmatch/anaconda3/envs/cosypose/lib/python3.7/runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "/home/rosmatch/anaconda3/envs/cosypose/lib/python3.7/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/home/rosmatch/cosypose/cosypose/scripts/run_cosypose_eval.py", line 491, in
main()
File "/home/rosmatch/cosypose/cosypose/scripts/run_cosypose_eval.py", line 433, in main
eval_metrics[preds_k], eval_dfs[preds_k] = eval_runner.evaluate(preds)
File "/home/rosmatch/cosypose/cosypose/evaluation/eval_runner/pose_eval.py", line 67, in evaluate
meter.add(obj_predictions, obj_data_gt.to(device))
File "/home/rosmatch/cosypose/cosypose/evaluation/meters/pose_meters.py", line 172, in add
cand_infos['label'].values)
File "/home/rosmatch/cosypose/cosypose/evaluation/meters/pose_meters.py", line 101, in compute_errors_batch
errors.append(self.compute_errors(TXO_pred
, TXO_gt
, labels_))
File "/home/rosmatch/cosypose/cosypose/evaluation/meters/pose_meters.py", line 70, in compute_errors
dists = dists_add_symmetric(TXO_pred, TXO_gt, points)
File "/home/rosmatch/cosypose/cosypose/lib3d/distances.py", line 16, in dists_add_symmetric
dists_norm_squared = (dists ** 2).sum(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 1.35 GiB (GPU 0; 5.93 GiB total capacity; 1.47 GiB already allocated; 866.50 MiB free; 36.31 MiB cached)

@salimkhazem
Copy link

Hello, i have the same issue, did u fix it ?
thanks a lot for your answer

@JohannesAma
Copy link

Same here

@yupei-git
Copy link

This may be done by changing the batch_size in run_pose_training.py.

@JohannesAma
Copy link

I solved this with following changes:
bullet_batch_renderer.py -> workers 8 to 1
multiview_predictor.py -> batch size(nsym) 64 to 1
run_bop_inference.py -> workers 8 to 1

@AlexandraPapadaki
Copy link

Same here. Is there any other suggestion? Unfortunately, Johannes's solution didnt work for me.
@JohannesAma did it really work for you for the siso tless case?

@JohannesAma
Copy link

Same here. Is there any other suggestion? Unfortunately, Johannes's solution didnt work for me.
@JohannesAma did it really work for you for the siso tless case?

My nvidia card has 8gb of storage, maybe yours is smaller and you have to reduce batch size and workers in some more modules that are used in the siso tless case.

@smoothumut
Copy link

I have the same problem and the suggested solution didnt work. Is there any solution ??? thanks in advance

@JohannesAma
Copy link

Im sorry I dont know about another solution
Workers and batchsize are the parameters which define the load on the grafics card
Maybe you have to set them even smaller.

@nturaymond
Copy link

The main reason for this problem is that the data set evaluated by the evaluation is too large, and the GPU memory for running the program is less than 8GB.
The The root cause is this line of code: run_cosypose_eval.py Line 443
eval_metrics[preds_k], eval_dfs[preds_k] = eval_runner.evaluate(preds)

Possible Solution:

  1. Go to folder "local_data" to delete some data. Then perform pre-training, usually, the results will not be a problem, and then execute the process of evaluation again.
  2. Discard GPU usage. Transfer all data, models to the CPU (requires constant code debugging)
  3. Modify the model to use AMP. However, the workload is large, and it is easy to cause the entire program to be difficult to execute if you are not careful.

In fact, the process of performing evaluation is not just to verify whether the results are correct. This model can be used to evaluate other datasets, and if the results are correct, it is not good. The main modification part is LOCAL_DATA_DIR.

@KushnirDmytro
Copy link

KushnirDmytro commented Apr 11, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants