mAP NAN while training with custom dataset #1765

YCAyca · 2024-03-27T12:08:33Z

Hello, I try to train YOLOX-X using my custom dataset in COCO format. While it was fine with a small version of my dataset (~3K images) using the default settings in yolox_base.py (only batch size set to 4, due to the lack of GPU memory), when I train it using the big version of my dataet (~10K) I cant find any solution to prevent getting NAN for each class, since the first epoch. I have tried many things:

Decreasing self.basic_lr_per_img by 10
Decreasing self.basic_lr_per_img by 100
Decreasing epoch number to 100, to 75 (thinking that the yolox cosine warmup learning rate scheduler gets arranged according to the total iteration number, and since my iteration per epoch is 3x more now, decreasing max epoch could...)
Using multi GPU so that I can put batch size = 16, which gives me very similar iteration number per epoch with my previous training

NONE of them worked. I don't know what to do else, is anyone have any idea?
Apart from that, I have checked my COCO format labels in the platform that I used to convert my labels, and they all seem fine. But maybe in YOLOX dataloader something is wrong, how can I visualize my ground truths easily after loading a batch in YOLOX training???

YCAyca · 2024-03-27T12:27:08Z

Another weird thing is that all the losses are 0 since the first iteration, except conf_loss which becomes directly the total loss:

benoitboidin · 2024-07-02T08:39:43Z

How many samples are there in your validation dataset? A common way to get NaN values is not to have enough samples in each classe to perform an evaluation.

For instance, if your validation dataset contains 0 "car", your "car" mAP will always beNaN. But also, if you have too few car, the validation batch may not contain any (since YOLOX use a random subset of this dataset for each epoch evaluation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mAP NAN while training with custom dataset #1765

mAP NAN while training with custom dataset #1765

YCAyca commented Mar 27, 2024

YCAyca commented Mar 27, 2024

benoitboidin commented Jul 2, 2024

mAP NAN while training with custom dataset #1765

mAP NAN while training with custom dataset #1765

Comments

YCAyca commented Mar 27, 2024

YCAyca commented Mar 27, 2024

benoitboidin commented Jul 2, 2024