-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation of px or 5% Error onKITTI 2015.(table 4) #71
Comments
Hi @Miaowei-HNU these results are from KITTI test data, calculated by KITTI website. ‘bg’ refers to background. ‘fg’ refers to foreground. |
Thank you for your reply |
Hi @mli0603 ,I feel that my fine-tuning result is close to yours, but the L1_raw is always very high. Is L1_raw necessary? It can be seen from the code that the difference between L1_raw and L1 is disp_pred with different resolutions. |
Hi @Miaowei-HNU , L1-raw is the metric of the cross-attention raw disparity at a lower resolution, which ideally should be low similarly to L1. In KITTI 2015 however, we have identified that the occlusion mask is ill-posed (our follow up paper in ECCV). Thus, the large error you see is mostly in the occlusion region (you can also visualize the raw disparity to see what is going on). The context adjustment layer learns to smooth out the occlusion errors in raw disparity map, thus leading to a much lower L1 error in the final estimation. What does this mean? KITTI 2015 gives an unfair evaluation against our approach and STTR has to unlearn the "correct" estimation from transformer and learns the "incorrect" estimation from the context-adjustment layer. I hope this helps. |
Hello, I would like to ask if Table 4 is the model uploaded to KITTI website for testing? If not, how do I calculate them, and does bg refer to the occluded area, and does fg refer to the non-occluded area?
The text was updated successfully, but these errors were encountered: