Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of px or 5% Error onKITTI 2015.(table 4) #71

Open
Miaowei-HNU opened this issue Jul 29, 2022 · 4 comments
Open

Evaluation of px or 5% Error onKITTI 2015.(table 4) #71

Miaowei-HNU opened this issue Jul 29, 2022 · 4 comments

Comments

@Miaowei-HNU
Copy link

Hello, I would like to ask if Table 4 is the model uploaded to KITTI website for testing? If not, how do I calculate them, and does bg refer to the occluded area, and does fg refer to the non-occluded area?
image

@mli0603
Copy link
Owner

mli0603 commented Jul 29, 2022

Hi @Miaowei-HNU these results are from KITTI test data, calculated by KITTI website.

‘bg’ refers to background. ‘fg’ refers to foreground.

@Miaowei-HNU
Copy link
Author

Thank you for your reply

@Miaowei-HNU
Copy link
Author

Hi @mli0603 ,I feel that my fine-tuning result is close to yours, but the L1_raw is always very high. Is L1_raw necessary? It can be seen from the code that the difference between L1_raw and L1 is disp_pred with different resolutions.
image

@mli0603
Copy link
Owner

mli0603 commented Aug 10, 2022

Hi @Miaowei-HNU , L1-raw is the metric of the cross-attention raw disparity at a lower resolution, which ideally should be low similarly to L1. In KITTI 2015 however, we have identified that the occlusion mask is ill-posed (our follow up paper in ECCV). Thus, the large error you see is mostly in the occlusion region (you can also visualize the raw disparity to see what is going on).

The context adjustment layer learns to smooth out the occlusion errors in raw disparity map, thus leading to a much lower L1 error in the final estimation.

What does this mean? KITTI 2015 gives an unfair evaluation against our approach and STTR has to unlearn the "correct" estimation from transformer and learns the "incorrect" estimation from the context-adjustment layer.

I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants