-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clearlinux docker images does not unpack in containerd #1765
Comments
@lucidprogrammer this is the right place for your bug. #1044 suggests a couple of workarounds to try |
@mythi tried whatever possible, no luck at all. As you see i am testing with the official clearlinux python image. |
with having Clear Linux as your host OS it works
|
Hrm interesting that it works on Clear, @lucidprogrammer what host OS are you running on? |
please review clearlinux/dockerfiles#384 This issue is in aws fargate. (i run production workloads of every other type of container there, only clearlinux fails )
|
it seems there is a similar issue reported on containerd, containerd/containerd#3974 |
Looks different. For me this fails on Fedora 31 ( |
@ZhongbaoShi any findings so far? FWIW, the problem also triggers with
|
This issue can be reproduced with ubuntu 18.04 + containerd 1.3.3 but can not be reproduced on Clear 32400 which is using containerd 1.3.0. unpacking linux/amd64 sha256:98ea6980c3121ef3fbacc0f39ae2c97fa648cc914a280640134b49acc007c1bc... |
This issue could be reproduce on Ubuntu 18.04 + containerd 1.2.10 as well. $ sudo -E ctr -n k8s.io i pull --snapshotter native docker.io/library/clearlinux:latest |
@ZhongbaoShi @hongzhanchen @qzheng527 any updates? |
@mythi, Hongzhan is checking carefully what's gap between image pack by Clear and unpack by containerd. Suppose something miss match now. He will update more analysis soon. |
are you checking this on both Clear Linux host and Fedora/Ubuntu host? |
I found there was similiar issue on containerd containerd/containerd#1785 which was fixed long time ago and it mentioned there was debug patch containerd/cri#463 to check the status of the rootfs after unpack. I am checking if I can use it to debug current issue. |
I added some debug info in containerd and found that it meet error when handling link sbin but before that it handle link lib64 and bin and lib correctly, it seems that sbin link is broken? Mar 09 08:49:28 intel-Z97X-UD5H containerd[17914]: 4096/var/lib/containerd/tmpmounts/containerd-mount374528018 |
Well the hdr.Typeflag looks suspicious? Is that a broken symlink? Maybe containerd processing these doesn't create the /bin symlink to /usr/bin before the /sbin -> /bin symlink. Seems easy enough to test. |
@qzheng527 did some test this afternoon. If you try generate the clearlinux base image following https://docs.01.org/clearlinux/latest/guides/maintenance/container-image-new.html#container-image-new, you will have the same issue. Use scripts to modify all the relative link to “coreutils” to absolute link “/usr/bin/coreutils”, re-generate the image, no issue anymore. |
That's good to know. The last thing that confuses me then is that in the image (if I run on Clear Linux I can ctr run and validate this) /sbin is a symlink to usr/bin not /bin. /bin and /sbin should be exactly the same (and created in the same way in our tooling and even are the same file according to our manifests) so them being different looking at the debug logs makes me suspicious. |
Doing a little test with Clear Linux on containerd 1.3.3 shows it working there as well. |
@bryteise, Actually, it was not easy for me to debug it out. Several days ago , I did not know go language and had not reviewed containerd and ctr code before. The log I added and printed is in function of handleLChmod of archive/tar_unix.go. The handleLChmod is called by createTarFile of archive/tar.go. Yes , as you said, sbin should be symlink. Actually , in createTarFile , it will create symlink if it is. But in problematic case , it detected sbin type is not symlink. hdr.Typeflag = 50 means that it is symblic. As you can see in my debug log, lib and bin and lib64 are all 50 but sbin is not.
|
Here's the fix:
I'm going to submit this to upstream for review. It's not totally clear to me why the previous code worked on some setups and did not on some but the fix itself makes sense regardless:
in the tar file, |
@mythi Ah I wasn't thinking about the fact we hardlink symlinks in swupd. So the current code checks if the tar header says the path is for a symlink, then it stats the linkname (which is supposed to be the target of the symlink), checks there is no error and then checks the fileinfo mode is a symlink and only then chmods the symlink's target (which may not even exist as I don't see any tests for that). I'm a bit confused about why it is chmod'ing symlinks in the first place. As long as the linked file has the right mode, symlinks should be skipped I would think. Your code checks if the symlink's target exists. But no longer tests the file is a symlink like the tar header says it is. So if the tar header is corrupt you'll chmod (but at least in the same way the else if case below your change does) a non-symlink which fixes the case where the file is actually a hardlink like we run into. This maybe masks over a problem in the tar file but since the code didn't raise an error before I can't really say this is worse. You then chmod the symlink as the previous code was doing. So while I would say this is a fix, I'm still thinking that doing it at all is weird.
Seems to work just fine in my testing. |
AFAICS It checks the target is not a symlink and chmods that. |
Ah yes, you are correct there sorry. Looking at the history of the file, handleLChmod is from the original implementation of the tar processing based on the OCI spec so not a lot of detail on why that test is reasonable. Looking at the spec I don't see anything really enlightening about why the test is done in that way. |
It was indeed hard to find the reasons for My patch is submitted here: Let's wait for the comments. |
Sounds good to me. Regardless this isn't a Clear Linux bug (nor can Clear Linux fix the issue for other distros) so I'm closing this issue for now and suggest those interested follow the upstream PR. |
Not sure where to file this bug to get attention. Please review clearlinux/dockerfiles#384
The text was updated successfully, but these errors were encountered: