-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Carry #1111: specs-go/config: add Landlock LSM support #1241
base: main
Are you sure you want to change the base?
Conversation
9962bf2
to
c94edbc
Compare
cc @kailun-qin (author of #1111) PTAL |
Before merging this PR, can we have a POC implementation of runc (or crun, youki) to confirm the design? |
opencontainers/runc#3194 POC is here. Still need polish some code and add tests for it. I will carry it |
@@ -340,6 +340,26 @@ For Linux-based systems, the `process` object supports the following process-spe | |||
|
|||
* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`. | |||
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest). | |||
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR should be carefully updated following the latest advances of landlock (rather than simply carried) as some of the spec proposals might be stale now. Pls see https://docs.kernel.org/userspace-api/landlock.html and https://github.com/landlock-lsm/go-landlock for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you got time to continue this PR or I can continue carry(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls go ahead updating this PR. I'll need some time to follow up and dive into the updates of landlock (even for review).
Thanks for working on this! You might want to take a look at this talk about backward and forward compatibility challenges: https://archive.fosdem.org/2023/schedule/event/rust_backward_and_forward_compatibility_for_security_features/ FYI, I plan to review these changes next week. |
FWIW the most common implementation problem so far has been the use of the "refer" right, whose logic unfortunately works slightly differently than for other access rights. I tried to explain this in https://blog.gnoack.org/post/landlock-best-effort/ last year. With Linux 6.7, Landlock also has gained support for restricting the use of bind(2) and connect(2) for TCP. (Landlock ABI version 4) |
Thanks for the important information, I will update this spec in this week ASAP I can |
I wonder what the behavior should be when a new process(e.g. docker exec) jumps into the existing container. |
56a5a1d
to
6730eb2
Compare
I have update the PR, PTAL @kailun-qin @gnoack @l0kod |
// LandlockFSAction used to specify the FS actions that are handled by a ruleset or allowed by a rule. | ||
type LandlockFSAction string | ||
|
||
// Define actions on files and directories that Landlock can restrict a sandboxed process to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation nit: The word "define" is usually unnecessary in docstrings (all of the exported symbols are defining something after all)
Ports []string `json:"ports,omitempty" platform:"linux"` | ||
} | ||
|
||
// LandlockFSAction used to specify the FS actions that are handled by a ruleset or allowed by a rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"LandlockFSAction specifies the FS actions that are handled by a ruleset or allowed by a rule."
("used to specify" is also past tense, which is a bit confusing to read)
// PathBeneath specifies the file-hierarchy typed rules. | ||
PathBeneath []LandlockRulePathBeneath `json:"pathBeneath,omitempty" platform:"linux"` | ||
// PortBeneath specifies the network-socket typed rules. | ||
PortBeneath []LandlockRulePortBeneath `json:"portBeneath,omitempty" platform:"linux"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two types of rules that exist in Landlock are "path beneath" and "net port", in the Linux headers. I would encourage you to stay consistent with that naming, as it will make it easier for users to map your API to the Landlock documentation.
The word "beneath" is referring to the hierarchical nature of file systems -- when an access right is granted on a directory, it also applies to the files beneath that directory. The word "beneath" is inappropriate when used together with ports, as we have no such hierarchy there.
DisableBestEffort bool `json:"disableBestEffort,omitempty" platform:"linux"` | ||
} | ||
|
||
// LandlockRuleset identifies a set of rules (i.e., actions on objects) that need to be handled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation nit: "The Ruleset identifies a set of rules" is a trivial statement which does not provide much additional value, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I think the naming of the "Landlock" and "LandlockRuleset" types is a bit backwards: In Landlock, the "ruleset" (which is a file descriptor under the hood), is actually the thing where users add their "path beneath" and "net port" rules. (Whereas, in this proposal, the "Ruleset" type does not actually hold the rules...? %-))
The "handled access rights" are just the information needed when creating the initial (empty) ruleset. If the kernel, the struct with the access rights is called the "ruleset attributes", but you might as well call it "handled access rights" or something like that, if you have the flexibility in your code to change that name later on.
"$ref": "#/definitions/LandlockRulePathBeneath" | ||
} | ||
}, | ||
"ArrayOfLandlockRulePortBeneaths": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on the other file about the word "beneath" when talking about network ports
The `ruleset` currently contains the following types: | ||
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset. | ||
If no rule explicitly allow them, they should then be forbidden. | ||
* **`handledAssessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are avaliable when the ABI version >= 4. the behavior of the NETWORK typed actions is not used when the ABI version is less than 4 will depend on the **`disableBestEffort`**) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure you meant "Access" here?
The `ruleset` currently contains the following types: | ||
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset. | ||
If no rule explicitly allow them, they should then be forbidden. | ||
* **`handledAssessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are avaliable when the ABI version >= 4. the behavior of the NETWORK typed actions is not used when the ABI version is less than 4 will depend on the **`disableBestEffort`**) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The sentences in parenthesis have broken grammar)
1. bind | ||
2. connect | ||
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict. | ||
* **`disableBestEffort`** (bool, OPTIONAL) the `disableBestEffort` field disables the best-effort security approach for Landlock access rights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is slightly confusing in my mind that the boolean condition has the "inverted" logic here. I assume that in this config format, it is not possible to call this variable "bestEffortFallback" and to make true
the default value instead?)
High level remark: Looks good so far. I added a bunch of documentation nits. As for Landlock-specific comments, I would recommend to re-think some of the type names, and I left comments in these places. I think the bigger chance for mistakes is in the implementation itself. - Are you planning to add that in the same PR, or will that come separately? |
I think it would be great to implement it in a single PR(Actually, I don't know which is better to split the PR. Depend on the ABI version or else) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opencontainers/runc#3194 POC is here. Still need polish some code and add tests for it.
I will carry it
As there are no tests there, can you share the example config you used to test it?
I'm curious about overlayfs and the behavior in old runtimes.
@@ -340,6 +340,52 @@ For Linux-based systems, the `process` object supports the following process-spe | |||
|
|||
* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`. | |||
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest). | |||
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the runtime doesn't know this field? I guess in runc it will be completely ignored, right? In that case, I think we at least need to expose landlock support via the features command too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the runtime will ignore the field. I think it's not necessary to add feature command like other optional features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zheaoli why not? This is a security feature, I can see use cases where upper layers want to enforce this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for replying late. I misunderstand something before. I think it's necessary to export the feature we support for landlock in features command
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset. | ||
If no rule explicitly allow them, they should then be forbidden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work on overlayfs? https://docs.kernel.org/userspace-api/landlock.html#bind-mounts-and-overlayfs
It is very widely used, can you clarify what works and what doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not testing it on overlayfs yet. But I think it still work on overlayfs. Because we will always target to restrict the rule on the merged hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you want to figure out is it working on the following circumstance?
- We got directories A and B, A and B will merged into C
- We give the process write permission for A and B
- The process still have write permission for C or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understandig of the link is, if C is the merged dir by overlayfs, then no, you don't have permissions to write even if you can write to A and B.
But I think if we use C for the rules, it might work. I don't know if we also need permissions in A and B (not 100% clear to me when reading the doc), in which case it will be tricky.
IMHO, it will be nice to understand that, as overlay is a widely used fs for containers. But not sure that will change something on how we support it in OCI, so maybe it is okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/Zheaoli/py-landlock/blob/main/examples/demo.py
I have some overlayfs test case here. PTAL if you got time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zheaoli I can try to download the python modules and run that locally. But it would be way simpler if you just share what works and what doesn't :)
At the OCI layer we don't know the lowerdir nor the upperdir, we just know the merged dir. It will be great to find if we apply a landlock policy restricting the merged dir, what effect that has on reading files AND also writing files.
Knowing if it works, or its limitations, is key to evaluate a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. Here's the full test result:
- A and B are lower dirs, and C is a merged dirs.
- Policy to C would no longer affect A and B. The reverse action would be the same(policy on A and B would not affect C).
- The write and read (or any permission) on C is not depend on the permission on A and B. The permission between C and AB is independent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zheaoli thanks! This LGTM.
I was playing with this, to make sure I'm not missing any case and indeed it seems to work just fine.
To play further, I used the example from: https://github.com/landlock-lsm/go-landlock/blob/efb66220540a9ef86aa0160d15e55f429d5b94d9/examples/go-landlock-configurable/main.go
With this json config:
JSON config:
{
"forbidden_access": [
"read_dir",
"write_file",
"make_reg"
],
"exceptions": [
{
"paths": [
"./rootfs-merged/tmp"
],
"permitted_access": [
"read_dir",
"make_reg",
"write_file"
]
}
],
"best_effort": false
}
And rootfs-merged is an overlayfs that I mounted manually. I tried it works as expected (can't create files on any paths except on ./rootfs-merged/tmp/, can't ls/read files except in that path), also if I chroot into that (chroot rootfs-merged /bin/sh
).
The files are created by overlayfs in the upper dir and all is as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a cursory look, I need to look later at what field are exactly proposed. But this seems in the right direction to me.
Added some comments/issues that I see.
1. bind | ||
2. connect | ||
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict. | ||
* **`disableBestEffort`** (bool, OPTIONAL) the `disableBestEffort` field disables the best-effort security approach for Landlock access rights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't completely work, sadly, due to some historic reasons. I've mentioned this also here: #1241 (comment)
The thing is, if the container runtime doesn't know about landlock, the whole landlock section will be silently ignored. So, you might think you will have this landlock policy or an error, but the truth is, if the runtime doesn't know about landlock, you can end up without a landlock policy at all.
For this reason, we should expose this in the features section too. This way, users of the OCI runtime can know if the runtime knows about landlock and handle that case properly. Without exposing it there, it is not really usable for this security features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we can remove the disableBestEffort field and export the features we support in features command instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do boths, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can output the feature check like this ?
{
"linux": {
"landlock": {
"enabled": true,
"bestEffort": {
"enabled": true // False
},
"fs":{
"enabled": true,
"actions": [
"exectute",
"write_file",
"read_file",
"read_dir",
"remove_dir",
"remove_file",
"make_char",
"make_dir",
"make_reg",
"make_sock",
"make_fifo",
"make_block",
"make_sym",
"refer",
"truncate"
]
},
"network": {
"enabled": true,
"actions": [
"bind",
"connect"
]
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to expose all of that. I think just the landlock object with the enabled field will be enough.
Then, if the runtime doesn't recognize some of the constants or if the kernel ABI is not the one needed, then make the runtime fail?
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset. | ||
If no rule explicitly allow them, they should then be forbidden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zheaoli thanks! This LGTM.
I was playing with this, to make sure I'm not missing any case and indeed it seems to work just fine.
To play further, I used the example from: https://github.com/landlock-lsm/go-landlock/blob/efb66220540a9ef86aa0160d15e55f429d5b94d9/examples/go-landlock-configurable/main.go
With this json config:
JSON config:
{
"forbidden_access": [
"read_dir",
"write_file",
"make_reg"
],
"exceptions": [
{
"paths": [
"./rootfs-merged/tmp"
],
"permitted_access": [
"read_dir",
"make_reg",
"write_file"
]
}
],
"best_effort": false
}
And rootfs-merged is an overlayfs that I mounted manually. I tried it works as expected (can't create files on any paths except on ./rootfs-merged/tmp/, can't ls/read files except in that path), also if I chroot into that (chroot rootfs-merged /bin/sh
).
The files are created by overlayfs in the upper dir and all is as expected.
3. ABI version >= 3: | ||
1. truncate | ||
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict. | ||
* **`portBeneath`** (array of objects, OPTIONAL) is an array of the network-hierarchy typed rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
portBeneath? Is this for all ports below the number? Or is this just a c&p mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's for all ports below the number. Actually I follow style with pathBeneath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, okay, thanks. I'm in an off-site this week, so I'll check this up next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't make sense out of that.
If I want to have a rule for port 21, 22, 53 and 80, it seems complicated to have a portBeneath rule. For filesystems that makes a lot of sense, because is all beneath this (i.e. subdirs and everything below that), this is applied. It's in the nature of fs.
But what does it mean for ports? It seems like a complicated way to express it. Am I missing something?
Oh, it seems @gnoack already commented on this. I fully agree: https://github.com/opencontainers/runtime-spec/pull/1241/files#r1508607132
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, exactly -- this is still confusing and would better be named "net port" in line with the landlock.h
header file.
It does not permit access for all ports below a given number, but only for that specific port number itself, and the word "hierarchy" is also misleading, as there is nothing hierarchical about that (probably copied over from the file system access rights documentation).
@@ -340,6 +340,52 @@ For Linux-based systems, the `process` object supports the following process-spe | |||
|
|||
* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`. | |||
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest). | |||
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. ABI version >= 3: | ||
1. truncate | ||
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict. | ||
* **`portBeneath`** (array of objects, OPTIONAL) is an array of the network-hierarchy typed rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't make sense out of that.
If I want to have a rule for port 21, 22, 53 and 80, it seems complicated to have a portBeneath rule. For filesystems that makes a lot of sense, because is all beneath this (i.e. subdirs and everything below that), this is applied. It's in the nature of fs.
But what does it mean for ports? It seems like a complicated way to express it. Am I missing something?
Oh, it seems @gnoack already commented on this. I fully agree: https://github.com/opencontainers/runtime-spec/pull/1241/files#r1508607132
Needs rebase |
Linux kernel 5.13 adds support for Landlock Linux Security Module (LSM). This allows unprivileged processes to create safe security sandboxes that can securely restrict the ambient rights (e.g. global filesystem access) for themselves. opencontainers#1110 Co-authored-by: Zheao Li <[email protected]> Signed-off-by: Kailun Qin <[email protected]>
6730eb2
to
9016907
Compare
Needs another rebase |
Carry #1111