-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket map based L7 context-propagation #1396
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1396 +/- ##
==========================================
- Coverage 81.01% 72.41% -8.60%
==========================================
Files 146 145 -1
Lines 14731 14813 +82
==========================================
- Hits 11934 10727 -1207
- Misses 2213 3374 +1161
- Partials 584 712 +128
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -215,4 +215,27 @@ handle_ssl_buf(void *ctx, u64 id, ssl_args_t *args, int bytes_len, u8 direction) | |||
} | |||
} | |||
|
|||
static __always_inline void *is_ssl_connection(u64 id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was only moved so we can reuse it.
u8 d_addr[IP_V6_ADDR_LEN]; | ||
union { | ||
u8 s_addr[IP_V6_ADDR_LEN]; | ||
u32 s_ip[IP_V6_ADDR_LEN_WORDS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a quality of life way to access this data 4 bytes at a time.
@@ -0,0 +1,72 @@ | |||
#ifndef TC_COMMON_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was just extracted from tc_tracer_l7 because we use it now in both places.
@@ -258,7 +258,7 @@ func (p *Tracer) KProbes() map[string]ebpfcommon.FunctionPrograms { | |||
Required: true, | |||
Start: p.bpfObjects.KprobeTcpClose, | |||
}, | |||
"tcp_sendmsg": { | |||
"tcp_sendmsg_locked": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We needed this change since sock_msg attaches to tcp_sendmsg and our probes don't run at all, and we need them to check for various things. Attaching lower to the locked version of sendmsg works. We can't read the bvec buffers that sock_msg makes, but our logic still runs.
Ensure the verifier passes for the IPv6 code
Fix clang-tidy warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! This is a game changer! LGTM - feel free to ignore my nits.
const char TP[] = "Traceparent: 00-00000000000000000000000000000000-0000000000000000-01\r\n"; | ||
const u32 EXTEND_SIZE = sizeof(TP) - 1; | ||
const char TP_PREFIX[] = "Traceparent: "; | ||
const u32 TP_PREFIX_SIZE = sizeof(TP_PREFIX) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory these should also be static. In practice this may not matter since we only ever have one translation unit for our eBPF programs.
// The order of copying the data from bpf_sock_ops matters and must match how | ||
// the struct is laid in vmlinux.h, otherwise the verifier thinks we are modifying | ||
// the context twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably have removed this comment, it no longer applies
// The order of copying the data from bpf_sock_ops matters and must match how | ||
// the struct is laid in vmlinux.h, otherwise the verifier thinks we are modifying | ||
// the context twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The order of copying the data from bpf_sock_ops matters and must match how | |
// the struct is laid in vmlinux.h, otherwise the verifier thinks we are modifying | |
// the context twice. | |
// The order of copying the data from bpf_sock_ops matters. A different order causes the compiler to | |
// generate code that modifies the pointer to the context (which is not accepted by the verifier) rather than | |
// generating code that relies on the pointer to the context + an immediate offset, for instance: | |
// r4 = *(u32*) r1 + 8 // OK | |
// vs | |
// r2 = 8 | |
// r1 += r2 | |
// r4 = *(u32*) r1 // ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing 👏🏻
This PR adds L7 context propagation by leveraging a socket map. The code runs as follows:
We add a sock_ops program which monitors for new sockets established. These sockets are stored in a sockhash map, using our connection info as a key. We mitigate finding already established sockets in TC by using the fact that sockmaps can also be manipulated directly with bpf_map_update_elem. See patch https://lore.kernel.org/bpf/[email protected]/.
We add a sock_msg program which detects ongoing outgoing http requests and extends the packet to add the missing header space. This program cannot write BPF memory (unless we use the undesirable bpf_probe_write_mem), so we only extend the packet and record metadata. This is the recommended approach based on the BPF docs.
In TC we look up the metadata and write the 'traceparent' value.
The writing of the memory in TC is very complex and I'm not certain it needs to be. Essentially, in all my experiments the adding of extra memory in sock_msg results in separate unique empty TCP packet of the exact size. We cover this case in the "fast path". If we can confirm by reading the kernel code that this will always be the case we can remove the written accounting and the custom memcpy I added.
I hit issues with the verifier about the handling of IPv6. It worked on my small prototype, but not when I merged the code in Beyla. We need to fix this.
Note: BEYLA_BPF_TC_CP enables now both L4 and L7 context propagation.
TODO: