Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket map based L7 context-propagation #1396

Merged
merged 38 commits into from
Nov 26, 2024

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented Nov 22, 2024

This PR adds L7 context propagation by leveraging a socket map. The code runs as follows:

We add a sock_ops program which monitors for new sockets established. These sockets are stored in a sockhash map, using our connection info as a key. We mitigate finding already established sockets in TC by using the fact that sockmaps can also be manipulated directly with bpf_map_update_elem. See patch https://lore.kernel.org/bpf/[email protected]/.

We add a sock_msg program which detects ongoing outgoing http requests and extends the packet to add the missing header space. This program cannot write BPF memory (unless we use the undesirable bpf_probe_write_mem), so we only extend the packet and record metadata. This is the recommended approach based on the BPF docs.
In TC we look up the metadata and write the 'traceparent' value.

The writing of the memory in TC is very complex and I'm not certain it needs to be. Essentially, in all my experiments the adding of extra memory in sock_msg results in separate unique empty TCP packet of the exact size. We cover this case in the "fast path". If we can confirm by reading the kernel code that this will always be the case we can remove the written accounting and the custom memcpy I added.

I hit issues with the verifier about the handling of IPv6. It worked on my small prototype, but not when I merged the code in Beyla. We need to fix this.

Note: BEYLA_BPF_TC_CP enables now both L4 and L7 context propagation.

TODO:

  • Fix support for IPv6 (with help from @rafaelroquetto)
  • Understand if we can safely rely on separate TCP packet for the newly added space and remove the memory write complexity.
  • Document the code
  • Tests (we now should be able to enable the skipped test with Docker compose)

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 68.04124% with 31 lines in your changes missing coverage. Please review.

Project coverage is 72.41%. Comparing base (65111df) to head (e5136ff).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
pkg/internal/ebpf/instrumenter.go 71.42% 8 Missing and 4 partials ⚠️
pkg/internal/ebpf/common/common_linux.go 0.00% 8 Missing ⚠️
pkg/internal/ebpf/tracer_linux.go 0.00% 6 Missing and 2 partials ⚠️
pkg/internal/ebpf/tctracer/tctracer.go 89.65% 2 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (65111df) and HEAD (e5136ff). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (65111df) HEAD (e5136ff)
unittests 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1396      +/-   ##
==========================================
- Coverage   81.01%   72.41%   -8.60%     
==========================================
  Files         146      145       -1     
  Lines       14731    14813      +82     
==========================================
- Hits        11934    10727    -1207     
- Misses       2213     3374    +1161     
- Partials      584      712     +128     
Flag Coverage Δ
integration-test 59.14% <68.04%> (+0.55%) ⬆️
k8s-integration-test 59.90% <67.01%> (+0.03%) ⬆️
oats-test 33.94% <9.27%> (-0.16%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -215,4 +215,27 @@ handle_ssl_buf(void *ctx, u64 id, ssl_args_t *args, int bytes_len, u8 direction)
}
}

static __always_inline void *is_ssl_connection(u64 id) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was only moved so we can reuse it.

u8 d_addr[IP_V6_ADDR_LEN];
union {
u8 s_addr[IP_V6_ADDR_LEN];
u32 s_ip[IP_V6_ADDR_LEN_WORDS];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a quality of life way to access this data 4 bytes at a time.

@@ -0,0 +1,72 @@
#ifndef TC_COMMON_H
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was just extracted from tc_tracer_l7 because we use it now in both places.

@@ -258,7 +258,7 @@ func (p *Tracer) KProbes() map[string]ebpfcommon.FunctionPrograms {
Required: true,
Start: p.bpfObjects.KprobeTcpClose,
},
"tcp_sendmsg": {
"tcp_sendmsg_locked": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needed this change since sock_msg attaches to tcp_sendmsg and our probes don't run at all, and we need them to check for various things. Attaching lower to the locked version of sendmsg works. We can't read the bvec buffers that sock_msg makes, but our logic still runs.

@grcevski grcevski marked this pull request as ready for review November 25, 2024 20:57
Copy link
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! This is a game changer! LGTM - feel free to ignore my nits.

Comment on lines +9 to +12
const char TP[] = "Traceparent: 00-00000000000000000000000000000000-0000000000000000-01\r\n";
const u32 EXTEND_SIZE = sizeof(TP) - 1;
const char TP_PREFIX[] = "Traceparent: ";
const u32 TP_PREFIX_SIZE = sizeof(TP_PREFIX) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory these should also be static. In practice this may not matter since we only ever have one translation unit for our eBPF programs.

Comment on lines +105 to +107
// The order of copying the data from bpf_sock_ops matters and must match how
// the struct is laid in vmlinux.h, otherwise the verifier thinks we are modifying
// the context twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably have removed this comment, it no longer applies

Comment on lines +73 to +75
// The order of copying the data from bpf_sock_ops matters and must match how
// the struct is laid in vmlinux.h, otherwise the verifier thinks we are modifying
// the context twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The order of copying the data from bpf_sock_ops matters and must match how
// the struct is laid in vmlinux.h, otherwise the verifier thinks we are modifying
// the context twice.
// The order of copying the data from bpf_sock_ops matters. A different order causes the compiler to
// generate code that modifies the pointer to the context (which is not accepted by the verifier) rather than
// generating code that relies on the pointer to the context + an immediate offset, for instance:
// r4 = *(u32*) r1 + 8 // OK
// vs
// r2 = 8
// r1 += r2
// r4 = *(u32*) r1 // ERROR

bpf/tc_sock.h Show resolved Hide resolved
Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing 👏🏻

@grcevski grcevski merged commit d2a8a83 into grafana:main Nov 26, 2024
15 checks passed
@grcevski grcevski deleted the socket_tracker_temp branch November 26, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants