We are so excited to announce Kmesh v0.5.0. First thanks to our contributors for their hard work over the last two months. In release v0.5.0 we have made a lot of great enhancements, including command line tool kmeshctl
, more complete E2E test coverage, better visualization of underlying eBPF information, observability enhancement, fully restart support, improve CNI installer, RBAC in XDP prog. In addition, in this release cycle, many critical bugs have been fixed, some key code has been refactored and more tests have been covered, making Kmesh more stable and robust. The highlights are as follows:
Zero-Down time during kmesh restart
It is amazing now Kmesh can gracefully reload eBPF map and prog after restart , and also no need to re-enroll namespaces or specific pods into kmesh after restart. As a result, the traffic flow is not interrupted during the restart, which is a big benefit to users. After kmesh-daemon restarted, the bpf map configurations will be automatically updated to date.
As early as release v0.4.0, after kmesh restarted it would require all the pods managed by kmesh to be restarted to be re-managed, because this kmesh manage is triggered by cni plug-in. Now it can be done in kmesh-daemon so that pods do not need to be restarted to be re-managed.
Observability enhancement
Now Kmesh supports L4 access log, allowing users to clearly visualize the traffic managed by Kmesh. Note that accesslog is not enabled by default. You can start the accesslog function by modifying the --enable-accesslog
parameter of spec.containers.args
in Kmesh. We will support using kmeshctl
to enable accesslog dynamiclly . At the same time, Grafana addon adapted for Kmesh has been added to better visualize monitoring metrics in various dimensions. Also some key issues were fixed in observability, effectively improving its accuracy and stability.
Offload authorization execution into XDP prog:
As early as release v0.3.0, Kmesh already supported L4 RBAC, but the previous solution was to do rbac in the user space, which had some issues in terms of performance and functionality. Now we have offloaded it into xdp eBPF, and this feature will be truly available.
Currently, authorization rules are moved down to the eBPF map, which provides the capability of performing authorization completely in the eBPF program. When the authz result is to reject, the XDP prog directly drops the request packet so that the client can detect the connection failure.
Better debugability
Added a command line tool kmeshctl
:
Kmesh has its own command line tool! Now you no longer need to exec
into the corresponding Kmesh daemon pod to adjust the log level of the Kmesh daemon or dump the configuration. You can directly use kmeshctl:
# Adjust kmesh-daemon log level (e.g., debug | error | info)
kmeshctl log kmesh-6ct4h --set default:debug
# Dump config
kmeshctl dump kmesh-6ct4h workload
More features will be added to kmeshctl in the future, allowing users to better manage and debug Kmesh.
Better visualization of underlying bpf map:
Previously we have interfaces /debug/config_dump/ads
and /debug/config_dump/workload
to output the config content cached in Kmesh daemon. Due to various reasons, the config in Kmesh daemon cache and the actual eBPF may not be completely consistent. If we can get human-readable eBPF info, it will be more helpful for us to troubleshoot. Now we can get it through interfaces /debug/bpf/*
.
It will also be integrated into kmeshctl later, making it easier to view. And can even be further expanded to determine whether the underlying eBPF is synchronized with the configuration in the Kmesh daemon.
Improve CNI installer:
As cni installer is Kmesh daemon, if the kmesh-daemon crash unexpectedly or the machine suddenly loses power, the CNI doesn’t have chance to uninstall the CNI config. If the kubeconfig’s token installed is expired, no pod can startup successfully after kmesh-daemon exit abnormally. So we make use of the following two methods to resolve:
- Do clean up the cni config at the end of
start_kmesh.sh
- Add a separate go routine in cni installer, update the kubeconfig file once the token file is modified. This can make sure the kubeconfig file does not expire easily.
Support hostnetwork workloads
Now for Kmesh Dual-Engine mode, we support accessing a service with hostnetwork pods.
Performance improvement
In dual-engine mode, we largely optimized the bpf map update during Workload
and Service
response handling by using local cache instead of looping over the bpf map.
Critical Bug Fix
We have also made some big bug fixes:
-
Prevent losing control of traffic during workload resource updates by not deleting the frontend map.
-
Traffic from mamespaced waypoint will be redirected to waypoint again, it falls into a dead loop. Now we skipped managing traffic sent from waypoint.
-
Fixed previously when waypoint processes non-HTTP tcp traffic, it would unexpectedly return HTTP/1.1 400 Bad Request. #681
What's Changed
Full Changelog
* kmesh route samples by @lec-bit in https://github.com//pull/531 * Kmesh Observability by @LiZhenCheng9527 in https://github.com//pull/527 * fix unexpected log by @Okabe-Rintarou-0 in https://github.com//pull/535 * Fix TestPodSidecarLabelChangeTriggersAddIptablesAction flake by @hzxuzhonghu in https://github.com//pull/540 * Modifybpf map update to prevent potential bugs by @weli-l in https://github.com//pull/541 * add codecov config by @LiZhenCheng9527 in https://github.com//pull/537 * use latest waypoint image to run e2e by @YaoZengzeng in https://github.com//pull/554 * add document for deploying and developing in kind by @Okabe-Rintarou-0 in https://github.com//pull/559 * add Copyright check by @LiZhenCheng9527 in https://github.com//pull/561 * add security.md for kmesh by @LiZhenCheng9527 in https://github.com//pull/564 * Add bpf log level getter (#560) by @Okabe-Rintarou-0 in https://github.com//pull/562 * add document about using enhanced kernel by @Okabe-Rintarou-0 in https://github.com//pull/565 * update gitignore for enhanced kernel by @Okabe-Rintarou-0 in https://github.com//pull/572 * Add code spell check github workflow by @Okabe-Rintarou-0 in https://github.com//pull/573 * add badge in readme by @LiZhenCheng9527 in https://github.com//pull/576 * Provide a way to allow setting all logger level to debug by @hzxuzhonghu in https://github.com//pull/557 * Fix `make gen` problem by @Okabe-Rintarou-0 in https://github.com//pull/582 * fix make clean by @Okabe-Rintarou-0 in https://github.com//pull/587 * add some waypoint related E2E test cases by @YaoZengzeng in https://github.com//pull/580 * optimize workload update by @nlgwcy in https://github.com//pull/590 * remove arch info in build process by @Okabe-Rintarou-0 in https://github.com//pull/585 * kmesh security: pod manage by @lec-bit in https://github.com//pull/489 * fix bpf map look up failed by @LiZhenCheng9527 in https://github.com//pull/594 * add configuration to collect kmesh metrics using Prometheus by @LiZhenCheng9527 in https://github.com//pull/589 * Bump the k8s-io group with 3 updates by @dependabot in https://github.com//pull/609 * waypoint should not managed by Kmesh by @LiZhenCheng9527 in https://github.com//pull/611 * remove resync period by @hzxuzhonghu in https://github.com//pull/601 * Fix DNS cluster's endpoint ip addr check by @LiZhenCheng9527 in https://github.com//pull/604 * E2E test cases for service and pod ip access by @YaoZengzeng in https://github.com//pull/596 * remove build arch in documents by @Okabe-Rintarou-0 in https://github.com//pull/622 * Bypass only for sidecar by @hzxuzhonghu in https://github.com//pull/607 * Bump github.com/containernetworking/cni from 1.2.2 to 1.2.3 by @dependabot in https://github.com//pull/624 * update metric_key with direction & dst_port by @nlgwcy in https://github.com//pull/627 * E2E test cases for waypoint management by @YaoZengzeng in https://github.com//pull/625 * Support ipv6 in e2e test by @noobwei in https://github.com//pull/621 * Make kmesh cni and manage controller consitent during pod enrollment by @hzxuzhonghu in https://github.com//pull/623 * kmesh support restart by reload old bpf map and prog by @lec-bit in https://github.com//pull/475 * enable select some e2e cases to run or skip some cases by @YaoZengzeng in https://github.com//pull/638 * copy bytes optimize by @hzxuzhonghu in https://github.com//pull/633 * preclude pod with host network to be managed by kmesh by @hzxuzhonghu in https://github.com//pull/634 * remove bypass from bpf prog by @hzxuzhonghu in https://github.com//pull/635 * Enable cleanup in e2e by @noobwei in https://github.com//pull/649 * Fix kmesh daemon graceful exit by @hzxuzhonghu in https://github.com//pull/651 * Fix TestPodSidecarLabelChangeTriggersAddIptablesAction flake by @hzxuzhonghu in https://github.com//pull/636 * Fixed bug in bpf where IPv4 destination address was stored as IPv6 by @LiZhenCheng9527 in https://github.com//pull/648 * add some secure compilation options by @kwb0523 in https://github.com//pull/658 * fix error log by @weli-l in https://github.com//pull/666 * e2e test for Kmesh daemon restart by @YaoZengzeng in https://github.com//pull/661 * Support ns manage donot need restart pod by @weli-l in https://github.com//pull/676 * kmesh restart with config change by @lec-bit in https://github.com//pull/640 * Bump github.com/docker/docker from 26.0.2+incompatible to 26.1.4+incompatible in the go_modules group by @dependabot in https://github.com//pull/668 * Fix namespaced waypoint service itself contains waypoint address, cau… by @hzxuzhonghu in https://github.com//pull/659 * Short circuit when svc endpoint count = 0 by @hzxuzhonghu in https://github.com//pull/675 * Update bpf log by @hzxuzhonghu in https://github.com//pull/673 * fix lint by @YaoZengzeng in https://github.com//pull/682 * fix data in bpfmap was cleared after kmesh restart in kind env by @weli-l in https://github.com//pull/684 * Add service metric and use ringbuf to report metrics by @LiZhenCheng9527 in https://github.com//pull/688 * Sync up status API with workload api by @hzxuzhonghu in https://github.com//pull/672 * enable e2e test case for ns granularity waypoint by @YaoZengzeng in https://github.com//pull/690 * fix metric nil pointer panic by @LiZhenCheng9527 in https://github.com//pull/706 * Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked by @tacslon in https://github.com//pull/679 * Support hostname network by @Okabe-Rintarou-0 in https://github.com//pull/709 * fix waypoint's processing of TCP by @YaoZengzeng in https://github.com//pull/681 * Support multi-arch images build and push automatically by @hzxuzhonghu in https://github.com//pull/715 * Bump github.com/docker/docker from 26.1.4+incompatible to 26.1.5+incompatible in the go_modules group by @dependabot in https://github.com//pull/718 * Update kmesh image by @hzxuzhonghu in https://github.com//pull/719 * fix bug that may occur in Update mode by @lec-bit in https://github.com//pull/696 * fix cgroup mount log by @lec-bit in https://github.com//pull/722 * Rm coverage threshold by @hzxuzhonghu in https://github.com//pull/726 * use dataplane label to manage workloads by @YaoZengzeng in https://github.com//pull/727 * FIX: go test timeout after running benchmark test by @hzxuzhonghu in https://github.com//pull/707 * Bump github.com/miekg/dns from 1.1.61 to 1.1.62 by @dependabot in https://github.com//pull/735 * make bypass idempotent by @weli-l in https://github.com//pull/731 * support remove residual endpoints by @nlgwcy in https://github.com//pull/605 * Update PHONY by @hzxuzhonghu in https://github.com//pull/729 * modify dockerfile by @weli-l in https://github.com//pull/742 * passthrough go test flag by @YaoZengzeng in https://github.com//pull/743 * Bump github.com/prometheus/client_golang from 1.19.1 to 1.20.0 by @dependabot in https://github.com//pull/736 * fix depolying ns and service waypoint in mixed manner and add related e2e test case by @YaoZengzeng in https://github.com//pull/737 * fix panic in ads mode by @nlgwcy in https://github.com//pull/746 * Fix image tag by @hzxuzhonghu in https://github.com//pull/753 * fix ebpf observalility doc by @LiZhenCheng9527 in https://github.com//pull/748 * match the meaning of Kmesh L4 metric with istio by @LiZhenCheng9527 in https://github.com//pull/747 * Fix: XDP prog duplicate when bpf filesystem mounted by @tacslon in https://github.com//pull/760 * add kmesh custom grafana addon by @YaoZengzeng in https://github.com//pull/617 * Bump github.com/prometheus/client_golang from 1.20.0 to 1.20.1 by @dependabot in https://github.com//pull/757 * fix image ci bug:kmesh latest cannot use by @lec-bit in https://github.com//pull/772 * add e2e test case for cross ns access by @YaoZengzeng in https://github.com//pull/759 * Bump cilium/ebpf by @hzxuzhonghu in https://github.com//pull/717 * make logs of metric clean up by @LiZhenCheng9527 in https://github.com//pull/773 * fix bpf prog double load by @lec-bit in https://github.com//pull/763 * E2e metric by @LiZhenCheng9527 in https://github.com//pull/771 * Bump github.com/prometheus/client_golang from 1.20.1 to 1.20.2 by @dependabot in https://github.com//pull/780 * correctly store origin dst addr in map_of_dst_info at ipv4-mapped ipv… by @kwb0523 in https://github.com//pull/765 * Check codecov but not fail github checks by @hzxuzhonghu in https://github.com//pull/776 * Bump google.golang.org/grpc from 1.65.0 to 1.66.0 by @dependabot in https://github.com//pull/787 * Bump github.com/prometheus/common from 0.55.0 to 0.56.0 by @dependabot in https://github.com//pull/788 * add accesslog to enhance obversiblity of kmesh by @LiZhenCheng9527 in https://github.com//pull/732 * istio 1.23 for e2e by @YaoZengzeng in https://github.com//pull/791 * e2e test to include bookinfo as test application by @YaoZengzeng in https://github.com//pull/790 * Improve cni uninstall by @hzxuzhonghu in https://github.com//pull/723 * Fix possible panic in loadKmeshSendmsgObjects by @lec-bit in https://github.com//pull/804 * fix version display by @Okabe-Rintarou-0 in https://github.com//pull/805 * Optimize endpoint index getter and also rename some methods to be mor… by @hzxuzhonghu in https://github.com//pull/783 * fix verifier log omitted by @weli-l in https://github.com//pull/800 * Optimize service frontend map delete: by using existing service addre… by @hzxuzhonghu in https://github.com//pull/807 * make codecov as an indication by @hzxuzhonghu in https://github.com//pull/809 * Batch update metrics through prometheus API by @LiZhenCheng9527 in https://github.com//pull/803 * update kubeconfig when token was updated by @YaoZengzeng in https://github.com//pull/801 * ci: add helm target by @zirain in https://github.com//pull/813 * Refactor workload.Services compare and eliminate `shouldAddEndpoint` by @hzxuzhonghu in https://github.com//pull/808 * Bump github.com/prometheus/client_golang from 1.20.2 to 1.20.3 by @dependabot in https://github.com//pull/819 * Bump github.com/prometheus/common from 0.56.0 to 0.59.1 by @dependabot in https://github.com//pull/818 * do rbac in xdp prog by @supercharge-xsy in https://github.com//pull/712 * typo by @lec-bit in https://github.com//pull/831 * cleanup unuse parameters by @LiZhenCheng9527 in https://github.com//pull/830 * add doc for startup of eBPF program by @weli-l in https://github.com//pull/810 * increase timeout of bookinfo e2e test by @YaoZengzeng in https://github.com//pull/832 * cgo: using CC=clang and CXX=clang++ by @yuanqijing in https://github.com//pull/836 * Bump google.golang.org/grpc from 1.66.0 to 1.66.1 by @dependabot in https://github.com//pull/840 * Add more approvers by @hzxuzhonghu in https://github.com//pull/833 * update golangci lint by @hzxuzhonghu in https://github.com//pull/817 * Filter out unhealthy workloads by @hzxuzhonghu in https://github.com//pull/695 * update confusing get workload logging by @hzxuzhonghu in https://github.com//pull/834 * some cleanups and rename on start type by @hzxuzhonghu in https://github.com//pull/796 * fix crash when waypoint is not network address type by @YaoZengzeng in https://github.com//pull/842 * Add auto build and push kmesh-build image by @hzxuzhonghu in https://github.com//pull/838 * Support passing VERSION to docker when run make docker by @hzxuzhonghu in https://github.com//pull/846 * Update push-builder-image.yml only run on dockerfile or go dependency… by @hzxuzhonghu in https://github.com//pull/848 * fix clean bpf_map twice in test by @LiZhenCheng9527 in https://github.com//pull/856 * support dump workload bpf map by @Okabe-Rintarou-0 in https://github.com//pull/853 * Fix accesslog by @LiZhenCheng9527 in https://github.com//pull/814 * make default logger only print to stderr not file by @hzxuzhonghu in https://github.com//pull/847 * build: fix `make gen` failed on arm64. by @yuanqijing in https://github.com//pull/860 * Bump github.com/prometheus/client_golang from 1.20.3 to 1.20.4 by @dependabot in https://github.com//pull/861 * Bump google.golang.org/grpc from 1.66.1 to 1.66.2 by @dependabot in https://github.com//pull/850 * fix ads deserial bug by @nlgwcy in https://github.com//pull/867 * upgrade protoc and protoc-gen-go by @hzxuzhonghu in https://github.com//pull/854 * Fix dns memleak when dns cluster removed by @hzxuzhonghu in https://github.com//pull/843 * remove spammy cni installer log by @hzxuzhonghu in https://github.com//pull/870 * Add proposal for local rate limit. by @yuanqijing in https://github.com//pull/873 * add a trigger for accesslog by @LiZhenCheng9527 in https://github.com//pull/871 * Enable non-default cluster by @noobwei in https://github.com//pull/866 * implement kmeshctl framework by @YaoZengzeng in https://github.com//pull/865 * add github ci ignore path by @LiZhenCheng9527 in https://github.com//pull/885 * use depguard to prohibitate importing github.com/golang/protobuf #876(after rebase) by @SpongeBob0318 in https://github.com//pull/881 * optimize kmesh-daemon binary by @YaoZengzeng in https://github.com//pull/884 * implement `kmeshctl dump` by @YaoZengzeng in https://github.com//pull/889 * Build 2 series of eBPF objects(kernel ver. <5.13 & >=5.13) and load eBPF dynamically when Kmesh starts up by @tacslon in https://github.com//pull/874 * Prevent potential panic when object is nil or empty by @hzxuzhonghu in https://github.com//pull/877 * make xds request send async by @lec-bit in https://github.com//pull/890 * Support ads bpf map lookup all by @Okabe-Rintarou-0 in https://github.com//pull/827 * [release-0.5] Fix build warning and remove docker pull explicitly by @kmesh-bot in https://github.com//pull/897 * update VERSION to 0.5.0 by @hzxuzhonghu in https://github.com//pull/903New Contributors
- @noobwei made their first contribution in #621
- @zirain made their first contribution in #813
- @yuanqijing made their first contribution in #836
- @SpongeBob0318 made their first contribution in #881