Skip to main content

Experience of LFX Mentorship - Kmesh Tcp Long Connection Metrics

· 3 min read
Yash Patel
Kmesh Member

Introduction

Hello readers, I am Yash, a final Year student from India. I love building cool stuffs and solving real world problems. I’ve been working in the cloud-native space for the past three years, exploring technologies like Kubernetes, Cilium, Istio, and more.

I successfully completed my mentorship with Kmesh during the LFX 2025 Term-1 program, which was an enriching and invaluable experience. Over the past three months, I gained significant knowledge and hands-on experience while contributing to the project. In this blog, I’ve documented my mentorship journey and the work I accomplished as a mentee.

LFX Mentorship Program – Overview

The LFX Mentorship Program, run by the Linux Foundation, is designed to help students and early-career professionals gain hands-on experience in open source development by working on real-world projects under the guidance of experienced mentors

Participants contribute to high-impact projects hosted by foundations like CNCF, LF AI, LF Edge, and more. The program typically runs in 3 terms throughout the year, each lasting about three months.

More-info

My Acceptance

I am a regular opensource contributor and loves contributing to opensource. My interests heavily aligned with clound-native technologies. I was familiar with popular mentorship programs like LFX and GSoC, which are designed to help students get started in the open source world. Based on my work the Kmesh community also promoted for the member of Kmesh I had made up my mind to apply for LFX 2025 Term-1 and began exploring projects in early February. The projects under CNCF for LFX are listed in the cncf/mentoring GitHub repository. I came across the Kmesh project, a newly added CNCF sandbox project participating in LFX for the first time. I found the Kmesh project particularly exciting because of the problem it addresses—providing a sidecarless service mesh data plane. This approach can greatly benefit the community by improving performance and reducing overhead.

Kmesh came up with 4 projects in term-1, i selected long-connection-metrics projects as it allows me to works with eBPF a already have a prior experience on working with eBPF.

I began exploring the Kmesh project by reading the documentation and contributing to Good First Issues. As I became more involved, the mentors started to take notice. I also submitted a proposal for the long connection metrics project.

In late February, I received an email from LFX notifying me of my selection. email

Project Workthrough

The tcp long connection metrics project aims to implement access logs and metrics for TCP long connections, developing a continuous monitoring and reporting mechanisms that captures detailed, real-time data throughout the lifetime of long-lived TCP connections.

Ebpf hooks are used to collect connection stats such as send/received bytes, packets losts, retransmissions etc.

design

More-information

Mentorship Experience

The Kmesh maintainers were always available to help me with any doubts, whether on Slack or GitHub. Additionally, there is a community meeting held regularly every Thursday, where I could ask questions and discuss various topics. I’ve learned a lot from them, including how to approach problems effectively and consider edge cases during development in these three months.

Based on my contributions and active involvement, the Kmesh community recognized my efforts and promoted me to a member of the organization. This acknowledgment was truly encouraging and motivated me to continue contributing to Kmesh and help the project grow.

Kmesh V1.1.0 Officially Released!

· 6 min read

We are delighted to announce the release of ​​Kmesh v1.1.0​​, a milestone achieved through the collective efforts of our global community over the past three months. Special recognition goes to the contributors from the ​​LXF Project​​, whose dedication has been pivotal in driving this release forward.

Building on the foundation of v1.0.0, this release introduces significant enhancements to Kmesh’s architecture, observability, and ecosystem integration. The official Kmesh website has undergone a comprehensive redesign, offering an intuitive interface and streamlined documentation to empower both users and developers. Under the hood, we’ve refactored the DNS module and added metrics for long connections, providing deeper insights into more traffic patterns.

In Kernel-Native mode, we’ve reduced invasive kernel modifications. Also, we use global variables to replace the BPF config map to simplify the underlying complexity. Compatibility with ​​Istio 1.25​​ has been rigorously validated, ensuring seamless interoperability with the latest Istio version. Notably, the persistent TestKmeshRestart E2E test case flaky—a long-standing issue—has been resolved through long-term investigation and reconstruction of the underlying BPF program, marking a leap forward in runtime reliability.

Main Features

Website overhaul

The Kmesh official website has undergone a complete redesign, offering an intuitive user experience with improved documentation, reorganized content hierarchy and streamlined navigation. In addressing feedback from the previous iteration, we focused on key areas where user experience could be enhanced. The original interface presented some usability challenges that occasionally led to navigation difficulties. Our blog module in particular required attention, as its content organization and visual hierarchy impacted content discoverability and readability. From an engineering perspective, we recognized opportunities to improve the code structure through better component organization and more systematic styling approaches, as the existing implementation had grown complex to maintain over time.

To address these problems, we shifted to React with Docusaurus, a modern documentation framework that's much more developer-friendly. This allowed us to create modular components, eliminating redundant code through reusability. Docusaurus provides built-in navigation systems specifically designed for documentation and blogs, plus version-controlled documentation features. We've implemented multilingual support with both English and Chinese documentation, added advanced search functionality, and completely reorganized the content structure. The result is a dramatically improved experience that makes the Kmesh site more accessible and valuable for all users.

Long connection metrics

Before this release, Kmesh provides access logs during termination and establishment of a TCP connection with more detailed information about the connection, such as bytes sent, received, packet lost, rtt and retransmits. Kmesh also provides workload and service specific metrics such as bytes sent and received, lost packets, minimum rtt, total connection opened and closed by a pod. These metrics are only updated after a connection is closed.

In this release, we implement access logs and metrics for TCP long connections, developing a continuous monitoring and reporting mechanism that captures detailed, real-time data throughout the lifetime of long-lived TCP connections. Access logs are reported periodically with information such as reporting time, connection establishment time, bytes sent, received, packet loss, rtt, retransmits and state. Metrics such as bytes sent and received, packet loss, retransmits are also reported periodically for long connections.

DNS refactor

The current DNS process includes the CDS refresh process. As a result, DNS is deeply coupled with kernel-native mode and cannot be used in dual-engine mode.

image

In release 1.1 we refactored the DNS module of Kmesh. Instead of a structure containing cds, the data looped through the refresh queue in the Dns is now a domain, so that the Dns module no longer cares about the Kmesh mode, only providing the hostname to be resolved.

image

BPF config map optimization

Kmesh has eliminated the dedicated kmesh_config_map BPF map, which previously stored global runtime configurations such as BPF logging level and monitoring toggle. These settings are now managed through global variables. Leveraging global variables simplifies BPF configuration management, enhancing runtime efficiency and maintainability.

Optimise Kernel Native mode to reduce intrusive modifications to the kernel The kernel-native mode requires a large number of intrusive kernel reconstructions to implement HTTP-based traffic control. Some of these modifications may have a significant impact on the kernel, which makes the kernel-native mode difficult to deploy and use in a real production environment. To resolve this problem, we have modified the kernel in kernel-native mode and the involved ko and eBPF synchronously. Through the optimization of this release. In kernel 5.10, the kernel modification is limited to four, and in kernel 6.6, the kernel modification is reduced to only one. This last one will be eliminated as much as possible, with the goal of eventually running kernel-native mode on native version 6.6 and above.

image

Adopt istio 1.25

Kmesh has verified compatibility with istio 1.25 and has added the corresponding E2E test to CI. The Kmesh community maintains verification of the three istio versions in CI, so the E2E test of istio 1.22 has been removed from CI.

Critical Bug Fix

kmeshctl install waypoint error (#1287)

root analysis:

Remove the extra v before the version number when building the waypoint image.

TestKmeshRestart flaky (#1192)

root analysis:

This issue is actually not related Kmesh restart, and it can also be produced in non-restart scenario.

The root case is that it's not appropriate to use sk as the key of map map_of_orig_dst, because it is reused and the value of map will be incorrectly overwritten, resulting in the metadata is not being encoded when it should be encoded in the connection sent to the waypoint, resulting the reset error in this issue.

TestServiceEntrySelectsWorkloadEntry flaky (#1352)

root analysis:

before this test case, there is a test TestServiceEntryInlinedWorkloadEntry which will generate two workload objects, for example, Kubernetes/networking.istio.io/ServiceEntry/echo-1-21618/test-se-v4/10.244.1.103 and ServiceEntry/echo-1-21618/test-se-v6/10.244.1.103.

In the current use case, WorkloadEntry will generate the workload object Kubernetes/networking.istio.io/WorkloadEntry/echo-1-21618/test-we.

If the test case runs fast enough, the removal operation of the first two workload objects will be aggregated with the creation operation of the latter object.

Kmesh will process the new object first and then remove the old resources, reference.

The IP addresses of these three objects are the same, which will eventually lead to the inability to find the IP address in the Kmesh workload cache, which will cause auth failure and connection timeout.

Acknowledgment

Kmesh v1.1.0 includes 118 commits from 14 contributors. We would like to express our sincere gratitude to all contributors:

@hzxuzhonghu@LiZhenCheng9527@YaoZengzeng@silenceper
@weli-l@sancppp@Kuromesi@yp969803
@lec-bit@ravjot07@jayesh9747@harish2773
@Dhiren-Mhatre@Murdock9803

We have always developed Kmesh with an open and neutral attitude, and continue to build a benchmark solution for the Sidecarless service mesh industry, serving thousands of industries and promoting the healthy and orderly development of service mesh. Kmesh is currently in a stage of rapid development, and we sincerely invite people with lofty ideals to join us!

Using Kmesh as the Data Plane for Alibaba Cloud Service Mesh (ASM) Sidecarless Mode

· 7 min read

Overview

Alibaba Cloud Service Mesh (ASM) supports both Sidecar and Sidecarless modes. The Sidecar mode, where a proxy runs alongside each service instance, is currently the most selected and stable solution. However, this architecture introduces latency and resource overhead. To address the latency and resource consumption inherent in the Sidecar mode, various Sidecarless mode solutions have emerged in recent years, such as Istio Ambient. Istio Ambient deploys a ztunnel on each node to perform layer-4 traffic proxying for the Pods running on the node and deploy waypoints for layer-7 traffic proxying. While the Sidecarless mode can reduce latency and resource consumption, its stability and completeness in functionality still require improvement.

Kmesh: Metrics and Accesslog in Detail

· 8 min read
lizhencheng
Kmesh Maintainer
Yash Patel
Kmesh Member

Introduction

Kmesh is kernel native sidecarless service mesh data plane. It sinks traffic governance into the OS kernel with the help of ebpf and programmable kernel. It reduces the resource overhead and network latency of the service mesh.

And the data of the traffic can be obtained directly in the kernel and can uses bpf map passed to the user space. This data is used to build metrics and accesslogs.

Kmesh Joins CNCF Cloud Native Landscape

· 4 min read

CNCF Landscape helps users understand specific software and product choices in each cloud-native practice phase. Kmesh joins CNCF Landscape and becomes a part of CNCF's best practice in building a cloud-native service mesh.

image

Kmesh: Kernel-Level Traffic Management Engine, Bring Ultimate Performance Experience

· 8 min read

Kmesh is a brand new kernel-level traffic management engine, which helps users build high-performance communication infrastructure in cloud-native scenarios through basic software innovation. Users can deploy Kmesh[1] with one click using helm in a service mesh environment, seamlessly connecting to Istiod. By sinking the traffic management down to the OS, Kmesh achieves more than a 50% reduction in forwarding latency compared to the Istio Sidecar solution, providing applications with an ultimate forwarding performance experience.

Kmesh: High-performance service mesh data plane

· 8 min read

What is a Service Mesh

The concept of a service mesh was introduced by Buoyant, the company behind the development of Linkerd software, in 2016. Willian Morgan, the CEO of Linkerd, provided the initial definition of a service mesh:

A service mesh is a dedicated layer for handling service-to-service communication. It’s responsible for the reliable delivery of requests through the complex topology of services that comprise a modern, cloud-native application. In practice, the service mesh is typically implemented as an array of lightweight network proxies that are deployed alongside application code, without the application needing to be aware.

In simple terms, a service mesh is an layer that handles communication between services. It ensures transparent and reliable network communication for modern cloud-native applications through an array of lightweight network proxies.

The essence of a service mesh is to address the challenge of how microservices can communicate effectively. By implementing governance rules such as load balancing, canary routing, and circuit breaking, the service mesh orchestrates traffic flow to maximize the capabilities of the service cluster. It is a product of the evolution of service governance.

Accelerating ServiceMesh Data Plane Based on Sockmap

· 5 min read

Background Introduction

Early microservices architectures faced various challenges such as service discovery, load balancing, and authorization/authentication. Initially, practitioners of microservices implemented their own distributed communication systems to address these challenges. However, this approach resulted in redundant business functionality. To solve this problem, a solution was proposed: extracting the common distributed system communication code into a framework and providing it as a library for programmatic use. However, this seemingly perfect solution had several fatal weaknesses:

  • The framework required invasive modifications to the business code, necessitating developers to learn how to use the framework.
  • The framework could not be used across different programming languages.
  • Managing compatibility issues with complex project frameworks and library versions was challenging, as upgrading the framework often forced businesses to upgrade as well.