Skip to content

Kafka/MySQL large-buffer state is read after map delete #2045

@MrAlias

Description

@MrAlias

Summary

The Kafka and MySQL large-buffer handlers delete their per-connection LRU map entry and then continue reading the returned map-value pointer to finish event construction.

Impact

This is a real lifetime bug in product eBPF code, but the impact supported by the current code is low severity. The stale data is limited to small protocol header fragments that can be copied into EVENT_TCP_LARGE_BUFFER telemetry, causing corrupted telemetry or disclosure of a few stale bytes to telemetry consumers.

Environment

  • First identified in: 281748d
  • Reviewed against current main commit: d51a98d

Evidence

Steps to reproduce

  1. Build the BPF programs and the agent from the repository root:

    make generate
    make compile
  2. Add temporary debug prints to the two delete-then-use sites, then rebuild. For MySQL, insert the following block in bpf/generictracer/protocol_mysql.h immediately before and after bpf_map_delete_elem(&mysql_state, &pid_conn->conn);:

    if (state_data) {
        bpf_dbg_printk("mysql before delete: ptr=%llx bytes=%02x %02x %02x %02x",
                       state_data,
                       ((const unsigned char *)state_data)[0],
                       ((const unsigned char *)state_data)[1],
                       ((const unsigned char *)state_data)[2],
                       ((const unsigned char *)state_data)[3]);
    
        bpf_map_delete_elem(&mysql_state, &pid_conn->conn);
    
        bpf_dbg_printk("mysql after delete: ptr=%llx bytes=%02x %02x %02x %02x",
                       state_data,
                       ((const unsigned char *)state_data)[0],
                       ((const unsigned char *)state_data)[1],
                       ((const unsigned char *)state_data)[2],
                       ((const unsigned char *)state_data)[3]);
    
        __builtin_memcpy(lb->buf, state_data, sizeof(*state_data));
        lb->len = sizeof(*state_data);
    }
  3. In bpf/generictracer/protocol_kafka.h, insert the following block immediately before and after bpf_map_delete_elem(&kafka_state, &state_key);:

    if (state_data && state_data->message_size > 0 && (u32)state_data->message_size == bytes_len) {
        bpf_dbg_printk("kafka before delete: ptr=%llx size=%d",
                       state_data,
                       state_data->message_size);
    
        bpf_map_delete_elem(&kafka_state, &state_key);
    
        bpf_dbg_printk("kafka after delete: ptr=%llx size=%d",
                       state_data,
                       state_data->message_size);
    
        const s32 message_size_be = bpf_htonl(state_data->message_size);
        __builtin_memcpy(lb->buf, &message_size_be, k_kafka_hdr_message_size);
    }
  4. Rebuild and restart the agent so the new bpf_dbg_printk statements are loaded:

    make generate
    make compile

    Run the agent with protocol debug enabled if your local setup supports it so the large-buffer chunk contents are also printed from userspace.

  5. Start a TCP fragmenting proxy that splits the protocol header into its own short read. The following Python proxy is sufficient for both MySQL and Kafka because it forwards the first N bytes as one write, sleeps briefly, then forwards the remainder:

    import selectors
    import socket
    import sys
    import time
    
    LISTEN_HOST = "127.0.0.1"
    LISTEN_PORT = int(sys.argv[1])
    TARGET_HOST = sys.argv[2]
    TARGET_PORT = int(sys.argv[3])
    SPLIT_BYTES = int(sys.argv[4])
    
    def forward_once(src, dst, split_bytes):
        data = src.recv(65535)
        if not data:
            return False
        head = data[:split_bytes]
        tail = data[split_bytes:]
        if head:
            dst.sendall(head)
            time.sleep(0.05)
        if tail:
            dst.sendall(tail)
        return True
    
    lsock = socket.socket()
    lsock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    lsock.bind((LISTEN_HOST, LISTEN_PORT))
    lsock.listen(1)
    
    client, _ = lsock.accept()
    upstream = socket.create_connection((TARGET_HOST, TARGET_PORT))
    
    sel = selectors.DefaultSelector()
    sel.register(client, selectors.EVENT_READ, ("client", client, upstream))
    sel.register(upstream, selectors.EVENT_READ, ("server", upstream, client))
    
    while True:
        for key, _ in sel.select():
            direction, src, dst = key.data
            split = SPLIT_BYTES if direction == "server" else 65535
            if not forward_once(src, dst, split):
                sys.exit(0)
  6. Use the proxy with a split size that matches the saved header state:

    python3 fragment_proxy.py 3307 127.0.0.1 3306 4
    python3 fragment_proxy.py 9093 127.0.0.1 9092 4

    The MySQL command makes the first 4 bytes of a server packet arrive separately from the next bytes. The Kafka command makes the 4-byte response size prefix arrive separately from the rest of the response body.

  7. Drive traffic through the proxy from an instrumented client:

    mysql --host 127.0.0.1 --port 3307 -u <user> -p -e 'select 1'
    kcat -b 127.0.0.1:9093 -t <topic> -C -c 1

    Any equivalent MySQL request or Kafka fetch is fine as long as the response passes through the fragmenting proxy.

  8. Observe the BPF debug output. A successful reproduction will show the same state_data pointer being logged before and after bpf_map_delete_elem(...), with the second log still dereferencing that pointer after deletion. Example output shape:

    mysql before delete: ptr=0xffff... bytes=01 00 00 01
    mysql after delete: ptr=0xffff... bytes=01 00 00 01
    
    kafka before delete: ptr=0xffff... size=128
    kafka after delete: ptr=0xffff... size=128
    
  9. Confirm in the userspace protocol-debug output for EVENT_TCP_LARGE_BUFFER that the first emitted chunk begins with the same 4 bytes that were read from the deleted map value. That is the observable consequence of the bug:

    • MySQL: the first large-buffer chunk starts with the 4-byte saved header fragment.
    • Kafka: the first large-buffer chunk starts with the 4-byte saved message_size field.

Suggested Fix Direction

Do not read map-value state after bpf_map_delete_elem. Copy the needed header fields into stack locals before deleting the map entry, then build the large-buffer event from those locals.

Acceptance Criteria

  • Kafka and MySQL large-buffer paths never dereference a map-value pointer after deleting its map entry.
  • Any needed header state is copied before deletion and used from stack-local storage afterward.
  • Tests cover the split-header cleanup path for both protocols.

Note

I have reviewed this issue before posting it. It was identified by OpenAI Codex, and the draft was prepared with its assistance, but it may still contain mistakes, missing context, or incorrect conclusions. Please independently validate the behavior, impact, and proposed fix before acting on it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: ebpfKernel-side eBPF program logic and protocol parsingbugSomething isn't workingebpfIssues or PRs that primarily require eBPF program changes

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions