Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(perf) skip to process nil ring #1503

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Ghostbaby
Copy link

@Ghostbaby Ghostbaby commented Jul 4, 2024

Code

eBPF code
//go:build ignore

#include <common.h>
#include <vmlinux.h>
#include <bpf_helpers.h>
#include <bpf_core_read.h>
#include <bpf_endian.h>

char __license[] SEC("license") = "Dual MIT/GPL";
struct event {
	u32 pid;
	u32 ppid;
	u32 uid;
	char comm[100];
};

struct {
	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
} events SEC(".maps");

// Force emitting struct event into the ELF.
const struct event *unused __attribute__((unused));

SEC("kprobe/sys_execve")
int kprobe_execve(struct pt_regs *ctx) {
	struct event e;
	u32 pid, ppid, uid;
	struct task_struct *task;

	pid = bpf_get_current_pid_tgid() >> 32;
	uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;

	task = (struct task_struct *)bpf_get_current_task();
	ppid = BPF_CORE_READ(task, real_parent, tgid);

	e.pid  = pid;
	e.ppid = ppid;
	e.uid  = uid;
	bpf_get_current_comm(&e.comm, sizeof(e.comm));

	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &e, sizeof(e));
	return 0;
}
GO code
package main

import (
	"bytes"
	"encoding/binary"
	"fmt"
	"github.com/cilium/ebpf/btf"
	"github.com/cilium/ebpf/link"
	"log"
	"os"
	"os/signal"
	"syscall"

	"github.com/cilium/ebpf"
	"github.com/cilium/ebpf/perf"
	"github.com/cilium/ebpf/rlimit"
)

type bpfEvent struct {
	Pid  uint32
	Ppid uint32
	Uid  uint32
	Comm [100]byte
}

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go bpf exec.c -- -I../headers_fake
func main() {
	// Remove memory limit for eBPF programs
	if err := rlimit.RemoveMemlock(); err != nil {
		fmt.Fprintf(os.Stderr, "Failed to remove memlock limit: %v\n", err)
		os.Exit(1)
	}

	spec, err := btf.LoadSpec("/data/gb/external.btf")
	if err != nil {
		log.Fatalf("loading BTF: %v", err)
	}
	opts := &ebpf.CollectionOptions{
		Programs: ebpf.ProgramOptions{
			KernelTypes: spec,
		},
	}

	objs := bpfObjects{}
	if err := loadBpfObjects(&objs, opts); err != nil {
		fmt.Fprintf(os.Stderr, "Loading objects failed: %v\n", err)
		os.Exit(1)
	}
	defer objs.KprobeExecve.Close()
	defer objs.Events.Close()

	kp, err := link.Kprobe("sys_execve", objs.KprobeExecve, nil)
	if err != nil {
		log.Fatalf("opening kprobe: %s", err)
	}
	defer kp.Close()

	// Set up a perf reader to read events from the eBPF program
	rd, err := perf.NewReader(objs.Events, os.Getpagesize())
	if err != nil {
		fmt.Fprintf(os.Stderr, "Creating perf reader failed: %v\n", err)
		os.Exit(1)
	}
	defer rd.Close()

	// Set up a channel to receive signals
	sig := make(chan os.Signal, 1)
	signal.Notify(sig, os.Interrupt, syscall.SIGTERM)

	fmt.Println("Listening for events..")

	// Loop to read events
	go func() {
		for {
			record, err := rd.Read()
			if err != nil {
				fmt.Fprintf(os.Stderr, "Reading from perf reader failed: %v\n", err)
				os.Exit(1)
			}

			// Parse event data
			var e bpfEvent
			if err := binary.Read(bytes.NewBuffer(record.RawSample), binary.LittleEndian, &e); err != nil {
				fmt.Fprintf(os.Stderr, "Parsing event data failed: %v\n", err)
				os.Exit(1)
			}

			fmt.Printf("PID: %d, PPID: %d, UID: %d, Comm: %s\n", e.Pid, e.Ppid, e.Uid, string(e.Comm[:]))
		}
	}()

	// Wait for a signal to exit
	<-sig
	fmt.Println("Exiting..")
}

header pkg: headers_fake.tar.gz

Reproducing

Build

  1. add eBPF and Go code file to cilium/ebpf/examples/kprobe-exec
  2. add header to cilium/ebpf/examples/headers_fake
  3. cd ./cillium/ebpf/examples ;make -C ..
  4. cd cilium/ebpf/examples/kprobe-exec; CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -gcflags "all=-N -l"
  5. upload kprobe-exec binary to aarch64 vm (in my env os version: KY10 SP1.1 aarch64,kernel version: 4.19.90-23.15.v2101)

tips:
Go code use external BTF file,your test vm need to enable BTF or use pahole gen external BTF.
if your vm has edabled BTF , should remove GO code :

	spec, err := btf.LoadSpec("/data/gb/external.btf")
	if err != nil {
		log.Fatalf("loading BTF: %v", err)
	}
	opts := &ebpf.CollectionOptions{
		Programs: ebpf.ProgramOptions{
			KernelTypes: spec,
		},
	}

Error

  1. NPE error
# ./kprobe-exec 
Listening for events..
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1bd1f8]

goroutine 25 [running]:
github.com/cilium/ebpf/perf.(*Reader).ReadInto(0x40012b95e0, 0x4002358c98)
        /Users/zhuhuijun/go/src/github.com/cilium/ebpf/perf/reader.go:389 +0x6c8
github.com/cilium/ebpf/perf.(*Reader).Read(0x40012b95e0)
        /Users/zhuhuijun/go/src/github.com/cilium/ebpf/perf/reader.go:337 +0x48
main.main.func1()
        /Users/zhuhuijun/go/src/github.com/cilium/ebpf/examples/kprobe-exec/main.go:75 +0x54
created by main.main in goroutine 1
        /Users/zhuhuijun/go/src/github.com/cilium/ebpf/examples/kprobe-exec/main.go:73 +0xa04
  1. Causes the error code location
    ring.loadHead()

@mejedi
Copy link
Contributor

mejedi commented Jul 4, 2024

Hi @Ghostbaby, thank you for submitting your PR.
Could you please provide some details about the issue you were facing that prompted this fix?

@Ghostbaby
Copy link
Author

Hi @mejedi, I added some details to the PR description.

@florianl
Copy link
Contributor

florianl commented Jul 5, 2024

Hi @Ghostbaby

Thanks for providing more context. Your panic indicates, that it is an issue with the perf reader. Can you also reproduce the issue with a minimal example, like in https://github.com/cilium/ebpf/blob/main/examples/tracepoint_in_go/ ?

As I don't know of a KY10 SP1.1 system, I tried to reproduce the issue on a EC2 arm64 instance, but was not able to do so.

If a CPU is offline during the creation of the perf event, we actually add nil to the ring array:

ebpf/perf/reader.go

Lines 219 to 225 in f2b2f6d

event, ring, err := newPerfEventRing(i, perCPUBuffer, opts)
if errors.Is(err, unix.ENODEV) {
// The requested CPU is currently offline, skip it.
rings = append(rings, nil)
eventFds = append(eventFds, nil)
continue
}

So my assumption is, that some CPUs are offline when you run the program. I will look more into this.

@Ghostbaby
Copy link
Author

Ghostbaby commented Jul 5, 2024

Hi @florianl, Thank you for your review.
I have built tracepoint_in_go/ and tested in my VM, got the same error, error details :

# ./tracepoint_in_go 
nCPU: 64
cpu: 16,err: can't create perf event: no such device
cpu: 17,err: can't create perf event: no such device
cpu: 18,err: can't create perf event: no such device
cpu: 19,err: can't create perf event: no such device
cpu: 20,err: can't create perf event: no such device
cpu: 21,err: can't create perf event: no such device
cpu: 22,err: can't create perf event: no such device
cpu: 23,err: can't create perf event: no such device
cpu: 24,err: can't create perf event: no such device
cpu: 25,err: can't create perf event: no such device
cpu: 26,err: can't create perf event: no such device
cpu: 27,err: can't create perf event: no such device
cpu: 28,err: can't create perf event: no such device
cpu: 29,err: can't create perf event: no such device
cpu: 30,err: can't create perf event: no such device
cpu: 31,err: can't create perf event: no such device
cpu: 32,err: can't create perf event: no such device
cpu: 33,err: can't create perf event: no such device
cpu: 34,err: can't create perf event: no such device
cpu: 35,err: can't create perf event: no such device
cpu: 36,err: can't create perf event: no such device
cpu: 37,err: can't create perf event: no such device
cpu: 38,err: can't create perf event: no such device
cpu: 39,err: can't create perf event: no such device
cpu: 40,err: can't create perf event: no such device
cpu: 41,err: can't create perf event: no such device
cpu: 42,err: can't create perf event: no such device
cpu: 43,err: can't create perf event: no such device
cpu: 44,err: can't create perf event: no such device
cpu: 45,err: can't create perf event: no such device
cpu: 46,err: can't create perf event: no such device
cpu: 47,err: can't create perf event: no such device
cpu: 48,err: can't create perf event: no such device
cpu: 49,err: can't create perf event: no such device
cpu: 50,err: can't create perf event: no such device
cpu: 51,err: can't create perf event: no such device
cpu: 52,err: can't create perf event: no such device
cpu: 53,err: can't create perf event: no such device
cpu: 54,err: can't create perf event: no such device
cpu: 55,err: can't create perf event: no such device
cpu: 56,err: can't create perf event: no such device
cpu: 57,err: can't create perf event: no such device
cpu: 58,err: can't create perf event: no such device
cpu: 59,err: can't create perf event: no such device
cpu: 60,err: can't create perf event: no such device
cpu: 61,err: can't create perf event: no such device
cpu: 62,err: can't create perf event: no such device
cpu: 63,err: can't create perf event: no such device
2024/07/05 17:35:43 Waiting for events..
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x19a778]

goroutine 1 [running]:
github.com/cilium/ebpf/perf.(*Reader).ReadInto(0x40001a0000, 0x40001af668)
        /Users/zhuhuijun/go/src/github.com/cilium/ebpf/perf/reader.go:392 +0x6c8
github.com/cilium/ebpf/perf.(*Reader).Read(0x40001a0000)
        /Users/zhuhuijun/go/src/github.com/cilium/ebpf/perf/reader.go:340 +0x48
main.main()
        /Users/zhuhuijun/go/src/github.com/cilium/ebpf/examples/tracepoint_in_go/main.go:108 +0x106c

After I added some debug info, I found that the nCPU number was obtained incorrectly, and the server virtual machine CPU had 16 cores.The specific reason should be caused by the following configuration:

# grep -i "Cpus_allowed" /proc/self/status
Cpus_allowed:   ffffffff,ffffffff
Cpus_allowed_list:      0-63

@mejedi
Copy link
Contributor

mejedi commented Jul 5, 2024

@Ghostbaby thank you for provifing further details. Please share /sys/devices/system/cpu/possible and /sys/devices/system/cpu/online.

@Ghostbaby
Copy link
Author

Ghostbaby commented Jul 5, 2024

Hi @mejedi,Here are the query results.

# cat /sys/devices/system/cpu/possible
0-63

# cat /sys/devices/system/cpu/online
0-15

and, Linux os enabled CPU hot plugin

# grep CONFIG_HOTPLUG_CPU /boot/config-$(uname -r)
CONFIG_HOTPLUG_CPU=y

looks like map MaxEntries init default value in here:

ebpf/map.go

Lines 137 to 150 in f2b2f6d

n, err := PossibleCPU()
if err != nil {
return nil, fmt.Errorf("fixup perf event array: %w", err)
}
if n := uint32(n); spec.MaxEntries == 0 || spec.MaxEntries > n {
// MaxEntries should be zero most of the time, but there is code
// out there which hardcodes large constants. Clamp the number
// of entries to the number of CPUs at most. Allow creating maps with
// less than n items since some kernel selftests relied on this
// behaviour in the past.
spec.MaxEntries = n
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants