Title : User-Space Probes (Uprobes) Author : Jim Keniston CONTENTS 1. Concepts: Uprobes, Return Probes 2. Architectures Supported 3. Configuring Uprobes 4. API Reference 5. Uprobes Features and Limitations 6. Interoperation with Kprobes 7. Interoperation with Utrace 8. Probe Overhead 9. TODO 10. Uprobes Team 11. Uprobes Example 12. Uretprobes Example 1. Concepts: Uprobes, Return Probes Uprobes enables you to dynamically break into any routine in a user application and collect debugging and performance information non-disruptively. You can trap at any code address, specifying a kernel handler routine to be invoked when the breakpoint is hit. There are currently two types of user-space probes: uprobes and uretprobes (also called return probes). A uprobe can be inserted on any instruction in the application's virtual address space. A return probe fires when a specified user function returns. These two probe types are discussed in more detail later. A registration function such as register_uprobe() specifies which process is to be probed, where the probe is to be inserted, and what handler is to be called when the probe is hit. Typically, Uprobes-based instrumentation is packaged as a kernel module. In the simplest case, the module's init function installs ("registers") one or more probes, and the exit function unregisters them. However, probes can be registered or unregistered in response to other events as well. For example: - A probe handler itself can register and/or unregister probes. - You can establish Utrace callbacks to register and/or unregister probes when a particular process forks, clones a thread, execs, enters a system call, receives a signal, exits, etc. See Documentation/utrace.txt. 1.1 How Does a Uprobe Work? When a uprobe is registered, Uprobes makes a copy of the probed instruction, stops the probed application, replaces the first byte(s) of the probed instruction with a breakpoint instruction (e.g., int3 on i386 and x86_64), and allows the probed application to continue. (When inserting the breakpoint, Uprobes uses the same copy-on-write mechanism that ptrace uses, so that the breakpoint affects only that process, and not any other process running that program. This is true even if the probed instruction is in a shared library.) When a CPU hits the breakpoint instruction, a trap occurs, the CPU's user-mode registers are saved, and a SIGTRAP signal is generated. Uprobes intercepts the SIGTRAP and finds the associated uprobe. It then executes the handler associated with the uprobe, passing the handler the addresses of the uprobe struct and the saved registers. The handler may block, but keep in mind that the probed thread remains stopped while your handler runs. Next, Uprobes single-steps its copy of the probed instruction and resumes execution of the probed process at the instruction following the probepoint. (It would be simpler to single-step the actual instruction in place, but then Uprobes would have to temporarily remove the breakpoint instruction. This would create problems in a multithreaded application. For example, it would open a time window when another thread could sail right past the probepoint.) Instruction copies to be single-stepped are stored in a per-process "single-step out of line (SSOL) area," which is a little VM area created by Uprobes in each probed process's address space. 1.2 The Role of Utrace When a probe is registered on a previously unprobed process, Uprobes establishes a tracing "engine" with Utrace (see Documentation/utrace.txt) for each thread (task) in the process. Uprobes uses the Utrace "quiesce" mechanism to stop all the threads prior to insertion or removal of a breakpoint. Utrace also notifies Uprobes of breakpoint and single-step traps and of other interesting events in the lifetime of the probed process, such as fork, clone, exec, and exit. 1.3 How Does a Return Probe Work? When you call register_uretprobe(), Uprobes establishes a uprobe at the entry to the function. When the probed function is called and this probe is hit, Uprobes saves a copy of the return address, and replaces the return address with the address of a "trampoline" -- a piece of code that contains a breakpoint instruction. When the probed function executes its return instruction, control passes to the trampoline and that breakpoint is hit. Uprobes's trampoline handler calls the user-specified handler associated with the uretprobe, then sets the saved instruction pointer to the saved return address, and that's where execution resumes upon return from the trap. The trampoline is stored in the SSOL area. 1.4 Multithreaded Applications Uprobes supports the probing of multithreaded applications. Uprobes imposes no limit on the number of threads in a probed application. All threads in a process use the same text pages, so every probe in a process affects all threads; of course, each thread hits the probepoint (and runs the handler) independently. Multiple threads may run the same handler simultaneously. If you want a particular thread or set of threads to run a particular handler, your handler should check current or current->pid to determine which thread has hit the probepoint. When a process clones a new thread, that thread automatically shares all current and future probes established for that process. Keep in mind that when you register or unregister a probe, the breakpoint is not inserted or removed until Utrace has stopped all threads in the process. The register/unregister function returns after the breakpoint has been inserted/removed (but see the next section). 1.5 Registering Probes within Probe Handlers A uprobe or uretprobe handler can call any of the functions in the Uprobes API ([un]register_uprobe(), [un]register_uretprobe()). A handler can even unregister its own probe. However, when invoked from a handler, the actual [un]register operations do not take place immediately. Rather, they are queued up and executed after all handlers for that probepoint have been run. In the handler, the [un]register call returns -EINPROGRESS. If you set the registration_callback field in the uprobe object, that callback will be called when the [un]register operation completes. 2. Architectures Supported Uprobes and uretprobes are implemented on the following architectures: - i386 - x86_64 (AMD-64, EM64T) - ppc64 - ia64 - s390x 3. Configuring Uprobes // TODO: The patch actually puts Uprobes configuration under "Instrumentation // Support" with Kprobes. Need to decide which is the better place. When configuring the kernel using make menuconfig/xconfig/oldconfig, ensure that CONFIG_UPROBES is set to "y". Under "Process debugging support," select "Infrastructure for tracing and debugging user processes" to enable Utrace, then select "Uprobes". So that you can load and unload Uprobes-based instrumentation modules, make sure "Loadable module support" (CONFIG_MODULES) and "Module unloading" (CONFIG_MODULE_UNLOAD) are set to "y". 4. API Reference The Uprobes API includes a "register" function and an "unregister" function for each type of probe. Here are terse, mini-man-page specifications for these functions and the associated probe handlers that you'll write. See the latter half of this document for examples. 4.1 register_uprobe #include int register_uprobe(struct uprobe *u); Sets a breakpoint at virtual address u->vaddr in the process whose pid is u->pid. When the breakpoint is hit, Uprobes calls u->handler. register_uprobe() returns 0 on success, -EINPROGRESS if register_uprobe() was called from a uprobe or uretprobe handler (and therefore delayed), or a negative errno otherwise. Section 4.4, "User's Callback for Delayed Registrations", explains how to be notified upon completion of a delayed registration. User's handler (u->handler): #include #include void handler(struct uprobe *u, struct pt_regs *regs); Called with u pointing to the uprobe associated with the breakpoint, and regs pointing to the struct containing the registers saved when the breakpoint was hit. 4.2 register_uretprobe #include int register_uretprobe(struct uretprobe *rp); Establishes a return probe in the process whose pid is rp->u.pid for the function whose address is rp->u.vaddr. When that function returns, Uprobes calls rp->handler. register_uretprobe() returns 0 on success, -EINPROGRESS if register_uretprobe() was called from a uprobe or uretprobe handler (and therefore delayed), or a negative errno otherwise. Section 4.4, "User's Callback for Delayed Registrations", explains how to be notified upon completion of a delayed registration. User's return-probe handler (rp->handler): #include #include void uretprobe_handler(struct uretprobe_instance *ri, struct pt_regs *regs); regs is as described for the user's uprobe handler. ri points to the uretprobe_instance object associated with the particular function instance that is currently returning. The following fields in that object may be of interest: - ret_addr: the return address - rp: points to the corresponding uretprobe object In ptrace.h, the regs_return_value(regs) macro provides a simple abstraction to extract the return value from the appropriate register as defined by the architecture's ABI. 4.3 unregister_*probe #include void unregister_uprobe(struct uprobe *u); void unregister_uretprobe(struct uretprobe *rp); Removes the specified probe. The unregister function can be called at any time after the probe has been registered, and can be called from a uprobe or uretprobe handler. 4.4 User's Callback for Delayed Registrations #include void registration_callback(struct uprobe *u, int reg, enum uprobe_type type, int result); As previously mentioned, the functions described in Section 4 can be called from within a uprobe or uretprobe handler. When that happens, the [un]registration operation is delayed until all handlers associated with that handler's probepoint have been run. Upon completion of the [un]registration operation, Uprobes checks the registration_callback member of the associated uprobe: u->registration_callback for [un]register_uprobe or rp->u.registration_callback for [un]register_uretprobe. Uprobes calls that callback function, if any, passing it the following values: - u = the address of the uprobe object. (For a uretprobe, you can use container_of(u, struct uretprobe, u) to obtain the address of the uretprobe object.) - reg = 1 for register_u[ret]probe() or 0 for unregister_u[ret]probe() - type = UPTY_UPROBE or UPTY_URETPROBE - result = the return value that register_u[ret]probe() would have returned if this weren't a delayed operation. This is always 0 for unregister_u[ret]probe(). NOTE: Uprobes calls the registration_callback ONLY in the case of a delayed [un]registration. 5. Uprobes Features and Limitations The user is expected to assign values to the following members of struct uprobe: pid, vaddr, handler, and (as needed) registration_callback. Other members are reserved for Uprobes's use. Uprobes may produce unexpected results if you: - assign non-zero values to reserved members of struct uprobe; - change the contents of a uprobe or uretprobe object while it is registered; or - attempt to register a uprobe or uretprobe that is already registered. Uprobes allows any number of probes (uprobes and/or uretprobes) at a particular address. For a particular probepoint, handlers are run in the order in which they were registered. Any number of kernel modules may probe a particular process simultaneously, and a particular module may probe any number of processes simultaneously. Probes are shared by all threads in a process (including newly created threads). If a probed process exits or execs, Uprobes automatically unregisters all uprobes and uretprobes associated with that process. Subsequent attempts to unregister these probes will be treated as no-ops. On the other hand, if a probed memory area is removed from the process's virtual memory map (e.g., via dlclose(3) or munmap(2)), it's currently up to you to unregister the probes first. There is no way to specify that probes should be inherited across fork; Uprobes removes all probepoints in the newly created child process. See Section 7, "Interoperation with Utrace", for more information on this topic. On at least some architectures, Uprobes makes no attempt to verify that the probe address you specify actually marks the start of an instruction. If you get this wrong, chaos may ensue. To avoid interfering with interactive debuggers, Uprobes will refuse to insert a probepoint where a breakpoint instruction already exists, unless it was Uprobes that put it there. Some architectures may refuse to insert probes on other types of instructions. If you install a probe in an inline-able function, Uprobes makes no attempt to chase down all inline instances of the function and install probes there. gcc may inline a function without being asked, so keep this in mind if you're not seeing the probe hits you expect. A probe handler can modify the environment of the probed function -- e.g., by modifying data structures, or by modifying the contents of the pt_regs struct (which are restored to the registers upon return from the breakpoint). So Uprobes can be used, for example, to install a bug fix or to inject faults for testing. Uprobes, of course, has no way to distinguish the deliberately injected faults from the accidental ones. Don't drink and probe. Since a return probe is implemented by replacing the return address with the trampoline's address, stack backtraces and calls to __builtin_return_address() will typically yield the trampoline's address instead of the real return address for uretprobed functions. If the number of times a function is called does not match the number of times it returns (e.g., if a function exits via longjmp()), registering a return probe on that function may produce undesirable results. When you register the first probe at probepoint or unregister the last probe probe at a probepoint, Uprobes asks Utrace to "quiesce" the probed process so that Uprobes can insert or remove the breakpoint instruction. If the process is not already stopped, Utrace stops it. If the process is running an interruptible system call, this may cause the system call to finish early or fail with EINTR. (The PTRACE_ATTACH request of the ptrace system call has this same limitation.) When Uprobes establishes a probepoint on a previous unprobed page of text, Linux creates a new copy of the page via its copy-on-write mechanism. When probepoints are removed, Uprobes makes no attempt to consolidate identical copies of the same page. This could affect memory availability if you probe many, many pages in many, many long-running processes. 6. Interoperation with Kprobes Uprobes is intended to interoperate usefully with Kprobes (see Documentation/kprobes.txt). For example, an instrumentation module can make calls to both the Kprobes API and the Uprobes API. A uprobe or uretprobe handler can register or unregister kprobes, jprobes, and kretprobes, as well as uprobes and uretprobes. On the other hand, a kprobe, jprobe, or kretprobe handler must not sleep, and therefore cannot register or unregister any of these types of probes. (Ideas for removing this restriction are welcome.) Note that the overhead of a u[ret]probe hit is several times that of a k[ret]probe hit. 7. Interoperation with Utrace As mentioned in Section 1.2, Uprobes is a client of Utrace. For each probed thread, Uprobes establishes a Utrace engine, and registers callbacks for the following types of events: clone/fork, exec, exit, and "core-dump" signals (which include breakpoint traps). Uprobes establishes this engine when the process is first probed, or when Uprobes is notified of the thread's creation, whichever comes first. An instrumentation module can use both the Utrace and Uprobes APIs (as well as Kprobes). When you do this, keep the following facts in mind: - For a particular event, Utrace callbacks are called in the order in which the engines are established. Utrace does not currently provide a mechanism for altering this order. - When Uprobes learns that a probed process has forked, it removes the breakpoints in the child process. - When Uprobes learns that a probed process has exec-ed or exited, it disposes of its data structures for that process (first allowing any outstanding [un]registration operations to terminate). - When a probed thread hits a breakpoint or completes single-stepping of a probed instruction, engines with the UTRACE_EVENT(SIGNAL_CORE) flag set are notified. The Uprobes signal callback prevents (via UTRACE_ACTION_HIDE) this event from being reported to engines later in the list. But if your engine was established before Uprobes's, you will see this this event. If you want to establish probes in a newly forked child, you can use the following procedure: - Register a report_clone callback with Utrace. In this callback, the CLONE_THREAD flag distinguishes between the creation of a new thread vs. a new process. - In your report_clone callback, call utrace_attach() to attach to the child process, and set the engine's UTRACE_ACTION_QUIESCE flag. The child process will quiesce at a point where it is ready to be probed. - In your report_quiesce callback, register the desired probes. (Note that you cannot use the same probe object for both parent and child. If you want to duplicate the probepoints, you must create a new set of u[ret]probe objects.) 8. Probe Overhead // TODO: This is out of date. // TODO: Adjust as other architectures are tested. On a typical CPU in use in 2007, a uprobe hit takes about 3 microseconds to process. Specifically, a benchmark that hits the same probepoint repeatedly, firing a simple handler each time, reports 300,000 to 350,000 hits per second, depending on the architecture. A return-probe hit typically takes 50% longer than a uprobe hit. When you have a return probe set on a function, adding a uprobe at the entry to that function adds essentially no overhead. Here are sample overhead figures (in usec) for different architectures. u = uprobe; r = return probe; ur = uprobe + return probe i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips u = 2.9 usec; r = 4.7 usec; ur = 4.7 usec x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips // TODO ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) // TODO 9. TODO a. SystemTap (http://sourceware.org/systemtap): Provides a simplified programming interface for probe-based instrumentation. SystemTap already supports kernel probes. It could exploit Uprobes as well. b. Support for other architectures. 10. Uprobes Team The following people have made major contributions to Uprobes: Jim Keniston - jkenisto@us.ibm.com Ananth Mavinakayanahalli - ananth@in.ibm.com Prasanna Panchamukhi - prasanna@in.ibm.com Dave Wilder - dwilder@us.ibm.com 11. Uprobes Example Here's a sample kernel module showing the use of Uprobes to count the number of times an instruction at a particular address is executed, and optionally (unless verbose=0) report each time it's executed. ----- cut here ----- /* uprobe_example.c */ #include #include #include #include /* * Usage: insmod uprobe_example.ko pid= vaddr=
[verbose=0] * where identifies the probed process and
is the virtual * address of the probed instruction. */ static int pid = 0; module_param(pid, int, 0); MODULE_PARM_DESC(pid, "pid"); static int verbose = 1; module_param(verbose, int, 0); MODULE_PARM_DESC(verbose, "verbose"); static long vaddr = 0; module_param(vaddr, long, 0); MODULE_PARM_DESC(vaddr, "vaddr"); static int nhits; static struct uprobe usp; static void uprobe_handler(struct uprobe *u, struct pt_regs *regs) { nhits++; if (verbose) printk(KERN_INFO "Hit #%d on probepoint at %#lx\n", nhits, u->vaddr); } int __init init_module(void) { int ret; usp.pid = pid; usp.vaddr = vaddr; usp.handler = uprobe_handler; printk(KERN_INFO "Registering uprobe on pid %d, vaddr %#lx\n", usp.pid, usp.vaddr); ret = register_uprobe(&usp); if (ret != 0) { printk(KERN_ERR "register_uprobe() failed, returned %d\n", ret); return -1; } return 0; } void __exit cleanup_module(void) { printk(KERN_INFO "Unregistering uprobe on pid %d, vaddr %#lx\n", usp.pid, usp.vaddr); printk(KERN_INFO "Probepoint was hit %d times\n", nhits); unregister_uprobe(&usp); } MODULE_LICENSE("GPL"); ----- cut here ----- You can build the kernel module, uprobe_example.ko, using the following Makefile: ----- cut here ----- obj-m := uprobe_example.o KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules clean: rm -f *.mod.c *.ko *.o .*.cmd rm -rf .tmp_versions ----- cut here ----- For example, if you want to run myprog and monitor its calls to myfunc(), you can do the following: $ make // Build the uprobe_example module. ... $ nm -p myprog | awk '$3=="myfunc"' 080484a8 T myfunc $ ./myprog & $ ps PID TTY TIME CMD 4367 pts/3 00:00:00 bash 8156 pts/3 00:00:00 myprog 8157 pts/3 00:00:00 ps $ su - ... # insmod uprobe_example.ko pid=8156 vaddr=0x080484a8 In /var/log/messages and on the console, you will see a message of the form "kernel: Hit #1 on probepoint at 0x80484a8" each time myfunc() is called. To turn off probing, remove the module: # rmmod uprobe_example In /var/log/messages and on the console, you will see a message of the form "Probepoint was hit 5 times". 12. Uretprobes Example Here's a sample kernel module showing the use of a return probe to report a function's return values. ----- cut here ----- /* uretprobe_example.c */ #include #include #include #include #include /* * Usage: * insmod uretprobe_example.ko pid= func= [verbose=0] * where identifies the probed process, and is the virtual * address of the probed function. */ static int pid = 0; module_param(pid, int, 0); MODULE_PARM_DESC(pid, "pid"); static int verbose = 1; module_param(verbose, int, 0); MODULE_PARM_DESC(verbose, "verbose"); static long func = 0; module_param(func, long, 0); MODULE_PARM_DESC(func, "func"); static int ncall, nret; static struct uprobe usp; static struct uretprobe rp; static void uprobe_handler(struct uprobe *u, struct pt_regs *regs) { ncall++; if (verbose) printk(KERN_INFO "Function at %#lx called\n", u->vaddr); } static void uretprobe_handler(struct uretprobe_instance *ri, struct pt_regs *regs) { nret++; if (verbose) printk(KERN_INFO "Function at %#lx returns %#lx\n", ri->rp->u.vaddr, regs_return_value(regs)); } int __init init_module(void) { int ret; /* Register the entry probe. */ usp.pid = pid; usp.vaddr = func; usp.handler = uprobe_handler; printk(KERN_INFO "Registering uprobe on pid %d, vaddr %#lx\n", usp.pid, usp.vaddr); ret = register_uprobe(&usp); if (ret != 0) { printk(KERN_ERR "register_uprobe() failed, returned %d\n", ret); return -1; } /* Register the return probe. */ rp.u.pid = pid; rp.u.vaddr = func; rp.handler = uretprobe_handler; printk(KERN_INFO "Registering return probe on pid %d, vaddr %#lx\n", rp.u.pid, rp.u.vaddr); ret = register_uretprobe(&rp); if (ret != 0) { printk(KERN_ERR "register_uretprobe() failed, returned %d\n", ret); unregister_uprobe(&usp); return -1; } return 0; } void __exit cleanup_module(void) { printk(KERN_INFO "Unregistering probes on pid %d, vaddr %#lx\n", usp.pid, usp.vaddr); printk(KERN_INFO "%d calls, %d returns\n", ncall, nret); unregister_uprobe(&usp); unregister_uretprobe(&rp); } MODULE_LICENSE("GPL"); ----- cut here ----- Build the kernel module as shown in the above uprobe example. $ nm -p myprog | awk '$3=="myfunc"' 080484a8 T myfunc $ ./myprog & $ ps PID TTY TIME CMD 4367 pts/3 00:00:00 bash 9156 pts/3 00:00:00 myprog 9157 pts/3 00:00:00 ps $ su - ... # insmod uretprobe_example.ko pid=9156 func=0x080484a8 In /var/log/messages and on the console, you will see messages such as the following: kernel: Function at 0x80484a8 called kernel: Function at 0x80484a8 returns 0x3 To turn off probing, remove the module: # rmmod uretprobe_example In /var/log/messages and on the console, you will see a message of the form "73 calls, 73 returns".