.\" -*- nroff -*- .TH STAPPROBES 5 @DATE@ "Red Hat" .SH NAME stapprobes \- systemtap probe points .\" macros .de SAMPLE .br .RS .nf .nh .. .de ESAMPLE .hy .fi .RE .. .SH DESCRIPTION The following sections enumerate the variety of probe points supported by the systemtap translator, and additional aliases defined by standard tapset scripts. .PP The general probe point syntax is a dotted-symbol sequence. This allows a breakdown of the event namespace into parts, somewhat like the Domain Name System does on the Internet. Each component identifier may be parametrized by a string or number literal, with a syntax like a function call. A component may include a "*" character, to expand to other matching probe points. A probe point may be followed by a "?" character, to indicate that it is optional, and that no error should result if it fails to expand. Optionalness passes down through all levels of alias/wildcard expansion. These are all syntactically valid probe points: .SAMPLE kernel.function("foo").return syscall(22) user.inode("/bin/vi").statement(0x2222) end kernel.syscall.* kernel.function("no_such_function") ? .ESAMPLE Probes may be broadly classified into "synchronous" and "asynchronous". A "synchronous" event is deemed to occur when any processor executes an instruction matched by the specification. This gives these probes a reference point (instruction address) from which more contextual data may be available. Other families of probe points refer to "asynchronous" events such as timers/counters rolling over, where there is no fixed reference point that is related. Each probe point specification may match multiple locations (for example, using wildcards or aliases), and all them are then probed. A probe declaration may also contain several comma-separated specifications, all of which are probed. .SS BEGIN/END The probe points .IR begin " and " end are defined by the translator to refer to the time of session startup and shutdown. All "begin" probe handlers are run, in some sequence, during the startup of the session. All global variables will have been initialized prior to this point. All "end" probes are run, in some sequence, during the .I normal shutdown of a session, such as in the aftermath of an .I exit () function call, or an interruption from the user. In the case of an error-triggered shutdown, "end" probes are not run. There are no target variables available in either context. .SS NEVER The probe point .IR never is specially defined by the translator to mean "never". Its probe handler is never run, though its statements are analyzed for symbol / type correctness as usual. This probe point may be useful in conjunction with optional probes. .SS TIMERS Intervals defined by the standard kernel "jiffies" timer may be used to trigger probe handlers asynchronously. Two probe point variants are supported by the translator: .SAMPLE timer.jiffies(N) timer.jiffies(N).randomize(M) .ESAMPLE The probe handler is run every N jiffies (a kernel-defined unit of time, typically between 1 and 60 ms). If the "randomize" component is given, a linearly distributed random value in the range [\-M..+M] is added to N every time the handler is run. N is restricted to a reasonable range (1 to around a million), and M is restricted to be smaller than N. There are no target variables provided in either context. It is possible for such probes to be run concurrently on a multi-processor computer. .PP Alternatively, intervals may be specified in units of milliseconds. There are two probe point variants similar to the jiffies timer: .SAMPLE timer.ms(N) timer.ms(N).randomize(M) .ESAMPLE Here, N and M are specified in milliseconds. The probe intervals will be rounded up to the nearest jiffies interval for the actual timer. If the "randomize" component is given, then the random value will be added to the interval before the conversion to jiffies. .PP Profiling timers are also available to provide probes that execute on all CPUs at the rate of the system tick. This probe takes no parameters. .SAMPLE timer.profile .ESAMPLE Full context information of the interrupted process is available, making this probe suitable for a time-based sampling profiler. .SS DWARF This family of probe points uses symbolic debugging information for the target kernel/module/program, as may be found in unstripped executables, or the separate .I debuginfo packages. They allow placement of probes logically into the execution path of the target program, by specifying a set of points in the source or object code. When a matching statement executes on any processor, the probe handler is run in that context. .PP Points in a kernel, which are identified by module, source file, line number, function name, C label name, or some combination of these. .PP Here is a list of probe point families currently supported. The .B .function variant places a probe near the beginning of the named function, so that parameters are available as context variables. The .B .return variant places a probe at the moment of return from the named function, so the return value is available as the "$return" context variable. The .B .inline variant is similar to .B .function but probes inline functions. Inline functions do not have an identifiable return point, so .B .return is not supported on .B .inline probes. The .B .statement variant places a probe at the exact spot, exposing those local variables that are visible there. .SAMPLE kernel.function(PATTERN) .br kernel.function(PATTERN).return .br kernel.inline(PATTERN) .br module(MPATTERN).function(PATTERN) .br module(MPATTERN).function(PATTERN).return .br module(MPATTERN).inline(PATTERN) .br kernel.statement(PATTERN) .br module(MPATTERN).statement(PATTERN) .ESAMPLE In the above list, MPATTERN stands for a string literal that aims to identify the loaded kernel module of interest. It may include "*", "[]", and "?" wildcards. PATTERN stands for a string literal that aims to identify a point in the program. It is made up of three parts. The first part is the name of a function, as would appear in the .I nm program's output. This part may use the "*" and "?" wildcarding operators to match multiple names. The second part is optional, and begins with the "@" character. It is followed by a source file name wildcard pattern, such as .IR mm/slab* . Finally, the third part is optional if the file name part was given, and identifies the line number in the source file, preceded by a ":". As an alternative, PATTERN may be a numeric constant, indicating an (module-relative or kernel-absolute) address. .PP Some of the source-level variables, such as function parameters, locals, globals visible in the compilation unit, may be visible to probe handlers. They may refer to these variables by prefixing their name with "$" within the scripts. In addition, a special syntax allows limited traversal of structures, pointers, and arrays. .TP $var refers to an in-scope variable "var". If it's an integer-like type, it will be cast to a 64-bit int for systemtap script use. String-like pointers (char *) may be copied to systemtap string values using the .IR kernel_string " or " user_string functions. .TP $var\->field traversal to a structure's field. The indirection operator may be repeated to follow more levels of pointers. .TP $var[N] indexes into an array. The index is given with a literal number. .SS MARKERS This family of probe points hooks up to static probing markers inserted into the kernel or modules. These markers are special macro calls inserted by kernel developers to make probing faster and more reliable than with DWARF-based probes. Further, DWARF debugging information is .I not required to probe markers. Marker probe points begin with .BR kernel " or " module("name") , just like DWARF probes. This identifies the source of symbol table used for finding markers. The next part names the marker itself: .BR mark("name") . The marker name string, which may contain the usual wildcard characters, is matched against the names given to the marker macros when the kernel or module was compiled. The handler associated with a marker-based probe may read the optional parameters specified at the macro call site. These are named .BR $arg1 " through " $argNN , where NN is the number of parameters supplied by the macro. Number and string parameters are passed in a type-safe manner. .SS PERFORMANCE MONITORING HARDWARE The perfmon family of probe points is used to access the performance monitoring hardware available in modern processors. This family of probes points needs the perfmon2 support in the kernel to access the performance monitoring hardware. .PP Performance monitor hardware points begin with a .BR perfmon ". " The next part of the names the event being counted .BR counter("event") . The event names are processor implementation specific with the execption of the generic .BR cycles " and " instructions events, which are available on all processors. This sets up a counter on the processor to count the number of events occuring on the processor. For more details on the performance monitoring events available on a specific processor use the command perfmon2 command: .SAMPLE pfmon -l .ESAMPLE .TP $counter is a handle used in the body of the probe for operations involving the counter associated with the probe. .TP read_counter is a function that is passed the handle for the perfmon probe and returns the current count for the event. .SS IO SCHEDULER This family of probe points is used to probe the IO scheduler activities. It contains the following probe points: .P .TP .B ioscheduler.elv_next_request Fires when a request is retrieved from request queue .B Arguments: .I elevator_name The name of the elevator .I disk_major The major number of the disk .I disk_minor The minor number of the disk .P .TP .B ioscheduler.elv_add_request Fires when a request is added to the request queue .B Arguments: .I elevator_name The name of the elevator .I disk_major The major number of the disk .I disk_minor The minor number of the disk .P .TP .B ioscheduler.elv_completed_request Fires when a request is completed .B Arguments: .I elevator_name The name of the elevator .I disk_major The major number of the disk .I disk_minor The minor number of the disk .SS SCSI This family of probe points is used to probe the SCSI activities. It contains the following probe points: .P .TP .B scsi.ioentry Fires when SCSI mid layer prepares a SCSI request .B Arguments: .I disk_major The major number of the disk .I disk_minor The minor number of the disk .I device_state The current state of the device. The possible values could be: SDEV_CREATED = 1, /* device created but not added to sysfs * Only internal commands allowed (for inq) */ SDEV_RUNNING = 2, /* device properly configured * All commands allowed */ SDEV_CANCEL = 3, /* beginning to delete device * Only error handler commands allowed */ SDEV_DEL = 4, /* device deleted * no commands allowed */ SDEV_QUIESCE = 5, /* Device quiescent. No block commands * will be accepted, only specials (which * originate in the mid-layer) */ SDEV_OFFLINE = 6, /* Device offlined (by error handling or * user request */ SDEV_BLOCK = 7, /* Device blocked by scsi lld. No scsi * commands from user or midlayer should be issued * to the scsi lld. */ .P .TP .B scsi.iodispatching Fires when the SCSI mid layer dispatches a SCSI command to the low level driver .B Arguments: .I host_no The host number .I channel The channel number .I lun The lun number .I dev_id The scsi device id .I device_state The current state of the device. .I data_direction The data_direction specifies whether this command is from/to the device. The possible values could be: DMA_BIDIRECTIONAL = 0, DMA_TO_DEVICE = 1, DMA_FROM_DEVICE = 2, DMA_NONE = 3, .I request_buffer The request buffer address .I req_bufflen The request buffer length .P .TP .B scsi.iodone Fires when a SCSI command is done by low level driver and enqueued into the done queue. .B Arguments: .I host_no The host number .I channel The channel number .I lun The lun number .I dev_id The scsi device id .I device_state The current state of the device .I data_direction The data_direction specifies whether this command is from/to the device. .P .TP .B scsi.iocompleted Fires when SCSI mid layer runs the completion processing for block device I/O requests .B Arguments: .I host_no The host number .I channel The channel number .I lun The lun number .I dev_id The scsi device id .I device_state The current state of the device .I data_direction The data_direction specifies whether this command is from/to the device. .I goodbytes The bytes completed. .SS NETWORK DEVICE This family of probe points is used to probe the activities of network device. It contains the following probe points: .P .TP .B netdev.receive Fires when data arrives on network device .B Arguments: .I dev_name The name of the device. e.g: eth0, ath1 .I length The length of the receiving buffer .I protocol The possible values of protocol could be: 0800 IP 8100 802.1Q VLAN 0001 802.3 0002 AX.25 0004 802.2 8035 RARP 0005 SNAP 0805 X.25 0806 ARP 8137 IPX 0009 Localtalk 86DD IPv6 .I truesize The size of the received data .P .TP .B netdev.transmit Fires when the network device wants to transmit a buffer .B Arguments: .I dev_name The name of the device. e.g: eth0, ath1 .I length The length of the transmit buffer .I protocol The protocol of this packet. .I truesize The size of the the data to be transmitted. .SS PAGE FAULT This family of probe points is used to probe page fault events. It contains the following probe points: .P .TP .B vm.pagefault Fires when there is a pagefault .B Arguments: .I address The address caused this page fault. .I write_access 1 means this is a write access and 0 means this is a read access .SS PROCESS This family of probe points is used to probe the process activities. It contains the following probe points: .P .TP .B process.create Fires whenever a new process is successfully created, either as a result of one of the fork syscall variants, or a new kernel thread. .B Arguments: .I task a handle to the newly created process .I new_pid pid of the newly created process .P .TP .B process.start Fires immediately before a new process begins execution. .B Arguments: .I N/A .P .TP .B process.exec Fires whenever a process attempts to exec to a new program .B Arguments: .I filename the path to the new executable .P .TP .B process.exec_complete Fires at the completion of an exec call .B Arguments: .I errno the error number resulting from the exec .I success a boolean indicating whether the exec was successful .P .TP .B process.exit Fires when a process terminates. This will always be followed by a process.release, though the latter may be delayed if the process waits in a zombie state. .B Arguments: .I code the exit code of the process .P .TP .B process.release Fires when a process is released from the kernel. This always follows a process.exit, though it may be delayed somewhat if the process waits in a zombie state. .B Arguments: .I task a task handle to the process being released .I pid pid of the process being released .SS TCP This family of probe points is used to probe TCP layer activities. It contains the following probe points: .P .TP .B tcp.sendmsg Fires whenever sending a tcp message .B Arguments: .I sock network socket .I size number of bytes to send .P .TP .B tcp.sendmsg.return Fires whenever sending message is done .B Arguments: .I size number of bytes sent .P .TP .B tcp.recvmsg Fires whenever a message is received .B Arguments: .I sock network socket .I size number of bytes to be received .P .TP .B tcp.recvmsg.return Fires whenever message receiving is done .B Arguments: .I size number of bytes received .P .TP .B tcp.disconnect Fires whenever tcp is disconnected .B Arguments: .I sock network socket .I flags TCP flags (e.g. FIN, etc) .P .TP .B tcp.disconnect.return Fires when returning from tcp.disconnect .B Arguments: .I ret error code (0: no error) .SS UDP This family of probe points is used to probe UDP layer activities. It contains the following probe points: .P .TP .B udp.sendmsg Fires whenever sending a udp message .B Arguments: .I sock network socket .I size number of bytes to send .P .TP .B udp.sendmsg.return Fires whenever sending message is done .B Arguments: .I size number of bytes sent .P .TP .B udp.recvmsg Fires whenever a message is received .B Arguments: .I sock network socket .I size number of bytes to be received .P .TP .B udp.recvmsg.return Fires whenever message receiving is done .B Arguments: .I size number of bytes received .P .TP .B udp.disconnect Fires whenever udp is disconnected .B Arguments: .I sock network socket .I flags flags (e.g. FIN, etc) .P .TP .B udp.disconnect.return Fires when returning from udp.disconnect .B Arguments: .I ret error code (0: no error) .SS SIGNAL This family of probe points is used to probe signal activities. It contains the following probe points: .P .TP .B signal.send Fires when a signal is sent to a process .B Arguments: .I sig signal number .I sig_name a string representation of the signal .I sig_pid pid of the signal recipient process .I pid_name name of the signal recipient process .I si_code indicates the signal type .I task a task handle to the signal recipient .I sinfo the address of siginfo struct .I shared indicates whether this signal is shared by the thread group .I send2queue indicates whether this signal is sent to an existing sigqueue .I name name of the function used to send out this signal .P .TP .B signal.send.return Fires when return from sending a signal .B Arguments: .I retstr the return value Return values for "__group_send_sig_info" and "specific_send_sig_info" .RS .RS - return 0 if the signal is sucessfully sent to a process, which means the following: <1> the signal is ignored by receiving process <2> this is a non-RT signal and we already have one queued <3> the signal is successfully added into the sigqueue of receiving process - return -EAGAIN if the sigqueue is overflow the signal was RT and sent by user using something other than kill() .RE Return values for ""send_group_sigqueue" .RS - return 0 if the signal is either sucessfully added into the sigqueue of receiving process or a SI_TIMER entry is already queued so just increment the overrun count - return 1 if this signal is ignored by receiving process .RE Return values for "send_sigqueue" .RS - return 0 if the signal is either sucessfully added into the sigqueue of receiving process or a SI_TIMER entry is already queued so just increment the overrun count - return 1 if this signal is ignored by receiving process - return -1 if the task is marked exiting, so posix_timer_event can redirect it to the group leader .RE .I shared indicates whether this signal is shared by the thread group .I send2queue indicates whether this signal is sent to an existing sigqueue .I name name of the function used to send out this signal .RE .RE .P .TP .B signal.checkperm Fires when check permissions for sending the signal .B Arguments: .I sig the number of the signal .I sig_name a string representation of the signal .I sig_pid pid of the signal recipient process .I pid_name name of the signal recipient process .I si_code indicates the signal type .I task a task handle to the signal recipient .I sinfo the address of siginfo struct .I name name of the probe point, is set to "signal.checkperm" .P .TP .B signal.checkperm.return Fires when return from permissions check for sending a signal .B Arguments: .I retstr the return value .I name name of the probe point, is set to "signal.checkperm" .P .TP .B signal.wakeup Fires when wake up the process for new active signals .B Arguments: .I sig_pid pid of the process to be woke up .I pid_name name of the process to be woke up .I resume indicate whether to wake up a task in STOPPED or TRACED state .I state_mask a string representation indicate the mask of task states that can be woken. Possible values are (TASK_INTERRUPTIBLE|TASK_STOPPED|TASK_TRACED) and TASK_INTERRUPTIBLE. .P .TP .B signal.check_ignored Fires when check whether the signal is ignored or not .B Arguments: .I sig_pid pid of the signal recipient process .I pid_name name of the signal recipient process .I sig the signal to be checked .I sig_name name of the signal .P .TP .B signal.check_ignored.return Fires when return from signal.check_ignored .B Arguments: .I retstr return value. 0 indicate the current signal isn't ignored. .P .TP .B signal.force_segv Forces SIGSEGV when there are some issues while handling signals for the process .B Arguments: .I sig_pid pid of the signal recipient process .I pid_name name of the signal recipient process .I sig the signal being handled .I sig_name name of this signal .P .TP .B signal.force_segv.return Fires when return from signal.force_segv .B Arguments: .I retstr return value. Always return 0 .P .TP .B signal.send_sig_queue Fires when queue a signal to a process .B Arguments: .I sig the signal to be queued .I sig_name name of this signal .I sig_pid pid of the process to which the signal is queued .I pid_name name of the process to which the signal is queued .I sigqueue_addr address of the signal queue .P .TP .B signal.send_sig_queue.return Fires when return from signal.send_sig_queue .B Arguments: .I retstr return value .P .TP .B signal.pending Fires when examine the set of signals that are pending for delivery to the calling thread .B Arguments: .I sigset_add address of user space sigset_t .I sigset_size sigset size .P .TP .B signal.pending.return Fires when return from signal.pending .B Arguments: .I retstr return value .P .TP .B signal.handle Fires when invoking the signal handler .B Arguments: .I sig signal number .I sig_name signal name .I sinfo address of siginfo struct .I sig_code the si_code of siginfo .I ka_addr Address of the k_sigaction struct associated with the signal .I oldset_addr Address of a bit mask array of blocked signals .I sig_mode indicates whether the signal is a User Mode or Kernel mode Signal .P .TP .B signal.handle.return Fires when return from signal.handle .B Arguments: .I retstr return value of handle_signal() .P .TP .B signal.do_action Fires by calling thread to examine and change a signal action .B Arguments: .I sig signal number .I sigact_addr address of the new sigaction struct associated with the signal .I oldsigact_addr address of a previous sigaction struct associated with the signal .I sa_handler the new handler of the signal .I sa_mask the new mask of the signal .P .TP .B signal.do_action.return Fires when return from signal.do_action .B Arguments: .I retstr return value of do_sigaction() .P .TP .B signal.procmask Fires by calling thread to examine and change blocked signals .B Arguments: .I how indicates how to change the blocked signals. Possible values are: SIG_BLOCK=0 for blocking signals SIG_UNBLOCK=1 for unblocking signals SIG_SETMASK=2 for setting the signal mask .I sigset_addr address of sigset_t to be set .I oldsigset_addr address of the old sigset_t .I sigset the actual sigset to be set .P .TP .B signal.procmask.return Fires when return from signal.procmask .B Arguments: .I retstr return value of sigprocmask() .P .TP .B signal.flush Fires when flush all pending signals for a task .B Arguments: .I task the task handler of the process .I sig_pid pid of the task .I pid_name name of the task .SH EXAMPLES .PP Here are some example probe points, defining the associated events. .TP begin, end, end refers to the startup and normal shutdown of the session. In this case, the handler would run once during startup and twice during shutdown. .TP timer.jiffies(1000).randomize(200) refers to a periodic interrupt, every 1000 +/\- 200 jiffies. .TP kernel.function("*init*"), kernel.function("*exit*") refers to all kernel functions with "init" or "exit" in the name. .TP kernel.function("*@kernel/sched.c:240") refers to any functions within the "kernel/sched.c" file that span line 240. .TP kernel.mark("getuid") refers to an STAP_MARK(getuid, ...) macro call in the kernel. .TP module("usb*").function("*sync*").return refers to the moment of return from all functions with "sync" in the name in any of the USB drivers. .TP kernel.statement(0xc0044852) refers to the first byte of the statement whose compiled instructions include the given address in the kernel. .TP kernel.syscall.*.return refers to the group of probe aliases with any name in the third position .SH SEE ALSO .IR stap (1), .IR lket (5)