diff options
Diffstat (limited to 'stapprobes.3stap.in')
-rw-r--r-- | stapprobes.3stap.in | 676 |
1 files changed, 676 insertions, 0 deletions
diff --git a/stapprobes.3stap.in b/stapprobes.3stap.in new file mode 100644 index 00000000..f175e6e0 --- /dev/null +++ b/stapprobes.3stap.in @@ -0,0 +1,676 @@ +.\" -*- nroff -*- +.TH STAPPROBES 3stap @DATE@ "Red Hat" +.SH NAME +stapprobes \- systemtap probe points + +.\" macros +.de SAMPLE +.br +.RS +.nf +.nh +.. +.de ESAMPLE +.hy +.fi +.RE +.. + +.SH DESCRIPTION +The following sections enumerate the variety of probe points supported +by the systemtap translator, and additional aliases defined by +standard tapset scripts. +.PP +The general probe point syntax is a dotted-symbol sequence. This +allows a breakdown of the event namespace into parts, somewhat like +the Domain Name System does on the Internet. Each component +identifier may be parametrized by a string or number literal, with a +syntax like a function call. A component may include a "*" character, +to expand to a set of matching probe points. Probe aliases likewise +expand to other probe points. Each and every resulting probe point is +normally resolved to some low-level system instrumentation facility +(e.g., a kprobe address, marker, or a timer configuration), otherwise +the elaboration phase will fail. +.PP +However, a probe point may be followed by a "?" character, to indicate +that it is optional, and that no error should result if it fails to +resolve. Optionalness passes down through all levels of +alias/wildcard expansion. Alternately, a probe point may be followed +by a "!" character, to indicate that it is both optional and +sufficient. (Think vaguely of the prolog cut operator.) If it does +resolve, then no further probe points in the same comma-separated list +will be resolved. Therefore, the "!" sufficiency mark only makes +sense in a list of probe point alternatives. +.PP +Additionally, a probe point may be followed by a "if (expr)" statement, in +order to enable/disable the probe point on-the-fly. With the "if" statement, +if the "expr" is false when the probe point is hit, the whole probe body +including alias's body is skipped. The condition is stacked up through +all levels of alias/wildcard expansion. So the final condition becomes +the logical-and of conditions of all expanded alias/wildcard. + +These are all syntactically valid probe points: + +.SAMPLE +kernel.function("foo").return +syscall(22) +user.inode("/bin/vi").statement(0x2222) +end +syscall.* +kernel.function("no_such_function") ? +module("awol").function("no_such_function") ! +signal.*? if (switch) +.ESAMPLE + +Probes may be broadly classified into "synchronous" and +"asynchronous". A "synchronous" event is deemed to occur when any +processor executes an instruction matched by the specification. This +gives these probes a reference point (instruction address) from which +more contextual data may be available. Other families of probe points +refer to "asynchronous" events such as timers/counters rolling over, +where there is no fixed reference point that is related. Each probe +point specification may match multiple locations (for example, using +wildcards or aliases), and all them are then probed. A probe +declaration may also contain several comma-separated specifications, +all of which are probed. + +.SS BEGIN/END/ERROR + +The probe points +.IR begin " and " end +are defined by the translator to refer to the time of session startup +and shutdown. All "begin" probe handlers are run, in some sequence, +during the startup of the session. All global variables will have +been initialized prior to this point. All "end" probes are run, in +some sequence, during the +.I normal +shutdown of a session, such as in the aftermath of an +.I exit () +function call, or an interruption from the user. In the case of an +error-triggered shutdown, "end" probes are not run. There are no +target variables available in either context. +.PP +If the order of execution among "begin" or "end" probes is significant, +then an optional sequence number may be provided: + +.SAMPLE +begin(N) +end(N) +.ESAMPLE + +The number N may be positive or negative. The probe handlers are run in +increasing order, and the order between handlers with the same sequence +number is unspecified. When "begin" or "end" are given without a +sequence, they are effectively sequence zero. + +The +.IR error +probe point is similar to the +.IR end +probe, except that each such probe handler run when the session ends +after errors have occurred. In such cases, "end" probes are skipped, +but each "error" prober is still attempted. This kind of probe can be +used to clean up or emit a "final gasp". It may also be numerically +parametrized to set a sequence. + +.SS NEVER +The probe point +.IR never +is specially defined by the translator to mean "never". Its probe +handler is never run, though its statements are analyzed for symbol / +type correctness as usual. This probe point may be useful in +conjunction with optional probes. + +.SS SYSCALL + +The +.IR syscall.* +aliases define several hundred probes, too many to +summarize here. They are: + +.SAMPLE +syscall.NAME +.br +syscall.NAME.return +.ESAMPLE + +Generally, two probes are defined for each normal system call as listed in the +.IR syscalls(2) +manual page, one for entry and one for return. Those system calls that never +return do not have a corresponding +.IR .return +probe. +.PP +Each probe alias defines a variety of variables. Looking at the tapset source +code is the most reliable way. Generally, each variable listed in the standard +manual page is made available as a script-level variable, so +.IR syscall.open +exposes +.IR filename ", " flags ", and " mode . +In addition, a standard suite of variables is available at most aliases: +.TP +.IR argstr +A pretty-printed form of the entire argument list, without parentheses. +.TP +.IR name +The name of the system call. +.TP +.IR retstr +For return probes, a pretty-printed form of the system-call result. +.PP +Not all probe aliases obey all of these general guidelines. Please report +any bothersome ones you encounter as a bug. + + +.SS TIMERS + +Intervals defined by the standard kernel "jiffies" timer may be used +to trigger probe handlers asynchronously. Two probe point variants +are supported by the translator: + +.SAMPLE +timer.jiffies(N) +timer.jiffies(N).randomize(M) +.ESAMPLE + +The probe handler is run every N jiffies (a kernel-defined unit of +time, typically between 1 and 60 ms). If the "randomize" component is +given, a linearly distributed random value in the range [\-M..+M] is +added to N every time the handler is run. N is restricted to a +reasonable range (1 to around a million), and M is restricted to be +smaller than N. There are no target variables provided in either +context. It is possible for such probes to be run concurrently on +a multi-processor computer. +.PP +Alternatively, intervals may be specified in units of time. +There are two probe point variants similar to the jiffies timer: + +.SAMPLE +timer.ms(N) +timer.ms(N).randomize(M) +.ESAMPLE + +Here, N and M are specified in milliseconds, but the full options for units +are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), +nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for +hertz timers. + +The actual resolution of the timers depends on the target kernel. For +kernels prior to 2.6.17, timers are limited to jiffies resolution, so +intervals are rounded up to the nearest jiffies interval. After 2.6.17, +the implementation uses hrtimers for tighter precision, though the actual +resolution will be arch-dependent. In either case, if the "randomize" +component is given, then the random value will be added to the interval +before any rounding occurs. +.PP +Profiling timers are also available to provide probes that execute on all +CPUs at the rate of the system tick (CONFIG_HZ). +This probe takes no parameters. + +.SAMPLE +timer.profile +.ESAMPLE + +Full context information of the interrupted process is available, making +this probe suitable for a time-based sampling profiler. + +.SS DWARF + +This family of probe points uses symbolic debugging information for +the target kernel/module/program, as may be found in unstripped +executables, or the separate +.I debuginfo +packages. They allow placement of probes logically into the execution +path of the target program, by specifying a set of points in the +source or object code. When a matching statement executes on any +processor, the probe handler is run in that context. +.PP +Points in a kernel, which are identified by +module, source file, line number, function name, or some +combination of these. +.PP +Here is a list of probe point families currently supported. The +.B .function +variant places a probe near the beginning of the named function, so that +parameters are available as context variables. The +.B .return +variant places a probe at the moment +.B after +the return from the named function, so the return value is available +as the "$return" context variable. The +.B .inline +modifier for +.B .function +filters the results to include only instances of inlined functions. +The +.B .call +modifier selects the opposite subset. Inline functions do not have an +identifiable return point, so +.B .return +is not supported on +.B .inline +probes. The +.B .statement +variant places a probe at the exact spot, exposing those local variables +that are visible there. + +.SAMPLE +kernel.function(PATTERN) +.br +kernel.function(PATTERN).call +.br +kernel.function(PATTERN).return +.br +kernel.function(PATTERN).inline +.br +kernel.function(PATTERN).label(LPATTERN) +.br +module(MPATTERN).function(PATTERN) +.br +module(MPATTERN).function(PATTERN).call +.br +module(MPATTERN).function(PATTERN).return +.br +module(MPATTERN).function(PATTERN).inline +.br +.br +kernel.statement(PATTERN) +.br +kernel.statement(ADDRESS).absolute +.br +module(MPATTERN).statement(PATTERN) +.ESAMPLE + +In the above list, MPATTERN stands for a string literal that aims to +identify the loaded kernel module of interest and LPATTERN stands for +a source program label. Both MPATTERN and LPATTERN may include the "*" +"[]", and "?" wildcards. +PATTERN stands for a string literal that +aims to identify a point in the program. It is made up of three +parts: +.IP \(bu 4 +The first part is the name of a function, as would appear in the +.I nm +program's output. This part may use the "*" and "?" wildcarding +operators to match multiple names. +.IP \(bu 4 +The second part is optional and begins with the "@" character. +It is followed by the path to the source file containing the function, +which may include a wildcard pattern, such as mm/slab*. +If it does not match as is, an implicit "*/" is optionally added +.I before +the pattern, so that a script need only name the last few components +of a possibly long source directory path. +.IP \(bu 4 +Finally, the third part is optional if the file name part was given, +and identifies the line number in the source file preceded by a ":" +or a "+". The line number is assumed to be an +absolute line number if preceded by a ":", or relative to the entry of +the function if preceded by a "+". +All the lines in the function can be matched with ":*". +A range of lines x through y can be matched with ":x-y". +.PP +As an alternative, PATTERN may be a numeric constant, indicating an +address. Such an address may be found from symbol tables of the +appropriate kernel / module object file. It is verified against +known statement code boundaries, and will be relocated for use at +run time. +.PP +In guru mode only, absolute kernel-space addresses may be specified with +the ".absolute" suffix. Such an address is considered already relocated, +as if it came from +.BR /proc/kallsyms , +so it cannot be checked against statement/instruction boundaries. +.PP +Some of the source-level context variables, such as function parameters, +locals, globals visible in the compilation unit, may be visible to +probe handlers. They may refer to these variables by prefixing their +name with "$" within the scripts. In addition, a special syntax +allows limited traversal of structures, pointers, and arrays. +.TP +$var +refers to an in-scope variable "var". If it's an integer-like type, +it will be cast to a 64-bit int for systemtap script use. String-like +pointers (char *) may be copied to systemtap string values using the +.IR kernel_string " or " user_string +functions. +.TP +$var\->field +traversal to a structure's field. The indirection operator +may be repeated to follow more levels of pointers. +.TP +$return +is available in return probes only for functions that are declared +with a return value. +.TP +.TP +$var[N] +indexes into an array. The index is given with a +literal number. +.TP +$$vars +expands to a character string that is equivalent to +sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", parm1, ..., parmN, +var1, ..., varN) +.TP +$$locals +expands to a subset of $$vars for only local variables. +.TP +$$parms +expands to a subset of $$vars for only function parameters. +.TP +$$return +is available in return probes only. It expands to a string that +is equivalent to sprintf("return=%x", $return) +if the probed function has a return value, or else an empty string. +.PP +For ".return" probes, context variables other than the "$return" +value itself are only available for the function call parameters. +The expressions evaluate to the +.IR entry-time +values of those variables, since that is when a snapshot is taken. +Other local variables are not generally accessible, since by the time +a ".return" probe hits, the probed function will have already returned. + + +.SS USER-SPACE +Early prototype support for user-space probing is available in the +form of a non-symbolic probe point: +.SAMPLE +process(PID).statement(ADDRESS).absolute +.ESAMPLE +is analogous to +.IR +kernel.statement(ADDRESS).absolute +in that both use raw (unverified) virtual addresses and provide +no $variables. The target PID parameter must identify a running +process, and ADDRESS should identify a valid instruction address. +All threads of that process will be probed. +.PP +Additional user-space probing is available in the following forms: +.SAMPLE +process(PID).begin +process("PATH").begin +process.begin +process(PID).thread.begin +process("PATH").thread.begin +process.thread.begin +process(PID).end +process("PATH").end +process.end +process(PID).thread.end +process("PATH").thread.end +process.thread.end +process(PID).syscall +process("PATH").syscall +process.syscall +process(PID).syscall.return +process("PATH").syscall.return +process.syscall.return +process(PID).insn +process("PATH").insn +process(PID).insn.block +process("PATH").insn.block +process("PATH").mark("LABEL") +.ESAMPLE +.PP +A +.B .begin +probe gets called when new process described by PID or PATH gets created. +A +.B .thread.begin +probe gets called when a new thread described by PID or PATH gets created. +A +.B .end +probe gets called when process described by PID or PATH dies. +A +.B .thread.end +probe gets called when a thread described by PID or PATH dies. +A +.B .syscall +probe gets called when a thread described by PID or PATH makes a +system call. The system call number is available in the +.BR $syscall +context variable, and the first 6 arguments of the system call +are available in the +.BR $argN +(ex. $arg1, $arg2, ...) context variable. +A +.B .syscall.return +probe gets called when a thread described by PID or PATH returns from a +system call. The system call number is available in the +.BR $syscall +context variable, and the return value of the system call is available +in the +.BR $return +context variable. +A +.B .insn +probe gets called for every single-stepped instruction of the process described by PID or PATH. +A +.B .insn.block +probe gets called for every block-stepped instruction of the process described by PID or PATH. +A +.B .mark +probe gets called via a static probe which is defined in the +application by +STAP_PROBE1(handle,LABEL,arg1), which is defined in sdt.h. The handle is an application handle, +LABEL corresponds to the .mark argument, and arg1 is the argument. +STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2 is used +for probes with 2 arguments, and so on. +The arguments of the probe are available in the context variables +$arg1, $arg2, ... An alternative to using the STAP_PROBE macros is to +use the dtrace script to create custom macros. +.PP +Note that +.I PATH +names refer to executables that are searched the same way shells do: relative +to the working directory if they contain a "/" character, otherwise in +.BR $PATH . +If a process probe is specified without a PID or PATH, all user +threads are probed. + +.SS PROCFS + +These probe points allow procfs "files" in +/proc/systemtap/MODNAME to be created, read and written +.RI ( MODNAME +is the name of the systemtap module). The +.I proc +filesystem is a pseudo-filesystem which is used an an interface to +kernel data structures. There are four probe point variants supported +by the translator: + +.SAMPLE +procfs("PATH").read +procfs("PATH").write +procfs.read +procfs.write +.ESAMPLE + +.I PATH +is the file name (relative to /proc/systemtap/MODNAME) to be created. +If no +.I PATH +is specified (as in the last two variants above), +.I PATH +defaults to "command". +.PP +When a user reads /proc/systemtap/MODNAME/PATH, the corresponding +procfs +.I read +probe is triggered. The string data to be read should be assigned to +a variable named +.IR $value , +like this: + +.SAMPLE +procfs("PATH").read { $value = "100\\n" } +.ESAMPLE +.PP +When a user writes into /proc/systemtap/MODNAME/PATH, the +corresponding procfs +.I write +probe is triggered. The data the user wrote is available in the +string variable named +.IR $value , +like this: + +.SAMPLE +procfs("PATH").write { printf("user wrote: %s", $value) } +.ESAMPLE + +.SS MARKERS + +This family of probe points hooks up to static probing markers +inserted into the kernel or modules. These markers are special macro +calls inserted by kernel developers to make probing faster and more +reliable than with DWARF-based probes. Further, DWARF debugging +information is +.I not +required to probe markers. + +Marker probe points begin with +.BR kernel . +The next part names the marker itself: +.BR mark("name") . +The marker name string, which may contain the usual wildcard characters, +is matched against the names given to the marker macros when the kernel +and/or module was compiled. Optionally, you can specify +.BR format("format") . +Specifying the marker format string allows differentation between two +markers with the same name but different marker format strings. + +The handler associated with a marker-based probe may read the +optional parameters specified at the macro call site. These are +named +.BR $arg1 " through " $argNN , +where NN is the number of parameters supplied by the macro. Number +and string parameters are passed in a type-safe manner. + +The marker format string associated with a marker is available in +.BR $format . +And also the marker name string is avalable in +.BR $name . + +.SS TRACEPOINTS + +This family of probe points hooks up to static probing tracepoints +inserted into the kernel or modules. As with markers, these +tracepoints are special macro calls inserted by kernel developers to +make probing faster and more reliable than with DWARF-based probes, +and DWARF debugging information is not required to probe tracepoints. +Tracepoints have an extra advantage of more strongly-typed parameters +than markers. + +Tracepoint probes begin with +.BR kernel . +The next part names the tracepoint itself: +.BR trace("name") . +The tracepoint name string, which may contain the usual wildcard +characters, is matched against the names defined by the kernel +developers in the tracepoint header files. + +The handler associated with a tracepoint-based probe may read the +optional parameters specified at the macro call site. These are +named according to the declaration by the tracepoint author. For +example, the tracepoint probe +.BR kernel.trace("sched_switch") +provides the parameters +.BR $rq ", " $prev ", and " $next . +If the parameter is a complex type, as in a struct pointer, then a +script can access fields with the same syntax as DWARF $target +variables. Also, tracepoint parameters cannot be modified, but in +guru-mode a script may modify fields of parameters. + +The name of the tracepoint is available in +.BR $$name , +and a string of name=value pairs for all parameters of the tracepoint +is available in +.BR $$vars " or " $$parms . + +.SS PERFORMANCE MONITORING HARDWARE + +The perfmon family of probe points is used to access the performance +monitoring hardware available in modern processors. This family of +probes points needs the perfmon2 support in the kernel to access the +performance monitoring hardware. +.PP +Performance monitor hardware points begin with a +.BR perfmon ". " +The next part of the names the event being counted +.BR counter("event") . +The event names are processor implementation specific with the +execption of the generic +.BR cycles " and " instructions +events, which are available on all processors. This sets up a counter +on the processor to count the number of events occuring on the +processor. For more details on the performance monitoring events +available on a specific processor use the command perfmon2 command: + +.SAMPLE +pfmon \-l +.ESAMPLE +.TP +$counter +is a handle used in the body of the probe for operations +involving the counter associated with the probe. +.TP +read_counter +is a function that is passed the handle for the perfmon probe and returns +the current count for the event. + +.SH EXAMPLES +.PP +Here are some example probe points, defining the associated events. +.TP +begin, end, end +refers to the startup and normal shutdown of the session. In this +case, the handler would run once during startup and twice during +shutdown. +.TP +timer.jiffies(1000).randomize(200) +refers to a periodic interrupt, every 1000 +/\- 200 jiffies. +.TP +kernel.function("*init*"), kernel.function("*exit*") +refers to all kernel functions with "init" or "exit" in the name. +.TP +kernel.function("*@kernel/sched.c:240") +refers to any functions within the "kernel/sched.c" file that span +line 240. +.TP +kernel.mark("getuid") +refers to an STAP_MARK(getuid, ...) macro call in the kernel. +.TP +module("usb*").function("*sync*").return +refers to the moment of return from all functions with "sync" in the +name in any of the USB drivers. +.TP +kernel.statement(0xc0044852) +refers to the first byte of the statement whose compiled instructions +include the given address in the kernel. +.TP +kernel.statement("*@kernel/sched.c:2917") +refers to the statement of line 2917 within "kernel/sched.c". +.TP +kernel.statement("bio_init@fs/bio.c+3") +refers to the statement at line bio_init+3 within "fs/bio.c". +.TP +syscall.*.return +refers to the group of probe aliases with any name in the third position + +.SH SEE ALSO +.IR stap (1), +.IR stapprobes.iosched (3stap), +.IR stapprobes.netdev (3stap), +.IR stapprobes.nfs (3stap), +.IR stapprobes.nfsd (3stap), +.IR stapprobes.pagefault (3stap), +.IR stapprobes.process (3stap), +.IR stapprobes.rpc (3stap), +.IR stapprobes.scsi (3stap), +.IR stapprobes.signal (3stap), +.IR stapprobes.socket (3stap), +.IR stapprobes.tcp (3stap), +.IR stapprobes.udp (3stap), +.IR proc (3stap) |