diff options
Diffstat (limited to 'stap.1.in')
-rw-r--r-- | stap.1.in | 615 |
1 files changed, 615 insertions, 0 deletions
diff --git a/stap.1.in b/stap.1.in new file mode 100644 index 00000000..dc421cdd --- /dev/null +++ b/stap.1.in @@ -0,0 +1,615 @@ +.TH STAP 1 @DATE@ "Red Hat" +.SH NAME +stap \- systemtap script translator/driver +.SH SYNOPSIS + +.br +.B stap +[ +.I OPTIONS +] +.I FILENAME +.br +.B stap +[ +.I OPTIONS +] +.B \- +.br +.B stap +[ +.I OPTIONS +] +.BI \-e " SCRIPT" + +.SH DESCRIPTION + +The +.IR stap +program is the front-end to the Systemtap tool. It accepts probing +instructions (written in a simple scripting language), translates +those instructions into C code, compiles this C code, and loads the +resulting kernel module into a running Linux kernel to perform the +requested system trace/probe functions. You can supply the script in +a named file, from standard input, or from the command line. +.PP +The language, which is described in a later section, is strictly typed, +declaration free, procedural, and inspired by +.IR dtrace +and +.IR awk . +It allows source code points or events in the kernel to be associated +with handlers, which are subroutines that are executed synchronously. It is +somewhat similar conceptually to "breakpoint command lists" in the +.IR gdb +debugger. +.PP +This manual corresponds to version @VERSION@. + +.SH OPTIONS +The systemtap translator supports the following options. Any other option +prints a list of supported options. +.\" undocumented for now: +.\" -t test mode +.\" -r RELEASE +.TP +.B \-v +Verbose mode. Produces more informative output. +.TP +.B \-k +Keep the temporary directory after all processing. This may be useful +in order to examine the generated C code, or to reuse the compiled +kernel object. +.TP +.B \-g +Guru mode. Enables parsing of unsafe expert-level constructs like +embedded C. +.TP +.BI \-p " NUM" +Stop after pass NUM. The passes are numbered 1-5: parse, elaborate, +translate, compile, run. See the +.B PROCESSING +section for details. +.TP +.BI \-I " DIR" +Add the given directory to the tapset search directory. See the +description of pass 2 for details. +.TP +.BI \-R " DIR" +Look for the systemtap runtime sources in the given directory. +.TP +.BI \-m " MODULE" +Use the given name for the generated kernel object module, instead +of a unique randomized name. +.TP +.BI \-o " FILE" +Send standard output to named file. + +.SH SCRIPT LANGUAGE + +The systemtap script language resembles +.IR awk . +There are two main outermost constructs: probes and functions. Within +these, statements and expressions use C-like operator syntax and +precedence. + +.SS GENERAL SYNTAX +Whitespace is ignored. Three forms of comments are supported: +.RS +.br +.BR # " ... shell style, to the end of line" +.br +.BR // " ... C++ style, to the end of line" +.br +.BR /* " ... C style ... " */ +.RE +Literals are either strings enclosed in double-quotes (soon supporting +the usual C escape codes with backslashes), or integers (in decimal, +hexadecimal, or octal, using the same notation as in C). All strings +are limited in length to some reasonable value (a few hundred bytes). +Integers are 64-bit signed quantities, although the parser also accepts +(and wraps around) values above positive 2**63. + +.SS VARIABLES +Identifiers for variables and functions are an alphanumeric sequence, +and may include "_" and "$" characters. They may not start with a +plain digit, as in C. Each variable is by default local to the probe +or function statement block within which it is mentioned, and therefore +its scope and lifetime is limited to a particular probe or function +invocation. +.\" XXX add statistics type here once it's supported +.PP +Scalar variables are implicitly typed as either string or integer. +Associative arrays also have a string or integer value, and a +a tuple of strings and/or integers serving as a key. +The translator performs +.I type inference +on all identifiers, including array indexes and function parameters. +Inconsistent type-related use of identifiers signals an error. +.PP +Variables may be declared global, so that they are shared amongst all +probes and live as long as the entire systemtap session. There is one +namespace for all global variables, regardless of which script file +they are found within. A global declaration may be written at the +outermost level anywhere, not within a block of code. The following +declaration marks "var1" and "var2" as global. The translator will +infer for each its value type, and if it is used as an array, its key +types. +.RS +.BR global " var1" , " var2" +.RE +.\" XXX add statistics type here once it's supported + +.SS STATEMENTS +Statements enable procedural control flow. They may occur within +functions and probe handlers. The total number of statements executed +in response to any single probe event is limited to some number +defined by a macro in the translated C code, and is in the +neighbourhood of 1000. +.TP +EXP +Execute the string- or integer-valued expression and throw away +the value. +.TP +.BR { " STMT1 STMT2 ... " } +Execute each statement in sequence in this block. Note that +separators or terminators are generally not necessary between statements. +.TP +.BR ; +Null statement, do nothing. It is useful as an optional separator between +statements to improve syntax-error detection and to handle certain +grammar ambiguities. +.TP +.BR if " (EXP) STMT1 [ " else " STMT2 ]" +Compare integer-valued EXP to zero. Execute the first (non-zero) +or second STMT (zero). +.TP +.BR while " (EXP) STMT" +While integer-valued EXP evaluates to non-zero, execute STMT. +.TP +.BR for " (EXP1; EXP2; EXP2) STMT" +Execute EXP2 as initialization. While EXP1 is non-zero, execute +STMT, then the iteration expression EXP1. +.TP +.BR foreach " (VAR " in " ARRAY) STMT" +Loop over each element of the named global array, assigning current +key to VAR. The array may not be modified within the statement. +.TP +.BR foreach " ([VAR1, VAR2, ...] " in " ARRAY) STMT" +Same as above, used when the array is indexed with a tuple of keys. +.TP +.BR break ", " continue +Exit or iterate the innermost nesting loop +.RB ( while " or " for " or " foreach ) +statement. +.TP +.BR return " EXP" +Return EXP value from enclosing function. A return value is mandatory, +since void functions are not supported. +.TP +.BR next +Return now from enclosing probe handler. + +.SS EXPRESSIONS +Systemtap supports a number of operators that have the same general syntax, +semantics, and precedence as in C and awk. Arithmetic is performed as per +C rules. Division by zero is detected and results in an error. +.TP +binary numeric operators +.B * / % + - >> << & ^ | && || +.TP +binary string operators +.B . +(string concatenation) +.TP +numeric assignment operators +.B = *= /= %= += -= >>= <<= &= ^= |= +.TP +string assignment operators +.B = .= +.TP +unary numeric operators +.B - ! ~ ++ -- +.TP +binary numeric or string comparison operators +.B < > <= >= == != +.TP +ternary operator +.RB cond " ? " exp1 " : " exp2 +.TP +grouping operator +.BR ( " exp " ) +.TP +function call +.RB "fn " ( "[ arg1, arg2, ... ]" ) + +.SS PROBES +The main construct in the scripting language identifies probes. +Probes associate abstract events with a statement block ("probe +handler") that is to be executed when those events occur. The +general syntax is as follows: +.RS +.br +.nh +.nf +.BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " } +.hy +.fi +.RE +.PP +Events are specified in a special syntax called "probe points". One +family refers to specific points in a kernel, which are identified by +module, source file, line number, function name, C label name, or some +combination of these. This kind of "synchronous" event is deemed to +occur when any processor executes an instruction matched by the +specification. Other families of probe points refer to "asynchronous" +events such as timers/counters rolling over, where there is no fixed +execution point that is related. Each probe point specification may +match multiple physical locations, all of which are then probed. A +probe declaration may also contain several comma-separated +specifications, all of which are probed. +.PP +Here is a list of probe point families currently supported. The +.B .function +variant places a probe near the beginning of the named function, so that +parameters are available as context variables. The +.B .return +variant places a probe at the moment of return from the named function, so +the return value is available as the "$retvalue" context variable. +The +.B .statement +variant places a probe at the exact spot, exposing those local variables +that are visible there. +.RS +.nf +.br +kernel.function(PATTERN) +.br +kernel.function(PATTERN).return +.br +module(MPATTERN).function(PATTERN) +.br +module(MPATTERN).function(PATTERN).return +.br +kernel.statement(PATTERN) +.br +module(MPATTERN).statement(PATTERN) +.fi +.RE +.PP +In the above list, MPATTERN stands for a string literal that aims to +identify the loaded kernel module of interest. It may include "*" and +"?" wildcards. PATTERN stands for a string literal that aims to +identify a point in the program. It is made up of three parts. The +first part is the name of a function, as would appear in the +.I nm +program's output. This part may use the "*" and "?" wildcarding +operators to match multiple names. The second part is optional, and +begins with the "@" character. It is followed by a source file name +wildcard pattern, such as +.IR mm/slab* . +Finally, the third part is optional if the file name part was given, +and identifies the line number in the source file, preceded by a ":". +As an alternative, PATTERN may be a numeric constant, indicating an +(module-relative or kernel-absolute) address. +.PP +Here are some example probe points: +.TP +kernel.function("*init*"), kernel.function("*exit*") +refers to all kernel functions with "init" or "exit" in the name. +.TP +kernel.function("*@kernel/sched.c:240") +refers to any functions within the "kernel/sched.c" file that span +line 240. +.TP +module("usb*").function("*sync*").return +refers to the moment of return from all functions with "sync" in the +name in any of the USB drivers. +.TP +kernel.statement(0xc0044852) +refers to the first byte of the statement whose compiled instructions +include the given address in the kernel. + +.PP +When any matching event occurs, the probe handler is run within that +context. For events that are defined by execution of specific parts +of code, this context may include variables defined in the source code +at that spot. These "target variables" are presented to the script as +variables whose names are prefixed with "$". They may be read/written +only if the kernel's compiler preserved them despite optimization. +This is the same constraint that a debugger user faces when working +with optimized code. Asynchronous probes have very little context. +.PP +In addition, "probe aliases" may be defined. Probe aliases look +similar to probe definitions, but instead of activating a probe at the +given point, it defines a new probe point name to alias an existing +one. This is identified by the "=" assignment operator. In addition, +the probe handler defined with an alias is implicitly added as a +prologue to any probe that refers to the alias. For example: +.RS +.nf +.nh +probe syscall("read") = kernel.function("sys_read") { + fildes = $fd +} +.hy +.fi +.RE +defines a new probe point +.nh +.IR syscall("read") , +.hy +which expands to +.nh +.IR kernel.function("sys_read") , +.hy +with the given assignment as a prologue. Another probe definition +may use the alias like this: +.RS +.nf +probe syscall("read") { + printk ("reading fd=" . string (fildes)) +} +.fi +.RE + +.SS FUNCTIONS +Systemtap scripts may define subroutines to factor out common work. +Functions take any number of scalar (integer or string) arguments, and +must return a single scalar (integer or string). An example function +declaration looks like this: +.RS +.nf +function thisfn (arg1, arg2) { + return arg1 + arg2 +} +.fi +.RE +Note the usual absence of type declarations, which are instead +inferred by the translator. Because a return value type is required, +each function must contain at least one +.I return +statement. Functions may call others or themselves recursively, up to +a fixed nesting limit. This limit is defined by a macro in the +translated C code and is in the neighbourhood of 30. + +.SS EMBEDDED C +When in guru mode, the translator accepts embedded code in the +script. Such code is enclosed between +.IR %{ +and +.IR %} +markers, and is transcribed verbatim, without analysis, in some +sequence, into the generated C code. At the outermost level, this may +be useful to add +.IR #include +instructions, and any auxiliary definitions for use by other embedded +code. +.PP +The other place where embedded code is permitted is as a function body. +In this case, the script language body is replaced entirely by a piece +of C code enclosed again between +.IR %{ " and " %} +markers. +This C code may do anything reasonable and safe. There are a number +of undocumented but complex safety constraints on concurrency, +resource consumption, and runtime limits, so this is an advanced +technique. +.PP +The memory locations set aside for input and output values +are made available to it using a macro +.IR THIS . +Here are some examples: +.RS +.br +.nf +function add_one (val) %{ + THIS->__retvalue = THIS->val + 1; +%} +function add_one_str (val) %{ + strncpy (THIS->__retvalue, THIS->val, MAXSTRINGLEN); + strncat (THIS->__retvalue, "one", MAXSTRINGLEN); +%} +.fi +.RE +The function argument and return value types have to be inferred by +the translator from the call sites in order for this to work. The +user should examine C code generated for ordinary script-language +functions in order to write compatible embedded-C ones. + +.SS BUILT-INS +A set of builtin functions and probe aliases are provided by the +scripts installed under the +.nh +.IR /usr/share/systemtap/tapset +.hy +directory. + +.SH PROCESSING +The translator begins pass 1 by parsing the given input script, +and all scripts (files named +.IR *.stp ) +found in a tapset directory. The directories listed +with +.BR -I +are processed in sequence. For each directory, a number of subdirectories +are also searched. These subdirectories are derived from the selected +kernel version (the +.BR -R +option), +in order to allow more kernel-version-specific scripts to override less +specific ones. For example, for a kernel version +.IR 2.6.12-23.FC3 +the following patterns would be searched, in sequence: +.IR 2.6.12-23.FC3/*.stp , +.IR 2.6.12/*.stp , +.IR 2.6/*.stp , +and finally +.IR *.stp +Stopping the translator after pass 1 causes it to print the parse trees. + +.PP +In pass 2, the translator analyzes the input script to resolve symbols +and types. References to variables, functions, and probe aliases that +are unresolved internally are satisfied by searching through the +parsed tapset scripts. If any tapset script is selected because it +defines an unresolved symbol, then the entirety of that script is +added to the translator's resolution queue. This process iterates +until all symbols are resolved and a subset of tapset scripts is +selected. +.PP +Next, all probe point descriptions are validated +against the wide variety supported by the translator. Probe points that +refer to code locations ("synchronous probe points") require the +appropriate kernel debugging information to be installed. In the +associated probe handlers, target-side variables (whose names begin +with "$") are found and have their run-time locations decoded. +.PP +Finally, all variable, function, parameter, array, and index types are +inferred from context (literals and operators). Stopping the +translator after pass 2 causes it to list all the probes, functions, +and variables, along with all inferred types. Any inconsistent or +unresolved types cause an error. + +.PP +In pass 3, the translator writes C code that represents the actions +of all selected script files, and creates a +.IR Makefile +to build that into a kernel object. These files are placed into a +temporary directory. Stopping the translator at this point causes +it to print the contents of the C file. + +.PP +In pass 4, the translator invokes the Linux kernel build system to +create the actual kernel object file. This involves running +.IR make +in the temporary directory, and requires a kernel module build +system (headers, config and Makefiles) to be installed in the usual +spot +.IR /lib/modules/VERSION/build . +Stopping the translator after pass 4 is the last chance before +running the kernel object. This may be useful if you want to +archive the file. + +.PP +In pass 5, the translator invokes the systemtap auxiliary program +.I stpd +program for the given kernel object. This program arranges to load +the module then communicates with it, copying trace data from the +kernel into temporary files, until the user sends an interrupt signal. +Any run-time error encountered by the probe handlers, such as running +out of memory, division by zero, exceeding nesting or runtime limits, +results in an error condition that prevents further probes from +running. Finally, stpd unloads the module, and cleans up. + +.SH EXAMPLES +To trace entry and exit from a function, use a pair of probes: +.RS +.br +.nf +probe kernel.function("foo") { log ("enter") } +probe kernel.function("foo").return { log ("exit") } +.fi +.RE + +To list the probeable functions in the kernel, use +.RS +.br +.nf +stap -p2 -e 'probe kernel.function("*") {}' +.fi +.RE + +.SH SAFETY AND SECURITY +Systemtap is an administrative tool. It exposes kernel internal data +structures and potentially private user information. It acquires root +privileges to actually run the kernel objects it builds using the +.IR sudo +command applied to the +.IR stpd +program. The latter is a part of the Systemtap package, dedicated to +module loading and unloading (but only in the white zone), and +kernel-to-user data transfer. Since +.IR stpd +does not perform any additional security checks on the kernel objects +it is given, it would be unwise for a system administrator to give +even targeted +.IR sudo +privileges to untrusted users. +.PP +The translator asserts certain safety constraints. It aims to ensure +that no handler routine can run for very long, allocate memory, +perform unsafe operations, or in unintentionally interfere with the +kernel. Use of guru mode constructs such as embedded C can violate +these constraints, leading to kernel crash or data corruption. + +.SH FILES +.\" consider autoconf-substituting these directories +.TP +/tmp/stapXXXXXX +Temporary directory for systemtap files, including translated C code +and kernel object. +.TP +/usr/share/systemtap/tapset +The automatic tapset search directory, unless overridden by +the +.I SYSTEMTAP_TAPSET +environment variable. +.TP +/usr/share/systemtap/runtime +The runtime sources, unless overridden by the +.I SYSTEMTAP_RUNTIME +environment variable. +.TP +/lib/modules/VERSION/build +The location of kernel module building infrastructure. +.TP +/usr/lib/debug/lib/modules/VERSION +The location of kernel debugging information when packaged into the +.IR kernel-debuginfo +RPM. +.TP +/usr/libexec/systemtap/stpd +The auxiliary program supervising module loading, interaction, and +unloading. + +.SH SEE ALSO +.IR dtrace (1), +.IR dprobes (1), +.IR awk (1), +.IR sudo (8), +.IR elfutils (3), +.IR gdb (1) + +.SH BUGS +There are numerous missing features and possibly numerous bugs. Use +the Bugzilla link off of the project web page: +.nh +.BR http://sources.redhat.com/systemtap/ . +.hy + +.SH AUTHORS +The +.IR stap +translator was written by Frank Ch. Eigler and Graydon Hoare. The +kernel-side runtime library and the user-level +.IR stpd +daemon was written by Martin Hunt and Tom Zanussi. Contact them +using the public mailing list: +.nh +.BR <systemtap@sources.redhat.com> . +.hy + +.SH ACKNOWLEDGEMENTS +The script language design was inspired by Sun's +.IR dtrace . +The primary probing mechanism uses IBM's +.IR kprobes , +and +.IR relayfs +packages, which were improved and ported by IBM and Intel staff. +The elfutils library from Ulrich Drepper and Roland McGrath is used +to process dwarf debugging information. Many project members contributed +to the overall design and priorities of the system, including Will Cohen, +Jim Keniston, Vara Prasad, and Brad Chen. + |