.\" -*- nroff -*- .TH STAP 1 @DATE@ "Red Hat" .SH NAME stap \- systemtap script translator/driver .SH SYNOPSIS .br .B stap [ .I OPTIONS ] .I FILENAME .br .B stap [ .I OPTIONS ] .B \- .br .B stap [ .I OPTIONS ] .BI \-e " SCRIPT" .SH DESCRIPTION The .IR stap program is the front-end to the Systemtap tool. It accepts probing instructions (written in a simple scripting language), translates those instructions into C code, compiles this C code, and loads the resulting kernel module into a running Linux kernel to perform the requested system trace/probe functions. You can supply the script in a named file, from standard input, or from the command line. .PP The language, which is described in a later section, is strictly typed, declaration free, procedural, and inspired by .IR dtrace and .IR awk . It allows source code points or events in the kernel to be associated with handlers, which are subroutines that are executed synchronously. It is somewhat similar conceptually to "breakpoint command lists" in the .IR gdb debugger. .PP This manual corresponds to version @VERSION@. .SH OPTIONS The systemtap translator supports the following options. Any other option prints a list of supported options. .\" undocumented for now: .\" -t test mode .\" -r RELEASE .TP .B \-v Verbose mode. Produces more informative output. .TP .B \-h Show help message. .TP .B \-V Show version message. .TP .B \-k Keep the temporary directory after all processing. This may be useful in order to examine the generated C code, or to reuse the compiled kernel object. .TP .B \-g Guru mode. Enables parsing of unsafe expert-level constructs like embedded C. .TP .BI \-p " NUM" Stop after pass NUM. The passes are numbered 1-5: parse, elaborate, translate, compile, run. See the .B PROCESSING section for details. .TP .BI \-I " DIR" Add the given directory to the tapset search directory. See the description of pass 2 for details. .TP .BI \-R " DIR" Look for the systemtap runtime sources in the given directory. .TP .BI \-m " MODULE" Use the given name for the generated kernel object module, instead of a unique randomized name. .TP .BI \-o " FILE" Send standard output to named file. .SH SCRIPT LANGUAGE The systemtap script language resembles .IR awk . There are two main outermost constructs: probes and functions. Within these, statements and expressions use C-like operator syntax and precedence. .SS GENERAL SYNTAX Whitespace is ignored. Three forms of comments are supported: .RS .br .BR # " ... shell style, to the end of line" .br .BR // " ... C++ style, to the end of line" .br .BR /* " ... C style ... " */ .RE Literals are either strings enclosed in double-quotes (soon supporting the usual C escape codes with backslashes), or integers (in decimal, hexadecimal, or octal, using the same notation as in C). All strings are limited in length to some reasonable value (a few hundred bytes). Integers are 64-bit signed quantities, although the parser also accepts (and wraps around) values above positive 2**63. .SS VARIABLES Identifiers for variables and functions are an alphanumeric sequence, and may include "_" and "$" characters. They may not start with a plain digit, as in C. Each variable is by default local to the probe or function statement block within which it is mentioned, and therefore its scope and lifetime is limited to a particular probe or function invocation. .\" XXX add statistics type here once it's supported .PP Scalar variables are implicitly typed as either string or integer. Associative arrays also have a string or integer value, and a a tuple of strings and/or integers serving as a key. The translator performs .I type inference on all identifiers, including array indexes and function parameters. Inconsistent type-related use of identifiers signals an error. .PP Variables may be declared global, so that they are shared amongst all probes and live as long as the entire systemtap session. There is one namespace for all global variables, regardless of which script file they are found within. A global declaration may be written at the outermost level anywhere, not within a block of code. The following declaration marks "var1" and "var2" as global. The translator will infer for each its value type, and if it is used as an array, its key types. .RS .BR global " var1" , " var2" .RE .\" XXX add statistics type here once it's supported .SS STATEMENTS Statements enable procedural control flow. They may occur within functions and probe handlers. The total number of statements executed in response to any single probe event is limited to some number defined by a macro in the translated C code, and is in the neighbourhood of 1000. .TP EXP Execute the string- or integer-valued expression and throw away the value. .TP .BR { " STMT1 STMT2 ... " } Execute each statement in sequence in this block. Note that separators or terminators are generally not necessary between statements. .TP .BR ; Null statement, do nothing. It is useful as an optional separator between statements to improve syntax-error detection and to handle certain grammar ambiguities. .TP .BR if " (EXP) STMT1 [ " else " STMT2 ]" Compare integer-valued EXP to zero. Execute the first (non-zero) or second STMT (zero). .TP .BR while " (EXP) STMT" While integer-valued EXP evaluates to non-zero, execute STMT. .TP .BR for " (EXP1; EXP2; EXP2) STMT" Execute EXP2 as initialization. While EXP1 is non-zero, execute STMT, then the iteration expression EXP1. .TP .BR foreach " (VAR " in " ARRAY) STMT" Loop over each element of the named global array, assigning current key to VAR. The array may not be modified within the statement. .TP .BR foreach " ([VAR1, VAR2, ...] " in " ARRAY) STMT" Same as above, used when the array is indexed with a tuple of keys. .TP .BR break ", " continue Exit or iterate the innermost nesting loop .RB ( while " or " for " or " foreach ) statement. .TP .BR return " EXP" Return EXP value from enclosing function. If the function's value is not taken anywhere, then a return statement is not needed, and the function will have a special "unknown" type with no return value. .TP .BR next Return now from enclosing probe handler. .SS EXPRESSIONS Systemtap supports a number of operators that have the same general syntax, semantics, and precedence as in C and awk. Arithmetic is performed as per C rules. Division by zero is detected and results in an error. .TP binary numeric operators .B * / % + - >> << & ^ | && || .TP binary string operators .B . (string concatenation) .TP numeric assignment operators .B = *= /= %= += -= >>= <<= &= ^= |= .TP string assignment operators .B = .= .TP unary numeric operators .B - ! ~ ++ -- .TP binary numeric or string comparison operators .B < > <= >= == != .TP ternary operator .RB cond " ? " exp1 " : " exp2 .TP grouping operator .BR ( " exp " ) .TP function call .RB "fn " ( "[ arg1, arg2, ... ]" ) .SS PROBES The main construct in the scripting language identifies probes. Probes associate abstract events with a statement block ("probe handler") that is to be executed when those events occur. The general syntax is as follows: .RS .br .nh .nf .BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " } .hy .fi .RE .PP Events are specified in a special syntax called "probe points". One family refers to specific points in a kernel, which are identified by module, source file, line number, function name, C label name, or some combination of these. This kind of "synchronous" event is deemed to occur when any processor executes an instruction matched by the specification. Other families of probe points refer to "asynchronous" events such as timers/counters rolling over, where there is no fixed execution point that is related. Each probe point specification may match multiple physical locations, all of which are then probed. A probe declaration may also contain several comma-separated specifications, all of which are probed. .PP Here is a list of probe point families currently supported. The .B .function variant places a probe near the beginning of the named function, so that parameters are available as context variables. The .B .return variant places a probe at the moment of return from the named function, so the return value is available as the "$retvalue" context variable. The .B .statement variant places a probe at the exact spot, exposing those local variables that are visible there. .RS .nf .br kernel.function(PATTERN) .br kernel.function(PATTERN).return .br module(MPATTERN).function(PATTERN) .br module(MPATTERN).function(PATTERN).return .br kernel.statement(PATTERN) .br module(MPATTERN).statement(PATTERN) .br timer.jiffies(NUM) .br timer.jiffies(NUM).randomize(RAND) .fi .RE .PP In the above list, MPATTERN stands for a string literal that aims to identify the loaded kernel module of interest. It may include "*" and "?" wildcards. PATTERN stands for a string literal that aims to identify a point in the program. It is made up of three parts. The first part is the name of a function, as would appear in the .I nm program's output. This part may use the "*" and "?" wildcarding operators to match multiple names. The second part is optional, and begins with the "@" character. It is followed by a source file name wildcard pattern, such as .IR mm/slab* . Finally, the third part is optional if the file name part was given, and identifies the line number in the source file, preceded by a ":". As an alternative, PATTERN may be a numeric constant, indicating an (module-relative or kernel-absolute) address. .PP The timer-based asynchronous probe points run the given handler every NUM jiffies. If given, the random value in the range [-RAND..RAND] is added to NUM every time the handler is run. .PP Here are some example probe points: .TP kernel.function("*init*"), kernel.function("*exit*") refers to all kernel functions with "init" or "exit" in the name. .TP kernel.function("*@kernel/sched.c:240") refers to any functions within the "kernel/sched.c" file that span line 240. .TP module("usb*").function("*sync*").return refers to the moment of return from all functions with "sync" in the name in any of the USB drivers. .TP kernel.statement(0xc0044852) refers to the first byte of the statement whose compiled instructions include the given address in the kernel. .TP timer.jiffies(1000).randomize(200) refers to a periodic interrupt, every 1000 +/- 200 jiffies. .PP When any matching event occurs, the probe handler is run within that context. For events that are defined by execution of specific parts of code, this context may include variables defined in the source code at that spot. These "target variables" are presented to the script as variables whose names are prefixed with "$". They may be read/written only if the kernel's compiler preserved them despite optimization. This is the same constraint that a debugger user faces when working with optimized code. Asynchronous probes have very little context. .PP In addition, "probe aliases" may be defined. Probe aliases look similar to probe definitions, but instead of activating a probe at the given point, it defines a new probe point name to alias an existing one. This is identified by the "=" assignment operator. In addition, the probe handler defined with an alias is implicitly added as a prologue to any probe that refers to the alias. For example: .RS .nf .nh probe syscall("read") = kernel.function("sys_read") { fildes = $fd } .hy .fi .RE defines a new probe point .nh .IR syscall("read") , .hy which expands to .nh .IR kernel.function("sys_read") , .hy with the given assignment as a prologue. Another probe definition may use the alias like this: .RS .nf probe syscall("read") { printk ("reading fd=" . string (fildes)) } .fi .RE .SS FUNCTIONS Systemtap scripts may define subroutines to factor out common work. Functions take any number of scalar (integer or string) arguments, and must return a single scalar (integer or string). An example function declaration looks like this: .RS .nf function thisfn (arg1, arg2) { return arg1 + arg2 } .fi .RE Note the usual absence of type declarations, which are instead inferred by the translator. Functions may call others or themselves recursively, up to a fixed nesting limit. This limit is defined by a macro in the translated C code and is in the neighbourhood of 30. .SS EMBEDDED C When in guru mode, the translator accepts embedded code in the script. Such code is enclosed between .IR %{ and .IR %} markers, and is transcribed verbatim, without analysis, in some sequence, into the generated C code. At the outermost level, this may be useful to add .IR #include instructions, and any auxiliary definitions for use by other embedded code. .PP The other place where embedded code is permitted is as a function body. In this case, the script language body is replaced entirely by a piece of C code enclosed again between .IR %{ " and " %} markers. This C code may do anything reasonable and safe. There are a number of undocumented but complex safety constraints on concurrency, resource consumption, and runtime limits, so this is an advanced technique. .PP The memory locations set aside for input and output values are made available to it using a macro .IR THIS . Here are some examples: .RS .br .nf function add_one (val) %{ THIS->__retvalue = THIS->val + 1; %} function add_one_str (val) %{ strncpy (THIS->__retvalue, THIS->val, MAXSTRINGLEN); strncat (THIS->__retvalue, "one", MAXSTRINGLEN); %} .fi .RE The function argument and return value types have to be inferred by the translator from the call sites in order for this to work. The user should examine C code generated for ordinary script-language functions in order to write compatible embedded-C ones. .SS BUILT-INS A set of builtin functions and probe aliases are provided by the scripts installed under the .nh .IR /usr/share/systemtap/tapset .hy directory. .SH PROCESSING The translator begins pass 1 by parsing the given input script, and all scripts (files named .IR *.stp ) found in a tapset directory. The directories listed with .BR -I are processed in sequence. For each directory, a number of subdirectories are also searched. These subdirectories are derived from the selected kernel version (the .BR -R option), in order to allow more kernel-version-specific scripts to override less specific ones. For example, for a kernel version .IR 2.6.12-23.FC3 the following patterns would be searched, in sequence: .IR 2.6.12-23.FC3/*.stp , .IR 2.6.12/*.stp , .IR 2.6/*.stp , and finally .IR *.stp Stopping the translator after pass 1 causes it to print the parse trees. .PP In pass 2, the translator analyzes the input script to resolve symbols and types. References to variables, functions, and probe aliases that are unresolved internally are satisfied by searching through the parsed tapset scripts. If any tapset script is selected because it defines an unresolved symbol, then the entirety of that script is added to the translator's resolution queue. This process iterates until all symbols are resolved and a subset of tapset scripts is selected. .PP Next, all probe point descriptions are validated against the wide variety supported by the translator. Probe points that refer to code locations ("synchronous probe points") require the appropriate kernel debugging information to be installed. In the associated probe handlers, target-side variables (whose names begin with "$") are found and have their run-time locations decoded. .PP Finally, all variable, function, parameter, array, and index types are inferred from context (literals and operators). Stopping the translator after pass 2 causes it to list all the probes, functions, and variables, along with all inferred types. Any inconsistent or unresolved types cause an error. .PP In pass 3, the translator writes C code that represents the actions of all selected script files, and creates a .IR Makefile to build that into a kernel object. These files are placed into a temporary directory. Stopping the translator at this point causes it to print the contents of the C file. .PP In pass 4, the translator invokes the Linux kernel build system to create the actual kernel object file. This involves running .IR make in the temporary directory, and requires a kernel module build system (headers, config and Makefiles) to be installed in the usual spot .IR /lib/modules/VERSION/build . Stopping the translator after pass 4 is the last chance before running the kernel object. This may be useful if you want to archive the file. .PP In pass 5, the translator invokes the systemtap auxiliary program .I stpd program for the given kernel object. This program arranges to load the module then communicates with it, copying trace data from the kernel into temporary files, until the user sends an interrupt signal. Any run-time error encountered by the probe handlers, such as running out of memory, division by zero, exceeding nesting or runtime limits, results in an error condition that prevents further probes from running. Finally, stpd unloads the module, and cleans up. .SH EXAMPLES To trace entry and exit from a function, use a pair of probes: .RS .br .nf probe kernel.function("foo") { log ("enter") } probe kernel.function("foo").return { log ("exit") } .fi .RE To list the probeable functions in the kernel, use .RS .br .nf stap -p2 -e 'probe kernel.function("*") {}' .fi .RE .SH SAFETY AND SECURITY Systemtap is an administrative tool. It exposes kernel internal data structures and potentially private user information. It acquires root privileges to actually run the kernel objects it builds using the .IR sudo command applied to the .IR stpd program. The latter is a part of the Systemtap package, dedicated to module loading and unloading (but only in the white zone), and kernel-to-user data transfer. Since .IR stpd does not perform any additional security checks on the kernel objects it is given, it would be unwise for a system administrator to give even targeted .IR sudo privileges to untrusted users. .PP The translator asserts certain safety constraints. It aims to ensure that no handler routine can run for very long, allocate memory, perform unsafe operations, or in unintentionally interfere with the kernel. Use of guru mode constructs such as embedded C can violate these constraints, leading to kernel crash or data corruption. .SH FILES .\" consider autoconf-substituting these directories .TP /tmp/stapXXXXXX Temporary directory for systemtap files, including translated C code and kernel object. .TP /usr/share/systemtap/tapset The automatic tapset search directory, unless overridden by the .I SYSTEMTAP_TAPSET environment variable. .TP /usr/share/systemtap/runtime The runtime sources, unless overridden by the .I SYSTEMTAP_RUNTIME environment variable. .TP /lib/modules/VERSION/build The location of kernel module building infrastructure. .TP /usr/lib/debug/lib/modules/VERSION The location of kernel debugging information when packaged into the .IR kernel-debuginfo RPM. .TP /usr/libexec/systemtap/stpd The auxiliary program supervising module loading, interaction, and unloading. .SH SEE ALSO .IR dtrace (1), .IR dprobes (1), .IR awk (1), .IR sudo (8), .IR elfutils (3), .IR gdb (1) .SH BUGS There are numerous missing features and possibly numerous bugs. Use the Bugzilla link off of the project web page: .nh .BR http://sources.redhat.com/systemtap/ . .hy .SH AUTHORS The .IR stap translator was written by Frank Ch. Eigler and Graydon Hoare. The kernel-side runtime library and the user-level .IR stpd daemon was written by Martin Hunt and Tom Zanussi. Contact them using the public mailing list: .nh .BR . .hy .SH ACKNOWLEDGEMENTS The script language design was inspired by Sun's .IR dtrace . The primary probing mechanism uses IBM's .IR kprobes , and .IR relayfs packages, which were improved and ported by IBM and Intel staff. The elfutils library from Ulrich Drepper and Roland McGrath is used to process dwarf debugging information. Many project members contributed to the overall design and priorities of the system, including Will Cohen, Jim Keniston, Vara Prasad, and Brad Chen.