.\" -*- nroff -*-
.TH STAP 1 @DATE@ "Red Hat"
.SH NAME
stap \- systemtap script translator/driver
.SH SYNOPSIS

.br
.B stap
[
.I OPTIONS
]
.I FILENAME
.br
.B stap
[
.I OPTIONS
]
.B \-
.br
.B stap
[
.I OPTIONS
]
.BI \-e " SCRIPT"

.SH DESCRIPTION

The
.IR stap
program is the front-end to the Systemtap tool.  It accepts probing
instructions (written in a simple scripting language), translates
those instructions into C code, compiles this C code, and loads the
resulting kernel module into a running Linux kernel to perform the
requested system trace/probe functions.  You can supply the script in
a named file, from standard input, or from the command line.
.PP
The language, which is described in a later section, is strictly typed,
declaration free, procedural, and inspired by
.IR dtrace 
and
.IR awk .
It allows source code points or events in the kernel to be associated
with handlers, which are subroutines that are executed synchronously.  It is
somewhat similar conceptually to "breakpoint command lists" in the
.IR gdb
debugger.
.PP
This manual corresponds to version @VERSION@.

.SH OPTIONS
The systemtap translator supports the following options.  Any other option
prints a list of supported options.
.\" undocumented for now:
.\" -t test mode
.\" -r RELEASE
.TP
.B \-v
Verbose mode.  Produces more informative output.
.TP
.B \-h
Show help message.
.TP
.B \-V
Show version message.
.TP
.B \-k
Keep the temporary directory after all processing.  This may be useful
in order to examine the generated C code, or to reuse the compiled
kernel object.
.TP
.B \-g
Guru mode.  Enables parsing of unsafe expert-level constructs like
embedded C.
.TP
.BI \-p " NUM"
Stop after pass NUM.  The passes are numbered 1-5: parse, elaborate,
translate, compile, run.  See the
.B PROCESSING
section for details.
.TP
.BI \-I " DIR"
Add the given directory to the tapset search directory.  See the
description of pass 2 for details.
.TP
.BI \-R " DIR"
Look for the systemtap runtime sources in the given directory.
.TP
.BI \-m " MODULE"
Use the given name for the generated kernel object module, instead
of a unique randomized name.
.TP
.BI \-o " FILE"
Send standard output to named file.

.SH SCRIPT LANGUAGE

The systemtap script language resembles 
.IR awk .
There are two main outermost constructs: probes and functions.  Within
these, statements and expressions use C-like operator syntax and
precedence.

.SS GENERAL SYNTAX
Whitespace is ignored.  Three forms of comments are supported:
.RS
.br
.BR # " ... shell style, to the end of line"
.br
.BR // " ... C++ style, to the end of line"
.br
.BR /* " ... C style ... " */
.RE
Literals are either strings enclosed in double-quotes (soon supporting
the usual C escape codes with backslashes), or integers (in decimal,
hexadecimal, or octal, using the same notation as in C).  All strings
are limited in length to some reasonable value (a few hundred bytes).
Integers are 64-bit signed quantities, although the parser also accepts
(and wraps around) values above positive 2**63.  

.SS VARIABLES
Identifiers for variables and functions are an alphanumeric sequence,
and may include "_" and "$" characters.  They may not start with a
plain digit, as in C.  Each variable is by default local to the probe
or function statement block within which it is mentioned, and therefore
its scope and lifetime is limited to a particular probe or function
invocation.
.\" XXX add statistics type here once it's supported
.PP
Scalar variables are implicitly typed as either string or integer.
Associative arrays also have a string or integer value, and a
a tuple of strings and/or integers serving as a key.
The translator performs
.I type inference
on all identifiers, including array indexes and function parameters.
Inconsistent type-related use of identifiers signals an error.
.PP
Variables may be declared global, so that they are shared amongst all
probes and live as long as the entire systemtap session.  There is one
namespace for all global variables, regardless of which script file
they are found within.  A global declaration may be written at the
outermost level anywhere, not within a block of code.  The following
declaration marks "var1" and "var2" as global.  The translator will
infer for each its value type, and if it is used as an array, its key
types.
.RS
.BR global " var1" , " var2"
.RE
.\" XXX add statistics type here once it's supported

.SS STATEMENTS
Statements enable procedural control flow.  They may occur within
functions and probe handlers.  The total number of statements executed
in response to any single probe event is limited to some number
defined by a macro in the translated C code, and is in the
neighbourhood of 1000.
.TP
EXP
Execute the string- or integer-valued expression and throw away
the value.
.TP
.BR { " STMT1 STMT2 ... " }
Execute each statement in sequence in this block.  Note that 
separators or terminators are generally not necessary between statements.
.TP
.BR ;
Null statement, do nothing.  It is useful as an optional separator between
statements to improve syntax-error detection and to handle certain
grammar ambiguities.
.TP
.BR if " (EXP) STMT1 [ " else " STMT2 ]"
Compare integer-valued EXP to zero.  Execute the first (non-zero)
or second STMT (zero).
.TP
.BR while " (EXP) STMT"
While integer-valued EXP evaluates to non-zero, execute STMT.
.TP
.BR for " (EXP1; EXP2; EXP2) STMT"
Execute EXP2 as initialization.  While EXP1 is non-zero, execute
STMT, then the iteration expression EXP1.
.TP
.BR foreach " (VAR " in " ARRAY) STMT"
Loop over each element of the named global array, assigning current
key to VAR.  The array may not be modified within the statement.
.TP
.BR foreach " ([VAR1, VAR2, ...] " in " ARRAY) STMT"
Same as above, used when the array is indexed with a tuple of keys.
.TP
.BR break ", " continue
Exit or iterate the innermost nesting loop
.RB ( while " or " for " or " foreach )
statement.
.TP
.BR return " EXP"
Return EXP value from enclosing function.  If the function's value is
not taken anywhere, then a return statement is not needed, and the
function will have a special "unknown" type with no return value.
.TP
.BR next
Return now from enclosing probe handler.

.SS EXPRESSIONS
Systemtap supports a number of operators that have the same general syntax,
semantics, and precedence as in C and awk.  Arithmetic is performed as per
C rules.  Division by zero is detected and results in an error.
.TP
binary numeric operators
.B * / % + - >> << & ^ | && ||
.TP
binary string operators
.B .
(string concatenation)
.TP
numeric assignment operators
.B = *= /= %= += -= >>= <<= &= ^= |=
.TP
string assignment operators
.B = .=
.TP
unary numeric operators
.B - ! ~ ++ -- 
.TP
binary numeric or string comparison operators
.B < > <= >= == !=
.TP
ternary operator
.RB cond " ? " exp1 " : " exp2
.TP
grouping operator
.BR ( " exp " )
.TP
function call
.RB "fn " ( "[ arg1, arg2, ... ]" )

.SS PROBES
The main construct in the scripting language identifies probes.
Probes associate abstract events with a statement block ("probe
handler") that is to be executed when those events occur.  The
general syntax is as follows:
.RS
.br
.nh
.nf
.BR probe " PROBEPOINT [" , " PROBEPOINT] " { " [STMT ...] " }
.hy
.fi
.RE 
.PP
Events are specified in a special syntax called "probe points".  One
family refers to specific points in a kernel, which are identified by
module, source file, line number, function name, C label name, or some
combination of these.  This kind of "synchronous" event is deemed to
occur when any processor executes an instruction matched by the
specification.  Other families of probe points refer to "asynchronous"
events such as timers/counters rolling over, where there is no fixed
execution point that is related.  Each probe point specification may
match multiple physical locations, all of which are then probed.  A
probe declaration may also contain several comma-separated
specifications, all of which are probed.
.PP
Here is a list of probe point families currently supported.  The
.B .function
variant places a probe near the beginning of the named function, so that
parameters are available as context variables.  The
.B .return
variant places a probe at the moment of return from the named function, so
the return value is available as the "$retvalue" context variable.
The
.B .statement
variant places a probe at the exact spot, exposing those local variables
that are visible there.
.RS
.nf
.br
kernel.function(PATTERN)
.br
kernel.function(PATTERN).return
.br
module(MPATTERN).function(PATTERN)
.br
module(MPATTERN).function(PATTERN).return
.br
kernel.statement(PATTERN)
.br
module(MPATTERN).statement(PATTERN)
.br
timer.jiffies(NUM)
.br
timer.jiffies(NUM).randomize(RAND)
.fi
.RE
.PP
In the above list, MPATTERN stands for a string literal that aims to
identify the loaded kernel module of interest.  It may include "*" and
"?" wildcards.  PATTERN stands for a string literal that aims to
identify a point in the program.  It is made up of three parts.  The
first part is the name of a function, as would appear in the
.I nm
program's output.  This part may use the "*" and "?" wildcarding
operators to match multiple names.  The second part is optional, and
begins with the "@" character.  It is followed by a source file name
wildcard pattern, such as
.IR mm/slab* .
Finally, the third part is optional if the file name part was given,
and identifies the line number in the source file, preceded by a ":".
As an alternative, PATTERN may be a numeric constant, indicating an
(module-relative or kernel-absolute) address.
.PP
The timer-based asynchronous probe points run the given handler every
NUM jiffies.  If given, the random value in the range [-RAND..RAND] is
added to NUM every time the handler is run.
.PP
Here are some example probe points:
.TP
kernel.function("*init*"), kernel.function("*exit*")
refers to all kernel functions with "init" or "exit" in the name.
.TP
kernel.function("*@kernel/sched.c:240")
refers to any functions within the "kernel/sched.c" file that span
line 240.
.TP
module("usb*").function("*sync*").return
refers to the moment of return from all functions with "sync" in the
name in any of the USB drivers.
.TP
kernel.statement(0xc0044852)
refers to the first byte of the statement whose compiled instructions
include the given address in the kernel.
.TP
timer.jiffies(1000).randomize(200)
refers to a periodic interrupt, every 1000 +/- 200 jiffies.

.PP
When any matching event occurs, the probe handler is run within that
context.  For events that are defined by execution of specific parts
of code, this context may include variables defined in the source code
at that spot.  These "target variables" are presented to the script as
variables whose names are prefixed with "$".  They may be read/written
only if the kernel's compiler preserved them despite optimization.
This is the same constraint that a debugger user faces when working
with optimized code.  Asynchronous probes have very little context.
.PP
In addition, "probe aliases" may be defined.  Probe aliases look
similar to probe definitions, but instead of activating a probe at the
given point, it defines a new probe point name to alias an existing
one.  This is identified by the "=" assignment operator.  In addition,
the probe handler defined with an alias is implicitly added as a
prologue to any probe that refers to the alias.  For example:
.RS
.nf
.nh
probe syscall("read") = kernel.function("sys_read") {
  fildes = $fd
}
.hy
.fi
.RE
defines a new probe point
.nh
.IR syscall("read") ,
.hy
which expands to
.nh
.IR kernel.function("sys_read") ,
.hy
with the given assignment as a prologue.  Another probe definition
may use the alias like this:
.RS
.nf
probe syscall("read") {
  printk ("reading fd=" . string (fildes))
}
.fi
.RE

.SS FUNCTIONS
Systemtap scripts may define subroutines to factor out common work.
Functions take any number of scalar (integer or string) arguments, and
must return a single scalar (integer or string).  An example function
declaration looks like this:
.RS
.nf
function thisfn (arg1, arg2) {
   return arg1 + arg2
}
.fi
.RE
Note the usual absence of type declarations, which are instead
inferred by the translator.  Functions may call others or themselves
recursively, up to a fixed nesting limit.  This limit is defined by
a macro in the translated C code and is in the neighbourhood of 30.

.SS EMBEDDED C
When in guru mode, the translator accepts embedded code in the
script.  Such code is enclosed between
.IR %{
and
.IR %}
markers, and is transcribed verbatim, without analysis, in some
sequence, into the generated C code.  At the outermost level, this may
be useful to add
.IR #include
instructions, and any auxiliary definitions for use by other embedded
code.  
.PP
The other place where embedded code is permitted is as a function body.
In this case, the script language body is replaced entirely by a piece
of C code enclosed again between
.IR %{ " and " %}
markers.
This C code may do anything reasonable and safe.  There are a number
of undocumented but complex safety constraints on concurrency,
resource consumption, and runtime limits, so this is an advanced
technique.
.PP
The memory locations set aside for input and output values
are made available to it using a macro
.IR THIS .
Here are some examples:
.RS
.br
.nf
function add_one (val) %{
  THIS->__retvalue = THIS->val + 1;
%}
function add_one_str (val) %{
  strncpy (THIS->__retvalue, THIS->val, MAXSTRINGLEN);
  strncat (THIS->__retvalue, "one", MAXSTRINGLEN);
%}
.fi
.RE
The function argument and return value types have to be inferred by
the translator from the call sites in order for this to work.  The
user should examine C code generated for ordinary script-language
functions in order to write compatible embedded-C ones.

.SS BUILT-INS
A set of builtin functions and probe aliases are provided by the
scripts installed under the
.nh
.IR /usr/share/systemtap/tapset
.hy
directory.

.SH PROCESSING
The translator begins pass 1 by parsing the given input script,
and all scripts (files named
.IR *.stp )
found in a tapset directory.  The directories listed
with
.BR -I
are processed in sequence.  For each directory, a number of subdirectories
are also searched.  These subdirectories are derived from the selected
kernel version (the
.BR -R
option),
in order to allow more kernel-version-specific scripts to override less
specific ones.  For example, for a kernel version
.IR 2.6.12-23.FC3
the following patterns would be searched, in sequence:
.IR 2.6.12-23.FC3/*.stp ,
.IR 2.6.12/*.stp ,
.IR 2.6/*.stp ,
and finally
.IR *.stp
Stopping the translator after pass 1 causes it to print the parse trees. 

.PP
In pass 2, the translator analyzes the input script to resolve symbols
and types.  References to variables, functions, and probe aliases that
are unresolved internally are satisfied by searching through the
parsed tapset scripts.  If any tapset script is selected because it
defines an unresolved symbol, then the entirety of that script is
added to the translator's resolution queue.  This process iterates
until all symbols are resolved and a subset of tapset scripts is
selected.
.PP
Next, all probe point descriptions are validated 
against the wide variety supported by the translator.  Probe points that
refer to code locations ("synchronous probe points") require the
appropriate kernel debugging information to be installed.  In the
associated probe handlers, target-side variables (whose names begin
with "$") are found and have their run-time locations decoded.
.PP
Finally, all variable, function, parameter, array, and index types are
inferred from context (literals and operators).  Stopping the
translator after pass 2 causes it to list all the probes, functions,
and variables, along with all inferred types.  Any inconsistent or
unresolved types cause an error.

.PP
In pass 3, the translator writes C code that represents the actions
of all selected script files, and creates a
.IR Makefile
to build that into a kernel object.  These files are placed into a
temporary directory.  Stopping the translator at this point causes
it to print the contents of the C file.

.PP
In pass 4, the translator invokes the Linux kernel build system to
create the actual kernel object file.  This involves running
.IR make
in the temporary directory, and requires a kernel module build
system (headers, config and Makefiles) to be installed in the usual
spot
.IR /lib/modules/VERSION/build .
Stopping the translator after pass 4 is the last chance before
running the kernel object.  This may be useful if you want to
archive the file.

.PP
In pass 5, the translator invokes the systemtap auxiliary program
.I stpd
program for the given kernel object.  This program arranges to load
the module then communicates with it, copying trace data from the
kernel into temporary files, until the user sends an interrupt signal.
Any run-time error encountered by the probe handlers, such as running
out of memory, division by zero, exceeding nesting or runtime limits,
results in an error condition that prevents further probes from
running.  Finally, stpd unloads the module, and cleans up.

.SH EXAMPLES
To trace entry and exit from a function, use a pair of probes:
.RS
.br
.nf
probe kernel.function("foo") { log ("enter") }
probe kernel.function("foo").return { log ("exit") }
.fi
.RE

To list the probeable functions in the kernel, use
.RS
.br
.nf
stap -p2 -e 'probe kernel.function("*") {}'
.fi
.RE

.SH SAFETY AND SECURITY
Systemtap is an administrative tool.  It exposes kernel internal data
structures and potentially private user information.  It acquires root
privileges to actually run the kernel objects it builds using the
.IR sudo
command applied to the
.IR stpd
program.  The latter is a part of the Systemtap package, dedicated to
module loading and unloading (but only in the white zone), and
kernel-to-user data transfer.  Since 
.IR stpd
does not perform any additional security checks on the kernel objects
it is given, it would be unwise for a system administrator to give
even targeted
.IR sudo
privileges to untrusted users.
.PP
The translator asserts certain safety constraints.  It aims to ensure
that no handler routine can run for very long, allocate memory,
perform unsafe operations, or in unintentionally interfere with the
kernel.  Use of guru mode constructs such as embedded C can violate
these constraints, leading to kernel crash or data corruption.

.SH FILES
.\" consider autoconf-substituting these directories
.TP
/tmp/stapXXXXXX
Temporary directory for systemtap files, including translated C code
and kernel object.
.TP
/usr/share/systemtap/tapset 
The automatic tapset search directory, unless overridden by
the
.I SYSTEMTAP_TAPSET
environment variable.
.TP
/usr/share/systemtap/runtime
The runtime sources, unless overridden by the
.I SYSTEMTAP_RUNTIME
environment variable.
.TP
/lib/modules/VERSION/build
The location of kernel module building infrastructure.
.TP
/usr/lib/debug/lib/modules/VERSION
The location of kernel debugging information when packaged into the
.IR kernel-debuginfo
RPM.
.TP
/usr/libexec/systemtap/stpd
The auxiliary program supervising module loading, interaction, and
unloading.

.SH SEE ALSO
.IR dtrace (1),
.IR dprobes (1),
.IR awk (1),
.IR sudo (8),
.IR elfutils (3),
.IR gdb (1)

.SH BUGS
There are numerous missing features and possibly numerous bugs.  Use
the Bugzilla link off of the project web page:
.nh
.BR http://sources.redhat.com/systemtap/ .
.hy

.SH AUTHORS
The
.IR stap
translator was written by Frank Ch. Eigler and Graydon Hoare.  The
kernel-side runtime library and the user-level
.IR stpd
daemon was written by Martin Hunt and Tom Zanussi.  Contact them
using the public mailing list:
.nh
.BR <systemtap@sources.redhat.com> .
.hy

.SH ACKNOWLEDGEMENTS
The script language design was inspired by Sun's 
.IR dtrace .
The primary probing mechanism uses IBM's
.IR kprobes ,
and
.IR relayfs
packages, which were improved and ported by IBM and Intel staff.
The elfutils library from Ulrich Drepper and Roland McGrath is used
to process dwarf debugging information.  Many project members contributed
to the overall design and priorities of the system, including Will Cohen,
Jim Keniston, Vara Prasad, and Brad Chen.