* What's new - Miscellaneous new tapset functions: - sid() returns the session ID of the current process - stringat() indexes a single character from a string. - Using %M in print formats for hex dumps can now print entire buffers, instead of just small numbers. - Dwarfless syscalls: The nd_syscalls tapset is now available to probe system calls without requiring kernel debugging information. All of the same probepoints in the normal syscalls tapset are available with an "nd_" prefix, e.g. syscall.open becomes nd_syscall.open. Most syscall arguments are also available by name in nd_syscalls. - Module signing: If the appropriate nss libraries are available on your system, stap will sign each compiled module using a self-generated certificate. This is the first step toward extending authority to load certain modules to unprivileged users. For now, if the system administrator adds a certificate to a database of trusted signers (stap-authorize-signing-cert), modules signed using that certificate will be verified by staprun against tampering. Otherwise, you should notice no difference in the operation of stap or staprun. * What's new in version 0.9.7 - @cast can now determine its type information using an explicit header specification. For example: @cast(tv, "timeval", "")->tv_sec @cast(task, "task_struct", "kernel")->tgid - The overlapping process.* tapsets are now separated. Those probe points documented in stapprobes(3stap) remain the same. Those that were formerly in stapprobes.process(3stap) have been renamed to kprocess, to reflect their kernel perspective on processes. - The --skip-badvars option now also suppresses run-time error messages that would otherwise result from erroneous memory accesses. Such accesses can originate from $context expressions fueled by erroneous debug data, or by kernel_{long,string,...}() tapset calls. - New probes kprobe.function(FUNCTION) and kprobe.function(FUNCTION).return for dwarfless probing. These postpone function address resolution to run-time and use the kprobe symbol-resolution mechanism. Probing of absolute statements can be done using the kprobe.statement(ADDRESS).absolute construct. - EXPERIMENTAL support for user process unwinding. A new collection of tapset functions have been added to handle user space backtraces from probe points that support them (currently process and timer probes - for timer probes test whether or not in user space first with the already existing user_mode() function). The new tapset functions are: uaddr - User space address of current running task. usymname - Return the symbol of an address in the current task. usymdata - Return the symbol and module offset of an address. print_ustack - Print out stack for the current task from string. print_ubacktrace - Print stack back trace for current task. ubacktrace - Hex backtrace of current task stack. Please read http://sourceware.org/ml/systemtap/2009-q2/msg00364.html on the current restrictions and possible changes in the future and give feedback if you want to influence future developments. * What's new in version 0.9.5 - New probes process().insn and process().insn.block that allows inspection of the process after each instruction or block of instructions executed. So to count the total number of instructions a process executes during a run do something like: $ stap -e 'global steps; probe process("/bin/ls").insn {steps++} probe end {printf("Total instructions: %d\n", steps);}' \ -c /bin/ls This feature can slow down execution of a process somewhat. - Systemtap probes and function man pages extracted from the tapsets are now available under 3stap. To show the page for probe vm.pagefault or the stap function pexecname do: $ man 3stap vm.pagefault $ man 3stap pexecname - Kernel tracepoints are now supported for probing predefined kernel events without any debuginfo. Tracepoints incur less overhead than kprobes, and context parameters are available with full type information. Any kernel 2.6.28 and later should have defined tracepoints. Try the following to see what's available: $ stap -L 'kernel.trace("*")' - Typecasting with @cast now supports modules search paths, which is useful in case there are multiple places where the type definition may be found. For example: @cast(sdev, "scsi_device", "kernel:scsi_mod")->sdev_state - On-file flight recorder is supported. It allows stap to record huge trace log on the disk and to run in background. Passing -F option with -o option runs stap in background mode. In this mode, staprun is detached from console, and stap itself shows staprun's pid and exits. Specifying the max size and the max number of log files are also available by passing -S option. This option has one or two arguments seperated by a comma. The first argument is the max size of a log file in MB. If the size of a log file exceeds it, stap switches to the next log file automatically. The second is how many files are kept on the disk. If the number of log files exceeds it, the oldest log file is removed automatically. The second argument can be omitted. For example, this will record output on log files each of them is smaller than 1024MB and keep last 3 logs, in background. % stap -F -o /tmp/staplog -S 1024,3 script.stp - In guru mode (-g), the kernel probing blacklist is disabled, leaving only a subset - the kernel's own internal kprobe blacklist - to attempt to filter out areas unsafe to probe. The differences may be enough to probe more interrupt handlers. - Variables unavailable in current context may be skipped by setting a session level flag with command line option --skip-badvars now available. This replaces any dwarf $variable expressions that could not be resolved with literal numeric zeros, along with a warning message. - Both kernel markers and kernel tracepoint support argument listing through stap -L 'kernel.mark("*")' or stap -L 'kernel.trace("*")' - Users can use -DINTERRUPTIBLE=0 to prevent interrupt reentrancy in their script, at the cost of a bit more overhead to toggle the interrupt mask. - Added reentrancy debugging. If stap is run with the arguments "-t -DDEBUG_REENTRANCY", additional warnings will be printed for every reentrancy event, including the probe points of the resident and interloper probes. - Default to --disable-pie for configure. Use --enable-pie to turn it back on. - Improved sdt.h compatibility and test suite for static dtrace compatible user space markers. - Some architectures now use syscall wrappers (HAVE_SYSCALL_WRAPPERS). The syscall tapset has been enhanced to take care of the syscall wrappers in this release. - Security fix for CVE-2009-0784: stapusr module-path checking race. * What's new in version 0.9 - Typecasting is now supported using the @cast operator. A script can define a pointer type for a "long" value, and then access type members using the same syntax as with $target variables. For example, this will retrieve the parent pid from a kernel task_struct: @cast(pointer, "task_struct", "kernel")->parent->pid - process().mark() probes are now possible to trace static user space markers put in programs with the STAP_PROBE macro using the new sys/sdt.h include file. This also provides dtrace compatible markers through DTRACE_PROBE and an associated python 'dtrace' script that can be used in builds based on dtrace that need dtrace -h or -G functionality. - For those that really want to run stap from the build tree there is now the 'run-stap' script in the top-level build directory that sets up the SYSTEMTAP_TAPSET, SYSTEMTAP_RUNTIME, SYSTEMTAP_STAPRUN, and SYSTEMTAP_STAPIO environment variables (installing systemtap, in a local prefix, is still recommended for common use). - Systemtap now comes with a new Beginners Guide that walks the user through their first steps setting up stap, understanding how it all works, introduces some useful scripts and describes some common pitfalls. It isn't created by default since it needs a Publican setup, but full build instructions can be found in the wiki: http://sourceware.org/systemtap/wiki/PublicanQuikHowto An online version can be found at: http://sourceware.org/systemtap/SystemTap_Beginners_Guide.pdf - Standard tapsets included with Systemtap were modified to include extractable documentation information based on the kernel-doc infrastructure. When configured --enabled-docs a HTML and PDF version of the Tapset Reference Manual is produced explaining probes defined in each tapset. - The systemtap client and compile server are now available. These allow you to compile a systemtap module on a host other than the one which it will be run, providing the client and server are compatible. Other than using a server for passes 1 through 4, the client behaves like the 'stap' front end itself. This means, among other things, that the client will automatically load the resulting module on the local host unless -p[1234] was specified. See stap-server(8) for more details. The client/server now use SSL for network connection security and for signing. The systemtap client and server are prototypes only. Interfaces, options and usage may change at any time. - function("func").label("label") probes are now supported to allow matching the label of a function. - Systemtap initscript is available. This initscript allows you to run systemtap scripts as system services (in flight recorder mode) and control those scripts individually. See README.initscript for details. - The stap "-r DIR" option may be used to identify a hand-made kernel build directory. The tool determines the appropriate release string automatically from the directory. - Serious problems associated with user-space probing in shared libraries were corrected, making it now possible to experiment with probe shared libraries. Assuming dwarf debugging information is installed, use this twist on the normal syntax: probe process("/lib64/libc-2.8.so").function("....") { ... } This would probe all threads that call into that library. Running "stap -c CMD" or "stap -x PID" naturally restricts this to the target command+descendants only. $$vars etc. may be used. - For scripts that sometimes terminate with excessive "skipped" probes, rerunning the script with "-t" (timing) will print more details about the skippage reasons. - Symbol tables and unwind (backtracing) data support were formerly compiled in for all probed modules as identified by the script (kernel; module("name"); process("file")) plus those listed by the stap "-d BINARY" option. Now, this data is included only if the systemtap script uses tapset functions like probefunc() or backtrace() that require such information. This shrinks the probe modules considerably for the rest. - Per-pass verbosity control is available with the new "--vp {N}+" option. "stap --vp 040" adds 4 units of -v verbosity only to pass 2. This is useful for diagnosing errors from one pass without excessive verbosity from others. - Most probe handlers now run with interrupts enabled, for improved system responsiveness and less probing overhead. This may result in more skipped probes, for example if a reentrant probe handler is attempted from within an interrupt handler. It may also make the systemtap overload detection facility more likely to be triggered, as interrupt handlers' run time would be included in the self-assessed overhead of running probe handlers. * What's new in version 0.8 - Cache limiting is now available. If the compiled module cache size is over a limit specified in the $SYSTEMTAP_DIR/cache/cache_mb_limit file, some old cache entries will be unlinked. See man stap(1) for more. - Error and warning messages are now followed by source context displaying the erroneous line/s and a handy '^' in the following line pointing to the appropriate column. - A bug reporting tool "stap-report" is now available which will quickly retrieve much of the information requested here: http://sourceware.org/systemtap/wiki/HowToReportBugs - The translator can resolve members of anonymous structs / unions: given struct { int foo; struct { int bar; }; } *p; this now works: $p->bar - The stap "-F" flag activates "flight recorder" mode, which consists of translating the given script as usual, but implicitly launching it into the background with staprun's existing "-L" (launch) option. A user can later reattach to the module with "staprun -A MODULENAME". - Additional context variables are available on user-space syscall probes. - $argN ($arg1, $arg2, ... $arg6) in process(PATH_OR_PID).syscall gives you the argument of the system call. - $return in process(PATH_OR_PID).syscall.return gives you the return value of the system call. - Target process mode (stap -c CMD or -x PID) now implicitly restricts all "process.*" probes to the given child process. (It does not affect kernel.* or other probe types.) The CMD string is normally run directly, rather than via a /bin/sh -c subshell, since then utrace/uprobe probes receive a fairly "clean" event stream. If metacharacters like redirection operators were present in CMD, then "sh -c CMD" is still used, and utrace/uprobe probes will receive events from the shell. % stap -e 'probe process.syscall, process.end { printf("%s %d %s\n", execname(), pid(), pp())}'\ -c ls ls 2323 process.syscall ls 2323 process.syscall ls 2323 process.end - Probe listing mode is improved: "-L" lists available script-level variables % stap -L 'syscall.*open*' syscall.mq_open name:string name_uaddr:long filename:string mode:long u_attr_uaddr:long oflag:long argstr:string syscall.open name:string filename:string flags:long mode:long argstr:string syscall.openat name:string filename:string flags:long mode:long argstr:string - All user-space-related probes support $PATH-resolved executable names, so probe process("ls").syscall {} probe process("./a.out").syscall {} work now, instead of just probe process("/bin/ls").syscall {} probe process("/my/directory/a.out").syscall {} - Prototype symbolic user-space probing support: # stap -e 'probe process("ls").function("*").call { log (probefunc()." ".$$parms) }' \ -c 'ls -l' This requires: - debugging information for the named program - a version of utrace in the kernel that is compatible with the "uprobes" kernel module prototype. This includes RHEL5 and older Fedora, but not yet current lkml-track utrace; a "pass 4a"-time build failure means your system cannot use this yet. - Global variables which are written to but never read are now automatically displayed when the session does a shutdown. For example: global running_tasks probe timer.profile {running_tasks[pid(),tid()] = execname()} probe timer.ms(8000) {exit()} - A formatted string representation of the variables, parameters, or local variables at a probe point is now supported via the special $$vars, $$parms, and $$locals context variables, which expand to a string containing a list "var1=0xdead var2=0xbeef var3=?". (Here, var3 exists but is for some reason unavailable.) In return probes only, $$return expands to an empty string for a void function, or "return=0xf00". * What's new in version 0.7 - .statement("func@file:*") and .statement("func@file:M-N") probes are now supported to allow matching a range of lines in a function. This allows tracing the execution of a function. - Scripts relying on probe point wildcards like "syscall.*" that expand to distinct kprobes are processed significantly faster than before. - The vector of script command line arguments is available in a tapset-provided global array argv[]. It is indexed 1 ... argc, another global. This can substitute for of preprocessor directives @NNN that fail at parse time if there are not enough arguments. printf("argv: %s %s %s", argv[1], argv[2], argv[3]) - .statement("func@file+line") probes are now supported to allow a match relative to the entry of the function incremented by line number. This allows using the same systemtap script if the rest of the file.c source only changes slightly. - A probe listing mode is available. % stap -l vm.* vm.brk vm.mmap vm.munmap vm.oom_kill vm.pagefault vm.write_shared - More user-space probe types are added: probe process(PID).begin { } probe process("PATH").begin { } probe process(PID).thread.begin { } probe process("PATH").thread.begin { } probe process(PID).end { } probe process("PATH").end { } probe process(PID).thread.end { } probe process("PATH").thread.end { } probe process(PID).syscall { } probe process("PATH").syscall { } probe process(PID).syscall.return { } probe process("PATH").syscall.return { } - Globals now accept ; terminators global odds, evens; global little[10], big[5]; * What's new in version 0.6 - A copy of the systemtap tutorial and language reference guide are now included. - There is a new format specifier, %m, for the printf family of functions. It functions like %s, except that it does not stop when a nul ('\0') byte is encountered. The number of bytes output is determined by the precision specifier. The default precision is 1. For example: printf ("%m", "My String") // prints one character: M printf ("%.5", myString) // prints 5 bytes beginning at the start // of myString - The %b format specifier for the printf family of functions has been enhanced as follows: 1) When the width and precision are both unspecified, the default is %8.8b. 2) When only one of the width or precision is specified, the other defaults to the same value. For example, %4b == %.4b == %4.4b 3) Nul ('\0') bytes are used for field width padding. For example, printf ("%b", 0x1111deadbeef2222) // prints all eight bytes printf ("%4.2b", 0xdeadbeef) // prints \0\0\xbe\xef - Dynamic width and precision are now supported for all printf family format specifiers. For example: four = 4 two = 2 printf ("%*.*b", four, two, 0xdeadbbeef) // prints \0\0\xbe\xef printf ("%*d", four, two) // prints 2 - Preprocessor conditional expressions can now include wildcard style matches on kernel versions. %( kernel_vr != "*xen" %? foo %: bar %) - Prototype support for user-space probing is showing some progress. No symbolic notations are supported yet (so no probing by function names, file names, process names, and no access to $context variables), but at least it's something: probe process(PID).statement(ADDRESS).absolute { } This will set a uprobe on the given process-id and given virtual address. The proble handler runs in kernel-space as usual, and can generally use existing tapset functions. - Crash utility can retrieve systemtap's relay buffer from a kernel dump image by using staplog which is a crash extension module. To use this feature, type commands as below from crash(8)'s command line: crash> extend staplog.so crash> help systemtaplog Then, you can see more precise help message. - You can share a relay buffer amoung several scripts and merge outputs from several scripts by using "-DRELAY_HOST" and "-DRELAY_GUEST" options. For example: # run a host script % stap -ve 'probe begin{}' -o merged.out -DRELAY_HOST & # wait until starting the host. % stap -ve 'probe begin{print("hello ");exit()}' -DRELAY_GUEST % stap -ve 'probe begin{print("world\n");exit()}' -DRELAY_GUEST Then, you'll see "hello world" in merged.out. - You can add a conditional statement for each probe point or aliase, which is evaluated when the probe point is hit. If the condition is false, the whole probe body(including aliases) is skipped. For example: global switch = 0; probe syscall.* if (switch) { ... } probe procfs.write {switch = strtol($value,10)} /* enable/disable ctrl */ - Systemtap will warn you if your script contains unused variables or functions. This is helpful in case of misspelled variables. If it doth protest too much, turn it off with "stap -w ...". - You can add error-handling probes to a script, which are run if a script was stopped due to errors. In such a case, "end" probes are not run, but "error" ones are. probe error { println ("oops, errors encountered; here's a report anyway") foreach (coin in mint) { println (coin) } } - In a related twist, one may list probe points in order of preference, and mark any of them as "sufficient" beyond just "optional". Probe point sequence expansion stops if a sufficient-marked probe point has a hit. This is useful for probes on functions that may be in a module (CONFIG_FOO=m) or may have been compiled into the kernel (CONFIG_FOO=y), but we don't know which. Instead of probe module("sd").function("sd_init_command") ? , kernel.function("sd_init_command") ? { ... } which might match neither, now one can write this: probe module("sd").function("sd_init_command") ! , /* <-- note excl. mark */ kernel.function("sd_init_command") { ... } - New security model. To install a systemtap kernel module, a user must be one of the following: the root user; a member of the 'stapdev' group; or a member of the 'stapusr' group. Members of the stapusr group can only use modules located in the /lib/modules/VERSION/systemtap directory (where VERSION is the output of "uname -r"). - .statement("...@file:line") probes now apply heuristics to allow an approximate match for the line number. This works similarly to gdb, where a breakpoint placed on an empty source line is automatically moved to the next statement. A silly bug that made many $target variables inaccessible to .statement() probes was also fixed. - LKET has been retired. Please let us know on if you have been a user of the tapset/tools, so we can help you find another way. - New families of printing functions println() and printd() have been added. println() is like print() but adds a newline at the end; printd() is like a sequence of print()s, with a specified field delimiter. * What's new since version 0.5.14? - The way in which command line arguments for scripts are substituted has changed. Previously, $1 etc. would interpret the corresponding command line argument as an numeric literal, and @1 as a string literal. Now, the command line arguments are pasted uninterpreted wherever $1 etc. appears at the beginning of a token. @1 is similar, but is quoted as a string. This change does not modify old scripts, but has the effect of permitting substitution of arbitrary token sequences. # This worked before, and still does: % stap -e 'probe timer.s($1) {}' 5 # Now this also works: % stap -e 'probe syscall.$1 {log(@1)}' open # This won't crash, just signal a recursion error: % stap -e '$1' '$1' # As before, $1... is recognized only at the beginning of a token % stap -e 'probe begin {foo$1=5}' * What's new since version 0.5.13? - The way in which systemtap resolves function/inline probes has changed: .function(...) - now refers to all functions, inlined or not .inline(...) - is deprecated, use instead: .function(...).inline - filters function() to only inlined instances .function(...).call - filters function() to only non-inlined instances .function(...).return - as before, but now pairs best with .function().call .statement() is unchanged. * What's new since version 0.5.12? - When running in -p4 (compile-only) mode, the compiled .ko file name is printed on standard output. - An array element with a null value such as zero or an empty string is now preserved, and will show up in a "foreach" loop or "in" test. To delete such an element, the scripts needs to use an explicit "delete array[idx]" statement rather than something like "array[idx]=0". - The new "-P" option controls whether prologue searching heuristics will be activated for function probes. This was needed to get correct debugging information (dwarf location list) data for $target variables. Modern compilers (gcc 4.1+) tend not to need this heuristic, so it is no longer default. A new configure flag (--enable-prologues) restores it as a default setting, and is appropriate for older compilers (gcc 3.*). - Each systemtap module prints a one-line message to the kernel informational log when it starts. This line identifies the translator version, base address of the probe module, a broken-down memory consumption estimate, and the total number of probes. This is meant as a debugging / auditing aid. - Begin/end probes are run with interrupts enabled (but with preemption disabled). This will allow begin/end probes to be longer, to support generating longer reports. - The numeric forms of kernel.statement() and kernel.function() probe points are now interpreted as relocatable values - treated as relative to the _stext symbol in that kernel binary. Since some modern kernel images are relocated to a different virtual address at startup, such addresses may shift up or down when actually inserted into a running kernel. kernel.statement(0xdeadbeef): validated, interpreted relative to _stext, may map to 0xceadbeef at run time. In order to specify unrelocated addresses, use the new ".absolute" probe point suffix for such numeric addresses. These are only allowed in guru mode, and provide access to no $target variables. They don't use debugging information at all, actually. kernel.statement(0xfeedface).absolute: raw, unvalidated, guru mode only * What's new since version 0.5.10? - Offline processing of debugging information, enabling general cross-compilation of probe scripts to remote hosts, without requiring identical module/memory layout. This slows down compilation/translation somewhat. - Kernel symbol table data is loaded by staprun at startup time rather than compiled into the module. - Support the "limit" keyword for foreach iterations: foreach ([x,y] in ary limit 5) { ... } This implicitly exits after the fifth iteration. It also enables more efficient key/value sorting. - Support the "maxactive" keyword for return probes: probe kernel.function("sdfsdf").maxactive(848) { ... } This allows up to 848 concurrently outstanding entries to the sdfsdf function before one returns. The default maxactive number is smaller, and can result in missed return probes. - Support accessing of saved function arguments from within return probes. These values are saved by a synthesized function-entry probe. - Add substantial version/architecture checking in compiled probes to assert correct installation of debugging information and correct execution on a compatible kernel. - Add probe-time checking for sufficient free stack space when probe handlers are invoked, as a safety improvement. - Add an optional numeric parameter for begin/end probe specifications, to order their execution. probe begin(10) { } /* comes after */ probe begin(-10) {} - Add an optional array size declaration, which is handy for very small or very large ones. global little[5], big[20000] - Include some example scripts along with the documentation. - Change the start-time allocation of probe memory to avoid causing OOM situations, and to abort cleanly if free kernel memory is short. - Automatically use the kernel DWARF unwinder, if present, for stack tracebacks. - Many minor bug fixes, performance, tapset, and error message improvements.