.\" -*- nroff -*-
.TH STAPPROBES 5 @DATE@ "Red Hat"
.SH NAME
stapprobes \- systemtap probe points

.\" macros
.de SAMPLE
.br
.RS
.nf
.nh
..
.de ESAMPLE
.hy
.fi
.RE
..

.SH DESCRIPTION
The following sections enumerate the variety of probe points supported
by the systemtap translator, and additional aliases defined by
standard tapset scripts.
.PP
The general probe point syntax is a dotted-symbol sequence.  This
allows a breakdown of the event namespace into parts, somewhat like
the Domain Name System does on the Internet.  Each component
identifier may be parametrized by a string or number literal, with a
syntax like a function call.  A component may include a "*"
character, to expand to other matching probe points.  A probe point
may be followed by a "?" character, to indicate that it is optional,
and that no error should result if it fails to expand.  Optionalness
passes down through all levels of alias/wildcard expansion.

These are all syntactically valid probe points:
.SAMPLE
kernel.function("foo").return
syscall(22)
user.inode("/bin/vi").statement(0x2222)
end
kernel.syscall.*
kernel.function("no_such_function") ?
.ESAMPLE

Probes may be broadly classified into "synchronous" and
"asynchronous".  A "synchronous" event is deemed to occur when any
processor executes an instruction matched by the specification.  This
gives these probes a reference point (instruction address) from which
more contextual data may be available.  Other families of probe points
refer to "asynchronous" events such as timers/counters rolling over,
where there is no fixed reference point that is related.  Each probe
point specification may match multiple locations (for example, using
wildcards or aliases), and all them are then probed.  A probe
declaration may also contain several comma-separated specifications,
all of which are probed.

.SS BEGIN/END

The probe points
.IR begin " and " end
are defined by the translator to refer to the time of session startup
and shutdown.  All "begin" probe handlers are run, in some sequence,
during the startup of the session.  All global variables will have
been initialized prior to this point.  All "end" probes are run, in
some sequence, during the
.I normal
shutdown of a session, such as in the aftermath of an
.I exit ()
function call, or an interruption from the user.  In the case of an
error-triggered shutdown, "end" probes are not run.  There are no
target variables available in either context.

.SS NEVER
The probe point
.IR never
is specially defined by the translator to mean "never".  Its probe
handler is never run, though its statements are analyzed for symbol /
type correctness as usual.  This probe point may be useful in
conjunction with optional probes.

.SS TIMERS

Intervals defined by the standard kernel "jiffies" timer may be used
to trigger probe handlers asynchronously.  Two probe point variants
are supported by the translator:
.SAMPLE
timer.jiffies(N)
timer.jiffies(N).randomize(M)
.ESAMPLE
The probe handler is run every N jiffies (a kernel-defined unit of
time, typically between 1 and 60 ms).  If the "randomize" component is
given, a linearly distributed random value in the range [\-M..+M] is
added to N every time the handler is run.  N is restricted to a
reasonable range (1 to around a million), and M is restricted to be
smaller than N.  There are no target variables provided in either
context.  It is possible for such probes to be run concurrently on
a multi-processor computer.
.PP
Alternatively, intervals may be specified in units of milliseconds.
There are two probe point variants similar to the jiffies timer:
.SAMPLE
timer.ms(N)
timer.ms(N).randomize(M)
.ESAMPLE
Here, N and M are specified in milliseconds.  The probe intervals will be
rounded up to the nearest jiffies interval for the actual timer.  If the
"randomize" component is given, then the random value will be added to the
interval before the conversion to jiffies.
.PP
Profiling timers are also available to provide probes that execute on all
CPUs at the rate of the system tick.  This probe takes no parameters.
.SAMPLE
timer.profile
.ESAMPLE
Full context information of the interrupted process is available, making
this probe suitable for a time-based sampling profiler.

.SS DWARF

This family of probe points uses symbolic debugging information for
the target kernel/module/program, as may be found in unstripped
executables, or the separate
.I debuginfo
packages.  They allow placement of probes logically into the execution
path of the target program, by specifying a set of points in the
source or object code.  When a matching statement executes on any
processor, the probe handler is run in that context.
.PP
Points in a kernel, which are identified by
module, source file, line number, function name, C label name, or some
combination of these.  
.PP
Here is a list of probe point families currently supported.  The
.B .function
variant places a probe near the beginning of the named function, so that
parameters are available as context variables.  The
.B .return
variant places a probe at the moment of return from the named function, so
the return value is available as the "$return" context variable. 
The
.B .inline
variant is similar to 
.B .function
but probes inline functions. Inline functions do not have an identifiable
return point, so 
.B .return
is not supported on 
.B .inline
probes. The
.B .statement
variant places a probe at the exact spot, exposing those local variables
that are visible there.
.SAMPLE
kernel.function(PATTERN)
.br
kernel.function(PATTERN).return
.br
kernel.inline(PATTERN)
.br
module(MPATTERN).function(PATTERN)
.br
module(MPATTERN).function(PATTERN).return
.br
module(MPATTERN).inline(PATTERN)
.br
kernel.statement(PATTERN)
.br
module(MPATTERN).statement(PATTERN)
.ESAMPLE
In the above list, MPATTERN stands for a string literal that aims to
identify the loaded kernel module of interest.  It may include "*", "[]",
and "?" wildcards.  PATTERN stands for a string literal that
aims to identify a point in the program.  It is made up of three
parts.  The first part is the name of a function, as would appear in
the
.I nm
program's output.  This part may use the "*" and "?" wildcarding
operators to match multiple names.  The second part is optional, and
begins with the "@" character.  It is followed by a source file name
wildcard pattern, such as
.IR mm/slab* .
Finally, the third part is optional if the file name part was given,
and identifies the line number in the source file, preceded by a ":".
As an alternative, PATTERN may be a numeric constant, indicating an
(module-relative or kernel-absolute) address.
.PP
Some of the source-level variables, such as function parameters,
locals, globals visible in the compilation unit, may be visible to
probe handlers.  They may refer to these variables by prefixing their
name with "$" within the scripts.  In addition, a special syntax
allows limited traversal of structures, pointers, and arrays.
.TP
$var
refers to an in-scope variable "var".  If it's an integer-like type,
it will be cast to a 64-bit int for systemtap script use.  String-like
pointers (char *) may be copied to systemtap string values using the
.IR kernel_string " or " user_string
functions.
.TP
$var\->field
traversal to a structure's field.  The indirection operator
may be repeated to follow more levels of pointers.
.TP
$var[N]
indexes into an array.  The index is given with a
literal number.

.SS MARKERS

This family of probe points hooks up to static probing markers
inserted into the kernel or modules.  These markers are special macro
calls inserted by kernel developers to make probing faster and more
reliable than with DWARF-based probes.  Further, DWARF debugging
information is 
.I not
required to probe markers.

Marker probe points begin with 
.BR kernel " or " module("name") ,
just like DWARF probes.  This identifies the source of symbol table
used for finding markers.  The next part names the marker itself:
.BR mark("name") .
The marker name string, which may contain the usual wildcard characters,
is matched against the names given to the marker macros when the kernel
or module was compiled.

The handler associated with a marker-based probe may read the
optional parameters specified at the macro call site.  These are
named
.BR $arg1 " through " $argNN ,
where NN is the number of parameters supplied by the macro.  Number
and string parameters are passed in a type-safe manner.

.SS PERFORMANCE MONITORING HARDWARE

The perfmon family of probe points is used to access the performance
monitoring hardware available in modern processors. This family of
probes points needs the perfmon2 support in the kernel to access the
performance monitoring hardware.
.PP
Performance monitor hardware points begin with a 
.BR perfmon ". "
The next part of the names the event being counted
.BR counter("event") .
The event names are processor implementation specific with the
execption of the generic
.BR cycles " and " instructions
events, which are available on all processors. This sets up a counter
on the processor to count the number of events occuring on the
processor. For more details on the performance monitoring events
available on a specific processor use the command perfmon2 command:
.SAMPLE
pfmon -l
.ESAMPLE
.TP
$counter
is a handle used in the body of the probe for operations
involving the counter associated with the probe.
.TP
read_counter
is a function that is passed the handle for the perfmon probe and returns
the current count for the event.


.SS IO SCHEDULER

This family of probe points is used to probe the IO scheduler activities.
It contains the following probe points:

.P
.TP 
.B ioscheduler.elv_next_request
Fires when a request is retrieved from request queue

.B Arguments:

.I elevator_name
  The name of the elevator

.I disk_major
  The major number of the disk

.I disk_minor
  The minor number of the disk

.P
.TP 
.B ioscheduler.elv_add_request
Fires when a request is added to the request queue

.B Arguments:

.I elevator_name
  The name of the elevator

.I disk_major
  The major number of the disk

.I disk_minor
  The minor number of the disk

.P
.TP 
.B ioscheduler.elv_completed_request
Fires when a request is completed

.B Arguments:

.I elevator_name
  The name of the elevator

.I disk_major
  The major number of the disk

.I disk_minor
  The minor number of the disk

.SS SCSI

This family of probe points is used to probe the SCSI activities.
It contains the following probe points:

.P
.TP 
.B scsi.ioentry
Fires when SCSI mid layer prepares a SCSI request

.B Arguments:

.I disk_major
  The major number of the disk

.I disk_minor
  The minor number of the disk

.I device_state
  The current state of the device. The possible values could be:

        SDEV_CREATED = 1,    /* device created but not added to sysfs
                              * Only internal commands allowed (for inq) */
        SDEV_RUNNING = 2,    /* device properly configured
                              * All commands allowed */
        SDEV_CANCEL = 3,     /* beginning to delete device
                              * Only error handler commands allowed */
        SDEV_DEL = 4,        /* device deleted
                              * no commands allowed */
        SDEV_QUIESCE = 5,    /* Device quiescent.  No block commands
                              * will be accepted, only specials (which
                              * originate in the mid-layer) */
        SDEV_OFFLINE = 6,    /* Device offlined (by error handling or
                              * user request */
        SDEV_BLOCK = 7,      /* Device blocked by scsi lld.  No scsi
                              * commands from user or midlayer should be issued
                              * to the scsi lld. */

.P
.TP 
.B scsi.iodispatching
Fires when the SCSI mid layer dispatches a SCSI command to the low level driver

.B Arguments:

.I host_no
  The host number

.I channel
  The channel number

.I lun
  The lun number

.I dev_id
  The scsi device id

.I device_state
  The current state of the device. 

.I data_direction
  The data_direction specifies whether this command is from/to the device.
  The possible values could be:

        DMA_BIDIRECTIONAL = 0,
        DMA_TO_DEVICE = 1,
        DMA_FROM_DEVICE = 2,
        DMA_NONE = 3,

.I request_buffer
  The request buffer address

.I req_bufflen
  The request buffer length

.P
.TP 
.B scsi.iodone
Fires when a SCSI command is done by low level driver and enqueued into the done queue.

.B Arguments:

.I host_no
  The host number

.I channel
  The channel number

.I lun
  The lun number

.I dev_id
  The scsi device id

.I device_state
  The current state of the device

.I data_direction
  The data_direction specifies whether this command is from/to the device.

.P
.TP 
.B scsi.iocompleted
Fires when SCSI mid layer runs the completion processing for 
block device I/O requests

.B Arguments:

.I host_no
  The host number

.I channel
  The channel number

.I lun
  The lun number

.I dev_id
  The scsi device id

.I device_state
  The current state of the device

.I data_direction
  The data_direction specifies whether this command is from/to the device.

.I goodbytes
  The bytes completed.


.SS NETWORK DEVICE

This family of probe points is used to probe the activities of network
device. 
It contains the following probe points:

.P
.TP 
.B netdev.receive
Fires when data arrives on network device

.B Arguments:

.I dev_name
  The name of the device. e.g: eth0, ath1

.I length
  The length of the receiving buffer

.I protocol
  The possible values of protocol could be:
     0800    IP
     8100    802.1Q VLAN
     0001    802.3
     0002    AX.25
     0004    802.2
     8035    RARP
     0005    SNAP
     0805    X.25
     0806    ARP
     8137    IPX
     0009    Localtalk
     86DD    IPv6

.I truesize
  The size of the received data

.P
.TP 
.B netdev.transmit
Fires when the network device wants to transmit a buffer

.B Arguments:

.I dev_name
  The name of the device. e.g: eth0, ath1

.I length
  The length of the transmit buffer

.I protocol
  The protocol of this packet.

.I truesize
  The size of the the data to be transmitted.

.SS PAGE FAULT

This family of probe points is used to probe page fault events.
It contains the following probe points:

.P
.TP 
.B vm.pagefault
Fires when there is a pagefault

.B Arguments:

.I address
  The address caused this page fault.

.I write_access
  1 means this is a write access and 0 means this is a read access

.SS PROCESS

This family of probe points is used to probe the process activities.
It contains the following probe points:

.P
.TP 
.B process.create

Fires whenever a new process is successfully created, either as a 
result of one of the fork syscall variants, or a new kernel thread.

.B Arguments:

.I task
  a handle to the newly created process

.I new_pid
  pid of the newly created process

.P
.TP 
.B process.start

Fires immediately before a new process begins execution.

.B Arguments:

.I N/A

.P
.TP
.B process.exec

Fires whenever a process attempts to exec to a new program

.B Arguments:

.I filename
  the path to the new executable

.P
.TP
.B process.exec_complete

Fires at the completion of an exec call

.B Arguments:

.I errno
  the error number resulting from the exec

.I success
  a boolean indicating whether the exec was successful

.P
.TP
.B process.exit

Fires when a process terminates. This will always be followed by a
process.release, though the latter may be delayed if the process 
waits in a zombie state.

.B Arguments:

.I code
  the exit code of the process

.P
.TP
.B process.release

Fires when a process is released from the kernel. This always 
follows a process.exit, though it may be delayed somewhat if the 
process waits in a zombie state.

.B Arguments:

.I task
  a task handle to the process being released

.I pid
  pid of the process being released

.SS TCP

This family of probe points is used to probe TCP layer activities.
It contains the following probe points:

.P
.TP
.B tcp.sendmsg

Fires whenever sending a tcp message

.B Arguments:

.I sock
  network socket

.I size
  number of bytes to send

.P
.TP
.B tcp.sendmsg.return

Fires whenever sending message is done

.B Arguments:

.I size
  number of bytes sent

.P
.TP
.B tcp.recvmsg

Fires whenever a message is received

.B Arguments:

.I sock
  network socket

.I size
  number of bytes to be received

.P
.TP
.B tcp.recvmsg.return

Fires whenever message receiving is done

.B Arguments:

.I size
  number of bytes received

.P
.TP
.B tcp.disconnect

Fires whenever tcp is disconnected

.B Arguments:

.I sock
  network socket

.I flags
  TCP flags (e.g. FIN, etc)

.P
.TP
.B tcp.disconnect.return

Fires when returning from tcp.disconnect

.B Arguments:

.I ret
  error code (0: no error)


.SS UDP

This family of probe points is used to probe UDP layer activities.
It contains the following probe points:

.P
.TP
.B udp.sendmsg

Fires whenever sending a udp message

.B Arguments:

.I sock
  network socket

.I size
  number of bytes to send

.P
.TP
.B udp.sendmsg.return

Fires whenever sending message is done

.B Arguments:

.I size
  number of bytes sent

.P
.TP
.B udp.recvmsg

Fires whenever a message is received

.B Arguments:

.I sock
  network socket

.I size
  number of bytes to be received

.P
.TP
.B udp.recvmsg.return

Fires whenever message receiving is done

.B Arguments:

.I size
  number of bytes received

.P
.TP
.B udp.disconnect

Fires whenever udp is disconnected

.B Arguments:

.I sock
  network socket

.I flags
  flags (e.g. FIN, etc)

.P
.TP
.B udp.disconnect.return

Fires when returning from udp.disconnect

.B Arguments:

.I ret
  error code (0: no error)

.SS SIGNAL

This family of probe points is used to probe signal activities.
It contains the following probe points:

.P
.TP
.B signal.send

Fires when a signal is sent to a process

.B Arguments:

.I sig
  signal number

.I sig_name
  a string representation of the signal

.I sig_pid
  pid of the signal recipient process

.I pid_name
  name of the signal recipient process

.I si_code
  indicates the signal type

.I task
  a task handle to the signal recipient

.I sinfo
  the address of siginfo struct

.I shared
  indicates whether this signal is shared by the thread group

.I send2queue
  indicates whether this signal is sent to an existing sigqueue

.I name
  name of the function used to send out this signal

.P
.TP
.B signal.send.return

Fires when return from sending a signal

.B Arguments:

.I retstr
  the return value

  Return values for "__group_send_sig_info" and "specific_send_sig_info"

.RS
.RS
- return 0 if the signal is sucessfully sent to a process, 
which means the following:

<1> the signal is ignored by receiving process

<2> this is a non-RT signal and we already have one queued

<3> the signal is successfully added into the sigqueue of receiving process

- return -EAGAIN if the sigqueue is overflow the signal was RT and sent 
by user using something other than kill()
.RE

  Return values for ""send_group_sigqueue"

.RS
- return 0 if the signal is either sucessfully added into the
sigqueue of receiving process or a SI_TIMER entry is already
queued so just increment the overrun count

- return 1 if this signal is ignored by receiving process
.RE

  Return values for "send_sigqueue"

.RS
- return 0 if the signal is either sucessfully added into the
sigqueue of receiving process or a SI_TIMER entry is already
queued so just increment the overrun count

- return 1 if this signal is ignored by receiving process

- return -1 if the task is marked exiting, so posix_timer_event
can redirect it to the group leader
.RE

.I shared
  indicates whether this signal is shared by the thread group

.I send2queue
  indicates whether this signal is sent to an existing sigqueue

.I name
  name of the function used to send out this signal


.RE
.RE
.P
.TP
.B signal.checkperm

Fires when check permissions for sending the signal

.B Arguments:

.I sig
  the number of the signal

.I sig_name
  a string representation of the signal

.I sig_pid
  pid of the signal recipient process

.I pid_name
  name of the signal recipient process

.I si_code
  indicates the signal type

.I task
  a task handle to the signal recipient

.I sinfo
  the address of siginfo struct

.I name
  name of the probe point, is set to "signal.checkperm"

.P
.TP
.B signal.checkperm.return

Fires when return from permissions check for sending a signal

.B Arguments:

.I retstr
  the return value

.I name
  name of the probe point, is set to "signal.checkperm"

.P
.TP
.B signal.wakeup

Fires when wake up the process for new active signals

.B Arguments:

.I sig_pid
  pid of the process to be woke up

.I pid_name
  name of the process to be woke up

.I resume
  indicate whether to wake up a task in STOPPED or TRACED state

.I state_mask
  a string representation indicate the mask of task states 
that can be woken. Possible values are 
(TASK_INTERRUPTIBLE|TASK_STOPPED|TASK_TRACED) and
TASK_INTERRUPTIBLE.

.P
.TP
.B signal.check_ignored

Fires when check whether the signal is ignored or not

.B Arguments:

.I sig_pid
  pid of the signal recipient process

.I pid_name
  name of the signal recipient process

.I sig
  the signal to be checked

.I sig_name
  name of the signal

.P
.TP
.B signal.check_ignored.return

Fires when return from signal.check_ignored

.B Arguments:

.I retstr
  return value. 0 indicate the current signal isn't ignored.

.P
.TP
.B signal.force_segv

Forces SIGSEGV when there are some issues while handling 
signals for the process 

.B Arguments:

.I sig_pid
  pid of the signal recipient process

.I pid_name
  name of the signal recipient process

.I sig
  the signal being handled

.I sig_name
  name of this signal

.P
.TP
.B signal.force_segv.return

Fires when return from signal.force_segv

.B Arguments:

.I retstr
  return value. Always return 0

.P
.TP
.B signal.send_sig_queue

Fires when queue a signal to a process

.B Arguments:

.I sig
  the signal to be queued

.I sig_name
  name of this signal

.I sig_pid
  pid of the process to which the signal is queued

.I pid_name
  name of the process  to which the signal is queued

.I sigqueue_addr
  address of the signal queue

.P
.TP
.B signal.send_sig_queue.return

Fires when return from signal.send_sig_queue

.B Arguments:

.I retstr
  return value

.P
.TP
.B signal.pending

Fires when examine the set of signals that are 
pending for delivery to the calling thread

.B Arguments:

.I sigset_add
  address of user space sigset_t

.I sigset_size
  sigset size

.P
.TP
.B signal.pending.return

Fires when return from signal.pending

.B Arguments:

.I retstr
  return value

.P
.TP
.B signal.handle

Fires when invoking the signal handler

.B Arguments:

.I sig
  signal number

.I sig_name
  signal name

.I sinfo
  address of siginfo struct

.I sig_code
  the si_code of siginfo

.I ka_addr
  Address of the k_sigaction struct associated with the signal

.I oldset_addr
  Address of a bit mask array of blocked signals

.I sig_mode
  indicates whether the signal is a User Mode or Kernel mode Signal

.P
.TP
.B signal.handle.return

Fires when return from signal.handle

.B Arguments:

.I retstr
  return value of handle_signal()

.P
.TP
.B signal.do_action

Fires by calling thread to examine and change a signal action
 
.B Arguments:

.I sig
  signal number

.I sigact_addr
  address of the new sigaction struct associated with the signal

.I oldsigact_addr
  address of a previous sigaction struct associated with the signal

.I sa_handler
  the new handler of the signal

.I sa_mask
  the new mask of the signal

.P
.TP
.B signal.do_action.return

Fires when return from signal.do_action

.B Arguments:

.I retstr
  return value of do_sigaction()

.P
.TP
.B signal.procmask

Fires by calling thread to examine and change blocked signals

.B Arguments:

.I how
  indicates how to change the blocked signals. 
  Possible values are:
    SIG_BLOCK=0 for blocking signals
    SIG_UNBLOCK=1 for unblocking signals
    SIG_SETMASK=2 for setting the signal mask

.I sigset_addr
  address of sigset_t to be set

.I oldsigset_addr
  address of the old sigset_t

.I sigset
  the actual sigset to be set

.P
.TP
.B signal.procmask.return

Fires when return from signal.procmask

.B Arguments:

.I retstr
  return value of sigprocmask()

.P
.TP
.B signal.flush

Fires when flush all pending signals for a task

.B Arguments:

.I task
  the task handler of the process

.I sig_pid
  pid of the task

.I pid_name
  name of the task

.SH EXAMPLES
.PP
Here are some example probe points, defining the associated events.
.TP
begin, end, end
refers to the startup and normal shutdown of the session.  In this
case, the handler would run once during startup and twice during
shutdown.
.TP
timer.jiffies(1000).randomize(200)
refers to a periodic interrupt, every 1000 +/\- 200 jiffies.
.TP
kernel.function("*init*"), kernel.function("*exit*")
refers to all kernel functions with "init" or "exit" in the name.
.TP
kernel.function("*@kernel/sched.c:240")
refers to any functions within the "kernel/sched.c" file that span
line 240.
.TP
kernel.mark("getuid")
refers to an STAP_MARK(getuid, ...) macro call in the kernel.
.TP
module("usb*").function("*sync*").return
refers to the moment of return from all functions with "sync" in the
name in any of the USB drivers.
.TP
kernel.statement(0xc0044852)
refers to the first byte of the statement whose compiled instructions
include the given address in the kernel.
.TP
kernel.syscall.*.return
refers to the group of probe aliases with any name in the third position

.SH SEE ALSO
.IR stap (1),
.IR lket (5)