relayfs - a high-speed data relay filesystem ============================================ relayfs is a filesystem designed to provide an efficient mechanism for tools and facilities to relay large and potentially sustained streams of data from kernel space to user space. The main abstraction of relayfs is the 'channel'. A channel consists of a set of per-cpu kernel buffers each represented by a file in the relayfs filesystem. Kernel clients write into a channel using efficient write functions which automatically log to the current cpu's channel buffer. User space applications mmap() the per-cpu files and retrieve the data as it becomes available. The format of the data logged into the channel buffers is completely up to the relayfs client; relayfs does however provide hooks which allow clients to impose some stucture on the buffer data. Nor does relayfs implement any form of data filtering - this also is left to the client. The purpose is to keep relayfs as simple as possible. This document provides an overview of the relayfs API. The details of the function parameters are documented along with the functions in the filesystem code - please see that for details. The relayfs user space API ========================== relayfs implements basic file operations for user space access to relayfs channel buffer data. Here are the file operations that are available and some comments regarding their behavior: open() enables user to open an _existing_ buffer. mmap() results in channel buffer being mapped into the caller's memory space. poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are notified when sub-buffer boundaries are crossed. close() decrements the channel buffer's refcount. When the refcount reaches 0 i.e. when no process or kernel client has the buffer open, the channel buffer is freed. In order for a user application to make use of relayfs files, the relayfs filesystem must be mounted. For example, mount -t relayfs relayfs /mnt/relay NOTE: relayfs doesn't need to be mounted for kernel clients to create or use channels - it only needs to be mounted when user space applications need access to the buffer data. The relayfs kernel API ====================== Here's a summary of the API relayfs provides to in-kernel clients: channel management functions: relay_open(base_filename, parent, subbuf_size, n_subbufs, overwrite, callbacks) relay_close(chan) relay_flush(chan) relay_reset(chan) relayfs_create_dir(name, parent) relayfs_remove_dir(dentry) relay_commit(buf, reserved, count) relay_subbufs_consumed(chan, cpu, subbufs_consumed) write functions: relay_write(chan, data, length) __relay_write(chan, data, length) relay_reserve(chan, length) callbacks: subbuf_start(buf, subbuf, prev_subbuf_idx, prev_subbuf) deliver(buf, subbuf_idx, subbuf) buf_mapped(buf, filp) buf_unmapped(buf, filp) buf_full(buf, subbuf_idx) A relayfs channel is made of up one or more per-cpu channel buffers, each implemented as a circular buffer subdivided into one or more sub-buffers. relay_open() is used to create a channel, along with its per-cpu channel buffers. Each channel buffer will have an associated file created for it in the relayfs filesystem, which can be opened and mmapped from user space if desired. The files are named basename0...basenameN-1 where N is the number of online cpus, and by default will be created in the root of the filesystem. If you want a directory structure to contain your relayfs files, you can create it with relayfs_create_dir() and pass the parent directory to relay_open(). Clients are responsible for cleaning up any directory structure they create when the channel is closed - use relayfs_remove_dir() for that. The total size of each per-cpu buffer is calculated by multiplying the number of sub-buffers by the sub-buffer size passed into relay_open(). The idea behind sub-buffers is that they're basically an extension of double-buffering to N buffers, and they also allow applications to easily implement random-access-on-buffer-boundary schemes, which can be important for some high-volume applications. The number and size of sub-buffers is completely dependent on the application and even for the same application, different conditions will warrant different values for these parameters at different times. Typically, the right values to use are best decided after some experimentation; in general, though, it's safe to assume that having only 1 sub-buffer is a bad idea - you're guaranteed to either overwrite data or lose events depending on the channel mode being used. relayfs channels can be opened in either of two modes - 'overwrite' or 'no-overwrite'. In overwrite mode, writes continuously cycle around the buffer and will never fail, but will unconditionally overwrite old data regardless of whether it's actually been consumed. In no-overwrite mode, writes will fail i.e. data will be lost, if the number of unconsumed sub-buffers equals the total number of sub-buffers in the channel. In this mode, the client is reponsible for notifying relayfs when sub-buffers have been consumed via relay_subbufs_consumed(). A full buffer will become 'unfull' and logging will continue once the client calls relay_subbufs_consumed() again. When a buffer becomes full, the buf_full() callback is invoked to notify the client. In both modes, the subbuf_start() callback will notify the client whenever a sub-buffer boundary is crossed. This can be used to write header information into the new sub-buffer or fill in header information reserved in the previous sub-buffer. One piece of information that's useful to save in a reserved header slot is the number of bytes of 'padding' for a sub-buffer, which is the amount of unused space at the end of a sub-buffer. The padding count for each sub-buffer is contained in an array in the rchan_buf struct passed into the subbuf_start() callback: rchan_buf->padding[prev_subbuf_idx] can be used to to get the padding for the just-finished sub-buffer. subbuf_start() is also called for the first sub-buffer in each channel buffer when the channel is created. The mode is specified to relay_open() using the overwrite parameter. kernel clients write data into the current cpu's channel buffer using relay_write() or __relay_write(). relay_write() is the main logging function - it uses local_irqsave() to protect the buffer and should be used if you might be logging from interrupt context. If you know you'll never be logging from interrupt context, you can use __relay_write(), which only disables preemption. These functions don't return a value, so you can't determine whether or not they failed - the assumption is that you wouldn't want to check a return value in the fast logging path anyway, and that they'll always succeed unless the buffer is full and in no-overwrite mode, in which case you'll be notified via the buf_full() callback. relay_reserve() is used to reserve a slot in a channel buffer which can be written to later. This would typically be used in applications that need to write directly into a channel buffer without having to stage data in a temporary buffer beforehand. Because the actual write may not happen immediately after the slot is reserved, applications using relay_reserve() can call relay_commit() to notify relayfs when the slot has actually been written. When all the reserved slots have been committed, the deliver() callback is invoked to notify the client that a guaranteed full sub-buffer has been produced. Because the write is under control of the client and is separated from the reserve, relay_reserve() doesn't protect the buffer at all - it's up to the client to provide the appropriate synchronization when using relay_reserve(). The client calls relay_close() when it's finished using the channel. The channel and its associated buffers are destroyed when there are no longer any references to any of the channel buffers. relay_flush() forces a sub-buffer switch on all the channel buffers, and can be used to finalize and process the last sub-buffers before the channel is closed. Some applications may want to keep a channel around and re-use it rather than open and close a new channel for each use. relay_reset() can be used for this purpose - it resets a channel to its initial state without reallocating channel buffer memory or destroying existing mappings. It should however only be called when it's safe to do so i.e. when the channel isn't currently being written to. Finally, there are a couple of utility callbacks that can be used for different purposes. buf_mapped() is called whenever a channel buffer is mmapped from user space and buf_unmapped() is called when it's unmapped. The client can use this notification to trigger actions within the kernel application, such as enabling/disabling logging to the channel. Credits ======= The ideas and specs for relayfs came about as a result of discussions on tracing involving the following: Michel Dagenais Richard Moore Bob Wisniewski Karim Yaghmour Tom Zanussi Also thanks to Hubertus Franke for a lot of useful suggestions and bug reports.