From 9663f2e6a6cf3f82b06d8fb699b11b80f92553ba Mon Sep 17 00:00:00 2001 From: Keith Packard Date: Thu, 30 Oct 2008 19:38:18 -0700 Subject: resources: add io-mapping functions to dynamically map large device apertures Impact: add new generic io_map_*() APIs Graphics devices have large PCI apertures which would consume a significant fraction of a 32-bit address space if mapped during driver initialization. Using ioremap at runtime is impractical as it is too slow. This new set of interfaces uses atomic mappings on 32-bit processors and a large static mapping on 64-bit processors to provide reasonable 32-bit performance and optimal 64-bit performance. The current implementation sits atop the io_map_atomic fixmap-based mechanism for 32-bit processors. This includes some editorial suggestions from Randy Dunlap for Documentation/io-mapping.txt Signed-off-by: Keith Packard Signed-off-by: Eric Anholt Signed-off-by: Ingo Molnar --- Documentation/io-mapping.txt | 76 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 Documentation/io-mapping.txt (limited to 'Documentation') diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt new file mode 100644 index 00000000000..cd2f726becc --- /dev/null +++ b/Documentation/io-mapping.txt @@ -0,0 +1,76 @@ +The io_mapping functions in linux/io-mapping.h provide an abstraction for +efficiently mapping small regions of an I/O device to the CPU. The initial +usage is to support the large graphics aperture on 32-bit processors where +ioremap_wc cannot be used to statically map the entire aperture to the CPU +as it would consume too much of the kernel address space. + +A mapping object is created during driver initialization using + + struct io_mapping *io_mapping_create_wc(unsigned long base, + unsigned long size) + + 'base' is the bus address of the region to be made + mappable, while 'size' indicates how large a mapping region to + enable. Both are in bytes. + + This _wc variant provides a mapping which may only be used + with the io_mapping_map_atomic_wc or io_mapping_map_wc. + +With this mapping object, individual pages can be mapped either atomically +or not, depending on the necessary scheduling environment. Of course, atomic +maps are more efficient: + + void *io_mapping_map_atomic_wc(struct io_mapping *mapping, + unsigned long offset) + + 'offset' is the offset within the defined mapping region. + Accessing addresses beyond the region specified in the + creation function yields undefined results. Using an offset + which is not page aligned yields an undefined result. The + return value points to a single page in CPU address space. + + This _wc variant returns a write-combining map to the + page and may only be used with mappings created by + io_mapping_create_wc + + Note that the task may not sleep while holding this page + mapped. + + void io_mapping_unmap_atomic(void *vaddr) + + 'vaddr' must be the the value returned by the last + io_mapping_map_atomic_wc call. This unmaps the specified + page and allows the task to sleep once again. + +If you need to sleep while holding the lock, you can use the non-atomic +variant, although they may be significantly slower. + + void *io_mapping_map_wc(struct io_mapping *mapping, + unsigned long offset) + + This works like io_mapping_map_atomic_wc except it allows + the task to sleep while holding the page mapped. + + void io_mapping_unmap(void *vaddr) + + This works like io_mapping_unmap_atomic, except it is used + for pages mapped with io_mapping_map_wc. + +At driver close time, the io_mapping object must be freed: + + void io_mapping_free(struct io_mapping *mapping) + +Current Implementation: + +The initial implementation of these functions uses existing mapping +mechanisms and so provides only an abstraction layer and no new +functionality. + +On 64-bit processors, io_mapping_create_wc calls ioremap_wc for the whole +range, creating a permanent kernel-visible mapping to the resource. The +map_atomic and map functions add the requested offset to the base of the +virtual address returned by ioremap_wc. + +On 32-bit processors, io_mapping_map_atomic_wc uses io_map_atomic_prot_pfn, +which uses the fixmaps to get us a mapping to a page using an atomic fashion. +For io_mapping_map_wc, ioremap_wc() is used to get a mapping of the region. -- cgit From 8d5c6603c408d91ecf543f244f10ccb8b500ad95 Mon Sep 17 00:00:00 2001 From: Keith Packard Date: Mon, 3 Nov 2008 18:21:44 +0100 Subject: io mapping: improve documentation Impact: add documentation Signed-off-by: Keith Packard Signed-off-by: Ingo Molnar --- Documentation/io-mapping.txt | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt index cd2f726becc..473e43b2d58 100644 --- a/Documentation/io-mapping.txt +++ b/Documentation/io-mapping.txt @@ -71,6 +71,12 @@ range, creating a permanent kernel-visible mapping to the resource. The map_atomic and map functions add the requested offset to the base of the virtual address returned by ioremap_wc. -On 32-bit processors, io_mapping_map_atomic_wc uses io_map_atomic_prot_pfn, -which uses the fixmaps to get us a mapping to a page using an atomic fashion. -For io_mapping_map_wc, ioremap_wc() is used to get a mapping of the region. +On 32-bit processors with HIGHMEM defined, io_mapping_map_atomic_wc uses +kmap_atomic_pfn to map the specified page in an atomic fashion; +kmap_atomic_pfn isn't really supposed to be used with device pages, but it +provides an efficient mapping for this usage. + +On 32-bit processors without HIGHMEM defined, io_mapping_map_atomic_wc and +io_mapping_map_wc both use ioremap_wc, a terribly inefficient function which +performs an IPI to inform all processors about the new mapping. This results +in a significant performance penalty. -- cgit From 9b803c0fc37dc49d06d99d63741e78274801c644 Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Mon, 3 Nov 2008 15:15:08 -0500 Subject: ftrace: update txt document Impact: Documentation update only A lot of changes have gone into ftrace. This patch updates the ftrace.txt document. Signed-off-by: Steven Rostedt Acked-by: Randy Dunlap Signed-off-by: Ingo Molnar --- Documentation/ftrace.txt | 169 ++++++++++++++++++++--------------------------- 1 file changed, 72 insertions(+), 97 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt index ea5a827395d..0cb989cd512 100644 --- a/Documentation/ftrace.txt +++ b/Documentation/ftrace.txt @@ -50,26 +50,26 @@ of ftrace. Here is a list of some of the key files: Note: all time values are in microseconds. - current_tracer : This is used to set or display the current tracer + current_tracer: This is used to set or display the current tracer that is configured. - available_tracers : This holds the different types of tracers that + available_tracers: This holds the different types of tracers that have been compiled into the kernel. The tracers listed here can be configured by echoing their name into current_tracer. - tracing_enabled : This sets or displays whether the current_tracer + tracing_enabled: This sets or displays whether the current_tracer is activated and tracing or not. Echo 0 into this file to disable the tracer or 1 to enable it. - trace : This file holds the output of the trace in a human readable + trace: This file holds the output of the trace in a human readable format (described below). - latency_trace : This file shows the same trace but the information + latency_trace: This file shows the same trace but the information is organized more to display possible latencies in the system (described below). - trace_pipe : The output is the same as the "trace" file but this + trace_pipe: The output is the same as the "trace" file but this file is meant to be streamed with live tracing. Reads from this file will block until new data is retrieved. Unlike the "trace" and "latency_trace" @@ -82,11 +82,11 @@ of ftrace. Here is a list of some of the key files: tracer is not adding more data, they will display the same information every time they are read. - iter_ctrl : This file lets the user control the amount of data + iter_ctrl: This file lets the user control the amount of data that is displayed in one of the above output files. - trace_max_latency : Some of the tracers record the max latency. + trace_max_latency: Some of the tracers record the max latency. For example, the time interrupts are disabled. This time is saved in this file. The max trace will also be stored, and displayed by either @@ -94,29 +94,26 @@ of ftrace. Here is a list of some of the key files: only be recorded if the latency is greater than the value in this file. (in microseconds) - trace_entries : This sets or displays the number of trace - entries each CPU buffer can hold. The tracer buffers - are the same size for each CPU. The displayed number - is the size of the CPU buffer and not total size. The + trace_entries: This sets or displays the number of bytes each CPU + buffer can hold. The tracer buffers are the same size + for each CPU. The displayed number is the size of the + CPU buffer and not total size of all buffers. The trace buffers are allocated in pages (blocks of memory that the kernel uses for allocation, usually 4 KB in size). - Since each entry is smaller than a page, if the last - allocated page has room for more entries than were - requested, the rest of the page is used to allocate - entries. + If the last page allocated has room for more bytes + than requested, the rest of the page will be used, + making the actual allocation bigger than requested. + (Note, the size may not be a multiple of the page size due + to buffer managment overhead.) This can only be updated when the current_tracer - is set to "none". + is set to "nop". - NOTE: It is planned on changing the allocated buffers - from being the number of possible CPUS to - the number of online CPUS. - - tracing_cpumask : This is a mask that lets the user only trace + tracing_cpumask: This is a mask that lets the user only trace on specified CPUS. The format is a hex string representing the CPUS. - set_ftrace_filter : When dynamic ftrace is configured in (see the + set_ftrace_filter: When dynamic ftrace is configured in (see the section below "dynamic ftrace"), the code is dynamically modified (code text rewrite) to disable calling of the function profiler (mcount). This lets tracing be configured @@ -130,14 +127,11 @@ of ftrace. Here is a list of some of the key files: be traced. If a function exists in both set_ftrace_filter and set_ftrace_notrace, the function will _not_ be traced. - available_filter_functions : When a function is encountered the first - time by the dynamic tracer, it is recorded and - later the call is converted into a nop. This file - lists the functions that have been recorded - by the dynamic tracer and these functions can - be used to set the ftrace filter by the above - "set_ftrace_filter" file. (See the section "dynamic ftrace" - below for more details). + available_filter_functions: This lists the functions that ftrace + has processed and can trace. These are the function + names that you can pass to "set_ftrace_filter" or + "set_ftrace_notrace". (See the section "dynamic ftrace" + below for more details.) The Tracers @@ -145,7 +139,7 @@ The Tracers Here is the list of current tracers that may be configured. - ftrace - function tracer that uses mcount to trace all functions. + function - function tracer that uses mcount to trace all functions. sched_switch - traces the context switches between tasks. @@ -166,8 +160,8 @@ Here is the list of current tracers that may be configured. the highest priority task to get scheduled after it has been woken up. - none - This is not a tracer. To remove all tracers from tracing - simply echo "none" into current_tracer. + nop - This is not a tracer. To remove all tracers from tracing + simply echo "nop" into current_tracer. Examples of using the tracer @@ -182,7 +176,7 @@ Output format: Here is an example of the output format of the file "trace" -------- -# tracer: ftrace +# tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | @@ -192,7 +186,7 @@ Here is an example of the output format of the file "trace" -------- A header is printed with the tracer name that is represented by the trace. -In this case the tracer is "ftrace". Then a header showing the format. Task +In this case the tracer is "function". Then a header showing the format. Task name "bash", the task PID "4251", the CPU that it was running on "01", the timestamp in . format, the function name that was traced "path_put" and the parent function that called this function @@ -1003,22 +997,20 @@ is the stack for the hard interrupt. This hides the fact that NEED_RESCHED has been set. We do not see the 'N' until we switch back to the task's assigned stack. -ftrace ------- +function +-------- -ftrace is not only the name of the tracing infrastructure, but it -is also a name of one of the tracers. The tracer is the function -tracer. Enabling the function tracer can be done from the -debug file system. Make sure the ftrace_enabled is set otherwise -this tracer is a nop. +This tracer is the function tracer. Enabling the function tracer +can be done from the debug file system. Make sure the ftrace_enabled is +set; otherwise this tracer is a nop. # sysctl kernel.ftrace_enabled=1 - # echo ftrace > /debug/tracing/current_tracer + # echo function > /debug/tracing/current_tracer # echo 1 > /debug/tracing/tracing_enabled # usleep 1 # echo 0 > /debug/tracing/tracing_enabled # cat /debug/tracing/trace -# tracer: ftrace +# tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | @@ -1040,10 +1032,10 @@ this tracer is a nop. [...] -Note: ftrace uses ring buffers to store the above entries. The newest data -may overwrite the oldest data. Sometimes using echo to stop the trace -is not sufficient because the tracing could have overwritten the data -that you wanted to record. For this reason, it is sometimes better to +Note: function tracer uses ring buffers to store the above entries. +The newest data may overwrite the oldest data. Sometimes using echo to +stop the trace is not sufficient because the tracing could have overwritten +the data that you wanted to record. For this reason, it is sometimes better to disable tracing directly from a program. This allows you to stop the tracing at the point that you hit the part that you are interested in. To disable the tracing directly from a C program, something like following @@ -1077,18 +1069,31 @@ every kernel function, produced by the -pg switch in gcc), starts of pointing to a simple return. (Enabling FTRACE will include the -pg switch in the compiling of the kernel.) -When dynamic ftrace is initialized, it calls kstop_machine to make -the machine act like a uniprocessor so that it can freely modify code -without worrying about other processors executing that same code. At -initialization, the mcount calls are changed to call a "record_ip" -function. After this, the first time a kernel function is called, -it has the calling address saved in a hash table. - -Later on the ftraced kernel thread is awoken and will again call -kstop_machine if new functions have been recorded. The ftraced thread -will change all calls to mcount to "nop". Just calling mcount -and having mcount return has shown a 10% overhead. By converting -it to a nop, there is no measurable overhead to the system. +At compile time every C file object is run through the +recordmcount.pl script (located in the scripts directory). This +script will process the C object using objdump to find all the +locations in the .text section that call mcount. (Note, only +the .text section is processed, since processing other sections +like .init.text may cause races due to those sections being freed). + +A new section called "__mcount_loc" is created that holds references +to all the mcount call sites in the .text section. This section is +compiled back into the original object. The final linker will add +all these references into a single table. + +On boot up, before SMP is initialized, the dynamic ftrace code +scans this table and updates all the locations into nops. It also +records the locations, which are added to the available_filter_functions +list. Modules are processed as they are loaded and before they are +executed. When a module is unloaded, it also removes its functions from +the ftrace function list. This is automatic in the module unload +code, and the module author does not need to worry about it. + +When tracing is enabled, kstop_machine is called to prevent races +with the CPUS executing code being modified (which can cause the +CPU to do undesireable things), and the nops are patched back +to calls. But this time, they do not call mcount (which is just +a function stub). They now call into the ftrace infrastructure. One special side-effect to the recording of the functions being traced is that we can now selectively choose which functions we @@ -1251,36 +1256,6 @@ Produces: We can see that there's no more lock or preempt tracing. -ftraced -------- - -As mentioned above, when dynamic ftrace is configured in, a kernel -thread wakes up once a second and checks to see if there are mcount -calls that need to be converted into nops. If there are not any, then -it simply goes back to sleep. But if there are some, it will call -kstop_machine to convert the calls to nops. - -There may be a case in which you do not want this added latency. -Perhaps you are doing some audio recording and this activity might -cause skips in the playback. There is an interface to disable -and enable the "ftraced" kernel thread. - - # echo 0 > /debug/tracing/ftraced_enabled - -This will disable the calling of kstop_machine to update the -mcount calls to nops. Remember that there is a large overhead -to calling mcount. Without this kernel thread, that overhead will -exist. - -If there are recorded calls to mcount, any write to the ftraced_enabled -file will cause the kstop_machine to run. This means that a -user can manually perform the updates when they want to by simply -echoing a '0' into the ftraced_enabled file. - -The updates are also done at the beginning of enabling a tracer -that uses ftrace function recording. - - trace_pipe ---------- @@ -1289,14 +1264,14 @@ on the tracing is different. Every read from trace_pipe is consumed. This means that subsequent reads will be different. The trace is live. - # echo ftrace > /debug/tracing/current_tracer + # echo function > /debug/tracing/current_tracer # cat /debug/tracing/trace_pipe > /tmp/trace.out & [1] 4153 # echo 1 > /debug/tracing/tracing_enabled # usleep 1 # echo 0 > /debug/tracing/tracing_enabled # cat /debug/tracing/trace -# tracer: ftrace +# tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | @@ -1317,7 +1292,7 @@ is live. Note, reading the trace_pipe file will block until more input is added. By changing the tracer, trace_pipe will issue an EOF. We needed -to set the ftrace tracer _before_ cating the trace_pipe file. +to set the function tracer _before_ we "cat" the trace_pipe file. trace entries @@ -1334,10 +1309,10 @@ number of entries. 65620 Note, to modify this, you must have tracing completely disabled. To do that, -echo "none" into the current_tracer. If the current_tracer is not set -to "none", an EINVAL error will be returned. +echo "nop" into the current_tracer. If the current_tracer is not set +to "nop", an EINVAL error will be returned. - # echo none > /debug/tracing/current_tracer + # echo nop > /debug/tracing/current_tracer # echo 100000 > /debug/tracing/trace_entries # cat /debug/tracing/trace_entries 100045 -- cgit From 42ec632e7b0185d3776aa5d23380cbdc963151f2 Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Mon, 3 Nov 2008 15:18:56 -0500 Subject: ftrace: ftrace.txt version update Impact: Documentation update only Update the version that the ftrace document was written for. Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar --- Documentation/ftrace.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt index 0cb989cd512..9cc4d685dde 100644 --- a/Documentation/ftrace.txt +++ b/Documentation/ftrace.txt @@ -8,7 +8,7 @@ Copyright 2008 Red Hat Inc. Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, John Kacur, and David Teigland. -Written for: 2.6.27-rc1 +Written for: 2.6.28-rc2 Introduction ------------ -- cgit