diff options
author | Clark Williams <williams@redhat.com> | 2010-03-27 11:53:13 -0500 |
---|---|---|
committer | Clark Williams <williams@redhat.com> | 2010-03-27 11:53:13 -0500 |
commit | 298b8e5efe211ae0f3f7b11dfa65ad393da1a35e (patch) | |
tree | 24e2b1d397e70e5963905898c1ba43b46c5f8a21 /doc | |
parent | 2c74e2107301ea45a7c43d17f135a8686c580b1c (diff) | |
download | rteval-298b8e5efe211ae0f3f7b11dfa65ad393da1a35e.tar.gz rteval-298b8e5efe211ae0f3f7b11dfa65ad393da1a35e.tar.xz rteval-298b8e5efe211ae0f3f7b11dfa65ad393da1a35e.zip |
updated rteval.txt whitepaper
Diffstat (limited to 'doc')
-rw-r--r-- | doc/rteval.txt | 113 |
1 files changed, 110 insertions, 3 deletions
diff --git a/doc/rteval.txt b/doc/rteval.txt index 8ad5762..40b4da9 100644 --- a/doc/rteval.txt +++ b/doc/rteval.txt @@ -1,6 +1,10 @@ Evaluating Realtime Linux system performance with rteval +Clark Williams <williams@redhat.com -------------------------------------------------------- +Abstract +-------- + One of the problems of developing and fielding any software product that runs on a wide range of hardware platforms, is determining how well the product performs on those platforms. This is especially true @@ -17,13 +21,21 @@ of a network packet). To give a realtime application the best chance of meeting its deadline(s), a realtime OS must minimize the time between event occurance and the servicing of that event (latency). -The 'rteval' program is an attempt to put together a synthetic +This paper describes the 'rteval' program, a Python 2.x program +developed at Red Hat to help quantify realtime performance on the MRG +Realtime kernel. Rteval is an attempt to put together a synthetic benchmark which mimics a well behaved realtime application, running on -a heavily loaded realtime Linux system. Rteval uses the 'cyclictest' +a heavily loaded realtime Linux system. It uses the 'cyclictest' program in the role of the realtime app and uses two loads, a parallel build of a Linux kernel and the scheduler benchmark 'hackbench' to boost the system load. +Rteval runs for a specified length of time (typically 12 hours). When +an rteval run is completed, a statisical analysis of the results is +done, an XML file is generated, containing system state, raw result +data and statistical analysis results and optionally the XML is sent +by XML-RPC to a database for reporting. + The Load Applications --------------------- @@ -113,7 +125,7 @@ you write 374,400,000 bytes per hour to disk. A 12 hour run on a four core system would generate about 44 gigabytes of data. This was deemed excessive... -So the decision was made to recored the latency values in histogram +So the decision was made to record the latency values in histogram format, one histogram for each measurement thread. This has the advantage of using only a fixed amount of memory to record samples, but has the disadvantage of losing temporal ordering information, @@ -132,3 +144,98 @@ variability of service times. This variability is sometimes called 'jitter' in realtime paralance, due to the plot the data would make on an oscilloscope. +In addition to calculating mean, variance and standard deviation, +rteval's statistics code calculates and stores the usual suspects for +statistics, such as min, max, mode, median, and range, both for each +cpu core and aggregated for all cores. + +Another challenge was in identifying the underlying hardware platform +so that runs on the same system could be grouped properly. The Desktop +Management Interface (DMI) tables maintaned by the BIOS were a good +starting point, since they record information about the cpu, memory, +and peripheral devices in the system. Added to that information is +some state about the system while the test was running: kernel +version, active clocksource, number of NUMA nodes available, kernel +threads and their priorities, kernel modules loaded, and the state of +network interfaces. + +Problems +-------- + +Using rteval has helped Red Hat locate areas that cause performance +problems with realtime Linux kernels. Some of these problem areas are: + +1. BIOS/SMI issues +Many systems use System Management Interrupts (SMIs) to perform system +critical operations without help from the running operating system by +trapping back to BIOS code. Unfortunately, this causes 'gaps in time' +for the kernel, since nothing can run while an SMI is being handled in +the BIOS. Most times SMI impact is negligable since it's mostly +thermal management, reading a thermocouple and turning fans on/off. +Sometimes though the operation takes a long time (i.e when an EDAC +needs to correct a memory error) and that can cause deadlines to +be missed by many hundreds of microseconds. Red Hat has been working +with hardware vendors to identify these hotspots and reduce their +impact. + +2. Kernel scalability issues + +In the past few years, the number of cores per socket on a motherboard +has gone up from 2 to 8, resulting in some scalability problems in the +kernel. One area that has received a lot of attention is the load +balancer. This is logic in the kernel that attempts to make sure that +each core in the system has tasks to run and that no one core is +overloaded with tasks. During a load balancer pass, a core with a long +run queue (indicating there are many tasks ready on that core) will +have some of those tasks migrated to other cores, which requires that +both the current and destination cores run queue locks being held +(meaning nothing can run on those cores). + +In a stock Linux kernel long load balancer passes result in more +utilization of cpus and an overall througput gain. Unfortunately long +load balancer passes can result in missed deadlines because a task on +the run queue for a core cannot run while the loadbalancer is +running. To compensate for this on realtime Linux the load balancer +has a lower number of target migrations and looks for contention on +the run queue locks (meaning that a task is trying to be scheduled on +one of the cores on which the balancer is operating). Research in this +area is ongoing. + +<what other areas?> + +3. NUMA + +In conjunction with the increase in the number of cpu cores per die +has been the desire to reduce the amount of interconnect traces +between cpus and memory nodes (as you pack the cores tighter, you have +less room to run connections to memory and other cores). One way to do +this is to route a core's address/data/signal lines to some sort of +switch module, such as AMD's HyperTransport mechanism. With a series +of switches in place many cores can access memory and other cpu +resources through the switch network without programs running on them +knowing they're going through a switch. This results in a Non-Uniform +Memory Access (NUMA) architecuture, which means that some memory +accesses will take longer than others due to traversing the switch +network. NUMA is great for scaling up throughput oriented servers, but +tends to hurt determinism, if the programs are not aware of the memory +topology. + +A --numa option was added to the cyclictest program to use the libnuma +library to bind threads to local memory nodes and allocate memory on +the closest memory node, so to minimize the time required to access +memory. + +Further Development +------------------- + +Once we started getting rteval run information it was natural that we +would want to store it in a database for further analysis (especially +watching for performance regressions). David Sommerseth created a set +of tables for a PostgreSQL database and then added an option to rteval +to ship the results back to a database server using XML-RPC. This +option is currently used internally at Red Hat do ship rteval run data +back to our internal DB server. There are no plans to open this data +up to the public, but the XML-RPC code is there if someone else wants +to use the facility. (No, there are no backdoors in the code that ship +run data back to Red Hat; it's Python code, look and see!). + |