summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorClark Williams <williams@redhat.com>2010-03-27 11:53:13 -0500
committerClark Williams <williams@redhat.com>2010-03-27 11:53:13 -0500
commit298b8e5efe211ae0f3f7b11dfa65ad393da1a35e (patch)
tree24e2b1d397e70e5963905898c1ba43b46c5f8a21 /doc
parent2c74e2107301ea45a7c43d17f135a8686c580b1c (diff)
downloadrteval-298b8e5efe211ae0f3f7b11dfa65ad393da1a35e.tar.gz
rteval-298b8e5efe211ae0f3f7b11dfa65ad393da1a35e.tar.xz
rteval-298b8e5efe211ae0f3f7b11dfa65ad393da1a35e.zip
updated rteval.txt whitepaper
Diffstat (limited to 'doc')
-rw-r--r--doc/rteval.txt113
1 files changed, 110 insertions, 3 deletions
diff --git a/doc/rteval.txt b/doc/rteval.txt
index 8ad5762..40b4da9 100644
--- a/doc/rteval.txt
+++ b/doc/rteval.txt
@@ -1,6 +1,10 @@
Evaluating Realtime Linux system performance with rteval
+Clark Williams <williams@redhat.com
--------------------------------------------------------
+Abstract
+--------
+
One of the problems of developing and fielding any software product
that runs on a wide range of hardware platforms, is determining how
well the product performs on those platforms. This is especially true
@@ -17,13 +21,21 @@ of a network packet). To give a realtime application the best chance
of meeting its deadline(s), a realtime OS must minimize the time
between event occurance and the servicing of that event (latency).
-The 'rteval' program is an attempt to put together a synthetic
+This paper describes the 'rteval' program, a Python 2.x program
+developed at Red Hat to help quantify realtime performance on the MRG
+Realtime kernel. Rteval is an attempt to put together a synthetic
benchmark which mimics a well behaved realtime application, running on
-a heavily loaded realtime Linux system. Rteval uses the 'cyclictest'
+a heavily loaded realtime Linux system. It uses the 'cyclictest'
program in the role of the realtime app and uses two loads, a parallel
build of a Linux kernel and the scheduler benchmark 'hackbench' to
boost the system load.
+Rteval runs for a specified length of time (typically 12 hours). When
+an rteval run is completed, a statisical analysis of the results is
+done, an XML file is generated, containing system state, raw result
+data and statistical analysis results and optionally the XML is sent
+by XML-RPC to a database for reporting.
+
The Load Applications
---------------------
@@ -113,7 +125,7 @@ you write 374,400,000 bytes per hour to disk. A 12 hour run on a four
core system would generate about 44 gigabytes of data. This was deemed
excessive...
-So the decision was made to recored the latency values in histogram
+So the decision was made to record the latency values in histogram
format, one histogram for each measurement thread. This has the
advantage of using only a fixed amount of memory to record samples,
but has the disadvantage of losing temporal ordering information,
@@ -132,3 +144,98 @@ variability of service times. This variability is sometimes called
'jitter' in realtime paralance, due to the plot the data would make on
an oscilloscope.
+In addition to calculating mean, variance and standard deviation,
+rteval's statistics code calculates and stores the usual suspects for
+statistics, such as min, max, mode, median, and range, both for each
+cpu core and aggregated for all cores.
+
+Another challenge was in identifying the underlying hardware platform
+so that runs on the same system could be grouped properly. The Desktop
+Management Interface (DMI) tables maintaned by the BIOS were a good
+starting point, since they record information about the cpu, memory,
+and peripheral devices in the system. Added to that information is
+some state about the system while the test was running: kernel
+version, active clocksource, number of NUMA nodes available, kernel
+threads and their priorities, kernel modules loaded, and the state of
+network interfaces.
+
+Problems
+--------
+
+Using rteval has helped Red Hat locate areas that cause performance
+problems with realtime Linux kernels. Some of these problem areas are:
+
+1. BIOS/SMI issues
+Many systems use System Management Interrupts (SMIs) to perform system
+critical operations without help from the running operating system by
+trapping back to BIOS code. Unfortunately, this causes 'gaps in time'
+for the kernel, since nothing can run while an SMI is being handled in
+the BIOS. Most times SMI impact is negligable since it's mostly
+thermal management, reading a thermocouple and turning fans on/off.
+Sometimes though the operation takes a long time (i.e when an EDAC
+needs to correct a memory error) and that can cause deadlines to
+be missed by many hundreds of microseconds. Red Hat has been working
+with hardware vendors to identify these hotspots and reduce their
+impact.
+
+2. Kernel scalability issues
+
+In the past few years, the number of cores per socket on a motherboard
+has gone up from 2 to 8, resulting in some scalability problems in the
+kernel. One area that has received a lot of attention is the load
+balancer. This is logic in the kernel that attempts to make sure that
+each core in the system has tasks to run and that no one core is
+overloaded with tasks. During a load balancer pass, a core with a long
+run queue (indicating there are many tasks ready on that core) will
+have some of those tasks migrated to other cores, which requires that
+both the current and destination cores run queue locks being held
+(meaning nothing can run on those cores).
+
+In a stock Linux kernel long load balancer passes result in more
+utilization of cpus and an overall througput gain. Unfortunately long
+load balancer passes can result in missed deadlines because a task on
+the run queue for a core cannot run while the loadbalancer is
+running. To compensate for this on realtime Linux the load balancer
+has a lower number of target migrations and looks for contention on
+the run queue locks (meaning that a task is trying to be scheduled on
+one of the cores on which the balancer is operating). Research in this
+area is ongoing.
+
+<what other areas?>
+
+3. NUMA
+
+In conjunction with the increase in the number of cpu cores per die
+has been the desire to reduce the amount of interconnect traces
+between cpus and memory nodes (as you pack the cores tighter, you have
+less room to run connections to memory and other cores). One way to do
+this is to route a core's address/data/signal lines to some sort of
+switch module, such as AMD's HyperTransport mechanism. With a series
+of switches in place many cores can access memory and other cpu
+resources through the switch network without programs running on them
+knowing they're going through a switch. This results in a Non-Uniform
+Memory Access (NUMA) architecuture, which means that some memory
+accesses will take longer than others due to traversing the switch
+network. NUMA is great for scaling up throughput oriented servers, but
+tends to hurt determinism, if the programs are not aware of the memory
+topology.
+
+A --numa option was added to the cyclictest program to use the libnuma
+library to bind threads to local memory nodes and allocate memory on
+the closest memory node, so to minimize the time required to access
+memory.
+
+Further Development
+-------------------
+
+Once we started getting rteval run information it was natural that we
+would want to store it in a database for further analysis (especially
+watching for performance regressions). David Sommerseth created a set
+of tables for a PostgreSQL database and then added an option to rteval
+to ship the results back to a database server using XML-RPC. This
+option is currently used internally at Red Hat do ship rteval run data
+back to our internal DB server. There are no plans to open this data
+up to the public, but the XML-RPC code is there if someone else wants
+to use the facility. (No, there are no backdoors in the code that ship
+run data back to Red Hat; it's Python code, look and see!).
+