\input texinfo @c abrt-retrace-server.texi - Retrace Server Documentation @c @c .texi extension is recommended in GNU Automake manual @setfilename abrt-retrace-server.info @include version.texi @settitle Retrace server for ABRT @value{VERSION} Manual @dircategory Retrace server @direntry * Retrace server: (retrace-server). Remote coredump analysis via HTTP. @end direntry @titlepage @title Retrace server @subtitle for ABRT version @value{VERSION}, @value{UPDATED} @author Karel Klic (@email{kklic@@redhat.com}) @page @vskip 0pt plus 1filll @end titlepage @contents @ifnottex @node Top @top Retrace server This manual is for retrace server for ABRT version @value{VERSION}, @value{UPDATED}. The retrace server provides coredump analysis and backtrace generation service over a network using HTTP protocol. @end ifnottex @menu * Overview:: * HTTP interface:: * Retrace worker:: * Task cleanup:: * Package repository:: * Traffic and load estimation:: * Security:: * Future work:: @end menu @node Overview @chapter Overview Analyzing a program crash from a coredump is a difficult task. The GNU Debugger (GDB), that is commonly used to analyze coredumps on free operating systems, expects that the system analyzing the coredump is identical to the system where the program crashed. Software updates often break this assumption even on the system where the crash occured, making the coredump analyzable only with significant effort. Furthermore, older versions of software packages are often removed from software repositories, including the packages with debugging symbols, so the package with debugging symbols is often not available when user needs to install it for coredump analysis. Packages with the debugging symbols are large, requiring a lot of free space and causing problems with downloading them via unreliable internet connection. Retrace server solves these problems for Fedora 14+ and RHEL 6+ operating systems, and allows developers to analyze coredumps without having access to the machine where the crash occurred. Retrace server is usually run as a service on a local network, or on Internet. A user sends a coredump together with some additional information to a retrace server. The server reads the coredump and depending on its contents it installs necessary software dependencies to create a software environment which is, from the GDB point of view, identical to the environment where the crash happened. Then the server runs GDB to generate a backtrace from the coredump and provides it back to the user. Core dumps generated on i386 and x86_64 architectures are supported within a single x86_64 retrace server instance. The retrace server consists of the following major parts: @enumerate @item a HTTP interface, consisting of a set of scripts handling communication with clients @item a retrace worker, doing the coredump processing, environment preparation, and running the debugger to generate a backtrace @item a cleanup script, handling stalled retracing tasks and removing old data @item a package repository, providing the application binaries, libraries, and debuginfo necessary for generating backtraces from coredumps @end enumerate @node HTTP interface @chapter HTTP interface @menu * Creating a new task:: * Task status:: * Requesting a backtrace:: * Requesting a log:: * Limiting traffic:: @end menu The client-server communication proceeds as follows: @enumerate @item Client uploads a coredump to a retrace server. Retrace server creates a task for processing the coredump, and sends the task ID and task password in response to the client. @item Client asks server for the task status using the task ID and password. Server responds with the status information (task finished successfully, task failed, task is still running). @item Client asks server for the backtrace from a successfully finished task using the task ID and password. Server sends the backtrace in response. @item Client asks server for a log from the finished task using the task ID and password, and server sends the log in response. @end enumerate The HTTP interface application is a set of script written in Python, using the @uref{http://www.python.org/dev/peps/pep-0333/, Python Web Server Gateway Interface} (WSGI) to interact with a web server. The only supported and tested configuration is the Apache HTTPD Server with @uref{http://code.google.com/p/modwsgi/, mod_wsgi}. Only secure (HTTPS) communication is allowed for communicating with a public instance of retrace server, because coredumps and backtraces are private data. Users may decide to publish their backtraces in a bug tracker after reviewing them, but the retrace server doesn't do that. The server is supposed to use HTTP persistent connections to to avoid frequent SSL renegotiations. @node Creating a new task @section Creating a new task A client might create a new task by sending a HTTP request to the @indicateurl{https://server/create} URL, and providing an archive as the request content. The archive contains crash data files. The crash data files are a subset of some local @file{/var/spool/abrt/ccpp-time-pid} directory contents, so the client must only pack and upload them. The server supports uncompressed tar archives, and tar archives compressed with gzip and xz. Uncompressed archives are the most efficient way for local network delivery, and gzip can be used there as well because of its good compression speed. The xz compression file format is well suited for public server setup (slow network), as it provides good compression ratio, which is important for compressing large coredumps, and it provides reasonable compress/decompress speed and memory consumption. See @ref{Traffic and load estimation} for the measurements. The @uref{http://tukaani.org/xz/, XZ Utils} implementation with the compression level 2 is used to compress the data. The HTTP request for a new task must use the POST method. It must contain a proper @var{Content-Length} and @var{Content-Type} fields. If the method is not POST, the server returns the @code{405 Method Not Allowed} HTTP error code. If the @var{Content-Length} field is missing, the server returns the @code{411 Length Required} HTTP error code. If an @var{Content-Type} other than @samp{application/x-tar}, @samp{application/x-gzip}, @samp{application/x-xz} is used, the server returns the @code{415 unsupported Media Type} HTTP error code. If the @var{Content-Length} value is greater than a limit set by @var{MaxPackedSize} option in the server configuration file (50 MB by default), or the real HTTP request size gets larger than the limit + 10 KB for headers, then the server returns the @code{413 Request Entity Too Large} HTTP error code, and provides an explanation, including the limit, in the response body. The limit is changeable from the server configuration file. If unpacking the archive would result in having the free disk space under certain limit in the @file{/var/spool/abrt-retrace} directory, the server returns the @code{507 Insufficient Storage} HTTP error code. The limit is specified by the @var{MinStorageLeft} option in the server configuration file, and it is set to 1024 MB by default. If the data from the received archive would take more than 1024 MB of disk space when uncompressed, the server returns the @code{413 Request Entity Too Large} HTTP error code, and provides an explanation, including the limit, in the response body. The size limit is changeable by the @var{MaxUnpackedSize} option in the server configuration file. It can be set pretty high because coredumps, that take most disk space, are stored on the server only temporarily until the backtrace is generated. When the backtrace is generated the coredump is deleted by the @command{abrt-retrace-worker}, so most disk space is released. The uncompressed data size for xz archives is obtained by calling @code{`xz --list file.tar.xz`}. The @option{--list} option has been implemented only recently, so updating @command{xz} on your server might be necessary. Likewise, the uncompressed data size for gzip archives is obtained by calling @code{`gzip --list file.tar.gz`}. If an upload from a client succeeds, the server creates a new directory @file{/var/spool/abrt-retrace/@var{id}} and extracts the received archive into it. Then it checks that the directory contains all the required files, checks their sizes, and then sends a HTTP response. After that it spawns a subprocess with @command{abrt-retrace-worker} on that directory. The following files from the local crash directory are required to be present in the archive: @file{coredump}, @file{executable}, @file{package}. If one or more files are not present in the archive, or some other file is present in the archive, the server returns the @code{403 Forbidden} HTTP error code. If the file check succeeds, the server HTTP response has the @code{201 Created} HTTP code. The response includes the following HTTP header fields: @itemize @item @var{X-Task-Id} containing a new server-unique numerical task id @item @var{X-Task-Password} containing a newly generated password, required to access the result @end itemize The @var{X-Task-Password} is a random alphanumeric (@samp{[a-zA-Z0-9]}) sequence 22 characters long. The password is stored in the @file{/var/spool/abrt-retrace/@var{id}/password} file, and passwords sent by a client in subsequent requests are verified by comparing with this file. The task id is intentionally not used as a password, because it is desirable to keep the id readable and memorable for humans. Password-like ids would be a loss when an user authentication mechanism is added, and server-generated password will no longer be necessary. @node Task status @section Task status A client might request a task status by sending a HTTP GET request to the @indicateurl{https://someserver/@var{id}} URL, where @var{id} is the numerical task id returned in the @var{X-Task-Id} field by @indicateurl{https://someserver/create}. If the @var{id} is not in the valid format, or the task @var{id} does not exist, the server returns the @code{404 Not Found} HTTP error code. The client request must contain the @var{X-Task-Password} field, and its content must match the password stored in the @file{/var/spool/abrt-retrace/@var{id}/password} file. If the password is not valid, the server returns the @code{403 Forbidden} HTTP error code. If the checks pass, the server returns the @code{200 OK} HTTP code, and includes a field @var{X-Task-Status} containing one of the following values: @samp{FINISHED_SUCCESS}, @samp{FINISHED_FAILURE}, @samp{PENDING}. The field contains @samp{FINISHED_SUCCESS} if the file @file{/var/spool/abrt-retrace/@var{id}/backtrace} exists. The client might get the backtrace on the @indicateurl{https://someserver/@var{id}/backtrace} URL. The log might be obtained on the @indicateurl{https://someserver/@var{id}/log} URL, and it might contain warnings about some missing debuginfos etc. The field contains @samp{FINISHED_FAILURE} if the file @file{/var/spool/abrt-retrace/@var{id}/backtrace} does not exist, and file @file{/var/spool/abrt-retrace/@var{id}/retrace-log} exists. The retrace-log file containing error messages can be downloaded by the client from the @indicateurl{https://someserver/@var{id}/log} URL. The field contains @samp{PENDING} if neither file exists. The client should ask again after 10 seconds or later. @node Requesting a backtrace @section Requesting a backtrace A client might request a backtrace by sending a HTTP GET request to the @indicateurl{https://someserver/@var{id}/backtrace} URL, where @var{id} is the numerical task id returned in the @var{X-Task-Id} field by @indicateurl{https://someserver/create}. If the @var{id} is not in the valid format, or the task @var{id} does not exist, the server returns the @code{404 Not Found} HTTP error code. The client request must contain the @var{X-Task-Password} field, and its content must match the password stored in the @file{/var/spool/abrt-retrace/@var{id}/password} file. If the password is not valid, the server returns the @code{403 Forbidden} HTTP error code. If the file @file{/var/spool/abrt-retrace/@var{id}/backtrace} does not exist, the server returns the @code{404 Not Found} HTTP error code. Otherwise it returns the file contents, and the @var{Content-Type} header is set to @samp{text/plain}. @node Requesting a log @section Requesting a log A client might request a task log by sending a HTTP GET request to the @indicateurl{https://someserver/@var{id}/log} URL, where @var{id} is the numerical task id returned in the @var{X-Task-Id} field by @indicateurl{https://someserver/create}. If the @var{id} is not in the valid format, or the task @var{id} does not exist, the server returns the @code{404 Not Found} HTTP error code. The client request must contain the @var{X-Task-Password} field, and its content must match the password stored in the @file{/var/spool/abrt-retrace/@var{id}/password} file. If the password is not valid, the server returns the @code{403 Forbidden} HTTP error code. If the file @file{/var/spool/abrt-retrace/@var{id}/retrace-log} does not exist, the server returns the @code{404 Not Found} HTTP error code. Otherwise it returns the file contents, and the @var{Content-Type} header is set to @samp{text/plain}. @node Limiting traffic @section Limiting traffic The maximum number of simultaneously running tasks is limited to 5 by the server. The limit is changeableby the @var{MaxParallelTasks} option in the server configuration file. If a new request comes when the server is fully occupied, the server returns the @code{503 Service Unavailable} HTTP error code. The archive extraction, chroot preparation, and gdb analysis is mostly limited by the hard drive size and speed. @node Retrace worker @chapter Retrace worker Retrace worker is a program (usually residing in @command{/usr/bin/abrt-retrace-worker}), which: @enumerate @item takes a task id as a parameter, and turns it into a directory containing a coredump @item determines which packages need to be installed from the coredump @item installs the packages in a newly created chroot environment together with @command{gdb} @item copies the coredump to the chroot environment @item runs @command{gdb} from inside the environment to generate a backtrace from the coredump @item copies the resulting backtrace from the environment to the directory @end enumerate The tasks reside in @file{/var/spool/abrt-retrace/@var{taskid}} directories. To determine which packages need to be installed, @command{abrt-retrace-worker} runs the @command{coredump2packages} tool. The tool reads build-ids from the coredump, and tries to find the best set of packages (epoch, name, version, release) matching the build-ids. Local yum repositories are used as the source of packages. GDB requirements are strict, and this is the reason why proper backtraces cannot be directly and reliably generated on systems whose software is updated: @itemize @item The exact binary which crashed needs to be available to GDB. @item All libraries which are linked to the binary need to be available in the same exact versions from the time of the crash. @item The binary plugins loaded by the binary or libraries via @code{dlopen} need to be present in proper versions. @item The files containing the debugging symbols for the binary and libraries (build-ids are used to find the pairs) need to be available to GDB. @end itemize The chroot environments are created and managed by @command{mock}, and they reside in @file{/var/lib/mock/@var{taskid}}. The retrace worker generates a mock configuration file and then invokes @command{mock} to create the chroot, and to run programs from inside the chroot. The chroot environment is populated by installing packages using @command{yum}. Package installation cannot be avoided, as GDB expects to operate on an installed system, and on crashes from that system. GDB uses plugins written in Python, that are shipped with packages (for example see @command{rpm -ql libstdc++}). Coredumps might be affected by @command{prelink}, which is used on Fedora to speed up dynamic linking by caching its results directly in binaries. The system installed by @command{mock} for the purpose of retracing doesn't use @command{prelink}, so the binaries differ between the system of origin and the mock environment. It has been tested that this is not an issue, but in the case some issue @uref{http://sourceware.org/ml/gdb/2009-05/msg00175.html, occurs} (GDB fails to work with a binary even if it's the right one), a bug should be filed on @code{prelink}, as its operation should not affect the area GDB operates on. No special care is taken to avoid the possibility that GDB will not run with the set of packages (fixed versions) as provided by coredump. It is expected that any combination of packages user might use in a released system satisfies the needs of some version of GDB. Yum selects the newest possible version which has its requirements satisfied. @node Task cleanup @chapter Task cleanup It is neccessary to watch and limit the resource usage of tasks for a retrace server to remain operational. This is performed by the @command{abrt-retrace-cleanup} tool. It is supposed that the server administrator sets @command{cron} to run the tool every hour. Tasks that were created more than 120 hours (5 days) ago are deleted. The limit can be changed by the @var{DeleteTaskAfter} option in the server configuration file. Coredumps are deleted when the retrace process is finished, and only backtraces, logs, and configuration remain available for every task until the cleanup. The @command{abrt-retrace-cleanup} checks the creation time and deletes the directories in @file{/var/spool/abrt-retrace/}. Tasks running for more than 1 hour are terminated and removed from the system. Tasks for which the @command{abrt-retrace-worker} crashed for some reason without marking the task as finished are also removed. @node Package repository @chapter Package repository Retrace server is able to support every Fedora release with all packages that ever made it to the updates and updates-testing repositories. In order to provide all that packages, a local repository needs to be maintained for every supported operating system. A repository with Fedora packages must be maintained locally on the server to provide good performance and to provide data from older packages already removed from the official repositories. Retrace server contains a tool @command{abrt-retrace-reposync}, which is a package downloader scanning Fedora servers for new packages, and downloading them so they are immediately available. Older versions of packages are regularly deleted from the updates and updates-testing repositories. Retrace server supports older versions of packages, as this is one of major pain-points that the retrace server is supposed to solve. The @command{abrt-retrace-reposync} downloads packages from Fedora repositories, and it does not delete older versions of the packages. The retrace server administrator is supposed to call this script using cron approximately every 6 hours. The script uses @command{rsync} to get the packages and @command{createrepo} to generate respository metadata. The packages are downloaded to a local repository in @file{/var/cache/abrt-retrace/}. The location can be changed via the @var{RepoDir} option in the server configuration file. @node Traffic and load estimation @chapter Traffic and load estimation 2500 bugs are reported from ABRT every month. Approximately 7.3% from that are Python exceptions, which don't need a retrace server. That means that 2315 bugs need a retrace server. That is 77 bugs per day, or 3.3 bugs every hour on average. Occasional spikes might be much higher (imagine a user that decided to report all his 8 crashes from last month). We should probably not try to predict if the monthly bug count goes up or down. New, untested versions of software are added to Fedora, but on the other side most software matures and becomes less crashy. So let's assume that the bug count stays approximately the same. Test crashes (see why we use @code{`xz -2`} to compress coredumps): @itemize @item firefox with 7 tabs (random pages opened), coredump size 172 MB @itemize @item xz compression @itemize @item compression level 6 (default): compression took 32.5 sec, compressed size 5.4 MB, decompression took 2.7 sec @item compression level 3: compression took 23.4 sec, compressed size 5.6 MB, decompression took 1.6 sec @item compression level 2: compression took 6.8 sec, compressed size 6.1 MB, decompression took 3.7 sec @item compression level 1: compression took 5.1 sec, compressed size 6.4 MB, decompression took 2.4 sec @end itemize @item gzip compression @itemize @item compression level 9 (highest): compression took 7.6 sec, compressed size 7.9 MB, decompression took 1.5 sec @item compression level 6 (default): compression took 2.6 sec, compressed size 8 MB, decompression took 2.3 sec @item compression level 3: compression took 1.7 sec, compressed size 8.9 MB, decompression took 1.7 sec @end itemize @end itemize @item thunderbird with thousands of emails opened, coredump size 218 MB @itemize @item xz compression @itemize @item compression level 6 (default): compression took 60 sec, compressed size 12 MB, decompression took 3.6 sec @item compression level 3: compression took 42 sec, compressed size 13 MB, decompression took 3.0 sec @item compression level 2: compression took 10 sec, compressed size 14 MB, decompression took 3.0 sec @item compression level 1: compression took 8.3 sec, compressed size 15 MB, decompression took 3.2 sec @end itemize @item gzip compression @itemize @item compression level 9 (highest): compression took 14.9 sec, compressed size 18 MB, decompression took 2.4 sec @item compression level 6 (default): compression took 4.4 sec, compressed size 18 MB, decompression took 2.2 sec @item compression level 3: compression took 2.7 sec, compressed size 20 MB, decompression took 3 sec @end itemize @end itemize @item evince with 2 pdfs (1 and 42 pages) opened, coredump size 73 MB @itemize @item xz compression @itemize @item compression level 2: compression took 2.9 sec, compressed size 3.6 MB, decompression took 0.7 sec @item compression level 1: compression took 2.5 sec, compressed size 3.9 MB, decompression took 0.7 sec @end itemize @end itemize @item OpenOffice.org Impress with 25 pages presentation, coredump size 116 MB @itemize @item xz compression @itemize @item compression level 2: compression took 7.1 sec, compressed size 12 MB, decompression took 2.3 sec @end itemize @end itemize @end itemize So let's imagine there are some users that want to report their crashes approximately at the same time. Here is what the retrace server must handle: @enumerate @item 2 OpenOffice crashes @item 2 evince crashes @item 2 thunderbird crashes @item 2 firefox crashes @end enumerate We will use the xz archiver with the compression level 2 on the ABRT's side to compress the coredumps. So the users spend 53.6 seconds in total packaging the coredumps. The packaged coredumps have 71.4 MB, and the retrace server must receive that data. The server unpacks the coredumps (perhaps in the same time), so they need 1158 MB of disk space on the server. The decompression will take 19.4 seconds. Several hundred megabytes will be needed to install all the required packages and debuginfos for every chroot (8 chroots 1 GB each = 8 GB, but this seems like an extreme, maximal case). Some space will be saved by using a debuginfofs. Note that most applications are not as heavyweight as OpenOffice and Firefox. @node Security @chapter Security The retrace server communicates with two other entities: it accepts coredumps form users, and it downloads debuginfos and packages from distribution repositories. @menu * Clients:: * Packages and debuginfo:: @end menu General security from GDB flaws and malicious data is provided by chroot. The GDB accesses the debuginfos, packages, and the coredump from within the chroot, unable to access the retrace server's environment. We should consider setting a disk quota to every chroot directory, and limit the GDB access to resources using cgroups. SELinux policy exists for both the retrace server's HTTP interface, and for the retrace worker. @node Clients @section Clients The clients, which are using the retrace server and sending coredumps to it, must fully trust the retrace server administrator. The server administrator must not try to get sensitive data from client coredumps. That seems to be a major bottleneck of the retrace server idea. However, users of an operating system already trust the OS provider in various important matters. So when the retrace server is operated by the operating system provider, that might be acceptable by users. We cannot avoid sending clients' coredumps to the retrace server, if we want to generate quality backtraces containing the values of variables. Minidumps lower the quality of the resulting backtraces, while not improving user security. Can the retrace server trust clients? We must know what can a malicious client achieve by crafting a nonstandard coredump, which will be processed by server's GDB. We should ask GDB experts about this. Another question is whether we can allow users providing some packages and debuginfo together with a coredump. That might be useful for users, who run the operating system only with some minor modifications, and they still want to use the retrace server. So they send a coredump together with a few nonstandard packages. The retrace server uses the nonstandard packages together with the OS packages to generate the backtrace. Is it safe? We must know what can a malicious client achieve by crafting a special binary and debuginfo, which will be processed by server's GDB. As for attacker trying to steal users' backtraces from the retrace server, the passwords protecting the backtraces in the @var{X-Task-Password} header are random alphanumeric (@samp{[a-zA-Z0-9]}) sequences 22 characters long. 22 alphanumeric characters corresponds to 128 bit password, because @samp{[a-zA-Z0-9]} is 62 characters, and @math{2^{128}} < @math{62^{22}}. The source of randomness is @file{/dev/urandom}. @node Packages and debuginfo @section Packages and debuginfo We can safely download packages and debuginfo from the distribution, as the packages are signed by the distribution, and the package origin can be verified. When the debuginfo server is done, the retrace server can safely use it, as the data will also be signed. @node Future work @chapter Future work @section Coredump stripping Jan Kratochvil: With my test of OpenOffice.org presentation kernel core file has 181MB, xz -2 of it has 65MB. According to `set target debug 1' GDB reads only 131406 bytes of it (incl. the NOTE segment). @section Supporting other architectures Three approaches: @itemize @item Use GDB builds with various target architectures: gdb-i386, gdb-ppc64, gdb-s390. @item Run @uref{http://wiki.qemu.org/download/qemu-doc.html#QEMU-User-space-emulator, QEMU user space emulation} on the server @item Run @code{abrt-retrace-worker} on a machine with right architecture. Introduce worker machines and tasks, similarly to Koji. @end itemize @section Use gdbserver instead of uploading whole coredump GDB's gdbserver cannot process coredumps, but Jan Kratochvil's can: @verbatim git://git.fedorahosted.org/git/elfutils.git branch: jankratochvil/gdbserver src/gdbserver.c * Currently threading is not supported. * Currently only x86_64 is supported (the NOTE registers layout). @end verbatim @section User management for the HTTP interface Multiple authentication sources (x509 for RHEL). @section Make all files except coredump optional on the input Make @file{architecture}, @file{release}, @file{packages} files, which must be included in the package when creating a task, optional. Allow uploading a coredump without involving tar: just coredump, coredump.gz, or coredump.xz. @section Handle non-standard packages (provided by user) This would make retrace server very vulnerable to attacks, it never can be enabled in a public instance. @section Support vmcores See @uref{https://fedorahosted.org/cas/, Core analysis system}, its features etc. @section Do not refuse new tasks on a fully loaded server Consider using @uref{http://git.fedorahosted.org/git/?p=kobo.git, kobo} for task management and worker handling (master/slaves arch). @section Support synchronous operation Client sends a coredump, and keeps receiving the server response message. The server response HTTP body is generated and sent gradually as the task is performed. Client can choose to stop receiving the response body after getting all headers and ask the server for status and backtrace asynchronously. The server re-sends the output of abrt-retrace-worker (its stdout and stderr) to the response the body. In addition, a line with the task status is added in the form @code{X-Task-Status: PENDING} to the body every 5 seconds. When the worker process ends, either @samp{FINISHED_SUCCESS} or @samp{FINISHED_FAILURE} status line is sent. If it's @samp{FINISHED_SUCCESS}, the backtrace is attached after this line. Then the response body is closed. @section Provide task estimation time The response to the @code{/create} action should contain a header @var{X-Task-Est-Time}, that contains a number of seconds the server estimates it will take to generate the backtrace The algorithm for the @var{X-Task-Est-Time} time estimation should take the previous analyses of coredumps with the same corresponding package name into account. The server should store simple history in a SQLite database to know how long it takes to generate a backtrace for certain package. It could be as simple as this: @itemize @item initialization step one: @code{CREATE TABLE package_time (id INTEGER PRIMARY KEY AUTOINCREMENT, package, release, time)}; we need the @var{id} for the database cleanup - to know the insertion order of rows, so the @code{AUTOINCREMENT} is important here; the @var{package} is the package name without the version and release numbers, the @var{release} column stores the operating system, and the @var{time} is the number of seconds it took to generate the backtrace @item initialization step two: @code{CREATE INDEX package_release ON package_time (package, release)}; we compute the time only for single package on single supported OS release per query, so it makes sense to create an index to speed it up @item when a task is finished: @code{INSERT INTO package_time (package, release, time) VALUES ('??', '??', '??')} @item to get the average time: @code{SELECT AVG(time) FROM package_time WHERE package == '??' AND release == '??'}; the arithmetic mean seems to be sufficient here @end itemize So the server knows that crashes from an OpenOffice.org package take 5 minutes to process in average, and it can return the value 300 (seconds) in the field. The client does not waste time asking about that task every 20 seconds, but the first status request comes after 300 seconds. And even when the package changes (rebases etc.), the database provides good estimations after some time anyway (@ref{Task cleanup} chapter describes how the data are pruned). @section Keep the database with statistics small The database containing packages and processing times should also be regularly pruned to remain small and provide data quickly. The cleanup script should delete some rows for packages with too many entries: @enumerate @item get a list of packages from the database: @code{SELECT DISTINCT package, release FROM package_time} @item for every package, get the row count: @code{SELECT COUNT(*) FROM package_time WHERE package == '??' AND release == '??'} @item for every package with the row count larger than 100, some rows most be removed so that only the newest 100 rows remain in the database: @itemize @item to get highest row id which should be deleted, execute @code{SELECT id FROM package_time WHERE package == '??' AND release == '??' ORDER BY id LIMIT 1 OFFSET ??}, where the @code{OFFSET} is the total number of rows for that single package minus 100 @item then all the old rows can be deleted by executing @code{DELETE FROM package_time WHERE package == '??' AND release == '??' AND id <= ??} @end itemize @end enumerate @section Support Fedora Rawhide When the @command{abrt-retrace-reposync} is used to sync with the Rawhide repository, unneeded packages (where a newer version exists) must be removed after residing one week with the newer package in the same repository. @bye