doc/abrt-retrace-server.texi


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799

\input texinfo
@c abrt-retrace-server.texi - Retrace Server Documentation
@c
@c .texi extension is recommended in GNU Automake manual
@setfilename abrt-retrace-server.info
@include version.texi

@settitle Retrace server for ABRT @value{VERSION} Manual

@dircategory Retrace server
@direntry
* Retrace server: (retrace-server).  Remote coredump analysis via HTTP.
@end direntry

@titlepage
@title Retrace server
@subtitle for ABRT version @value{VERSION}, @value{UPDATED}
@author Karel Klic (@email{kklic@@redhat.com})
@page
@vskip 0pt plus 1filll
@end titlepage

@contents

@ifnottex
@node Top
@top Retrace server

This manual is for retrace server for ABRT version @value{VERSION},
@value{UPDATED}.  The retrace server provides coredump analysis and
backtrace generation service over a network using HTTP protocol.
@end ifnottex

@menu
* Overview::
* HTTP interface::
* Retrace worker::
* Task cleanup::
* Package repository::
* Traffic and load estimation::
* Security::
* Future work::
@end menu

@node Overview
@chapter Overview

Analyzing a program crash from a coredump is a difficult task. The GNU
Debugger (GDB), that is commonly used to analyze coredumps on free
operating systems, expects that the system analyzing the coredump is
identical to the system where the program crashed. Software updates
often break this assumption even on the system where the crash occured,
making the coredump analyzable only with significant
effort. Furthermore, older versions of software packages are often
removed from software repositories, including the packages with
debugging symbols, so the package with debugging symbols is often not
available when user needs to install it for coredump analysis. Packages
with the debugging symbols are large, requiring a lot of free space and
causing problems with downloading them via unreliable internet
connection.

Retrace server solves these problems for Fedora 14+ and RHEL 6+
operating systems, and allows developers to analyze coredumps without
having access to the machine where the crash occurred.

Retrace server is usually run as a service on a local network, or on
Internet. A user sends a coredump together with some additional
information to a retrace server. The server reads the coredump and
depending on its contents it installs necessary software dependencies to
create a software environment which is, from the GDB point of view,
identical to the environment where the crash happened. Then the server
runs GDB to generate a backtrace from the coredump and provides it back
to the user.

Core dumps generated on i386 and x86_64 architectures are supported
within a single x86_64 retrace server instance.

The retrace server consists of the following major parts:
@enumerate
@item
a HTTP interface, consisting of a set of scripts handling communication
with clients
@item
a retrace worker, doing the coredump processing, environment
preparation, and running the debugger to generate a backtrace
@item
a cleanup script, handling stalled retracing tasks and removing old data
@item
a package repository, providing the application binaries, libraries, and
debuginfo necessary for generating backtraces from coredumps
@end enumerate

@node HTTP interface
@chapter HTTP interface

@menu
* Creating a new task::
* Task status::
* Requesting a backtrace::
* Requesting a log::
* Limiting traffic::
@end menu

The client-server communication proceeds as follows:
@enumerate
@item
Client uploads a coredump to a retrace server. Retrace server creates a
task for processing the coredump, and sends the task ID and task
password in response to the client.
@item
Client asks server for the task status using the task ID and password.
Server responds with the status information (task finished successfully,
task failed, task is still running).
@item
Client asks server for the backtrace from a successfully finished task
using the task ID and password. Server sends the backtrace in response.
@item
Client asks server for a log from the finished task using the task ID
and password, and server sends the log in response.
@end enumerate

The HTTP interface application is a set of script written in Python,
using the @uref{http://www.python.org/dev/peps/pep-0333/, Python Web
Server Gateway Interface} (WSGI) to interact with a web server. The only
supported and tested configuration is the Apache HTTPD Server with
@uref{http://code.google.com/p/modwsgi/, mod_wsgi}.

Only secure (HTTPS) communication is allowed for communicating with a
public instance of retrace server, because coredumps and backtraces are
private data. Users may decide to publish their backtraces in a bug
tracker after reviewing them, but the retrace server doesn't do
that. The server is supposed to use HTTP persistent connections to to
avoid frequent SSL renegotiations.

@node Creating a new task
@section Creating a new task

A client might create a new task by sending a HTTP request to the
@indicateurl{https://server/create} URL, and providing an archive as the
request content. The archive contains crash data files. The crash data
files are a subset of some local @file{/var/spool/abrt/ccpp-time-pid}
directory contents, so the client must only pack and upload them.

The server supports uncompressed tar archives, and tar archives
compressed with gzip and xz. Uncompressed archives are the most
efficient way for local network delivery, and gzip can be used there as
well because of its good compression speed.

The xz compression file format is well suited for public server setup
(slow network), as it provides good compression ratio, which is
important for compressing large coredumps, and it provides reasonable
compress/decompress speed and memory consumption. See @ref{Traffic and
load estimation} for the measurements. The @uref{http://tukaani.org/xz/,
XZ Utils} implementation with the compression level 2 is used to
compress the data.

The HTTP request for a new task must use the POST method. It must
contain a proper @var{Content-Length} and @var{Content-Type} fields. If
the method is not POST, the server returns the @code{405 Method Not
Allowed} HTTP error code. If the @var{Content-Length} field is missing,
the server returns the @code{411 Length Required} HTTP error code. If an
@var{Content-Type} other than @samp{application/x-tar},
@samp{application/x-gzip}, @samp{application/x-xz} is used, the server
returns the @code{415 unsupported Media Type} HTTP error code. If the
@var{Content-Length} value is greater than a limit set by
@var{MaxPackedSize} option in the server configuration file (50 MB by
default), or the real HTTP request size gets larger than the limit + 10
KB for headers, then the server returns the @code{413 Request Entity Too
Large} HTTP error code, and provides an explanation, including the
limit, in the response body. The limit is changeable from the server
configuration file.

If unpacking the archive would result in having the free disk space
under certain limit in the @file{/var/spool/abrt-retrace} directory, the
server returns the @code{507 Insufficient Storage} HTTP error code. The
limit is specified by the @var{MinStorageLeft} option in the server
configuration file, and it is set to 1024 MB by default.

If the data from the received archive would take more than 1024 MB of
disk space when uncompressed, the server returns the @code{413 Request
Entity Too Large} HTTP error code, and provides an explanation,
including the limit, in the response body. The size limit is changeable
by the @var{MaxUnpackedSize} option in the server configuration file. It
can be set pretty high because coredumps, that take most disk space, are
stored on the server only temporarily until the backtrace is
generated. When the backtrace is generated the coredump is deleted by
the @command{abrt-retrace-worker}, so most disk space is released.

The uncompressed data size for xz archives is obtained by calling
@code{`xz --list file.tar.xz`}. The @option{--list} option has been
implemented only recently, so updating @command{xz} on your server might
be necessary. Likewise, the uncompressed data size for gzip archives is
obtained by calling @code{`gzip --list file.tar.gz`}.

If an upload from a client succeeds, the server creates a new directory
@file{/var/spool/abrt-retrace/@var{id}} and extracts the
received archive into it. Then it checks that the directory contains all
the required files, checks their sizes, and then sends a HTTP
response. After that it spawns a subprocess with
@command{abrt-retrace-worker} on that directory.

The following files from the local crash directory are required to be
present in the archive: @file{coredump}, @file{executable},
@file{package}. If one or more files are not present in the archive, or
some other file is present in the archive, the server returns the
@code{403 Forbidden} HTTP error code.

If the file check succeeds, the server HTTP response has the @code{201
Created} HTTP code. The response includes the following HTTP header
fields:
@itemize
@item
@var{X-Task-Id} containing a new server-unique numerical
task id
@item
@var{X-Task-Password} containing a newly generated
password, required to access the result
@end itemize

The @var{X-Task-Password} is a random alphanumeric (@samp{[a-zA-Z0-9]})
sequence 22 characters long. The password is stored in the
@file{/var/spool/abrt-retrace/@var{id}/password} file, and passwords
sent by a client in subsequent requests are verified by comparing with
this file.

The task id is intentionally not used as a password, because it is
desirable to keep the id readable and memorable for
humans. Password-like ids would be a loss when an user authentication
mechanism is added, and server-generated password will no longer be
necessary.

@node Task status
@section Task status

A client might request a task status by sending a HTTP GET request to
the @indicateurl{https://someserver/@var{id}} URL, where @var{id} is the
numerical task id returned in the @var{X-Task-Id} field by
@indicateurl{https://someserver/create}. If the @var{id} is not in the
valid format, or the task @var{id} does not exist, the server returns
the @code{404 Not Found} HTTP error code.

The client request must contain the @var{X-Task-Password} field, and its
content must match the password stored in the
@file{/var/spool/abrt-retrace/@var{id}/password} file. If the password is
not valid, the server returns the @code{403 Forbidden} HTTP error code.

If the checks pass, the server returns the @code{200 OK} HTTP code, and
includes a field @var{X-Task-Status} containing one of the following
values: @samp{FINISHED_SUCCESS}, @samp{FINISHED_FAILURE},
@samp{PENDING}.

The field contains @samp{FINISHED_SUCCESS} if the file
@file{/var/spool/abrt-retrace/@var{id}/backtrace} exists. The client might
get the backtrace on the @indicateurl{https://someserver/@var{id}/backtrace}
URL. The log might be obtained on the
@indicateurl{https://someserver/@var{id}/log} URL, and it might contain
warnings about some missing debuginfos etc.

The field contains @samp{FINISHED_FAILURE} if the file
@file{/var/spool/abrt-retrace/@var{id}/backtrace} does not exist, and file
@file{/var/spool/abrt-retrace/@var{id}/retrace-log} exists. The retrace-log
file containing error messages can be downloaded by the client from the
@indicateurl{https://someserver/@var{id}/log} URL.

The field contains @samp{PENDING} if neither file exists. The client
should ask again after 10 seconds or later.

@node Requesting a backtrace
@section Requesting a backtrace

A client might request a backtrace by sending a HTTP GET request to the
@indicateurl{https://someserver/@var{id}/backtrace} URL, where @var{id}
is the numerical task id returned in the @var{X-Task-Id} field by
@indicateurl{https://someserver/create}. If the @var{id} is not in the
valid format, or the task @var{id} does not exist, the server returns
the @code{404 Not Found} HTTP error code.

The client request must contain the @var{X-Task-Password} field, and its
content must match the password stored in the
@file{/var/spool/abrt-retrace/@var{id}/password} file. If the password
is not valid, the server returns the @code{403 Forbidden} HTTP error
code.

If the file @file{/var/spool/abrt-retrace/@var{id}/backtrace} does not
exist, the server returns the @code{404 Not Found} HTTP error code.
Otherwise it returns the file contents, and the @var{Content-Type}
header is set to @samp{text/plain}.

@node Requesting a log
@section Requesting a log

A client might request a task log by sending a HTTP GET request to the
@indicateurl{https://someserver/@var{id}/log} URL, where @var{id} is the
numerical task id returned in the @var{X-Task-Id} field by
@indicateurl{https://someserver/create}. If the @var{id} is not in the
valid format, or the task @var{id} does not exist, the server returns
the @code{404 Not Found} HTTP error code.

The client request must contain the @var{X-Task-Password} field, and its
content must match the password stored in the
@file{/var/spool/abrt-retrace/@var{id}/password} file. If the password
is not valid, the server returns the @code{403 Forbidden} HTTP error
code.

If the file @file{/var/spool/abrt-retrace/@var{id}/retrace-log} does not
exist, the server returns the @code{404 Not Found} HTTP error code.
Otherwise it returns the file contents, and the @var{Content-Type}
header is set to @samp{text/plain}.

@node Limiting traffic
@section Limiting traffic

The maximum number of simultaneously running tasks is limited to 5 by
the server. The limit is changeableby the @var{MaxParallelTasks} option
in the server configuration file. If a new request comes when the server
is fully occupied, the server returns the @code{503 Service Unavailable}
HTTP error code.

The archive extraction, chroot preparation, and gdb analysis is
mostly limited by the hard drive size and speed.

@node Retrace worker
@chapter Retrace worker

Retrace worker is a program (usually residing in
@command{/usr/bin/abrt-retrace-worker}), which:
@enumerate
@item
takes a task id as a parameter, and turns it into a directory containing
a coredump
@item
determines which packages need to be installed from the coredump
@item
installs the packages in a newly created chroot environment together
with @command{gdb}
@item
copies the coredump to the chroot environment
@item
runs @command{gdb} from inside the environment to generate a backtrace
from the coredump
@item
copies the resulting backtrace from the environment to the directory
@end enumerate

The tasks reside in @file{/var/spool/abrt-retrace/@var{taskid}}
directories.

To determine which packages need to be installed,
@command{abrt-retrace-worker} runs the @command{coredump2packages} tool.
The tool reads build-ids from the coredump, and tries to find the best
set of packages (epoch, name, version, release) matching the
build-ids. Local yum repositories are used as the source of
packages. GDB requirements are strict, and this is the reason why proper
backtraces cannot be directly and reliably generated on systems whose
software is updated:
@itemize
@item
The exact binary which crashed needs to be available to GDB.
@item
All libraries which are linked to the binary need to be available in the
same exact versions from the time of the crash.
@item
The binary plugins loaded by the binary or libraries via @code{dlopen}
need to be present in proper versions.
@item
The files containing the debugging symbols for the binary and libraries
(build-ids are used to find the pairs) need to be available to GDB.
@end itemize

The chroot environments are created and managed by @command{mock}, and
they reside in @file{/var/lib/mock/@var{taskid}}. The retrace worker
generates a mock configuration file and then invokes @command{mock} to
create the chroot, and to run programs from inside the chroot.

The chroot environment is populated by installing packages using
@command{yum}. Package installation cannot be avoided, as GDB expects to
operate on an installed system, and on crashes from that system. GDB
uses plugins written in Python, that are shipped with packages (for
example see @command{rpm -ql libstdc++}).

Coredumps might be affected by @command{prelink}, which is used on
Fedora to speed up dynamic linking by caching its results directly in
binaries. The system installed by @command{mock} for the purpose of
retracing doesn't use @command{prelink}, so the binaries differ between
the system of origin and the mock environment. It has been tested that
this is not an issue, but in the case some issue
@uref{http://sourceware.org/ml/gdb/2009-05/msg00175.html, occurs}
(GDB fails to work with a binary even if it's the right one), a bug
should be filed on @code{prelink}, as its operation should not affect
the area GDB operates on.

No special care is taken to avoid the possibility that GDB will not run
with the set of packages (fixed versions) as provided by coredump. It is
expected that any combination of packages user might use in a released
system satisfies the needs of some version of GDB. Yum selects the
newest possible version which has its requirements satisfied.

@node Task cleanup
@chapter Task cleanup

It is neccessary to watch and limit the resource usage of tasks for a
retrace server to remain operational. This is performed by the
@command{abrt-retrace-cleanup} tool. It is supposed that the server
administrator sets @command{cron} to run the tool every hour.

Tasks that were created more than 120 hours (5 days) ago are
deleted. The limit can be changed by the @var{DeleteTaskAfter} option in
the server configuration file. Coredumps are deleted when the retrace
process is finished, and only backtraces, logs, and configuration remain
available for every task until the cleanup. The
@command{abrt-retrace-cleanup} checks the creation time and deletes the
directories in @file{/var/spool/abrt-retrace/}.

Tasks running for more than 1 hour are terminated and removed from the
system. Tasks for which the @command{abrt-retrace-worker} crashed for
some reason without marking the task as finished are also removed.

@node Package repository
@chapter Package repository

Retrace server is able to support every Fedora release with all packages
that ever made it to the updates and updates-testing repositories. In
order to provide all that packages, a local repository needs to be
maintained for every supported operating system.

A repository with Fedora packages must be maintained locally on the
server to provide good performance and to provide data from older
packages already removed from the official repositories. Retrace server
contains a tool @command{abrt-retrace-reposync}, which is a package
downloader scanning Fedora servers for new packages, and downloading
them so they are immediately available.

Older versions of packages are regularly deleted from the updates and
updates-testing repositories. Retrace server supports older versions of
packages, as this is one of major pain-points that the retrace server is
supposed to solve.

The @command{abrt-retrace-reposync} downloads packages from Fedora
repositories, and it does not delete older versions of the packages. The
retrace server administrator is supposed to call this script using cron
approximately every 6 hours. The script uses @command{rsync} to get the
packages and @command{createrepo} to generate respository metadata.

The packages are downloaded to a local repository in
@file{/var/cache/abrt-retrace/}. The location can be changed via the
@var{RepoDir} option in the server configuration file.

@node Traffic and load estimation
@chapter Traffic and load estimation

2500 bugs are reported from ABRT every month. Approximately 7.3%
from that are Python exceptions, which don't need a retrace
server. That means that 2315 bugs need a retrace server. That is 77
bugs per day, or 3.3 bugs every hour on average. Occasional spikes
might be much higher (imagine a user that decided to report all his 8
crashes from last month).

We should probably not try to predict if the monthly bug count goes up
or down. New, untested versions of software are added to Fedora, but
on the other side most software matures and becomes less crashy.  So
let's assume that the bug count stays approximately the same.

Test crashes (see why we use @code{`xz -2`} to compress coredumps):
@itemize
@item
firefox with 7 tabs (random pages opened), coredump size 172 MB
@itemize
@item
xz compression
@itemize
@item
compression level 6 (default): compression took 32.5 sec, compressed
size 5.4 MB, decompression took 2.7 sec
@item
compression level 3: compression took 23.4 sec, compressed size 5.6 MB,
decompression took 1.6 sec
@item
compression level 2: compression took 6.8 sec, compressed size 6.1 MB,
decompression took 3.7 sec
@item
compression level 1: compression took 5.1 sec, compressed size 6.4 MB,
decompression took 2.4 sec
@end itemize
@item
gzip compression
@itemize
@item
compression level 9 (highest): compression took 7.6 sec, compressed size
7.9 MB, decompression took 1.5 sec
@item
compression level 6 (default): compression took 2.6 sec, compressed size
8 MB, decompression took 2.3 sec
@item
compression level 3: compression took 1.7 sec, compressed size 8.9 MB,
decompression took 1.7 sec
@end itemize
@end itemize
@item
thunderbird with thousands of emails opened, coredump size 218 MB
@itemize
@item
xz compression
@itemize
@item
compression level 6 (default): compression took 60 sec, compressed size
12 MB, decompression took 3.6 sec
@item
compression level 3: compression took 42 sec, compressed size 13 MB,
decompression took 3.0 sec
@item
compression level 2: compression took 10 sec, compressed size 14 MB,
decompression took 3.0 sec
@item
compression level 1: compression took 8.3 sec, compressed size 15 MB,
decompression took 3.2 sec
@end itemize
@item
gzip compression
@itemize
@item
compression level 9 (highest): compression took 14.9 sec, compressed
size 18 MB, decompression took 2.4 sec
@item
compression level 6 (default): compression took 4.4 sec, compressed size
18 MB, decompression took 2.2 sec
@item
compression level 3: compression took 2.7 sec, compressed size 20 MB,
decompression took 3 sec
@end itemize
@end itemize
@item
evince with 2 pdfs (1 and 42 pages) opened, coredump size 73 MB
@itemize
@item
xz compression
@itemize
@item
compression level 2: compression took 2.9 sec, compressed size 3.6 MB,
decompression took 0.7 sec
@item
compression level 1: compression took 2.5 sec, compressed size 3.9 MB,
decompression took 0.7 sec
@end itemize
@end itemize
@item
OpenOffice.org Impress with 25 pages presentation, coredump size 116 MB
@itemize
@item
xz compression
@itemize
@item
compression level 2: compression took 7.1 sec, compressed size 12 MB,
decompression took 2.3 sec
@end itemize
@end itemize
@end itemize

So let's imagine there are some users that want to report their
crashes approximately at the same time. Here is what the retrace
server must handle:
@enumerate
@item
2 OpenOffice crashes
@item
2 evince crashes
@item
2 thunderbird crashes
@item
2 firefox crashes
@end enumerate

We will use the xz archiver with the compression level 2 on the ABRT's
side to compress the coredumps. So the users spend 53.6 seconds in
total packaging the coredumps.

The packaged coredumps have 71.4 MB, and the retrace server must
receive that data.

The server unpacks the coredumps (perhaps in the same time), so they
need 1158 MB of disk space on the server. The decompression will take
19.4 seconds.

Several hundred megabytes will be needed to install all the
required packages and debuginfos for every chroot (8 chroots 1 GB each
= 8 GB, but this seems like an extreme, maximal case). Some space will
be saved by using a debuginfofs.

Note that most applications are not as heavyweight as OpenOffice and
Firefox.

@node Security
@chapter Security

The retrace server communicates with two other entities: it accepts
coredumps form users, and it downloads debuginfos and packages from
distribution repositories.

@menu
* Clients::
* Packages and debuginfo::
@end menu

General security from GDB flaws and malicious data is provided by
chroot. The GDB accesses the debuginfos, packages, and the coredump from
within the chroot under a non-root user, unable to access the retrace
server's environment.

@c We should consider setting a disk quota to every chroot directory,
@c and limit the GDB access to resources using cgroups.

SELinux policy exists for both the retrace server's HTTP interface, and
for the retrace worker.

@node Clients
@section Clients

It is expected that the clients, which are using the retrace server and
sending coredumps to it, trust the retrace server administrator. The
server administrator must not try to get sensitive data from client
coredumps. This is a major bottleneck of the retrace server. However,
users of an operating system already trust the operating system provider
in various important matters. So when the retrace server is operated by
the OS provider, that might be acceptable for users.

Sending clients' coredumps to the retrace server cannot be avoided if we
want to generate good backtraces containing the values of
variables. Minidumps lower the quality of the resulting backtraces,
while not improving user security.

A malicious client can craft a nonstandard coredump, which will be
processed by server's GDB. GDB handles malformed coredumps well.

Users can never be allowed to provide custom packages/debuginfo together
with a coredump. Packages need to be installed to the environment, and
installing untrusted programs is insecure.

As for attacker trying to steal users' backtraces from the retrace
server, the passwords protecting the backtraces in the
@var{X-Task-Password} header are random alphanumeric
(@samp{[a-zA-Z0-9]}) sequences 22 characters long. 22 alphanumeric
characters corresponds to 128 bit password, because @samp{[a-zA-Z0-9]}
is 62 characters, and @math{2^{128}} < @math{62^{22}}. The source of
randomness is @file{/dev/urandom}.

@node Packages and debuginfo
@section Packages and debuginfo

Packages and debuginfo are safely downloaded from the distribution
repositories, as the packages are signed by the distribution, and the
package origin is verified.

When the debuginfo filesystem server is done, the retrace server can
safely use it, as the data will also be signed.

@node Future work
@chapter Future work

@section Coredump stripping
Jan Kratochvil: With my test of OpenOffice.org presentation kernel core
file has 181MB, xz -2 of it has 65MB.  According to `set target debug 1'
GDB reads only 131406 bytes of it (incl. the NOTE segment).

@section Supporting other architectures
Three approaches:
@itemize
@item
Use GDB builds with various target architectures: gdb-i386, gdb-ppc64,
gdb-s390.
@item
Run
@uref{http://wiki.qemu.org/download/qemu-doc.html#QEMU-User-space-emulator,
QEMU user space emulation} on the server
@item
Run @code{abrt-retrace-worker} on a machine with right
architecture. Introduce worker machines and tasks, similarly to Koji.
@end itemize

@section Use gdbserver instead of uploading whole coredump
GDB's gdbserver cannot process coredumps, but Jan Kratochvil's can:
@verbatim
git://git.fedorahosted.org/git/elfutils.git
branch: jankratochvil/gdbserver
  src/gdbserver.c
   * Currently threading is not supported.
   * Currently only x86_64 is supported (the NOTE registers layout).
@end verbatim

@section User management for the HTTP interface
Multiple authentication sources (x509 for RHEL).

@section Make all files except coredump optional on the input
Make @file{architecture}, @file{release}, @file{packages} files, which
must be included in the package when creating a task, optional. Allow
uploading a coredump without involving tar: just coredump, coredump.gz,
or coredump.xz.

@section Handle non-standard packages (provided by user)
This would make retrace server very vulnerable to attacks, it never can
be enabled in a public instance.

@section Support vmcores
See @uref{https://fedorahosted.org/cas/, Core analysis system}, its
features etc.

@section Do not refuse new tasks on a fully loaded server
Consider using @uref{http://git.fedorahosted.org/git/?p=kobo.git, kobo}
for task management and worker handling (master/slaves arch).

@section Support synchronous operation
Client sends a coredump, and keeps receiving the server response
message. The server response HTTP body is generated and sent gradually
as the task is performed. Client can choose to stop receiving the
response body after getting all headers and ask the server for status
and backtrace asynchronously.

The server re-sends the output of abrt-retrace-worker (its stdout and
stderr) to the response the body. In addition, a line with the task
status is added in the form @code{X-Task-Status: PENDING} to the body
every 5 seconds. When the worker process ends, either
@samp{FINISHED_SUCCESS} or @samp{FINISHED_FAILURE} status line is
sent. If it's @samp{FINISHED_SUCCESS}, the backtrace is attached after
this line. Then the response body is closed.

@section Provide task estimation time
The response to the @code{/create} action should contain a header
@var{X-Task-Est-Time}, that contains a number of seconds the server
estimates it will take to generate the backtrace

The algorithm for the @var{X-Task-Est-Time} time estimation
should take the previous analyses of coredumps with the same
corresponding package name into account. The server should store
simple history in a SQLite database to know how long it takes to
generate a backtrace for certain package. It could be as simple as
this:
@itemize
@item
  initialization step one: @code{CREATE TABLE package_time (id INTEGER
  PRIMARY KEY AUTOINCREMENT, package, release, time)}; we need the
  @var{id} for the database cleanup - to know the insertion order of
  rows, so the @code{AUTOINCREMENT} is important here; the @var{package}
  is the package name without the version and release numbers, the
  @var{release} column stores the operating system, and the @var{time}
  is the number of seconds it took to generate the backtrace
@item
  initialization step two: @code{CREATE INDEX package_release ON
  package_time (package, release)}; we compute the time only for single
  package on single supported OS release per query, so it makes sense to
  create an index to speed it up
@item
  when a task is finished: @code{INSERT INTO package_time (package,
  release, time) VALUES ('??', '??', '??')}
@item
  to get the average time: @code{SELECT AVG(time) FROM package_time
  WHERE package == '??' AND release == '??'}; the arithmetic mean seems
  to be sufficient here
@end itemize

So the server knows that crashes from an OpenOffice.org package
take 5 minutes to process in average, and it can return the value 300
(seconds) in the field. The client does not waste time asking about
that task every 20 seconds, but the first status request comes after
300 seconds. And even when the package changes (rebases etc.), the
database provides good estimations after some time anyway
(@ref{Task cleanup} chapter describes how the
data are pruned).

@section Keep the database with statistics small
The database containing packages and processing times should also be
regularly pruned to remain small and provide data quickly. The cleanup
script should delete some rows for packages with too many entries:
@enumerate
@item
get a list of packages from the database: @code{SELECT DISTINCT package,
release FROM package_time}
@item
for every package, get the row count: @code{SELECT COUNT(*) FROM
package_time WHERE package == '??' AND release == '??'}
@item
for every package with the row count larger than 100, some rows most be
removed so that only the newest 100 rows remain in the database:
@itemize
@item
to get highest row id which should be deleted, execute @code{SELECT id
FROM package_time WHERE package == '??' AND release == '??' ORDER BY id
LIMIT 1 OFFSET ??}, where the @code{OFFSET} is the total number of rows
for that single package minus 100
@item
then all the old rows can be deleted by executing @code{DELETE FROM
package_time WHERE package == '??' AND release == '??' AND id <= ??}
@end itemize
@end enumerate

@section Support Fedora Rawhide
When the @command{abrt-retrace-reposync} is used to sync with the
Rawhide repository, unneeded packages (where a newer version exists)
must be removed after residing one week with the newer package in the
same repository.

@bye