doc/design


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155

	Design goals

We want to catch kernel oopses, binary program crashes (coredumps)
and interpreted languages crashes (Python exceptions, maybe more
in the future).

We want to support the following use cases:

* Home/office user with minimal administration

In this scenario, user expects that abrt will work "out of the box"
with minimal configuration. It will be sufficient if crashes
just show a GUI notification, and user can invoke a GUI tool
to process the crash and report it to bugzilla etc.

The configuration (like bugzilla address, username, password)
needs to be done via GUI dialogs from the same GUI tool.

* Standalone server

The server is installed by an admin. It may lack GUI.
Admin is willing to do somewhat more complex configuration.
Crashes should be recorded, and either processed at once
or reported to the admin by email etc. Admin may log in
and manually request crash(es) to be processed and reported,
using GUI or CLI tools.

* Mission critical servers, server farms etc.

Admins are expected to be competent and willing to set up complex
configurations. They might want to avoid any complex crash processing
on the servers - for example, it does not make much sense and/or
can be considered insecure to download debuginfo packages
to such servers. Admins may want to send "raw" crash dumps
to a dedicated server(s) for processing (backtrace, etc).


	Design

Abrt design should be flexible enough to accomodate all
of the above usage scenarios.

Since currently we do not know how to dump oops on demand,
we can only poll for it. There is a small daemon, abrt-dump-oops,
which polls syslog file and saves oopses when it sees them.
The oops dump is written into /var/spool/abrt/DIR.
[TODO? abrt-dump-oops spawns "abrt-handle-crashdump -d /var/spool/abrt/DIR"
which processes it according to configuration in /etc/abrt/*.conf]

In order to catch binary crashes, we install a handler for it
in /proc/sys/kernel/core_pattern (by setting it to
"|/usr/libexec/abrt-hook-ccpp /var/spool/abrt ....").
When process dumps core, the dump is written into /var/spool/abrt/DIR.
[TODO? after this, abrt-hook-ccpp spawns "abrt-handle-crashdump
-d /var/spool/abrt/DIR"]
Then abrt-hook-ccpp terminates.

When python program crashes, it invokes internal python subroutine
which connects to abrtd via /var/run/abrt/abrt.socket and sends
crash data to abrtd. abrtd creates dump dir /var/spool/abrt/DIR.

abrtd daemon watches /var/spool/abrt for new directories.
When new directory is noticed, abrtd runs "post-create" event
on it, and emits a dbus signal. This dbus signal is used
to inform online users about the new crash: if abrt-applet
is running in X session, it sees the signal and starts blinking.

[The above scheme is somewhat suboptimal. It's stupid that abrtd
uses inotify to watch for crashes. Instead the programs which create crashes
can trigger their initial ("post-create") processing themselves]

Crashes conceptually go through "events" in their lives.
Apart from "post-create" event decribed above, they may have
"analyze" event, "report[_FOO]" events,
and arbitrarily-named other events.
abrt-handle-crashdump tool can be used to "run" an event on a directory,
or to query list of possible events for a directory.
/etc/abrt/abrt_event.conf file describes what should be done on each event.

When user (admin) wants to see the list of dumped crashes and
process them, he runs abrt-gui or abrt-cli. These programs
perform a dbus call to "com.redhat.abrt" on a system dbus.
If there is no program with this name on it, dbus autostart
will invoke abrtd, which registers "com.redhat.abrt"
and processes the call(s).

The key dbus calls served by abrtd are:

- GetCrashInfos(): returns a vector_map_crash_data_t (vector_map_vector_string_t)
     of crashes for given uid
     v[N]["executable"/"uid"/"kernel"/"backtrace"][N] = "contents"
- CreateReport(/var/spool/abrt/DIR): starts creating a report.
     Returns job id (uint64).
     Then abrtd run "analyze" event on the DIR.
     After it completes, when report creation thread has finished,
     JobDone(client_dbus_ID,/var/spool/abrt/DIR) dbus signal is emitted.
  [Problem: how to do privilegged plugin specific actions?]
    Solution: if plugin needs an access to some root only accessible dir then
    abrt should be run by root anyway
    - debuginfo gets installed using pk-debuginfo-install, which cares about
    privileges itself, so no problem here
- GetJobResult(/var/spool/abrt/DIR): returns map_crash_data_t (map_vector_string_t)
- Report(map_crash_data_t (map_vector_string_t)):
     "Please report this crash": 
     abrtd run "report[_FOO]" event(s) on the DIR.
     Returns report_status_t (map_vector_string_t) - the status of each event
- DeleteDebugDump(/var/spool/abrt/DIR): delete /var/spool/abrt/DIR. Returns bool

[Note: we are in the process of reducing/eliminating
dbus communication between abrt-gui/abrt-cli and abrtd.
It seems we will be able to reduce dbus messages to "crash occurred"
signal and "DeleteDebugDump(DIR)" call]


	Development plan

Since current code does not match the planned design, we need to gradually
change the code to "morph" it into the desired shape.

Done:

* Make abrtd dbus startable.
* Add -t TIMEOUT_SEC option to abrtd.
* Make abrt-gui start abrtd on demand, so that abrt-gui can be started
  even if abrtd does not run at the moment.
* make kerneloops plugin into separate daemon (convert it to a hook
  and get rid of "cron plugins" which are wrong idea since the begining)
* make C/C++ hook to be started by init script
* add "include FILE" feature to abrt_event.conf

Planned steps:

* make kerneloops plugin into separate service?
* hooks will start the daemon on-demand using dbus
  - this is something I'm not sure if it's good idea, but dbus is becoming
    to be "un-installable" on Fedora, it's probably ok
* simplify abrt.conf:
  - move all plugin related info to plugins/<plugin>.conf
    - enabled, action association, etc ...
  - make abrtd to parse plugins/*.conf and set the config options
    that it understand
  - this will fix the case when this is in abrt.conf

    [Cron]
    KerneloopsScanner = 120

    because this should be in plugins/kerneloops.conf
    and thus shouldn't exist if kerneloops-addon is
    not installed
* ???
* ???
* ???
* ???
* ???
* Take over the world