summaryrefslogtreecommitdiffstats
path: root/doc/design
blob: 6074457bd2d98669c6424e6a8c75b21bef785ebe (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
	Design goals

We want to catch kernel oopses, binary program crashes (coredumps)
and interpreted languages crashes (Python exceptions, maybe more
in the future).

We want to support the following use cases:

* Home/office user with minimal administration

In this scenario, user expects that abrt will work "out of the box"
with minimal configuration. It will be sufficient if crashes
just show a GUI notification, and user can invoke a GUI tool
to process the crash and report it to bugzilla etc.

The configuration (like bugzilla address, username, password)
needs to be done via GUI dialogs from the same GUI tool.

* Standalone server

The server is installed by an admin. It may lack GUI.
Admin is willing to do somewhat more complex configuration.
Crashes should be recorded, and either processed at once
or reported to the admin by email etc. Admin may log in
and manually request crash(es) to be processed and reported,
using GUI or CLI tools.

* Mission critical servers, server farms etc.

Admins are expected to be competent and willing to set up complex
configurations. They might want to avoid any complex crash processing
on the servers - for example, it does not make much sense and/or
can be considered insecure to download debuginfo packages
to such servers. Admins may want to send "raw" crash dumps
to a dedicated server(s) for processing (backtrace, etc).


	Design

Abrt design should be flexible enough to accomodate all
of the above usage scenarios.

The description below is not what abrt does now.
It is (currently incomplete) design notes on how we want
it to achieve design goals.

Since currently we do not know how to dump oops on demand,
we can only poll for it. There is a small daemon which polls
kernel message buffer and dumps oopses when it sees them.
The dump is written into /var/spool/abrt/DIR.
After this, daemon spawns "abrt-process -d /var/spool/abrt/DIR"
which processes it according to configuration in /etc/abrt/*.conf.

In order to catch binary crashes, we install a handler for it
in /proc/sys/kernel/core_pattern (by setting it to
"|/usr/libexec/abrt-hook-ccpp /var/spool/abrt %p %s %u").
When process dumps core, the dump is written into /var/spool/abrt/DIR.
After this, abrt-hook-ccpp spawns "abrt-process -d /var/spool/abrt/DIR"
and terminates.

When python program crashes, it invokes internel python subroutine
which dumps crash info into ~/abrt/spool/DIR.
[this is a tentative plan, currently we dump in /var/spool/abrt/DIR]
After this, it spawns "abrt-process -d ~/abrt/spool/DIR"
and terminates.

[Problem: dumping to /var/spool/abrt/DIR needs world-writable
/var/spool/abrt and allows user to go way over his
disk quota. Dumping to ~/abrt/spool/DIR makes it difficult
to present a list of all crashes which happened on the machine -
for example, root-owned processes cannot even access user data
in ~user/* if /home is on NFS4...
]

When user (admin) wants to see the list of dumped crashes and
process them, he runs abrt-gui or abrt-cli. These programs
perform a dbus call to "com.redhat.abrt" on a system dbus.
If there is no program with this name on it, dbus autostart
will invoke "abrt-process", which registers "com.redhat.abrt"
and processes the call(s).

abrt-process will terminate after a timeout (a few minutes)
if no new dbus calls are arriving to it.

The key dbus calls served by abrt-process are:

- GetCrashInfos(): returns a vector_map_crash_data_t (vector_map_vector_string_t)
     of crashes for given uid
     v[N]["executable"/"uid"/"kernel"/"backtrace"][N] = "contents"
[see above the problem with producing this list]
- CreateReport(UUID): starts creating a report for /var/spool/abrt/DIR with this UUID.
     Returns job id (uint64).
     After it returns, when report creation thread has finished,
     JobDone(client_dbus_ID,UUID) dbus signal is emitted.
  [Problem: how to do privilegged plugin specific actions?]
    Solution: if plugin needs an access to some root only accessible dir then
    abrt should be run by root anyway
    - debuginfo gets installed using pk-debuginfo-install, which cares about
    privileges itself, so no problem here
- GetJobResult(UUID): returns map_crash_data_t (map_vector_string_t)
- Report(map_crash_data_t (map_vector_string_t)):
     "Please report this crash": calls Report() of all registered reporter plugins
     Returns report_status_t (map_vector_string_t) - the status of each call
- DeleteDebugDump(UUID): delete corresponding /var/spool/abrt/DIR. Returns bool


	Development plan

Since current code does not match the planned design, we need to gradually
change the code to "morph" it into the desired shape.

Done:

* Make abrtd dbus startable.
* Add -t TIMEOUT_SEC option to abrtd. {done}
* Make abrt-gui start abrtd on demand, so that abrt-gui can be started
  even if abrtd does not run at the moment. (doesn't work in some cases!)

Planned steps:

* make kerneloops plugin into separate daemon (convert it to a hook
  and get rid of "cron plugins" which are wrong idea since the begining)
  - and make it to the service (write an initscript)
* make C/C++ hook to be started by init script
  - init scritp would run ccpp-hook --init whic shoudl just set the core_pattern, which is now done by the C analyzer plugin
* hooks will start the daemon on-demand using dbus
  - this is something I'm not sure if it's good idea, but dbus is becoming
    to be "un-installable" on Fedora, it's probably ok
* simplify abrt.conf:
  - move all plugin related info to plugins/<plugin>.conf
    - enabled, action association, etc ...
  - make abrtd to parse plugins/*.conf and set the config options
    that it understand
  - this will fix the case when this is in abrt.conf

    [Cron]
    KerneloopsScanner = 120

    because this should be in plugins/kerneloops.conf
    and thus shouldn't exist if kerneloops-addon is
    not installed
* ???
* ???
* ???
* ???
* ???
* Take over the world