summaryrefslogtreecommitdiffstats
path: root/server/parser/README.parser
blob: d156b968b605e924a51901bd59e3abf8131e884f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
**
**  rteval_parsed - the rteval XML report parser
**

The purpose of the daemon is to off load the web server from the heavy duty
work of parsing and processing the rteval XML reports.  The XML-RPC server
will receive the reports and put the files in a queue directory on the
file system and register the the submission in the database.  This will notify
the rteval-parsed that a new report has been received and it will start
processing that file independently of the web/XML-RPC server.


** Installing the software

If you install this application from source, please read the
README.xmlrpc file for more information about the building and
installation process.

Please read the README.xmlrpc file also for setting up and preparing
the database which the rteval-parserd program will be using.  This
file will also contain information regarding upgrading the database.

If you are installing this application from a binary package, like RPM
files on Fedora/RHEL based boxes, you should have the rteval-parserd
in your $PATH and the needed file for the parsing (xmlparser.xsl)
should be installed in a directory rteval-parserd where look for it by
default.


** Configure rteval-parsed

This daemon uses the same configuration file as the rest of the rteval program
suite, /etc/rteval.conf.  It will parse the section named 'xmlrpc_parser'.

The default values are:

  - xsltpath: /usr/share/rteval
    Defines where it can find the xmlparser.xsl XSLT template

  - db_server: localhost
    Which database server to connect to

  - db_port: 5432
    Which port to use for the database connection

  - database: rteval
    Which database to make use of.

  - db_username: rtevparser
    Which user name to use for the connection

  - db_password: rtevaldb_parser
    Which password to use for the authentication

  - reportdir: /var/lib/rteval/report
    Where to save the parsed reports

  - threads: 4
    Number of worker threads.  This defines how many reports you will
    process in parallel.  The recommended number here is the number
    of available CPU cores, as having a higher thread number often
    punishes the performance.

  - max_report_size: 2097152
    Maximum file size of reports which the parser will process.  The
    default value is 2MB.  The value must be given in bytes.  Remember
    that this value is per thread, and that XML and XSLT processing can
    be quite memory hungry.  If this value is set too high or you have too
    many worker threads, your system might become unresponsive for a while
    and the parser can be killed by the kernel (OOM).


** rteval-parserd arguments

  -d | --daemon                    Run as a daemon
  -l | --log        <log dest>     Where to put log data
  -L | --log-level  <verbosity>    What to log
  -f | --config     <config file>  Which configuration file to use
  -t | --threads    <num. threads> How many worker threads to start (def: 4)
  -h | --help                      This help screen

- Configuration file
By default the program will look for /etc/rteval.conf.  This can be
overridden by using --config <config file>.

- Logging
When the program is started as a daemon, it will log to syslog by default.
The default log level is 'info'.  When not started as a daemon, all logging
will go to stderr by default.

The --log argument takes either 'destination' or a file name.  Unknown
destinations are treated as filenames.  Valid 'destinations' are:

    stderr:             - Log to stderr
    stdout:             - Log to stdout
    syslog:[facility]   - Log to syslog
    <file name>         - Log to given file

For syslog the default facility is 'daemon', but can be overridden by using
one of the following facility values:
    daemon, user and local0 to local7

Log verbosity is set by the --log-level.  The valid values here are:

    emerg, emergency    - Only log errors which causes the program to stop
    alert               - Incidents which needs immediate attention
    crit, critical      - Unexpected incidents which is not urgent
    err, error          - Parsing errors.  Issues with input data
    warn, warning       - Incidents which may influence performance
    notice              - Less important warnings
    info                - General run information
    debug               - Detailed run information, incl. thread operation

- Threads
By default, the daemon will use five threads.  One for the main threads which
processes the submission queue and notifies the working threads.  The four
other threads are worker threads, which will process the received reports.

Each of the worker threads will have its own connection to the database.  This
connection will be connected to the database as long as the daemon is running.
It is therefore important that you do not have more worker threads than
available database connections.


** POSIX Message Queue

The daemon makes use of POSIX MQ for distributing work to the worker threads.
Each thread lives independently and polls the queue regularly for more work.
As the POSIX MQ has a pretty safe mechanism of not duplicating messages in the
implementation, no other locking facility is needed.

On Linux, the default value for maximum messages in the queue are set to 10.
If you receive a lot of reports and the threads do not process the queue
quickly enough, it will fill up pretty quickly.  If the queue is filled up,
the main thread which populates the message queue will politely go to sleep
for one minute before attempting to send new messages.  To avoid this, consider
to increase the queue size by modifying /proc/sys/fs/mqueue/msg_max.

When the daemon initialises itself, it will read this file to make sure it
uses the queue to the maximum, but not beyond that.


** PostgreSQL features

The daemon depends on the PostgreSQL database.  It is written with an
abstraction layer so it should, in theory, be possible to easily adopt it to
different database implementation.

In the current implementation, it makes use of PostgreSQL's LISTEN, NOTIFY and
UNLISTEN features.  A trigger is enabled on the submission queue table, which
sends a NOTIFY whenever a record is inserted into the table.  The rteval-parser
daemon listens for these notifications, and will immediately poll the table
upon such a notification.

Whenever a notification is received, it will always parse all unprocessed
reports.  In addition it will also only listen for notifications when there
are no unprocessed reports.

The core PostgreSQL implementation is only done in pgsql.[ch], which provides an
abstract API layer for the rest of the parser daemon.


** Submission queue status codes

In the rteval database's submissionqueue table there is a status field.  The
daemon will only consider records with status == 0 for processing.  It do not
consider any other fields.  For a better understanding of the different status
codes, look into the file statuses.h.