server/parser/README.parser


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204

**
**  rteval-parsed - the rteval XML report parser
**

The purpose of the daemon is to off load the web server from the heavy duty
work of parsing and processing the rteval XML reports.  The XML-RPC server
will receive the reports and put the files in a queue directory on the
file system and register the the submission in the database.  This will notify
the rteval-parsed that a new report has been received and it will start
processing that file independently of the web/XML-RPC server.


** Installing the software

  !! Please install also the rteval-xmlrpc package and read the !!
  !! README.xmlrpc file also for setting up and preparing the   !!
  !! database which the rteval-parserd program will be using.   !!
  !! This file will also contain information regardingupgrading !!
  !! the database.                                              !!

When installing this application from a binary package, like RPM
files on Fedora/RHEL based boxes, you should have the rteval-parserd
in your $PATH.  Otherwise, when installing from sources, the configure
script defines the default paths.


** Configure rteval-parsed

When starting the rteval-parserd via the init.d script (or via the 'service'
command on RHEL/Fedora distributions) it will use the values configured in
/etc/sysconfig/rteval-parserd.

The available parameters are:

    - NUM_THREADS
      When this is not defined, the default behaviour is to use the number
      of available CPU cores.  The init.d script will detect this
      automatically.

    - LOG
      This defines how logging will be done.  See the rteval-parserd
      arguments description further down in the document for more
      information.

    - LOGLEVEL
      Defines how verbose the logging will be.  See the rteval-parserd
      arguments description further down in the document for more
      information.

    - CONFIGFILE
      The default configuration file rteval-parserd will try to read is
      /etc/rteval.conf.  See the next paragraph for more information about
      this file.  This argument let you override the default config file.

    - PIDFILE
      Defines where the init.d script will put the PID file for the
      rteval-parserd process.  The default is /var/run/rteval-parserd.pid

This daemon uses the same configuration file as the rest of the rteval program
suite, /etc/rteval.conf.  It will parse the section named 'xmlrpc_parser'.

The default values are:

  - xsltpath: /usr/share/rteval
    Defines where it can find the xmlparser.xsl XSLT template

  - db_server: localhost
    Which database server to connect to

  - db_port: 5432
    Which port to use for the database connection

  - database: rteval
    Which database to make use of.

  - db_username: rtevparser
    Which user name to use for the connection

  - db_password: rtevaldb_parser
    Which password to use for the authentication

  - reportdir: /var/lib/rteval/report
    Where to save the parsed reports

  - threads: 4
    Number of worker threads.  This defines how many reports you will
    process in parallel.  The recommended number here is the number
    of available CPU cores, as having a higher thread number often
    punishes the performance.  The default value is 4 when rteval-parserd
    is started directly.  When started via the init.d script, the default
    is to start one thread per CPU core.

  - max_report_size: 2097152
    Maximum file size of reports which the parser will process.  The
    default value is 2MB.  The value must be given in bytes.  Remember
    that this value is per thread, and that XML and XSLT processing can
    be quite memory hungry.  If this value is set too high or you have too
    many worker threads, your system might become unresponsive for a while
    and the parser might be killed by the kernel (OOM).

  - measurement_tables: cyclic_statistics, cyclic_histogram, cyclic_rawdata
    Declares which measurement results will be parsed and stored in the
    database.  These names are referring to table definitions in the
    xmlparser.xsl XSLT template.  The definitions in this template tells
    rteval-parsed which data to extract from the rteval summary.xml report
    and where and how to store it in the database.


** rteval-parserd arguments

  -d | --daemon                    Run as a daemon
  -l | --log        <log dest>     Where to put log data
  -L | --log-level  <verbosity>    What to log
  -f | --config     <config file>  Which configuration file to use
  -t | --threads    <num. threads> How many worker threads to start (def: 4)
  -h | --help                      This help screen

- Configuration file
By default the program will look for /etc/rteval.conf.  This can be
overridden by using --config <config file>.

- Logging
When the program is started as a daemon, it will log to syslog by default.
The default log level is 'info'.  When not started as a daemon, all logging
will go to stderr by default.

The --log argument takes either 'destination' or a file name.  Unknown
destinations are treated as filenames.  Valid 'destinations' are:

    stderr:             - Log to stderr
    stdout:             - Log to stdout
    syslog:[facility]   - Log to syslog
    <file name>         - Log to given file

For syslog the default facility is 'daemon', but can be overridden by using
one of the following facility values:
    daemon, user and local0 to local7

Log verbosity is set by the --log-level.  The valid values here are:

    emerg, emergency    - Only log errors which causes the program to stop
    alert               - Incidents which needs immediate attention
    crit, critical      - Unexpected incidents which is not urgent
    err, error          - Parsing errors.  Issues with input data
    warn, warning       - Incidents which may influence performance
    notice              - Less important warnings
    info                - General run information
    debug               - Detailed run information, incl. thread operation

- Threads
By default, the daemon will use five threads.  One for the main threads which
processes the submission queue and notifies the working threads.  The four
other threads are worker threads, which will process the received reports.

Each of the worker threads will have its own connection to the database.  This
connection will be connected to the database as long as the daemon is running.
It is therefore important that you do not have more worker threads than
available database connections.


** POSIX Message Queue

The daemon makes use of POSIX MQ for distributing work to the worker threads.
Each thread lives independently and polls the queue regularly for more work.
As the POSIX MQ has a pretty safe mechanism of not duplicating messages in the
implementation, no other locking facility is needed.

On Linux, the default value for maximum messages in the queue are set to 10.
If you receive a lot of reports and the threads do not process the queue
quickly enough, it will fill up pretty quickly.  If the queue is filled up,
the main thread which populates the message queue will politely go to sleep
for one minute before attempting to send new messages.  To avoid this, consider
to increase the queue size by modifying /proc/sys/fs/mqueue/msg_max.

When the daemon initialises itself, it will read this file to make sure it
uses the queue to the maximum, but not beyond that.


** PostgreSQL features

The daemon depends on the PostgreSQL database.  It is written with an
abstraction layer so it should, in theory, be possible to easily adopt it to
different database implementation.

In the current implementation, it makes use of PostgreSQL's LISTEN, NOTIFY and
UNLISTEN features.  A trigger is enabled on the submission queue table, which
sends a NOTIFY whenever a record is inserted into the table.  The rteval-parser
daemon listens for these notifications, and will immediately poll the table
upon such a notification.

Whenever a notification is received, it will always parse all unprocessed
reports.  In addition it will also only listen for notifications when there
are no unprocessed reports.

The core PostgreSQL implementation is only done in pgsql.[ch], which provides an
abstract API layer for the rest of the parser daemon.


** Submission queue status codes

In the rteval database's submissionqueue table there is a status field.  The
daemon will only consider records with status == 0 for processing.  It do not
consider any other fields.  For a better understanding of the different status
codes, look into the file statuses.h.