From 6024306168584d98c34544d9d90d7a9fefa9afb5 Mon Sep 17 00:00:00 2001 From: Rainer Gerhards Date: Fri, 18 Jan 2008 10:27:05 +0000 Subject: description of disk-assisted queue added --- doc/dev_queue.html | 226 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 226 insertions(+) create mode 100644 doc/dev_queue.html (limited to 'doc/dev_queue.html') diff --git a/doc/dev_queue.html b/doc/dev_queue.html new file mode 100644 index 00000000..c61ef942 --- /dev/null +++ b/doc/dev_queue.html @@ -0,0 +1,226 @@ + + +rsyslog queue object + + +

The rsyslog queue object

This page reflects the status as of 2008-01-17. The documentation is still incomplete. +Target audience is developers and users who would like to get an in-depth understanding of +queues as used in rsyslog.

Some definitions

A queue is DA-enabled if it is configured to use disk-assisted mode when +there is need to. A queue is in DA mode (or DA run mode), when it actually runs +disk assisted.

Implementation Details

Disk-Assisted Mode

Memory-Type queues may utilize disk-assisted (DA) mode. DA mode is enabled +whenever a queue file name prefix is provided. This is called DA-enabled mode. +If DA-enabled, the queue operates as a regular memory queue until a high water +mark is reached. If that happens, the queue activates disk assistance (called +"runs disk assisted" or "runs DA" - you can find that often in source file +comments). To do so, it creates a helper queue instance (the DA queue). At that +point, there are two queues running - the primary queue's consumer changes to a +shuffle-to-DA-queue consumer and the original primary consumer is assigned to +the DA queue. Existing and new messages are spooled to the disk queue, where the +DA worker takes them from and passes them for execution to the actual consumer. +In essence, the primary queue has now become a memory buffer for the DA queue. +The primary queue will be drained until a low water mark is reached. At that +point, processing is held. New messages enqueued to the primary queue will not +be processed but kept in memory. Processing resumes when either the high water +mark is reached again or the DA queue indicates it is empty. If the DA queue is +empty, it is shut down and processing of the primary queue continues as a +regular in-memory queue (aka "DA mode is shut down"). The whole thing iterates +once the high water mark is hit again.

There is one special case: if the primary queue is shut down and could not +finish processing all messages within the configured timeout periods, the DA +queue is instantiated to take up the remaining messages. These will be preserved +and be processed during the next run. During that period, the DA queue runs in +"enqueue-only" mode and does not execute any consumer. Draining the primary +queue is typically very fast. If that behaviour is not desired, it can be turned +of via parameters. In that case, any remaining in-memory messages are lost.

Due to the fact that when running DA two queues work closely together and +worker threads (including the DA worker) may shut down at any time (due to +timeout), processing synchronization and startup and shutdown are somewhat +complex. I'll outline the exact conditions and steps down here. I also do this +so that I know clearly what to develop to, so please be patient if the +information is a bit too in-depth ;)

DA Run Mode Initialization

Three cases:

any time during queueEnqObj() when the high water mark is hit
at queue startup if there is an on-disk queue present (presence of QI + file indicates presence of queue data)
at queue shutdown if remaining in-memory data needs to be persisted to + disk

In case 1, the worker pool is running. When switching to DA mode, all +regular workers are sent termination commands. The DA worker is initiated. +Regular workers may run in parallel to the DA worker until they terminate. +Regular workers shall terminate as soon as their current consumer has completed. +They shall not execute the DA consumer.

In case 2, the worker pool is not yet running and is NOT started. The +DA worker is initiated.

In case 3, the worker pool is already shut down. The DA worker is +initiated. The DA queue runs in enqueue-only mode.

In all cases, the DA worker starts up and checks if DA mode is already fully +initialized. If not, it initializes it, what most importantly means construction +of the queue.

Then, regular worker processing is carried out. That is, the queue worker +will wait on empty queue and terminate after an timeout. However, If any message +is received, the DA consumer is executed. That consumer checks the low water +mark. If the low water mark is reached, it stops processing until either the +high water mark is reached again or the DA queue indicates it is empty (there is +a pthread_cond_t for this synchronization).

In theory, a case-2 startup could lead to the worker becoming inactive +and terminating while waiting on the primary queue to fill. In practice, this is +highly unlikely (but only for the main message queue) because rsyslog issues a +startup message. HOWEVER, we can not rely on that, it would introduce a race. If +the primary rsyslog thread (the one that issues the message) is scheduled very +late and there is a low inactivty timeout for queue workers, the queue worker +may terminate before the startup message is issued. And if the on-disk queue +holds only a few messages, it may become empty before the DA worker is +re-initiated again. So it is possible that the DA run mode termination criteria +occurs while no DA worker is running on the primary queue.

In cases 1 and 3, the DA worker can never become inactive without hitting the +DA shutdown criteria. In case 1, it either shuffles messages from the +primary to the DA queue or it waits because it has the hit low water mark.

In case 3, it always shuffles messages between the queues (because, +that's the sole purpose of that run). In order for this to happen, the high +water mark has been set to the value of 1 when DA run mode has been initialized. +This ensures that the regular logic can be applied to drain the primary queue. +To prevent a hold due to reaching the low water mark, that mark must be changed +to 0 before the DA worker starts.

DA Run Mode Shutdown

In essence, DA run mode is terminated when the DA queue is empty and the +primary worker queue size is below the high water mark. It is also terminated +when the primary queue is shut down. The decision to switch back to regular +(non-DA) run mode is typically made by the DA worker. If it switches, the DA +queue is destructed and the regular worker pool is restarted. In some cases, the +queue shutdown process may initiate the "switch" (in this case more or less a +clean shutdown of the DA queue).

One might think that it would be more natural for the DA queue to detect +being idle and shut down itself. However, there are some issues associated with +that. Most importantly, all queue worker threads need to be shut down during +queue destruction. Only after that has happend, final destruction steps can +happen (else we would have a myriad of races). However, it is the DA queues +worker thread that detects it is empty (empty queue detection always happens at +the consumer side and must so). That would lead to the DA queue worker thread to +initiate DA queue destruction which in turn would lead to that very same thread +being canceled (because workers must shut down before the queue can be +destructed). Obviously, this is not place where it can be done. As such, the +process that enqueues messages must destruct the queue - and that is the primary +queue's DA worker thread.

There are some subleties due to thread synchronization and the fact that the +no DA consumer may be running (in a case-2 startup). So it is not trivial +to reliably change the queue back from DA run mode to regular run mode. The +priority is a clean switch. We accept the fact that there may be situations +where we cleanly shut down DA run mode, just to re-enable it with the very next +message being enqueued. While unlikely, this will happen from time to time and +is considered perfectly legal. We can't predict the future and it would +introduce too great complexity to try to do something against that (that would +most probably even lead to worse performance under regular conditions).

The primary queue's DA worker thread may wait at two different places:

after reaching the low water mark and waiting for either high water or + DA queue empty
at the regular pthread_cond_wait() on an empty primary queue

Case 2 is very unlikely, but may happen (see info above on a case 2 startup).

The DA worker may also not wait at all, because it is actively +executing and shuffeling messages between the queues. In that case, however, the +program code passes both of the 2 wait cases but simply does not wait.

Finally, the DA worker may be inactive (again, a case-2 startup). In +that case no work(er) at all is executed. Most importantly, without the DA +worker being active, nobody will ever detect the need to change back to regular +mode. If we have this situation, the very next message enqueued will cause the +switch, because then the DA run mode shutdown criteria is met. However, it may +take close to eternal for this message to arrive. During that time, disk and +memory resources for the DA queue remain allocated. This also leaves processing +in a sub-optimal state and it may take longer than necessary to switch back to +regular queue mode when a message burst happens. In extreme cases, this could +even lead to shutdown of DA run mode, which takes so long that the high water +mark is passed and DA run mode is immediately re-initialized - while with an +immediate switch, the message burst may have been able to be processed by the +in-memory queue without DA support.

So in short, it is desirable switch to regular run mode as soon as possible. +To do this, we need an active DA worker. The easy solution is to initiate DA +worker startup from the DA queue's worker once it detects empty condition. To do +so, the DA queue's worker must call into a "DA worker startup initiation" +routine inside the main queue. As a reminder, the DA worker will most probably +not receive the "DA queue empty" signal in that case, because it will be long +sent (in most cases) before the DA worker even waits for it. So it is vital +that DA run mode termination checks be done in the DA worker before it goes into +any wait condition.

Please note that the "DA worker startup initiation" routine may be +called concurrently from multiple initiators. To prevent a race, it must be +guarded by the queue mutex and return without any action (and no error +code!) if the DA worker is already initiated.

And now let's consider the case of primary queue destruction. During +destruction, our primary focus is on loosing as few messages as possible. If the +queue is not DA-enabled, there is nothing but the configured timeouts to handle +that situation. However, with a DA-enabled queue there are more options.

If the queue is DA-enabled, it may be configured to persist messages to +disk before it is terminated. In that case, loss of messages never occurs +(at the price of a potentially lenghty shutdown). Even if that setting is not +applied, the queue should drain as many messages as possible to the disk. For +that reason, it makes no sense to wait on a low water mark. Also, if the queue +is already in DA run mode, it does not make any sense to switch back to regular +run mode during termination and then try to process some messages via the +regular consumer. It is much more appropriate the try completely drain the queue +during the remaining timeout period. For the same reason, it is preferred that +no new consumers be activated (via the DA queue's worker), as they only cost +valuable CPU cycles and, more importantly, would potentially be long(er)-running +and possibly be needed to be cancelled. To prevent all of that, queue +parameters are changed for DA-enabled queues: the high water mark is to 1 +and the low water mark to 0 on the primary queue. The DA queue is commanded to +run in enqueue-only mode. If the primary queue is configured to persist +messages to disk before it is terminated, its SHUTDOWN timeout is changed to +to eternal. These parameters will cause the queue to drain as much as possible +to disk (and they may cause a case 3 DA run mode initiation). Please note that +once the primary queue has been drained, the DA queue's worker will +automatically switch back to regular (non-DA) run mode. It must be ensured +that no worker cancellation occurs during that switchback. Please note that +the queue may not swich back to regular run mode if it is not configured to +persist messages to disk before it is terminated. In order to apply the new +parameters, worker threads must be awakened. Remember we may not be in DA +run mode at this stage. In that case, the regular workers must be awakend, which +then will switch to DA run mode. No worker may be active, in that case one must +be initiated. If in DA run mode and the DA worker is incactive, the "DA +worker startup initiation" must be called to activate it. That routine +ensures only one DA worker ist started even with multiple concurrent callers - +this may be the case here. The DA queue's worker may have requested DA worker +startup in order to terminate on empty queue (which will probably not be honored +as we have changed the low water mark).

After all this is done, the queue destructor requests termination of the +queue's worker threads. It will use the normal timeouts and potentially cancel +too-long running worker threads. The shutdown process must ensure that all +workers reach running state before they are commanded to terminate. +Otherwise it may run into a race condition that could lead to a false shutdown +with workers running asynchronously. As a few workers may have just been started +to initialize (to apply new parameter settings), the probability for this race +condition is extremely high, especially on single-CPU systems.

After all workers have been shut down (or cancelled), the queue may still be +in DA run mode. If so, this must be terminated, which now can simply be done by +destructing the DA queue object.

Finanlly, the queue is mostly shut down and ready to be actually destructed. +As a last try, the queuePersists() entry point is called. It is used to persists +a non-DA-enabled queue in whatever way is possible for that queue. There may be +no implementation for the specific queue type. Please note that this is not just +a theoretical construct. This is an extremely important code path when the DA +queue itself is destructed. Remember that it is a queue object in its own right. +The DA queue is obviously not DA-enabled, so it calls into queuePersists() +during its destruction - this is what enables us to persist the disk queue!

After that point, left over queue resources (mutexes, dynamic memory, ...) +are freed and the queue object is actually destructed.

Copyright

Permission is granted to copy, distribute and/or modify this document under +the terms of the GNU Free Documentation License, Version 1.2 or any later +version published by the Free Software Foundation; with no Invariant Sections, +no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be +viewed at +http://www.gnu.org/copyleft/fdl.html.

+ + \ No newline at end of file -- cgit