summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * eventscript: store from_user and script_list inside state structureRusty Russell2009-12-081-3/+5
| | | | | | | | | | | | | | | | | | | | This means all the state about running the scripts is in that structure, which helps in the next patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 020fd21e0905e7f11400f6537988645987f2bb32)
| * eventscript: use direct script state pointer for current monitorRusty Russell2009-12-082-26/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We put a "scripts" member in ctdb_event_script_state, rather than using a special struct for monitor events. This will fit better as we further unify the different events, and holds the reports from the child process running each monitor script. Rather than making the monitor state a child of current_monitor_status_ctx, we just point current_monitor directly at it. This means we need to reset that pointer in the destructor for ctdb_event_script_state. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)
| * eventscript: make current_monitor_status_ctx serve as monitor_event_script_ctxRusty Russell2009-12-082-38/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have monitor_event_script_ctx and other_event_script_ctx, and current_monitor_status_ctx in struct ctdb_context. This seems more complex than it needs to be. We use a single "event_script_ctx" as parent for all event script state structures. Then we explicitly reparent monitor events under current_monitor_status_ctx: this is freed every script invocation to kill off any running scripts anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)
| * eventscript: split ctdb_run_event_script into multiple partsRusty Russell2009-12-071-62/+88
| | | | | | | | | | | | | | | | | | | | Simple refactoring in preparation for switching to one-child-per-script. We also call the functions run by the child process "child_". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit bfee777faff75e9bed4aedc1558957483616a6d3)
| * eventscript: hoist work out of child process, into parentRusty Russell2009-12-071-22/+24
| | | | | | | | | | | | | | | | | | | | | | This is the start of a move towards finer-grained reporting, with one child per script. Simple code motion to do sanity check and get the list of scripts before fork(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 816b9177f51ae5b21b92ff4a404f548fe9723c96)
| * eventscript: don't make ourselves healthy if we're under ban_countRusty Russell2009-12-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If we've timed out, but we've not timed out more than ctdb->tunable.script_ban_count, we pretend we haven't. There's a logic bug in the way this is done: if we were unhealthy before, this would set us to "healthy" again (status == 0). I don't think this would happen in real life, but it's a little surprising. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6488c0e05bab5c4c2c0a6370930b0b27e5ed56e)
| * eventscript: handle banning within the callbacksRusty Russell2009-12-074-41/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the timeout handler in eventscript.c does the banning if a timeout happens. However, because monitor events are different, it has to special case them. As we call the callback anyway in this case, we should make that handle -ETIME as it sees fit: for everyone but the monitor event, we simply ban ourselves. The more complicated monitor event banning logic is now in ctdb_monitor.c where it belongs. Note: I wrapped the other bans in "if (status == -ETIME)", though they should probably ban themselves on any error. This change should be a noop. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5)
| * eventscript: expost ctdb_ban_self()Rusty Russell2009-12-073-16/+17
| | | | | | | | | | | | | | | | | | eventscript.c uses this now, but our next patch makes others use it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)
| * eventscript: handle v. unlikely timeout raceRusty Russell2009-12-071-0/+1
| | | | | | | | | | | | | | | | | | | | If we time out just as the child exits, we currently will report an uninitialized cb_status field. Set it to -ETIME as expected. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 024386931bda9757079f206238ae09bae4de6ea2)
| * eventscript: replace other -1 returns with -errnoRusty Russell2009-12-071-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This completes our "problem with script" reporting; we never set cb_status to -1 on error. Real errnos are used where the failure is a system call (eg. read, setpgid), otherwise -EIO is used if we couldn't communicate with the parent. The latter case is a bit useless, since the parent probably won't see the error anyway, but it's neater. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 1269458547795c90d544371332ba1de68df29548)
| * eventscript: simplify ctdb_run_event_script loopRusty Russell2009-12-071-13/+3
| | | | | | | | | | | | | | | | | | | | If we break, we avoid cut & paste code inside the loop. Need to initialize ret to 0 for the "no scripts" case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit ec36ced9446da7e3bf866466d265ee8e18f606c1)
| * eventscript: handle and report generic stat/execution errorsRusty Russell2009-12-072-19/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than ignoring deleted event scripts (or pretending that they were "OK"), and discarding other stat errors, we save the errno and turn it into a negative status. This gives us a bit more information if we can't execute a script (eg. too many symlinks or other weird errors). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 5d894e1ae5228df6bbe4fc305ccba19803fa3798)
| * eventscript: use -ENOEXEC for disabled status valueRusty Russell2009-12-076-66/+8
| | | | | | | | | | | | | | | | | | | | This unifies code paths and simplifies things: we just hand -ENOEXEC to ctdb_ctrl_event_script_stop(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
| * eventscript: enhance script delete race checkRusty Russell2009-12-071-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | We currently assume 127 == script removed. The script can also return 127; best to re-check the execution status in this case (and for 126, which will happen if the script is non-executable). If the script is no longer executable/not present, we ignore it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0a53d6b5ac81daf0efa32f35e7758ede2a5bdb63)
| * eventscript: check_executable() to centralize stat/perm checksRusty Russell2009-12-071-11/+32
| | | | | | | | | | | | | | | | | | This is used later in the "script vanished" check. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8ddb97040842375daf378cbb5816d0c2b031fa65)
| * talloc: save errno over talloc_freeRusty Russell2009-12-071-1/+6
| | | | | | | | | | | | | | | | | | | | As we start to use errno more, it's a huge pain if talloc_free() can blatt it (esp. destructors). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 76a0ca77feba14e1e1162c195ffbdf516e62aa4d)
| * eventscript: use -ETIME for timeout status valueRusty Russell2009-12-073-11/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | This starts the move toward more expressive encoding of return values: positive values mean the script ran, negative means we had a problem with the script (and the value is the errno). This does timeout, but changes the ctdb tool to recognize it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
| * eventscript: marshall onto last_status immediatelyRusty Russell2009-12-072-35/+26
| | | | | | | | | | | | | | | | | | | | This simplifies the code a little: last_status is now read to go (it's only used by the scriptstatus command at the moment). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)
| * eventscript: reduce code duplication for ending a script, and fix bugRusty Russell2009-12-021-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 50c2caed57c0 removed a gratuitous talloc_steal from the code in ctdb_control_event_script_finished(), but not ctdb_event_script_timeout(). Easiest to call ctdb_control_event_script_finished() at the bottom of the timeout routine. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 17fa252d0d6981fbae8083a818f26d5ce9c5102e)
* | Bond devices can have any name the user configures, soRonnie Sahlberg2009-12-091-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | when checking link status for an interface, first check if this interface is in fact a bond device (by the precense of a /proc/net/bonding/IFACE file) and use that file for checking status. Othervise assume ib* is an infiniband interface which we donnt know how to check, or otherwise it is an ethernet interface and ethtool should hopefully work. (This used to be ctdb commit 8cc6c5de3d7abb0b72eaa6e769e70963b02d84cb)
* | make sure to also check that interfaces used for NATGW are okRonnie Sahlberg2009-12-091-0/+1
| | | | | | | | | | | | | | and have a link. if not the node should become unhealthy (This used to be ctdb commit 03b5bbaae1b53830a4cd20d3079ab8f45ffce923)
* | events/50.samba: only use wbinfo --ping-dc if availableStefan Metzmacher2009-12-081-1/+6
|/ | | | | | metze (This used to be ctdb commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b)
* version 1.0.108Ronnie Sahlberg2009-12-071-1/+10
| | | | (This used to be ctdb commit fff280878e670e93a818c0071f3172056214e8c4)
* Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way ↵Ronnie Sahlberg2009-12-071-1/+1
| | | | | | to determine if winbindd is in a useful state. (This used to be ctdb commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6)
* packaging: package tests/bin/ctdb_transaction under /usr/share/doc/tests/binMichael Adam2009-12-041-0/+5
| | | | | | | | For testing/diagnostic purposes. Michael (This used to be ctdb commit b796d736946856abfbe53de95dfcd73072ee8ccd)
* client: improve two error messages in ctdb_transaction_commit().Michael Adam2009-12-041-2/+10
| | | | | | Michael (This used to be ctdb commit d971b2ca84c0451dc7e5acbf4a5ade06270a2044)
* server:trans2_commit: move the check for active recovery down.Michael Adam2009-12-041-5/+5
| | | | | | | | | | | This needs to be done after the control-dispatcher: In the TRANS2_COMMIT control, the client->db_id needs to be set before bailing out, since otherwise the next TRANS2_COMMIT_RETRY will fail... Michael (This used to be ctdb commit 59faf3f923a5989b5ee94ef02a12827412775bae)
* client: increase the number of commit retries 10-->100Michael Adam2009-12-041-1/+1
| | | | | | | | | To cope with timeouts when recoveries and transactions collide. Maybe 100 is too high. Michael (This used to be ctdb commit c23d804165e84bdf95ba960c953c736d361011d7)
* client: untangle checks and produce more detailed error messagesMichael Adam2009-12-041-1/+13
| | | | | | | | in ctdb_transaction_fetch_start Michael (This used to be ctdb commit 428914377851a98b3fc893798783fbfebffc1c0d)
* client: increase the rsn of the __transaction_lock__ when storingMichael Adam2009-12-041-0/+2
| | | | | | | | | So that it is correctly handled by recoveries. Also explicitly set the dmaster field to the current node's pnn. Michael (This used to be ctdb commit 03a5bb727b9db1ba952632f08ceb5355f0df842d)
* recovery: add special pull-logic for persistent databasesMichael Adam2009-12-041-4/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | The decision mechanism which records of a persistent db are to be pulled into the recdb during recovery is now as follows: * Usually a record with the higher rsn than that already stored is taken. (Just as for normal tdbs.) * If a transaction is running on some node, then those nodes copies of all records are taken and are not overwritten later by other nodes' copies. In order to keep track of whether a record's copy was obtained from a node with a transaction running, the recovery mechanism misuses the ctdb tdb header field 'lacount' in the recdb. It is cleared later when pushing out the recdb database to the other nodes. This way, an incomplete transaction is not spoiled when a recovery interrupts and the replay should usually succeed (possibly after a few retries). Michael (This used to be ctdb commit 8aef46d2aab3efb322dda51eaa202653cefd5222)
* make ctdb_ctrl_transaction_active public.Michael Adam2009-12-042-3/+8
| | | | | | Michael (This used to be ctdb commit e5496a83ef4a01604195b27c4b97f50d4979510e)
* recovery: for persistent db's don't set the dmaster to the recmaster node numberMichael Adam2009-12-041-1/+3
| | | | | | | | | It is important to keep track of the dmaster (i.e. the node that last committed a transaction containing changes to this node). Michael (This used to be ctdb commit fe68972eb9cf3aa1f16ba1aacf57ade5d66e647c)
* recovery: pass the persistent flag to recover_database()Michael Adam2009-12-041-6/+16
| | | | | | | | | | | | and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07)
* tests:ctdb_transaction: print an extra counters when a commit failsMichael Adam2009-12-041-0/+1
| | | | | | Michael (This used to be ctdb commit 4113385865f53a57b18ea752a7dad8a08bed588e)
* client: in catdb, print the keyname first, and separate records by a blank lineMichael Adam2009-12-041-3/+5
| | | | | | Michael (This used to be ctdb commit b9882710e12f28c96a0af298e419160f00578241)
* packaging: remove the lib/popt from the tarball in debian modeMichael Adam2009-12-041-0/+1
| | | | | | | | Debian CTDB packaging fails when this is included. Michael (This used to be ctdb commit 574702f8d701fe3e493b31948420b2981eb36f93)
* packaging: rework maketarball.sh to accept an arbitrary githas to packMichael Adam2009-12-041-28/+36
| | | | | | | | | | | | | The githash can be specified through the environment variable "GITHASH" that can contain a commit hash or a tag name, e.g. The call syntax is now [GITHASH=xyz] [USE_GITHASH=yes/no] [DEBIAN_MODE=yes/no] maketarball.sh Michael (This used to be ctdb commit 41aa9bdfa2934f564bdc14374362437dfad0045f)
* ctdb: add command "ctdb wipedb" to wipe the contents of an attached tdbMichael Adam2009-12-041-0/+163
| | | | | | Michael (This used to be ctdb commit 5a7c1e7f15693522bbf1c39a53be2304ece9a134)
* tests: turn printfs into DEBUG statements in the ctdb_transaction testMichael Adam2009-12-041-18/+18
| | | | | | Michael (This used to be ctdb commit 0e130d79ab71cf3aa65c40af91866823246a0283)
* Merge branch 'status-test-2'Martin Schwenke2009-12-0417-81/+102
|\ | | | | | | (This used to be ctdb commit 5fc297a6bd49d9366703eef3edb9bdf0fe8505cc)
| * Eventscripts: Fix syntax error in 00.ctdb.Martin Schwenke2009-12-011-0/+1
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9ea261f791ab919eb1ce5b37073b4f1d30694bb8)
| * Eventscripts: Remove executable bit accidently set on some scripts.Martin Schwenke2009-12-013-0/+0
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4c6e68ae942c05224c5f8b683fbc2dc1adced8ee)
| * Eventscript argument cleanups and introduction of ctdb_standard_event_handler.Martin Schwenke2009-12-0117-81/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions file no longer causes a side-effect by doing a shift. It also doesn't set a convenience variable for $1. All eventscripts now explicitly use "$1" in their case statement, as does the initscript. The absence of a shift means that the takeip/releaseip events now explicitly reference $2-$4 rather than $1-$3. New function ctdb_standard_event_handler handles the status and setstatus events, and exits for either of those events. It is called via a default case in each eventscript, replacing an explicit status case where applicable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3d55408cbbb3bb71670b80f3dad5639ea0be5b5b)
* | Dont store debug level DEBUG_DEBUG in the in-memory ringbuffer.Ronnie Sahlberg2009-12-042-2/+2
| | | | | | | | | | | | | | | | It is unlikely we will need something this verbose for normal troubleshooting. This allows us to keep a significantly longer time interval of log messages in the 500k slots available in the ringbuffer. (This used to be ctdb commit cc99c05c0c6484ad574039a454e6133852cb41fa)
* | Use statically allocated ringbuffer to store the last 500k log entriesRonnie Sahlberg2009-12-041-11/+9
| | | | | | | | | | | | | | in memory instead of dynamically allocated ones so that we reduce the pressure on malloc/free. (This used to be ctdb commit c5cbb95512f034abeec515579983bf7ac55eadd9)
* | Document the procedure to remove/change the NATGW configuration atRonnie Sahlberg2009-12-043-10/+73
| | | | | | | | | | | | runtime without restarting the ctdb service (This used to be ctdb commit 0a0526e03ef995b6b6634f5b75c7a17cb7b5df8f)
* | lower the loglevel for the message that a client has attached to a ↵Ronnie Sahlberg2009-12-021-1/+1
| | | | | | | | | | | | persistent database (This used to be ctdb commit 2027cf3881ba890648c543bacbfd5b06464efc10)
* | lower the loglevel for the message that a client has attached through a ↵Ronnie Sahlberg2009-12-021-1/+1
| | | | | | | | | | | | domian socket (This used to be ctdb commit de9e5236b20d70eac5ed29991703d6d25a103963)
* | Add a proper function to process a process-exist control in the daemon.Ronnie Sahlberg2009-12-023-1/+42
| | | | | | | | | | | | | | | | | | | | This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed. If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw. bz58185 (This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)