summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* libctdb: implement ctdb_disconnect and ctdb_detachdbRusty Russell2010-06-184-12/+89
| | | | | | | | | | | These are important for testing, since we can easily tell if we leak memory if there are outstanding allocations after calling these. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 18a212aa40d0ff9ff59775c6fcf9dc973e991460)
* libctdb: fix io_elem resource leak on realloc failure.Rusty Russell2010-06-181-2/+4
| | | | | | | | | | | | Found by nfsim. I knew about this, but as we stop when it happens anyway I didn't fix it. But it bugs nfsim, so fix it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 936b02443d36306407d6a26e8037cf31e3190b32)
* libctdb: fix writerecord() to actually write the record.Rusty Russell2010-06-211-0/+2
| | | | | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 680ee6afaa89f21115a1bf33a8b9e7e92084a1a1)
* libctdb: ctdb_service() never returns < 0Rusty Russell2010-06-181-1/+1
| | | | | | | | | Found by ctdb-test. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0e8210f19edf2ae14154afb85d9b96951881f31f)
* libctdb: check ctdb_request_free & ctdb_cancel used appropriately.Rusty Russell2010-06-181-0/+15
| | | | | | | | | | | | | Since I made this mistake myself, we should check for it. We could have one function that does both, but from a user's point of view they are very different and it's quite possibly a bug if they think the request is finished/unfinished when it's not. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 70f6ed2634fb10749cdad3deffa96a1aa439c235)
* libctdb: synchronous should be using ctdb_cancel to kill unfinished requests.Rusty Russell2010-06-181-2/+6
| | | | | | | | | Found by ctdb-test. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit cd6b2f46075bfb64561496960af7fc2e95500e52)
* libctdb: fix uninitialized field usage on ctdb_attach failure pathRusty Russell2010-06-181-1/+1
| | | | | | | | | Found by ctdb-test. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 54c1036090d930c19231038ca861297153c1d0cf)
* libctdb: removed unused lock field from struct ctdb_dbRusty Russell2010-06-181-3/+0
| | | | | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 256653a223c48ed932ce85f89fc2c2dda14f8c27)
* config/interface_modify.sh: do the echo before running the scriptStefan Metzmacher2010-07-151-1/+1
| | | | | | | metze Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit bb1d2bd31073304fc203868517144f61d12b7fc2)
* config/interface_modify.sh: before calling a script check if it exists and ↵Stefan Metzmacher2010-07-151-0/+3
| | | | | | | | | | | | | | | | is executable For non bash shells $_s_script might end with '/*'. We do the workarround this way, because it makes sense to check that a script is executable, before trying to execute it. metze [ This actually applies to any shell -- Rusty Russell ] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e665cfde03fc9ec2264e99512ed5470872a2fd04)
* config: wrap iptables in flock to avoid concurrancy.Rusty Russell2010-07-151-0/+6
| | | | | | | | | | | | | | | | | | | When doing a releaseip event, we do them in parallel for all the separate IPs. This creates a problem for iptables, which isn't reentrant, giving the strange message: iptables encountered unknown error "18446744073709551615" while initializing table "filter" The worst possible symptom of this is that releaseip won't remove the rule which prevents us listening to clients during releaseip, and the node will be healthy but non-responsive. The simple workaround is to flock-wrap iptables. Better would be to rework the code so we didn't need to use iptables in these paths. CQ:S1018353 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 72d6914ee913272312d7b68f1be5ad05ad06587d)
* ctdb: fix crash on "ctdb scriptstatus --events=releaseip"Rusty Russell2010-07-121-0/+4
| | | | | | | | | | Martin accidentally typed this instead of "ctdb scriptstatus releaseip" and it crashes. CQ:S1018859 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 70877b2e7f8fd0d46899bbeca2c6caad6e6e6820)
* version: generate RPM version from gitRusty Russell2010-07-022-10/+30
| | | | | | | | | | | | | | | | | This unifies our RPM version handling, based on tags. 1) Tags are of form ctdb-<version>. 2) The first <version> starts with .1. 3) Devel versions end with .0.<patchnum>.<checksum>.devel to reliably identify them. This means that devel versions will correctly supersede releases and earlier devels, but new releases will correctly supersede older devel RPMs. Making a new release is as simple as creating a new git tag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 44009e02a661d4a1e14246f650974fc4ed7a07c9)
* Report client for queue errors.Rusty Russell2010-07-017-22/+52
| | | | | | | | | | We've been seeing "Invalid packet of length 0" errors, but we don't know what is sending them. Add a name for each queue, and print nread. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
* tdb: improve loggingRusty Russell2010-07-011-2/+3
| | | | | | | | When tdb throws an error, we didn't report the name of the tdb; we should. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit cfea357c9b2142c8cd8cac1ee712d40b188793e1)
* ctdb_freeze: extend db priority hack to cover serverid.tdb deadlock.Rusty Russell2010-07-011-2/+6
| | | | | | | | | | | | | | | | We discovered that recent smbd locks the serverid tdb while holding a lock on another tdb (locking.tdb): 7: POSIX ADVISORY WRITE smbd-2224318 locking.tdb.0 10600 10600 22: -> POSIX ADVISORY READ smbd-2224318 serverid.tdb.0 26580 26580 The result is a deadlock against the ctdb_freeze code called for recovery. We extend the "notify" workaround to this case, too. BZ:65158 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit dfdaa446cf256854ff6d267dceeb86fbee8bb188)
* speed startup: with --sloppy-start, cut initial election timeout to 1/2 second.Rusty Russell2010-06-221-0/+5
| | | | | | | | | | | Seconds between ctdbd first log message and node healthy: BEFORE: 4.03 AFTER: 2.02 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8f17731dea4287d4f9b21dc58c1bdf26c8a0e628)
* speed startup: add --sloppy-start.Rusty Russell2010-06-223-1/+4
| | | | | | | | | | | | | | | | | | The extra recovery interval wait was introduced in 821333afb458 but no explanation was provided in that message. Nonetheless, if starting the entire cluster for the first time, it should be safe to skip this. We use the commandline arg --sloppy-start which should discourage people from using it outside testing. Seconds between ctdbd first log message and node healthy: BEFORE: 16.10 AFTER: 4.03 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 509e2e89ae233a0e91998d95267bf62f296a73cd)
* speed startup: run startup immediately after recovery finished.Rusty Russell2010-06-221-1/+1
| | | | | | | | | | | Seconds between ctdbd first log message and node healthy: BEFORE: 17.08 AFTER: 16.10 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 372201d418f041d69646793105f6898ab12a7d91)
* speed startup: don't wait a full recovery interval if we've already waitedRusty Russell2010-06-221-3/+12
| | | | | | | | | | | | | | We currently sleep for one second, whether or not we've already slept. Change this to sleep for the remainder of the second, if at all. Seconds between ctdbd first log message and node healthy: BEFORE: 18.09 AFTER: 17.08 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9)
* speed startup: immediately run first monitor event after startup.Rusty Russell2010-06-221-1/+1
| | | | | | | | | | | | | | | Once we've done a startup, we need to run a monitor event successfully to be marked as healthy. Rather than wait the usual 5 seconds, run it immediately (which will then reset next_interval to 5 seconds). Seconds between ctdbd first log message and node healthy: BEFORE: 23.58 AFTER: 18.09 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c8651494febcb1c9e558b2002e2a72c2bf547c06)
* speed startup: alter recovery loopRusty Russell2010-06-221-100/+103
| | | | | | | | | | | | | | | | | | We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2)
* Wrap the IDR early, but not too early.Ronnie Sahlberg2010-06-101-1/+1
| | | | | | | We dont want it to wrap almost immediately so that basically all "ctdb ..." commands log the "Reqid wrap" warning. (This used to be ctdb commit f26b59d8b96a70baa80ab1bad406ee6a21330b68)
* Merge commit 'rusty/idtree'Ronnie sahlberg2010-06-104-4/+14
|\ | | | | | | (This used to be ctdb commit 069db55ea6fa6b8dd278b880c1a325e259f3e172)
| * Delay reusing ids to make protocol more robustRusty Russell2010-06-103-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ronnie and I tracked down a bug which seems to be caused by a node running so slowly that we timed out the request and reused the request id before it responded. The result was that we unlocked the wrong record, leading to the following: ctdbd: tdb_unlock: count is 0 ctdbd: tdb_chainunlock failed smbd[1630912]: [2010/06/08 15:32:28.251716, 0] lib/util_sock.c:1491(get_peer_addr_internal) ctdbd: Could not find idr:43 ctdbd: server/ctdb_call.c:492 reqid 43 not found This exact problem is now detected, but in general we want to delay id reuse as long as possible to make our system more robust. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)
| * idtree: fix handling of large ids (eg INT_MAX)Rusty Russell2010-06-101-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Since idtree assigns sequentially, it rarely reaches high numbers. But such numbers can be forced with idr_get_new_above(), and that reveals two bugs: 1) Crash in sub_remove() caused by pa array being too short. 2) Shift by more than 32 in _idr_find(), which is undefined, causing the "outside the current tree" optimization to misfire and return NULL. Signed-off-by: Rusty Russell <rusty@rustorp.com.au> (This used to be ctdb commit 32c04e11ebbcf8239e47016302c6ce802a8b0a6f)
* | fix a debug messageRonnie Sahlberg2010-06-091-1/+1
| | | | | | | | (This used to be ctdb commit 856bd6de6218d9b70baed0e6443be4253ea31afe)
* | idr can timeout and wrap/be reused quite quickly.Ronnie Sahlberg2010-06-091-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a noremote node hangs for an extended period, it is possible that we might have a DMASTER request in flight for record A to that node. Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B. If while the request for B is in flight, the first tnode un-hangs and responds back we would receive a dmaster reply for the wrong record. This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight. (This used to be ctdb commit 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37)
* | We can not be holding a chainlock at this stage, so the tdb_chainunlock() ↵Ronnie Sahlberg2010-06-091-1/+0
| | | | | | | | | | | | | | | | call is bogus ( a child process might be holding the lock, but not the main daemon) (This used to be ctdb commit 9b4a83e49c5df80df8498b7384c5f53f390c1d9d)
* | add extra logging for failed ctdb_ltdb_unlock() for a few more placesRonnie Sahlberg2010-06-091-4/+19
| | | | | | | | | | | | it is called from (This used to be ctdb commit 5c0fea90c6474a51992a9c4aeb6af7dfeb213ee0)
* | add additional logging when tdb_chainunlock() failsRonnie Sahlberg2010-06-093-8/+39
| | | | | | | | | | | | so we can see where it was called from when it fails (This used to be ctdb commit 0c091b3db6bdefd371787d87bc749593ea8e3c76)
* | print the db name qwhen a chainunlock fails tooRonnie Sahlberg2010-06-091-1/+1
| | | | | | | | (This used to be ctdb commit 7932156d7f25870e6937faca08bf75d3cdbad2e5)
* | when tdb_chainunlock() fails, print the tdb error that occuredRonnie Sahlberg2010-06-091-1/+1
|/ | | | (This used to be ctdb commit dcdd2010905b9007fbf7ab71f576cfbd48acce8a)
* Some "ctdb ..." commands can be run without having the main daemon running.Ronnie Sahlberg2010-06-091-10/+11
| | | | | | | | | | | | In that case, when the main daemon is not running the ctdb context will be initialized to NULL, since we can not connect. Move the calls to read the ctdb socketname and connecting via libctdb to only happen when we are executing a "ctdb ..." command that requires that we talk to the actual daemon. Otherwise we will get an ugly SEGV for the "ctdb ..." commandline tool when trying to run a command that is supposed to work also when the daemon is down. (This used to be ctdb commit 18168da84a6aa8d69465e43402444c7ec979604a)
* libctdb: connect TDB logging to our loggingRusty Russell2010-06-083-9/+68
| | | | | | | | | | A simple connector function, made a bit more complex because TDB adds a '\n' and we don't. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit ae5b89dca00ca080c70868430fa54ba07bd6f5f4)
* libctdb: always check header hasn't changed on local tdbRusty Russell2010-06-082-25/+57
| | | | | | | | | | | | | | The code on which this is based could alter the header: a normal client can't. If we use this differently later we can change this. For the moment it's a nice extra check. We optimize out the record write altogether when the record hasn't changed, rather than just suppressing the seqnum update. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2638dbae7bf1a35ed37802e35e179e435a5d622a)
* libctdb: more bool conversion, and accompany lock by ctdb_db in APIRusty Russell2010-06-084-18/+34
| | | | | | | | | | | | | | I missed some int->bool conversions previously, particularly the return of ctdb_writerecord(). By always handing functions ctdb_connection or ctdb_db, we keep it consistent with the rest of the API and can do extra lock consistency checks. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 3f939956ddd693cba6ea5c655288f4f5ca95f768)
* libctdb: clarify logging levelsRusty Russell2010-06-083-17/+30
| | | | | | | | | | | | | Now we have more messages, it seems to make sense to document their usage and make them consistent. In particular, LOG_CRIT for internal libctdb problems, LOG_ALERT for API misuse. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit a6fed3f577c7ec51df38ed15ecb9db6ea2ae7c8f)
* libctdb: use magic to detect free/invalid locksRusty Russell2010-06-081-26/+32
| | | | | | | | | | | | | | Rather than using a binary, we use a magic value for locking. We also split out the "dont have the lock yet" from the "do have the lock" paths for clarity and extra checking. This should detect a superset of the previous case, even if they free (and reuse) the lock memory. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit dc081d40051b9204bb38e4de7dfe8d78656593d0)
* Additional log messages when tdb databases can no longer be chainlocked or ↵Ronnie Sahlberg2010-06-082-1/+3
| | | | | | | | chainunlocked BZ64688 (This used to be ctdb commit b977901a49a9fed45cc8a2fe880eb749f58278f6)
* In ctdb_writerecord()Ronnie Sahlberg2010-06-051-0/+7
| | | | | | | | | | | Verify that the lock is still held and refuse the write otherwise. We have to guarantee that we dont write to an unlocked record. If we write to a record after it has been released, the record may have already migrated off the node, in which case we get a DMASTER split brain for this record. (These application bugs are incredibly hard to track down) (This used to be ctdb commit f62c7e44dc303f274bbc1dd59fad2167e72a2af0)
* Split ctdb_release_lock() into a function to release the locvk and another ↵Ronnie Sahlberg2010-06-053-3/+26
| | | | | | | | function to free the data structures. This allows us to keep the datastructure valid after the lock has been released by the application and we can trap and warn when the application is accessing the lock after it has been released. I.e. application bugs. (This used to be ctdb commit 463a266205f145cd9c4c36b9c59d3747eeef0e2e)
* update "ctdb pnn" to use the new return value for _recv() whereRonnie Sahlberg2010-06-051-2/+2
| | | | | | bool false means failure and true means success. (This used to be ctdb commit 8fec60cb92d26886d853c918b8bc7931fec46469)
* Must initialize ctdb->locks or else bad things happenRonnie Sahlberg2010-06-051-0/+1
| | | | (This used to be ctdb commit 9ec0b9bb148327a40e439d9c643c9d2ff93ce598)
* Update the ctdb tool to use the new signature for ctdb_connect()Ronnie Sahlberg2010-06-051-1/+2
| | | | (This used to be ctdb commit ced3bc40f841d353bc86a6ee9dd1868473223f52)
* libctdb: documentationRusty Russell2010-06-042-90/+385
| | | | | | | | | | | | | | Full documentation for all the functions. This looks longer than it is, because it sorts them into async and sync parts, and also renames some formal parameters. Added TODO to libctdb directory to track our plans. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 108e9c2450876a9f8821aa7efd5be971eee5afd3)
* libctdb: use values from ctdb_protocol.h, don't re-declareRusty Russell2010-06-041-12/+1
| | | | | | | | | | We're best off including ctdb_protocol.h to get these, even if we document the important ones in ctdb.h. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit cdc19dc73032470d57f38bf825d8113b3a0c8cd1)
* libctdb: use bool in APIRusty Russell2010-06-046-64/+60
| | | | | | | | | | Return bool instead of -1/0; that's what the young kids are doing these days! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e285b5d5a9d4fbc4f75dbb237d2fcdbd84f2d605)
* libctdb: track lock for each ctdb_db, complain if they hold too long.Rusty Russell2010-06-042-33/+50
| | | | | | | | | | | In particular, this stops them grabbing two (with wrappers so we can enhance this logic once we support threads), and warns them if they re-enter ctdb_service() holding a lock (you are not supposed to block!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c620cfbad3b5f0d6330ef47f572d4ade08e169e8)
* patch libctdb-use-logging.patchRusty Russell2010-06-045-22/+138
| | | | (This used to be ctdb commit fecb8a19e97f6e453066461b234acdb0946bbadd)