summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * ctdb: hold transaction locks during freeze, mark during recover.Ronnie Sahlberg2011-01-181-0/+9
| | | | | | | | | | | | | | | | | | | | | | Make the ctdb parent "mark" the transaction lock once the child process has frozen/locked the entire database. This stops the ctdb daemon from using a blocking fcntl() locking on the tdb during the read traverse during recovery. CQ 1021388 (This used to be ctdb commit 52ee2b3ce822344d0f55ac040fe25f6ec5c0d7c2)
| * tdb: expose transaction lock infrastructure for ctdbRusty Russell2011-01-182-0/+24
| | | | | | | | | | | | | | | | | | tdb_traverse_read() grabs the transaction lock. This can cause ctdbd (which uses it) to block when it should not; expose mark and normal variants of this lock, so ctdbd's child (the recovery daemon) can acquire it and the ctdbd parent can mark it was held. (This used to be ctdb commit d09fa845bd848d04507853809acf42e0471b44bf)
| * change Christinas previous patch to only perform the check/loggingRonnie Sahlberg2011-01-174-8/+12
| | | | | | | | | | | | | | | | if we are the main ctdb daemon. Other daemons/child processes are not guaranteed to get events on regular basis so those should not be checked. (This used to be ctdb commit ac2afe9c25753b837d5f6396020e0f3c65ef3628)
| * improve timing issue detectionsChristian Ambach2011-01-176-15/+61
| | | | | | | | | | | | | | | | | | | | | | the original "Time jumped" messages are too coarse to interpret exactly what was going wrong inside of CTDB. This patch removes the original logs and adds two other logs that differentiate between the time it took to work on an event and the time it took to get the next event. (This used to be ctdb commit fd8d54292f10b35bc4960d64cfa6843ce9aba225)
| * LIBCTDB: add support for traverseRonnie Sahlberg2011-01-145-18/+267
| | | | | | | | (This used to be ctdb commit 9463e04038ba36792583f83bd95c1af322dc283a)
| * db_exists() takes 3 arguments, not two.Ronnie Sahlberg2011-01-141-4/+6
| | | | | | | | (This used to be ctdb commit 2c02fc2d45cd7364d7bee0d6a89f1386131ef002)
| * We can not always rely on the recovery daemon pinging us in a timely mannerRonnie Sahlberg2011-01-141-0/+30
| | | | | | | | | | | | | | | | so we need a "ticker" in the main ctdbd daemon too to ensure we get at least one event to process every second. This will improve the accuracy of "Time jumped" messages and remove false positives when the recovery daemon is "slow". (This used to be ctdb commit 70154e5e19e219de086b2995d41e8f6e069ee20d)
| * ADDIP failureRonnie Sahlberg2011-01-131-12/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found during automatic regression testing. We do not allow the takeip/releaseip events to be executed during a recovery. All of "ctdb addip, ctdb delip, ctdb moveip" use and force these events to trigger to perform the ip assignments required. If these commands collide with a recovery, these commands could fail since we do not allow takeip/releaseip events to trigger during the recovery. While it is easy to just try running hte command again, this is suboptimal for script use. Change these commands to retry these operations a few times until either successfull or until we give up. This makes the commands much easier to use in scripts. (This used to be ctdb commit 6954c9df67501183995f408cca358c8fdfb176ab)
| * IPALLOCATION : If the node is held pinned down in "init" stateRonnie Sahlberg2011-01-131-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | by external services failing to start, or blocking CTDBD from finishing the startup phase, we can encounter a situation where we have not yet fully initialized, but a remote recovery master tries to release a certain ip clusterwide. In this situation the node that is pinned down in init/startup phase would fail to perform the release of the ip address since we are not yet fully operational and not yet host any valid interfaces. In this situation, we just need to remain unhealthy, there is on need to also ban the node. Remove the autobanning for this condition and just let the node remain in unhealthy mode. Banning is overkill in this situation when the system is broken and just draws attention to ctdbd instead of the root cause. (This used to be ctdb commit d8af74e4c4961deb94c18dde8ba7fc07e944729c)
| * Eventscripts: lower the fail/restart limits for nfsd.Martin Schwenke2011-01-111-2/+2
| | | | | | | | | | | | | | | | | | We were potentially leaving a node unable to serve requests for too long. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5be8610ffa33db49e33949560d0ef2fa5f3c0c73)
| * Eventscripts: use "startstop_nfs restart" to reconfigure NFS.Martin Schwenke2011-01-111-0/+1
| | | | | | | | | | | | | | | | | | This was defaulting to just "service nfs restart", which doesn't have the workarounds we need. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0f462e9e9fe12b595f3c7452123db8e69548abd6)
| * Eventscripts: only autostart during a monitor event.Martin Schwenke2011-01-111-0/+3
| | | | | | | | | | | | | | | | | | Otherwise we might short-circuit events that are run only once and actually need to do something. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c4f9e8a43540bc049b2771e0a2d76d37b9d17331)
| * Eventscripts: print a message when reconfiguring a service.Martin Schwenke2011-01-111-0/+1
| | | | | | | | | | | | | | | | | | Otherwise there can be strange error messages from services stopping/starting, without any context. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8bcf7ab164429ddc0ae530133e114f186a8146dd)
| * Eventscripts: work around NFS restart failure under load.Martin Schwenke2011-01-111-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | "service nfs restart" can fail. To stop nfsd it sends a SIGINT and nfsd might take a while to process it if the system is loaded. Starting nfsd may then fail because resources are still in use. This does some /proc magic to tell nfsd to do no more processing. It then runs service stop, kills nfsd with SIGKILL, and then runs service start. This is much less likely to fail. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a9bf4f82852975b0b627f61ceb2d23401f630805)
| * TYPORonnie Sahlberg2011-01-111-1/+1
| | | | | | | | (This used to be ctdb commit 38dc1ac2e87416a22c9356596286b773d601e71c)
| * STATD is 100027 not 1000247Ronnie Sahlberg2011-01-111-1/+1
| | | | | | | | (This used to be ctdb commit f4cf15a2b06ffefde0cba803603b48040ad0fa05)
| * LIBCTDB uninitialized inqueue elementRonnie Sahlberg2011-01-111-0/+1
| | | | | | | | | | | | | | | | From Michael Anderson, initialize the inqueue element of the ctdb structure to NULL, else it might be used uninitialized and cause a segv. (This used to be ctdb commit 775d02180b825ae32d6536eaf2059884d5fed9f4)
| * recoverd: avoid triggering a full recovery if just some ip allocationRonnie Sahlberg2011-01-111-9/+6
| | | | | | | | | | | | | | | | has failed. We dont need to rebuild the databases in this situation, we just need to try again to sort out the ip address allocations. (This used to be ctdb commit 044c398ffea23d36ee033c8ddf07d11028197346)
| * Add ctdb_fork(0 which will fork a child process and drop the real-timeRonnie Sahlberg2011-01-1111-17/+22
| | | | | | | | | | | | | | | | | | scheduler for the child. Use ctdb_fork() from callers where we dont want the child to be running at real-time privilege. (This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
| * Revert scheduling back to use real-time processesRonnie Sahlberg2011-01-115-13/+57
| | | | | | | | | | | | | | | | | | Revert this patch: commit 482c302d46e2162d0cf552f8456bc49573ae729d We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads. (This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)
| * 60.nfs Check if we have rpc.statd and if not, skip checking for statdRonnie Sahlberg2011-01-061-18/+23
| | | | | | | | | | | | | | availability at all (since we cant restart it, there is not point checking if it is alive) (This used to be ctdb commit 6075e85ba6c0f58fd1ab2ce3b09dd3d6ff491365)
| * 41.HTTPDRonnie Sahlberg2010-12-221-4/+17
| | | | | | | | | | | | | | | | | | Httpd can be very slow to start on some platforms, wait 5 monitor intervals before we try to restart it if it has not bound to port 80 yet. After 10 failed intervals, flag the node as unhealthy. (This used to be ctdb commit 6ec1993aa5f2778b8227ce5f6eca0d19e4ae9788)
| * 60.nfsRonnie Sahlberg2010-12-221-6/+23
| | | | | | | | | | | | | | Try to restart LOCKD after 10 failures and flag the node as unhealthy after 15 failures (This used to be ctdb commit 5a67889c9166835aef3443051812d14af07dfca5)
| * Dont run net serverid wipe in the backgroundRonnie Sahlberg2010-12-221-1/+1
| | | | | | | | (This used to be ctdb commit 76c515f9f05f4fb5683b5ff65cf136c168fd882f)
| * 50.sambaRonnie Sahlberg2010-12-141-2/+2
| | | | | | | | | | | | | | | | Net serverid wipe can take a bit of time sometimes so background it. Only perform auto start/stop of the managed service on the monitor event (This used to be ctdb commit deba5cbbf7703a1a24ce88a06c73fca056e05521)
| * ctdb addip:Ronnie Sahlberg2010-12-131-138/+143
| | | | | | | | | | | | | | | | | | | | | | After finishing "ctdb addip" wait for an implicit "iptakeover" to complete the assignment to a node. This makes it more wasteful and timeconsuming when adding multiple ips at once, or the same ip to multiple nodes, but makes it easier to script the use of this command. (This used to be ctdb commit d86cbf3d7d426c558d110d67dc985634c754a522)
| * LVSRonnie Sahlberg2010-12-131-1/+1
| | | | | | | | | | | | update lvs configuration on ipreallocated events too (This used to be ctdb commit a4e98073d955676fdcbb91affae1de1a733d0bc2)
| * When assigning the single-public-ip during startup,Ronnie Sahlberg2010-12-131-0/+9
| | | | | | | | | | | | | | | | | | flag the interface as initially being "link ok" so that we can add it and startup. The eventscript can later drop the flag if required (This used to be ctdb commit 720849b756c825fb8b285f09972a8c39f1888a99)
| * Revert "server: when we migrate off a record with data, set the ↵Ronnie Sahlberg2010-12-131-4/+0
| | | | | | | | | | | | | | | | MIGRATED_WITH_DATA flag" This reverts commit 17e231abf5ade83d7fa624b5cf54ae876e2795aa. (This used to be ctdb commit 23f81ba39ee7cd8a7360f4602b3eb264eb221552)
| * Revert "Add a new header flag for "migrated with data" and set this to 1"Ronnie Sahlberg2010-12-131-22/+4
| | | | | | | | | | | | This reverts commit a8cc35191df1cd4b866897df71d317ce5f198cb5. (This used to be ctdb commit 7c37435fb517a621c45b21a21b4eb15f8bbd3c83)
| * libctdbRonnie Sahlberg2010-12-101-1/+1
| | | | | | | | | | | | fix a compile problem after renaming a structure field (This used to be ctdb commit f44c02f45dbc13e3cc2e89ee1c96bd0d57042fcc)
| * LibCTDBRonnie Sahlberg2010-12-105-3/+51
| | | | | | | | | | | | | | Add an input queue where we keep received pdus we have not yet processed This allows us to perform SYNC calls from an ASYNC callback (This used to be ctdb commit c111e98d3ad7bd3d09f4081e9bb1443d3722672f)
| * only run "serverid wipe" if we are actually running samba.Ronnie Sahlberg2010-12-101-2/+2
| | | | | | | | | | | | we dont need to run this on systems where we do run winbind but not samba (This used to be ctdb commit fcb9e8d1e1c78439ea42adb8b05ad84fbca7f724)
| * idtree: fix overflow for v. large ids on allocation and removalRusty Russell2010-12-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | (Imported from SAMBA commit 09a6538969ac). Chris Cowan tracked down a SEGV in sub_alloc: idp->level can actually be equal to 7 (MAX_LEVEL) there, as it can be in sub_remove. (We unfairly blamed a shift of a signed var for this crash in commit 2db1987f5a3a). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 73764104356d3738d9d20a9d06ce51535f74f475)
| * Add a new header flag for "migrated with data" and set this to 1Ronnie Sahlberg2010-12-071-4/+22
| | | | | | | | | | | | | | | | | | | | | | when we migrate a non-empty record onto the node or a non-empty record off the node When we migrate a record back to the lmaster and yield the dmaster role, inspect this flag if if it is still not set, we can delete the record from the local database as soon as we have migrated it back to the lmaster. (This used to be ctdb commit a8cc35191df1cd4b866897df71d317ce5f198cb5)
| * add new command line functionsRonnie Sahlberg2010-12-071-0/+103
| | | | | | | | | | | | | | | | | | ctdb readkey <dbid> <key> ctdb writekey <dbid> <key> <value> these are mainly intended for debugging of databases and dmaster migration issues (This used to be ctdb commit 70c2e7dd04727371590fb94579ffd20318fbeb58)
| * add a new ctdb_ltdb function to delete a record in a normal databaseRonnie Sahlberg2010-12-072-0/+20
| | | | | | | | (This used to be ctdb commit fe9070ec9be69e6a6fcbf9899e7ced24541c9c3a)
| * server: when we migrate off a record with data, set the MIGRATED_WITH_DATA flagMichael Adam2010-12-071-0/+4
| | | | | | | | (This used to be ctdb commit 17e231abf5ade83d7fa624b5cf54ae876e2795aa)
| * Add 60.ganesha to what gets installed by make install as well as by the RPMRonnie Sahlberg2010-12-062-0/+2
| | | | | | | | (This used to be ctdb commit 8a6da384f3fa08b1c5eba79d6febc7af7b3d9229)
| * add a missing part of the import of the previous ganesha patchRonnie Sahlberg2010-12-061-0/+1
| | | | | | | | (This used to be ctdb commit 171b8855bb2feae7f7dd6a079571f3113dedd6f4)
| * make changes to ctdb event scripts to support NFS-Ganesha.Chandra Seetharaman2010-12-062-0/+160
| | | | | | | | | | | | | | | | | | make changes to ctdb event scripts to support NFS-Ganesha. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 7298588ed54492f106954c893dd86b0a36783470)
| * during ip allocation, there are failure modes where a node might hold a ip ↵Ronnie Sahlberg2010-12-032-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | address but thinks it is still unassigned (-1). add code to the recovery daemon to detect this case and trigger a reallocation so that the ip gets covered and change the takeip code to allow for this condition, taking on an ip address that is already hosted. cq s1021073 (This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7)
| * dont try starting samba through the "init" eventRonnie Sahlberg2010-12-031-0/+2
| | | | | | | | (This used to be ctdb commit e314a449606418a4c4eac6eb319bfcdf1c398cd3)
| * When we are no longer the natgw master, dont put the natgw ip on loopback.Ronnie Sahlberg2010-11-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We put the ip on loopback just to make sure we would still interoperate with non-standard configurations on unix-KDC, that are configured to verify the optional HostAddresses field. This is not required for AD, since AD does not use this field, and is replaced in unix land with other/better mechanisms than this "dodgy" check. This makes it "easier" for applications that have bound to the natgw address to detect a socket problem and try to reconnect/recover if the ip address is completely missing from the system. At the same time, use the winbind specific hook that exists to explicitely tell winbindd : this address is gone, so if you have bound to it, this is a good time to close and rebind your socket. cq 1020333 (This used to be ctdb commit 0da94869d2912b2a412ba3fbd2137d88ce4e4389)
| * update autostart/stop to work for sambaRonnie Sahlberg2010-11-222-6/+8
| | | | | | | | (This used to be ctdb commit 37ab57e2adaecc3f7996ea20af45a5df0cd8be76)
| * add an explicit _is_managed_service to iscsi eventscriptRonnie Sahlberg2010-11-181-0/+2
| | | | | | | | (This used to be ctdb commit 44f683a1ba15944d3306a0effd572de3280ff975)
| * Dont pollute the logs with a "file not found" messageRonnie Sahlberg2010-11-181-1/+1
| | | | | | | | | | | | CQ S1020745 (This used to be ctdb commit ea8bb7b26bb879a895c267d49672433182390d0d)
| * 60.nfs eventscript should do nothing if NFS isn't managed by CTDB.Martin Schwenke2010-11-181-0/+2
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 582e5cd077501e8d4131a9c7981781471308edfd)
| * Eventscript functions - catch failures in ctdb_service_start().Martin Schwenke2010-11-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | ctdb_service_start() currently succeeds if ctdb_counter_init() succeeds. This changes it to fail when a service start fails. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ddb73962d72d933bf0edc28be0dbb45bea7e5ef4)
| * 50.samba eventscript should stop/start services when they become (un)managed.Martin Schwenke2010-11-182-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or corresponding changes are made to $CTDB_MANAGED_VERSIONS), the associated service should be started or stopped as necessary. This add calls to ctdb_start_stop_service() to manage starting/stopping samba and winbind. An associated cleanup is made to the initial checks that one of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them with calls to is_ctdb_managed_service(). To handle the winbind cases ctdb_start_stop_service() and is_ctdb_managed_service() are updated to take an optional service name parameter. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d98f175e8420d921a123ae9c0ce00945350b1537)