summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * eventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNTMartin Schwenke2013-06-132-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following example: 1. There are 256 nfsd threads configured. 2. 200 threads are "stuck" in system calls, perhaps waiting for the underlying filesystem when an attempt is made to restart NFS. 3. 56 threads exit when NFS is stopped. 4. 56 new threads are started when NFS is started. 5. 200 "stuck" threads exit leaving only 56 threads running. Setting this option to "yes" makes the 60.nfs monitor event look for this situation and try to correct it. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2)
| * recoverd: Log node that causes takoever run to failMartin Schwenke2013-06-131-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | Extend takeover_fail_callback() to just log (and not do any ban processing) when the callback data is NULL. Always call ctdb_takeover_run() with the callback so that useful errors are always logged. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c429394afbabaee09f9216dc743419adddf523ea)
| * doc: Add release notes for 2.2Martin Schwenke2013-05-301-0/+65
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ac0892d3a57adb0587a37de0f94fa686bed8970f)
| * build: Fix extra whitespacesAmitay Isaacs2013-05-291-7/+7
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 78cff9d54f241fb6a2943e50346f9c2ad9decc78)
| * tevent: Sync to tevent 0.9.18 from upstreamAmitay Isaacs2013-05-2928-726/+3782
| | | | | | | | (This used to be ctdb commit 82d61f77c01df0fbb42743593937b175ce22a445)
| * replace: Sync to latest replace from upstreamAmitay Isaacs2013-05-2941-4745/+1599
| | | | | | | | | | | | | | | | | | | | | | | | | | The latest commits affecting lib/replace remove autoconf build from Samba tree. So using following commit as a sync point. commit 9ddfd7d8784e6f546628f48990b69ee2850be52d Author: Andrew Bartlett <abartlet@samba.org> Date: Wed May 22 17:23:30 2013 +1000 Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 506b27c944b4031e8a325816bd12abddd442a0bb)
| * tdb: Sync to tdb 1.2.11 from upstreamAmitay Isaacs2013-05-2965-47/+6474
| | | | | | | | (This used to be ctdb commit bb3a32ec055432afc7225c9fd7504fb187694bda)
| * talloc: Sync to talloc 2.0.8 from upstreamAmitay Isaacs2013-05-2938-309/+3034
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 3bffca8c17e441364525df115ee2ac16b5969e24)
| * ctdbd: Log node state transitions at higher debug levelAmitay Isaacs2013-05-291-2/+2
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit db31dc48bd3135e9242af08bb79b67a17a2b1668)
| * git: Ignore generated ctdb.spec fileAmitay Isaacs2013-05-291-0/+1
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ca7ba26362eabfbcc329c66919d9c4da79c3b799)
| * git: Ignore ctdb_version.h fileAmitay Isaacs2013-05-291-1/+1
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 641f539ffc7dd9542e669a3ec20c004f8bbcbf1e)
| * build: Use REPLACE_OBJ and CTDB_EXTERNAL_OBJ to simplify build rulesAmitay Isaacs2013-05-291-9/+13
| | | | | | | | | | | | | | | | | | This fixes the build on AIX where libreplace is required to build ctdb_lock_helper, ctdb_fetch_lock_once, ctdb_fetch_readonly_once. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit fa757b49374e44c2380d4457e9b0eb3582981fac)
| * build: Support for building on AIX xlc compilerAmitay Isaacs2013-05-291-2/+6
| | | | | | | | | | | | | | | | xlc does not support -fPIC, -Wno-format-zero-length Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2cf95741fdab2ee5f724950a0b1ef257d6aeade7)
| * tests: Do not use err() to support AIXAmitay Isaacs2013-05-291-4/+6
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1734562a7b3512853b9e0232880c42d50c1c2e4c)
| * tests: Include system/time.h to support building on AIXAmitay Isaacs2013-05-293-7/+2
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 0320bb4f8ca8171812ec7f41556aed847c74bfb4)
| * libctdb: Do not include sys/time.h to support build on AIXAmitay Isaacs2013-05-295-0/+6
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2c19fa78ce0b25c3615b23664df32233bdbdea42)
| * util: Do not stop build if backtracing is not supportedAmitay Isaacs2013-05-291-2/+1
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b091f09ea01482823bd850d1d4e2329e0a19c959)
| * eventscripts: Fix statd-callout update handlingMartin Schwenke2013-05-283-31/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 60.nfs and 60.ganesha touch $statd_update_trigger every time they're run. This stops the statd-callout updates from ever being called. Make this logic self-contained and move it to new function nfs_statd_update() in the functions file. Call this in 60.nfs and 60.ganesha with the appropriate update period as the only argument. Signed-off-by: Martin Schwenke <martin@meltin.net> Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com> (This used to be ctdb commit 1b5968f6be084590667f4f15ff3bef13ed9a2973)
| * tests/integration: Improve debug output for unhealthy cluster after restartMartin Schwenke2013-05-281-4/+7
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 25a6fd784cde96f3d20a79f70b5589b5c4aca675)
| * tests/scripts: Delete unused $rows and $ww variables from run_testsMartin Schwenke2013-05-281-3/+0
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 80b3cf2c652c6098390cdd0dbb3edc648f7df487)
| * packaging: Create separate package for pcp pmdaMartin Schwenke2013-05-281-0/+27
| | | | | | | | | | | | | | | | | | | | To build ctdb-pcp-pmda package, run packaging/RPM/makerpms.sh script with "--with pmda" option. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 85e11b9b13b3add88c1b8957be51793cc1db4f2d)
| * build: Separate autoconf macros for pmdaMartin Schwenke2013-05-282-19/+33
| | | | | | | | | | | | | | | | | | | | The pmda stuff is no longer built by default even if the headers are available. To build, run "configure --enable-pmda". Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 194f7a0dec26d693a5f3e6734b1c82f61f8e4d19)
| * build: Fix install paths for pcp pmdaMartin Schwenke2013-05-281-5/+5
| | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 11af486754bb04899e3dc544157bf70530e66cd1)
| * packaging: makerpms.sh can take multiple arguments for rpmbuildMartin Schwenke2013-05-271-1/+1
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f2ef3510407fbad29908195c58e4160d5a81e8a4)
| * eventscripts: Stop NAT gateway's delete_all() from polluting the logMartin Schwenke2013-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Every time a node that wasn't the NAT gateway master gets reconfigured something like this appears in the log: ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1 Since this usually fails it is better to mute the error than to have it pollute the log. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0ca7a98ffef50cbd06849cfbf65fb4a3d668b7bd)
| * recoverd: Backward compatibility for nodes without IPREALLOCATED controlMartin Schwenke2013-05-271-4/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the case of upgrading a cluster node by node, where some nodes are still running older versions of CTDB without the IPREALLOCATED control. If a "new" node takes over as recovery master and a failover occurs, then it will attempt to send IPREALLOCATED controls to all nodes. The "old" nodes will fail in a fairly nondescript way (result == -1). To try to handle this situation, fall back to the EVENTSCRIPT control to handle "ipreallocated". Only do this on the failed nodes. However, do not do this on nodes that timed out (they've probably implemented the control and we should call the regular fail_callback to get those nodes banned) or for stopped nodes (since they can't actually run the "ipreallocated" event via the EVENTSCRIPT control). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b2654853ce9b7c18c5874b080bc94d3118078a5d)
| * scripts: Provide mktemp function for platforms without mktemp commandMartin Schwenke2013-05-272-0/+47
| | | | | | | | | | | | | | | | | | | | | | This is needed for AIX and possibly others. Also provide a cheaper mktemp function is needed in the run_tests script. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)
| * tests: Fix integration tests to use real private IPsMartin Schwenke2013-05-271-2/+2
| | | | | | | | | | | | | | | | 192.0.2.x was a typo. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c9e36f596c63c9af7f80d7cb8d7a5c6dcca4860a)
| * pmda: handle new ctdb_statistics formatDavid Disseldorp2013-05-261-7/+7
| | | | | | | | | | | | | | | | | | | | The ctdb_statistics structure was recently changed. Update the PMDA to dereference the new structure member names. Signed-off-by: David Disseldorp <ddiss@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit e5a5ab53173d9aa4190ddf68c4ae316d4473eb56)
| * tests/takeover: New test with 900 IPsMartin Schwenke2013-05-241-0/+1813
| | | | | | | | (This used to be ctdb commit 75a620c516e384f042b5d675183b3a1b48fd6115)
| * tests/takeover: Takeover tests can use up to 1024 and checks limitsMartin Schwenke2013-05-241-1/+13
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cfd1371d3a1f78a0ed86485d83bd4d311727c3d4)
| * tests/takeover: LCP2 tests for weird, unbalanced corner-casesMartin Schwenke2013-05-243-0/+201
| | | | | | | | | | | | | | | | 2 tests to show a bad result and a 3rd test for the fix. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ef35c8889d90220929e48e66eb62da9ea2025ede)
| * tests/takeover: Allow takeover runs with differing IP allocations per nodeMartin Schwenke2013-05-242-12/+48
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 954ae6f84cb06a8dcbc12456d4752280072be5bf)
| * vacuum: Reduce the priority of non-critical errorAmitay Isaacs2013-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | Since the complete database is not locked when the receive_records control is received, it's possible that we may not be able to obtain lock on a chain. We will try again to store this record. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 32723c9efdad1c6ca4aa53f308ccd9bef1aadfff)
| * ctdbd: fix comment explaining redirection of CTDB_REQ_CALL redirection.Michael Adam2013-05-241-2/+5
| | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit b697625b184227dad1be31a41b7a3fd9bd312e29)
| * ctdbd: remove a nonempty blank lineMichael Adam2013-05-241-1/+0
| | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit d9e24782a90d9ce29c0e6584b75d2b186142174d)
| * ctdbd: update comment describing ctdb_call_send_redirect()Michael Adam2013-05-241-12/+1
| | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 9a21d417c51fb9cad8f2e87e00ca54d379aef860)
| * tests/takeover: New tests to check runstate handlingMartin Schwenke2013-05-243-0/+108
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c57430998a3bdedc8a904eb3a9cdfde1421aff50)
| * recoverd: Nodes can only takeover IPs if they are in runstate RUNNINGMartin Schwenke2013-05-242-3/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently the order of the first IP allocation, including the first "ipreallocated" event, and the "startup" event is undefined. Both of these events can (re)start services. This stops IPs being hosted before the "startup" event has completed. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)
| * recoverd: Handle errors carefully when fetching tunablesMartin Schwenke2013-05-241-5/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If a tunable is not implemented on a remote node then this should not be fatal. In this case the takeover run can continue using benign defaults for the tunables. However, timeouts and any unexpected errors should be fatal. These should abort the takeover run because they can lead to unexpected IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c0c27762ea728ed86405b29c642ba9e43200f4ae)
| * recoverd: Set explicit default value when getting tunable from nodesMartin Schwenke2013-05-241-4/+11
| | | | | | | | | | | | | | | | | | Both of the current defaults are implicitly 0. It is better to make the defaults obvious. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1190bb0d9c14dc5889c2df56f6c8986db23d81a1)
| * client: async_callback() sets result to -ETIME if a control times outMartin Schwenke2013-05-241-0/+5
| | | | | | | | | | | | | | | | | | | | Otherwise there is no way of treating a timeout differently to a general failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 40e34773b8063196457746ffe7a048eb87d96d61)
| * ctdbd: Update the get_tunable code to return -EINVAL for unknown tunableMartin Schwenke2013-05-243-3/+3
| | | | | | | | | | | | | | | | | | Otherwise callers can't tell the difference between some other failure (e.g. memory allocation failure) and an unknown tunable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)
| * recoverd: Whitespace improvementsMartin Schwenke2013-05-241-10/+10
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 473cfcb019f0cb4a094bf10397f7414f7923ee57)
| * recoverd: Use talloc_array_length() for simpler codeMartin Schwenke2013-05-241-8/+8
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f6792f478197774d2f3b2258c969b67c83e017ab)
| * ctdbd: When the "setup" event fails log an error and exit, don't abortMartin Schwenke2013-05-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | The "setup" event can fail when one of the eventscripts fails to run its "setup" event. If this occurs then the eventscript should log an error. The stack trace and core file generated when we abort provides no useful information. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c50eca6fbf49a6c7bf50905334704f8d2d3237d7)
| * eventscripts: 11.natgw should not call ctdb tool in "init" eventMartin Schwenke2013-05-241-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | The current code calls "ctdb setnatgwstate ..." on every event. However, calling the ctdb tool in the "init" event is not permitted. Instead, update the capability when it is needed and at regular intervals via the "monitor" event. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 39a43feae7c7de07ddaf2d6cb962f923d47d0c19)
| * ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERYMartin Schwenke2013-05-248-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds more serialisation to the startup, ensuring that the "startup" event runs after everything to do with the first recovery (including the "recovered" event). Given that it now takes longer to get to the "startup" state, the initscript needs to wait until ctdbd gets to "first_recovery". Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
| * tools/ctdb: "ctdb runstate" now accepts optional expected run state argumentsMartin Schwenke2013-05-244-4/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If one or more run states are specified then "ctdb runstate" succeeds only if ctdbd is in one of those run states. At the moment, if the "setup" event fails then the initscript succeeds but ctdbd exits almost immediately. This behaviour isn't very friendly. The initscript now waits until ctdbd is in "startup" or "running" run state via the use of "ctdb runstate startup running", meaning that ctdbd has successfully passed the "setup" event. The "setup" event code in 00.ctdb now waits until ctdbd is in the "setup" run state before proceeding via the use of "ctdb runstate setup". Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 4a2effcc455be67ff4a779a59ca81ba584312cd6)
| * tools/ctdb: New command runstate to print current runstateMartin Schwenke2013-05-242-0/+38
| | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit bf20c3ab090f75f59097b36186347cedb1c445d4)