summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* recoverd: New function unassign_unsuitable_ips()Martin Schwenke2013-01-081-25/+34
| | | | | | | | | Move the code into a new function so it can be called from a number of places. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8adb255e62dbe60d1e983047acd7b9c941231d11)
* recoverd: Move failback retry loop into basic_failback() and lcp2_failback()Martin Schwenke2013-01-082-31/+28
| | | | | | | | | | | | | The retry loop is currently in ctdb_takeover_run_core(). Pushing it into each function will make it possible to put each algorithm into a separate top-level function. This will make the code much clearer and more maintainable. Also keep associated test code compatible. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f6ce18d011dd9043b04256690d826deb2640cd89)
* recoverd: Trying to failback more IPs no longer allocates unassigned IPsMartin Schwenke2013-01-082-20/+2
| | | | | | | | | | | | Neither basic_failback() nor lcp2_failback() unassign IPs anymore, so there's no point looping back that far. Also fix a unit test that now fails because looping back to handle unassigned IPs is no longer logged. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c09aeaecad7d3232b1c07bab826b96818756f5e0)
* recoverd: basic_failback() can call find_takeover_node() directlyMartin Schwenke2013-01-081-4/+2
| | | | | | | | | Instead of unassigning, looping back and depending on basic_allocate_unassigned. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4dc08e37dec464c8785a2ddae15c7c69d3c81ac3)
* recoverd: Don't do failback at all when deterministic IPs are in useMartin Schwenke2013-01-081-13/+5
| | | | | | | | | This seems to be the right thing to do instead of calling into the failback code and continually skipping the release of an IP. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4c87e7cb3fa2cf2e034fa8454364e0a7fe0c8f81)
* recoverd: Move the test for both 'DeterministicIPs' and 'NoIPFailback' setMartin Schwenke2013-01-081-3/+8
| | | | | | | | | If this is done earlier then some other logic can be improved. Also, this should be a warning since no error condition is set. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e06476e07197b7327b8bdac9c0b2e7281798ffec)
* recoverd: Fix a memory leak in IP allocationMartin Schwenke2013-01-081-0/+2
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bcd5f587aff3ba536cb0b5ef00d2d802352bae25)
* tests/takeover: Add some LCP2 tests for case when no node are healthyMartin Schwenke2013-01-086-0/+189
| | | | | | | | | | 3 tests should assign IPs to all nodes. 3 tests set NoIPTakeoverOnDisabled=1 and should drop all IPs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit edda58a45915494027785608126b5da7c98fee85)
* tests/takeover: Initial tests for deterministic IPsMartin Schwenke2013-01-083-0/+90
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5c820b2398a42af0e94bc524854a1ad144a63f7b)
* tests/takeover: Do output filtering for deterministic IPs algorithm tooMartin Schwenke2013-01-081-1/+2
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 98bd58a98d34ecca89c9042417d7527a18a5ecf9)
* tests/takeover: Support testing of NoIPTakeoverOnDisabledMartin Schwenke2013-01-081-0/+5
| | | | | | | | Via $CTDB_SET_NoIPTakeoverOnDisabled. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d357d52dbd533444a4af6151d04ba119a1533068)
* tests/takeover: IP allocation now selected via $CTDB_IP_ALGORITHMMartin Schwenke2013-01-082-16/+18
| | | | | | | | | Default to LCP2, like ctdbd. Also support "det" for deterministic IPs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 20631f5f29859920844dd8f410e24917aabd3dfd)
* tests/takeover: Support valgrinding the takeover codeMartin Schwenke2013-01-081-1/+1
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 06ad6b8a19f830472b0ed65cb52e7c3ea74ed1dc)
* tests: new simple integration test for delip interface garbage collectionMartin Schwenke2013-01-071-0/+62
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1a5410e8349cdb96fdc51aa5ecd4f5734f6798a5)
* tests: new function ip2ipmask() for integration testingMartin Schwenke2013-01-071-0/+7
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8164d9b29bf9080ccc76b1305fb6c07f1ed61d55)
* ctdbd: Clean up orphaned interfaces when an IP is deletedMartin Schwenke2013-01-071-4/+72
| | | | | | | | | | | | | | | | | Add a new function ctdb_remove_orphaned_ifaces() and call it in ctdb_control_del_public_address(). ctdb_remove_orphaned_ifaces() uses a naive implementation that does things in a very obvious way. There are many ways to improve the performance - some are mentioned in a comment in the code. However, I doubt that this will be a bottleneck even with a large number of public IPs. Running the eventscript is likely to outweigh the cost of this cleanup. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a)
* tests/complex: Add NFS test when CTDB is killed on one of the nodesAmitay Isaacs2013-01-071-0/+88
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b849fb4923d6a34141fe19006a974de81508ceda)
* Eventscripts: Change the default reconfigure action to do nothingMartin Schwenke2013-01-075-14/+12
| | | | | | | | | | | | | A default action of restarting the service doesn't obey the principle of least surprise. It cause the NFS service to be implicitly reintroduced. This allows no-op functions to be removed from some eventscripts and service restart functions to be added to others. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0)
* Eventscripts: Do not restart NFS on reconfigureMartin Schwenke2013-01-077-14/+0
| | | | | | | | | | | | | | | | | | | | It looks like this restart was accidentally reintroduced in commit fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure became unset so the default action of restarting the service would occur. From there cleanups have explicitly reintroduced it and carried it through the code. Also update the unit tests affected by this change. The restart was originally removed in commit bc481c3f1a44c50648488c4f8a7f15ec395d446f. The default reconfigure action of restarting a service is clearly suboptimal and will be addressed in a separate patch. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37)
* ctdbd: Initialise the node flags in just one placeMartin Schwenke2013-01-074-23/+33
| | | | | | | | | | | | | | | | | | Currently flags are initialised in 2 places. One of them is in ctdb_tcp_listen_automatic(), which just seems wrong. This makes the code easier to follow by just doing it in ctdb_start_daemon(). This means that the flags are now initialised later than previously. However, it is still done before the transport is started and before clients can connect. In future it might make sense to do a similar thing with setting the PNN. However, the current optimisation is reasonably obvious... Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2bbee8ac23ad5b7adf7122d8c91d5f0d54582507)
* ctdbd: Remove debug option --node-ip, use --listen insteadMartin Schwenke2013-01-073-54/+21
| | | | | | | | | This effectively reverts d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0 Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 496387a585b2c5778c808cf02b8e1435abde4c3e)
* tests: Local daemons should use --listen instead of --node-ipMartin Schwenke2013-01-071-1/+1
| | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 3221fce9ee2f6fdd3bb17a5e1629ad52a32f90d6)
* Initscript: when checking status, print output of "ctdb ping" if it failsMartin Schwenke2013-01-071-1/+4
| | | | | | | | | At the moment the caller has no idea why it thinks CTDB isn't running and we can't debug failures... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 776590bf84d221092298346a28d7fc0552a67c9d)
* ctdb:recover: fix a comment typoMichael Adam2013-01-051-1/+1
| | | | | | Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 5067392d2e06795559f25828b65c129608b65c0b)
* events/50.samba: fix testparm background updateMichael Adam2013-01-051-1/+1
| | | | | | | | | | creating the smb.conf cache with "-v" results in a cache file that fails to load with "testparm -s ..." later on due to "copy = " not being processable. (Copying the empty service name fails). Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 81788cfabe960497b050c5ee4e4e487ee061012a)
* daemon: Add a tunable to enable automatic database priority settingAmitay Isaacs2013-01-054-5/+24
| | | | | | | | | | | | | | Samba versions 3.6.x and older do not set the database priority. This can cause deadlock between Samba and CTDB since the locking order of database will be different. A hack was added for automatic promotion of priority for specific databases to avoid deadlock. This code should not be invoked with Samba version 4.x which correctly specifies the priority for each database. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7)
* daemon: Check if log_latency_ms is set before using itAmitay Isaacs2012-11-301-1/+1
| | | | | | | | This fixes a bug where wrong variable is checked. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f81e9add466b1d9b2796c09c6ba63b77296ea149)
* Git should ignore generated include/version.h fileMartin Schwenke2012-11-271-0/+1
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 905cd1293aa97dc7839a59b4f68eca02981f0891)
* vacuum: Avoid some tallocs in ctdb recoveryVolker Lendecke2012-11-261-6/+8
| | | | | | | | | | | | In a heavily loaded and volatile database a lot of SCHEDULE_FOR_DELETION requests can come in between fast vacuuming runs. This can lead to significant ctdb cpu load due to the cost of doing talloc_free. This reduces the number of objects a bit by coalescing the two objects of delete_record_data into one. It will also avoid having to allocate another talloc header for a SCHEDULE_FOR_DELETION key. Not the full fix for this problem, but it might contribute a bit. (This used to be ctdb commit 9a02f61547ddf74629aca21639d8fb61c1df7cbb)
* doc: Update ping_pong documentation to add -c optionAmitay Isaacs2012-11-211-0/+17
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit d05faf294e58e22ae3fbc76162258f1ae8178129)
* utils:ping_pong: add a -c switch to check the lock before reading/writingMichael Adam2012-11-201-2/+40
| | | | | | | | | | This is to verify that the fcntl F_GETLK call reports F_UNLCK if called from a process already holding a lock. This is for example used by samba's strict locking code in combination with "posix locking = true". Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 4f42d17b74ce891691eee1cead498959cc8e4837)
* recovery: data corruption of persistent DBs after recoveries: don't delete ↵Michael Adam2012-11-201-2/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | emtpy records The record-by-record mode of recovery deletes empty records. For persistent databases, this can lead to data corruption by deleting records that should be there: - Assume the cluster has been running for a while. - A record R in a persistent database has been created and deleted a couple of times, the last operation being deletion, leaving an empty record with a high RSN, say 10. - Now a node N is turned off. - This leaves the local database copy of D on N with the empty copy of R and RSN 10. On all other nodes, the recovery has deleted the copy of record R. - Now the record is created again while node N is turned off. This creates R with RSN = 1 on all nodes except for N. - Now node N is turned on again. The following recovery will chose the older empty copy of R due to RSN 10 > RSN 1. ==> Hence the record is gone after the recovery. On databases like Samba's registry, this can damage the higher-level data structures built from the various tdb-level records. This patch fixes that problem by not deleting empty records in recoveries for persistent databases. Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 6860c79aea416f56cfd7a6af790bbdf495dbc54e)
* recoverd: fix a comment typoMichael Adam2012-11-201-1/+1
| | | | | | Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 909269a4a3690e1245117ca1af935401455785e6)
* vacuum: fix a comment typoMichael Adam2012-11-191-1/+1
| | | | | | | Pair-Programmed-With: Volker Lendecke <vl@samba.org> Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit bab744e3c49efef2e05dc09e8ea9bd3e3fa58716)
* Eventscripts: 10.interface should list configured interfacesMartin Schwenke2012-11-191-3/+3
| | | | | | | | | | | | | The current code lists available interfaces. If IPs are configured in some other way than the public addresses file (e.g. ctdb addip) and their interfaces default to being marked down then, since down interfaces are not available, these interfaces can never be marked up. The configured interfaces should be listed instead. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d8f010355b715e49709836e057a5d0f110919897)
* ctdbd: Make the link status of new interfaces more flexibleMartin Schwenke2012-11-191-1/+14
| | | | | | | | | | | | | | | | | | | | | | Neither up nor down is a good default value for the link status of a new interface. Up means that IPs can be assigned to interfaces before the true state is known and they can move away quickly if the interface is actually down. Down means that IPs can't be assigned to an interface for a variable amount of time - until a monitor cycle occurs - and this can result in imbalanced IPs. This is a neat compromise. Before the startup event completes, IPs can't be assigned to interfaces because all interfaces begin in a down state. As soon as the startup event completes, IPs can be allocated to any interface that has been marked up by the eventscript. Later, during normal operation, newly added IPs can be assigned to new interfaces immediately. The IPs will still move away if an interface is noticed to be down in the next monitor cycle, but that is the exception rather than the rule. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9275a69a414482f1053ae14528d5972575b9214e)
* locking: Do not use RECLOCK for tracking DB locks and latenciesAmitay Isaacs2012-11-142-6/+12
| | | | | | | | | | RECLOCK is for recovery lock in CTDB. Do not override the meaning for tracking locks on databases. Database lock latency has nothing to do with recovery lock latency. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 54e24a151d2163954e5a2a1c0f41a2b5c19ae44b)
* tools/ctdb: Do not use function return value as pnnAmitay Isaacs2012-11-141-3/+5
| | | | | | | | | This fixes the wrong code where same variable 'ret' is used to track the pnn and the return value of a function call. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 718233c445cd6627ab3962b6565c2655f1f8efd0)
* recoverd: Track the nodes that fail takeover run and set culprit countAmitay Isaacs2012-11-143-11/+43
| | | | | | | | | | | | | | | | | | | | | If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245)
* Eventscripts: 10.interface startup event should only process interfaces onceMartin Schwenke2012-11-141-7/+4
| | | | | | | | | | | | | | | | Provided that monitor_interfaces() sets the state of each interface, there's no need to mark all interfaces as up before running monitor_interfaces() in the startup event. monitor_interfaces() will set the true status of each interface anyway. The duplication is unnecessary and may cause extra action in the recovery daemon because the state of some interfaces is changed an extra time. Instead, add a comment at the top of the loop in monitor_interfaces() to warn against early loop exits. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f243a916ee71013f7402b9c396c2ead88eb3aab0)
* build: Fix the build with old system-installed teventVolker Lendecke2012-11-081-1/+4
| | | | | | We depend on the tracing callback mechanism in ctdb. (This used to be ctdb commit 5f58c811127a89f162b6a41ddcd6e944801740a5)
* ctdbd: Fix compilation warning in locking codeMartin Schwenke2012-10-311-2/+2
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cd64035d71ddff6aebe6c15a49e09527283425d2)
* web: Update instructions for building from tarballAmitay Isaacs2012-10-311-1/+9
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ceac026713a7ee30ea865ed4a9422900ed76fdf6)
* tests: Do not check release suffix in ctdb version testAmitay Isaacs2012-10-311-1/+2
| | | | | | | | | release suffix added by RPM is to track packaging changes. Core CTDB version does not include the release suffix. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit aad1584da8a8425bc6f5163c95810e9d2390dc91)
* packaging: Use maketarball.sh script to create tarball for RPMAmitay Isaacs2012-10-301-18/+5
| | | | | | | | | This removes the duplicate code for building tarball and reuses existing script. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 16a91c2a4d03b46743611e2fe844bb2cef95e46a)
* packaging: Use optional argument as targetdir when creating tarballAmitay Isaacs2012-10-301-14/+14
| | | | | | | | In addition, do not modify CTDB version string with extra suffix. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 3d4838db51dd8199b9c29aebb6e7bfbd2a27b8bb)
* tool/ctdb: Always support ctdb version command, don't make it optionalAmitay Isaacs2012-10-302-9/+4
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f8af7d8de76e68e5c4bde15f832a31ce9107e8c7)
* build: Add rules to create include/version.h when building from git treeAmitay Isaacs2012-10-301-1/+7
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 8df7ea6b20417833792932487a082b3c71bb6837)
* packaging: Create include/version.h to define CTDB_VERSION_STRINGAmitay Isaacs2012-10-301-0/+14
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b151f9b62299ec5b887c62cef780547a39c0ba9d)
* Add a \n to an error messageVolker Lendecke2012-10-251-1/+1
| | | | (This used to be ctdb commit 9be3b23adbfc844b71bf1d4ddf0fbc3b269f15fa)