summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * eventscripts: NFS RPC checks no longer support "knfsd"Martin Schwenke2013-05-071-1/+1
| | | | | | | | | | | | | | | | No longer used, support removed from test infrastructure. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0eb351ff4c7ee096de7c5e0a59561067091fa32e)
| * eventscripts: 60.nfs uses nfs_check_rpc_services() to check NFS RPC servicesMartin Schwenke2013-05-0714-118/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs * Installation and packaging additions to handle nfs-rpc-checks.d/ * Unit test updates, including deleting 1 test that sanity checked test infrastructure * Test infrastructure changes to use nfs-rpc-checks.d/ Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in 60.nfs. To get the equivalent behaviour, edit 20.nfsd.check and remove/comment all lines. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7e792d6768d9ca420ce3713cb122e63afd594b15)
| * eventscripts: NFS RPC checks allows "nfsd" in addition to "knfsd"Martin Schwenke2013-05-061-1/+1
| | | | | | | | | | | | | | | | Want nfs_check_rpc_services() to support filenames without the 'k'. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d9775fcbd6e30eef8382bea68e2f9bad2309f2c1)
| * eventscripts: New function nfs_check_rpc_services()Martin Schwenke2013-05-061-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is intended to replace nfs_check_rpc_service(), which builds configuration into eventscripts. nfs_check_rpc_services() uses a directory of configuration checks that can be edited by an administrator. The files have one limit check and a set of actions per line. The program name is extracted from the file name. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9bc8fbee6550ed2814fb35c70d57fab21ef1b8fd)
| * eventscripts: nfs_check_rpc_action() should be _nfs_check_rpc_action()Martin Schwenke2013-05-061-2/+2
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5a717fd495ba5a2bfd481d69f38b68fa4576716f)
| * eventscripts: Factor out common code from nfs_check_rpc_service()Martin Schwenke2013-05-061-6/+17
| | | | | | | | | | | | | | | | This creates new function _nfs_check_rpc_common(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cc3bb42e48bbdabd19187c231846b98589b4f4f3)
| * eventscripts: Remove ganesha support from nfs_check_rpc_service()Martin Schwenke2013-05-061-6/+0
| | | | | | | | | | | | | | | | | | This is unused so doesn't need to be maintained. An attempt to use it now will explicitly fail rather than implicitly fail via bitrot. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 887733dd7be53158bfe07b30ef31b611d0f8122f)
| * Revert "Eventscript functions: add optional version to nfs_check_rpc_service()"Martin Schwenke2013-05-061-10/+4
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 92f74fd589467b46c758e116e97417edfe8773d7. This change is unused and is just complicating the function. Conflicts: config/functions (This used to be ctdb commit 77302dbfd85754e02559eccb2dd6c090db0b6b9f)
| * eventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()Martin Schwenke2013-05-062-10/+10
| | | | | | | | | | | | | | | | | | The code in 60.nfs is going to be genericised, so make all the checks look the same. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 15b0f78cbf8d6ba481b7eba9e4fe3f4270214c72)
| * eventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()Martin Schwenke2013-05-061-46/+58
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4b4e7d8f0e8dcbab987e374d06ffaa21c06da0d3)
| * eventscripts: Remove unused function ctdb_check_counter_limit()Martin Schwenke2013-05-061-15/+0
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a8ef00608e48a551a334aded206146807aeb4c5a)
| * eventscripts: Use ctdb_check_counter() instead of ctdb_check_counter_limit()Martin Schwenke2013-05-062-6/+7
| | | | | | | | | | | | | | | | ctdb_check_counter_limit() can soon be removed... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bb2cdff77e8ec79e7d319159b9c9848ecfaaa0f1)
| * eventscripts: Might as well try to stat the reclock file firstMartin Schwenke2013-05-061-9/+9
| | | | | | | | | | | | | | | | | | It is in the background but it still might cause the counter to be reset before it is checked. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ef2cf75e95ff382c65524a4d77eb00ab8411d2fc)
| * eventscripts: Make the early exit in 01.reclock earlierMartin Schwenke2013-05-061-6/+3
| | | | | | | | | | | | | | | | That way we don't even check the counter... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 136abd4604dc68f7c696704bac708bae53cf1940)
| * eventscripts: Minor cleanups for killtcp/tickle functionsMartin Schwenke2013-05-061-16/+17
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 25ef4f655f1efc833deb5e244f9fff461e92f439)
| * eventscripts: Tweak the timeout check in kill_tcp_connections()Martin Schwenke2013-05-061-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This has 2 advantages: 1. It uses get_tcp_connections_for_ip() to check for leftover connections, instead of custom code. 2. It checks for the timeout condition before sleeping. The current code sleeps and then checks, so wastes a second. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 60a08eb96e1d97aab31e9bd4af01683c650541c2)
| * eventscripts: In killtcp/tickle functions, $_failed should be booleanMartin Schwenke2013-05-061-13/+12
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 319c1b68d5aa78f82a68febcad233a7c78afc887)
| * eventscripts: Remove unused $_killcount from tickle_tcp_connections()Martin Schwenke2013-05-061-2/+0
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8514ca56830b30e7f0eb5018632640daaf8ff65d)
| * eventscripts: Refactor connection listing in killtcp and tickle functionsMartin Schwenke2013-05-061-51/+58
| | | | | | | | | | | | | | | | | | Uses new function get_tcp_connections_for_ip(). This avoids using a temporary file and running netstat twice. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a621622903c7ef17764b15293d6ea8df5a53c7e1)
| * eventscripts: Reimplement kill_tcp_connections_local_only()Martin Schwenke2013-05-061-35/+10
| | | | | | | | | | | | | | | | ... using kill_tcp_connections() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 10e4db8f796d1e3259733180494db3b4bbad291a)
| * eventscripts: Change handling of one-way kills in kill_tcp_connections()Martin Schwenke2013-05-061-5/+6
| | | | | | | | | | | | | | | | | | This change is a no-op. However, In a subsequent commit we'll merge kill_tcp_connections_local_only() with this function. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 23c0f5f48e3e5a0c1a3254c582299f7893cf0d33)
| * eventscripts: Remove unnecessary variables from killtcp/tickle functionsMartin Schwenke2013-05-061-22/+11
| | | | | | | | | | | | | | | | | | Setting these variables spawns lots of unnecessary processes, which would surely slow down these functions on a busy system. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3eae161472e6352f7f656851c73dc056f95113eb)
| * eventscripts: Clean up ctdb_check_command()Martin Schwenke2013-05-063-11/+10
| | | | | | | | | | | | | | | | | | | | * Command is now multiple arguments, preserving quoting * $service_name no longer printed, no longer an argument * Debug output from failed command Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9e25fb261447a196de05937052779b36e75e7215)
| * eventscripts; Cleanup up ctdb_check_directories()Martin Schwenke2013-05-061-6/+7
| | | | | | | | | | | | | | | | | | The documentation comments are wrong... and remove option $service_name argument. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d9e6cb945c5edac9ca6405c9228bf647fab814f5)
| * eventscripts: Assert that $service_name is set in a few key placesMartin Schwenke2013-05-061-0/+14
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3d0a7d83ddc824961d876fc9afba829c90aef3e7)
| * eventscripts: counters default to $script_name if $service_name not setMartin Schwenke2013-05-061-1/+1
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fff88940f71058e4eefd65f50a6701389c005c17)
| * eventscripts: Simplify handling of $service name in "managed" functionsMartin Schwenke2013-05-067-21/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Complicated argument handling was introduced to deal with multiple services per eventscript. This was a failure and we split 50.samba. This simplifies several functions to use global $service_name unconditionally instead of having an optional argument. $service_name is no automatically longer set in the functions file. This means it needs to be explicitly set in 13.per_ip_routing because this script uses ctdb_service_check_reconfigure(). Eventscript unit test infrastructure needs to set $service_name during fake service setup, and policy routing tests need to be updated accordingly. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 27aab8783898a50da8c4bc887b512d8f0c0d842c)
| * eventscripts: Simplify handling of $service name in start/stop functionsMartin Schwenke2013-05-061-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | Complicated argument handling was introduced to deal with multiple services per eventscript. This was a failure and we split 50.samba. This simplifies several functions to use global $service_name unconditionally instead of having an optional argument. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b5802c4735e1c719a5cf9ce69489d5947bd5e8c5)
| * eventscripts: Simplify handling of $service name in service_managementMartin Schwenke2013-05-061-20/+16
| | | | | | | | | | | | | | | | | | | | | | | | Complicated argument handling was introduced to deal with multiple services per eventscript. This was a failure and we split 50.samba. This simplifies several functions to use global $service_name unconditionally instead of having an optional argument. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e24baac0d2952e86d5ff31235901f06e2f2b2449)
| * eventscripts: Simplify handling of $service name in reconfigure functionsMartin Schwenke2013-05-061-18/+15
| | | | | | | | | | | | | | | | | | | | | | | | Complicated argument handling was introduced to deal with multiple services per eventscript. This was a failure and we split 50.samba. This simplifies several functions to use global $service_name unconditionally instead of having an optional argument. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c2ea72ff565222f9edab408638bd45dbba6e8ff7)
| * eventscripts: Remove unused function ctdb_check_counter_equal()Martin Schwenke2013-05-061-12/+0
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fd536a26b310b5bf9628da62cca0b425f4a54030)
| * scripts: Fix script_log() regressionMartin Schwenke2013-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | 5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always pass a message to logger, so script_log() can no longer log stdin. Put all the tag fu in the actual tag so the message argument is empty if no message was passed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9dee4c84273633b9ad82e94dabbf0e6f86edbcef)
| * initscript: Look for tdbtool/tdbdump using which, not in fixed locationsMartin Schwenke2013-05-061-4/+4
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c74cc0442eb90d859eae270b59456d28605817c4)
| * ctdbd: Log CTDB startup before creating the PID fileMartin Schwenke2013-05-061-1/+1
| | | | | | | | | | | | | | | | | | Otherwise the messages are in a stupid order... :-) Signed-off-by: Martin Schwenke <martin@meltin.net> Reported-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit cd87ba85fc6c375758c7d3dfa8dbd4d8a02074b0)
| * ctdbd: Remove the "stopped" eventMartin Schwenke2013-05-066-61/+9
| | | | | | | | | | | | | | | | It isn't used, superceded by "ipreallocated". Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
| * eventscripts: Remove use of "stopped" eventMartin Schwenke2013-05-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Use "ipreallocated" instead. The "stopped" event pre-dates the "ipreallocated" event. The only way of stopping a node is via the ctdb tool, which explicitly causes a takeover run to occur after the node is stopped. The takeover run will generate an "ipreallocated" event. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 978d4a0d6d8c9877b23f72e3a7b78c1245d16908)
| * recoverd: ctdb_takeover_run() uses CTDB_CONTROL_IPREALLOCATEDMartin Schwenke2013-05-061-4/+2
| | | | | | | | | | | | | | | | This means "ipreallocated" is now run on stopped nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 83b61f7414b1f7a3424497ac987ca0724fba9eaa)
| * ctdbd: New control CTDB_CONTROL_IPREALLOCATEDMartin Schwenke2013-05-065-0/+65
| | | | | | | | | | | | | | | | | | This is an alternative to using ctdb_run_eventscripts() that can be used when in recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)
| * ctdbd: Avoid freeing non-monitor event callback when monitoring is disabledMartin Schwenke2013-05-061-31/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running a non-monitor event, check is made for any active monitor events. If there is an active monitor event, then the active monitor event is cancelled. This is done by freeing state->callback which is allocated from monitor_context. When CTDB is stopped or shutdown, monitoring is disabled by freeing monitor_context, which frees callback and then stopped or shutdown event is run. This creates a new callback structure which is allocated at the exact same memory location as the monitor callback which was freed. So in the check for active monitor events, it frees the new callback for non-monitor event. Since the callback function flags successful completion of that event, it is never marked complete and CTDB is stuck in a loop waiting for completion. Move the monitor cancellation to the top of the function so that this can't happen. Follow log snippest highlights the problem. 2013/04/30 16:54:10.673807 [21505]: Received SHUTDOWN command. Stopping CTDB daemon. 2013/04/30 16:54:10.673814 [21505]: Shutting down recovery daemon 2013/04/30 16:54:10.673852 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0 2013/04/30 16:54:10.673858 [21505]: Monitoring has been stopped 2013/04/30 16:54:10.673899 [21505]: server/eventscript.c:594 Sending SIGTERM to child pid:23847 2013/04/30 16:54:10.673913 [21505]: server/eventscript.c:629 searching for callback 0x1c6d5c0 2013/04/30 16:54:10.673932 [21505]: server/eventscript.c:641 running callback 2013/04/30 16:54:10.673939 [21505]: server/eventscript.c:866 in event_script_callback 2013/04/30 16:54:10.673946 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0 Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 05f785b51cfd8b22b3ae35bf034127fbc07005be)
| * recoverd: Interface reference count changes should not cause takeover runsMartin Schwenke2013-05-021-23/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment a naive compare of the all the interface data is done. So, if any IPs move then the reference counts for the the relevant interfaces change, interfaces appear to have changed and another takeover run is initiated by each node that took/released IPs. This change stops the spurious takeover runs by changing the interface comparison to ignore the reference counts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0b7257642f62ebd83c05b6e2922f0dc2737f175c)
| * recover: use CTDB_REC_RO_FLAGS where appropriateMichael Adam2013-04-241-13/+5
| | | | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b5a8791268e938d7e017056e0e2bd2cbec1fa690)
| * ctdb_daemon: use CTDB_REC_RO_FLAGS where appropriateMichael Adam2013-04-241-1/+1
| | | | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c7eab97c7a939710b73aae2d75b404b235a998f5)
| * ctdb_call: use CTDB_REC_RO_FLAGS where appropriateMichael Adam2013-04-241-1/+1
| | | | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f99eb2f56d8ca27110a45ae0e1c4bff40ac7a60e)
| * vacuum: use CTDB_REC_RO_FLAGS in the vacuuming codeMichael Adam2013-04-241-10/+2
| | | | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a62775334aa20d1d850d2df705eb70303b04ac5c)
| * ltdb_server: use CTDB_REC_RO_FLAGS where appropriateMichael Adam2013-04-241-2/+2
| | | | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 61f17e53576197def46bc61fdf0cdb5282333a3e)
| * include: define CTDB_REC_RO_FLAGS - all read-only related record flagsMichael Adam2013-04-241-0/+4
| | | | | | | | | | | | | | | | | | This is used for some checks Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c7924ce6404bb18641b00d5fbd2fe9da9aaf7959)
| * vacuum: Update (C)Michael Adam2013-04-241-1/+1
| | | | | | | | | | | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 61264debba58355b9716ac1637fdedef5ed249c8)
| * vacuum: extend the header comment for ctdb_process_delete_list()Michael Adam2013-04-241-2/+20
| | | | | | | | | | | | | | | | | | | | | | Describe the (new) process more precisely. And mention that is the last step of the vacuuming process that is performed on the lmaster. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 06de786c786f1cab4c6721adf47c2cb1e8a72adb)
| * vacuum: turn the vacuuming on lmaster into a three-phase process.Michael Adam2013-04-241-25/+278
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More precisely, before locally deleting an empty record, that has been migrated with data and that we are dmaster and laster for, we now perform the deletion on the other nodes in two steps instead of a single step. - First send out the list of records to be deleted to all other nodes with the new RECEIVE_RECORDS control to store the lmaster's current empty copy. - Then send those records that could be deleted on all nodes to all nodes again with the TRY_DELETE_RECORDS control as before for deletion. - Finally delete those records locally that were successfully deleted remotely in the previous step. This fixes an old race where a recovery that hits the vacuum process square between the eyes can create gaps in the record's history and hence let the records resurrect. In the case of the locking.tdb, that could mean that a file that was already closed, was recorded as being open and locked again, so samba clients were locked out of that file until samba was restarted. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit eee23d44b6427be8ab49bbfcee3abb62f37dfcc7)
| * vacuum: introduce the RECEIVE_RECORDS controlMichael Adam2013-04-244-0/+209
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999)