| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| | |
| |
| |
| |
| |
| |
| |
| | |
This replaces previous script-local variable ctdb_test_scripts_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 107b465172205cb304549fcffaf36b9416696c15)
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We haven't seen problems related to time jumps for a long time. Turn
this off by default.
To switch it back on set $CTDB_TEST_TIME_LOGGING to any non-null
value.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2aa9bbf3a52dde0707eb06acd91e57c8da5c717f)
|
| | |
| |
| |
| |
| |
| | |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d5b2ad651495f32091bd33d30871638de0de633a)
|
| |/
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit b86b947797c51e3576c6b34f547434c3f0aa36f3)
|
| |
|
|
|
|
|
|
|
|
| |
"ctdb listnodes" changed so that it never tries to contact the daemon
but reads the local nodes file instead. This fails if the nodes file
is in a non-default place but $CTDB_NODES isn't set.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a7ad2fb75f06791508dd928d2a0c305fc7f7b814)
|
| |
|
|
|
|
|
|
|
| |
There looks to be a minor race where IPs haven't yet been reallocated
but the cluster is healthy. This should fix it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2d6a800a789ca59fdab92422f98a4e05ba55f34c)
|
| |
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0e14213dfa841080c07fa6fce23b192493adb926)
|
| |
|
|
|
|
|
|
|
| |
Default to "any"... but allow specification because sometimes it
matters...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c12c97598afcd07ce4876b26e0b734bc825e54c1)
|
| |
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 91e74cb01a11012e41ef9633c98f13ddbb2e5908)
|
| |
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d3dc9410501767c07d9b0106bb73c979d869c127)
|
| |
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 21cdc7ed6942238faeb42983c862d4abc3f54ffb)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're seeing the cluster become healthy after a restart and then
revert to being unhealthy. It looks like there's a race and the
cluster shouldn't have been healthy, given that we seem to see that
the monitor cycle hasn't yet been run.
This collects some state debug info from all nodes after the cluster
becomes healthy. This is printed if the cluster is then unexpectedly
unhealthy a short time later.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c2efb5897e4258df649149f9904d7ac47322e1b4)
|
| |
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit fed3c2b80b8add8d1cf33abdd5dd8d8001af44d4)
|
| |
|
|
|
|
|
|
|
| |
This depends on the format of onnode output and also depends on
simple/00_ctdb_onnode.sh having been run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 93b53b186df55942bf4d9e90cae329f47889af72)
|
| |
|
|
|
|
|
|
|
|
|
| |
If filenames should be printed in descriptions in the summary then the
descriptions should include the filename. A better option is to
include something more human-readable that makes the test just as
easily identifiable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0efdbd61bdc2343e5459959b300bccc9986b1d78)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Putting PASSED/FAILED on the left makes it easier to scan the results
and simplifies the code. Also put starts around the word "*FAILED*"
to make it more obvious.
Also add a -q option to throw away test output and only display the
summary (if -s is also specified).
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c44b632b010b7d57007f3c8f294271c7e0217e0d)
|
| |
|
|
|
|
|
|
|
|
| |
This causes summary lines (when used with -s) to be pretty printed and
include the test description. This is the 4th line of the test output
- that is, immediately after the header.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0e5cc2a58b0d38e10a2ef9e81dc887c20f3fbdcb)
|
| |
|
|
|
|
|
|
| |
It is too hard to do anything else...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 08b636b500855e38e708e6963d8e63ded97c25ec)
|
| |
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4cdf3b9adc7edfd80a2901ef8457ae67aab0829a)
|
| |
|
|
|
|
|
|
| |
This should help with log cross-checking.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c0a916c40c623c0aa8245526283a064dbeea4b57)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If there's a chance that "ctdb status -Y" can return 0 but print
garbage then this function might return a false positive.
So, we do 2 things:
* Redirect stderr to >/dev/null rather than looking at it. This
minimises the chance that we will see garbage.
* Since we need at least 1 good line to decide the cluster is healthy,
we sanity check each line to esnure it starts with :[0-9].
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d4189c7c3fceaa833f9f0446a2b06af6fed714ec)
|
| |
|
|
|
|
|
|
| |
Also ensure that $CTDB is set by default it to "ctdb".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 8222fef1e61836b9bfd406205f9ffb9396aa7480)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This currently does "onnode any ... wait_until ...". If ctdbd is
being shutdown on a node then that node might be chosen anyway, if it
is asked early enough. Then we'll loop on that node but our ctdb
client command may always fail, causing a timeout rather than the
expected behaviour.
This puts the loop on the outside of the "onnode any" so that if the
"wrong" node is chosen initially then on the next iteration the choice
can be remade.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a88ee78686bd5aa2b789f5959e0562315a13525d)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These tests currently wait for the old IPs to fail back to the test
node. This isn't guaranteed with DeterministicIPs disabled.
This changes those tests to wait until the test node gets at least 1
IP assigned.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e9b3f5b1b51d541a911a27eb4348b368f28d185e)
|
| |
|
|
|
|
| |
metze
(This used to be ctdb commit c24fbea156dfdc9154e94eace725526e44cbcdac)
|
| |\
| |
| |
| | |
(This used to be ctdb commit 82e1c5231c389bea935328a08ecf9b0b3a3979ef)
|
| | |
| |
| |
| |
| |
| | |
metze
(This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)
|
| |/
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f319bd54369a2bc7d32c3bda7fc22f2ef1a51c3a)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 25e82a8a667a54c6921ef076c63fdd738dd75d19 changed wait_until()
to protect the command it runs from "set -e" by running it in a
subshell. This breaks uses where the command is expected to set
global variables. For example, wait_until_get_src_socket lost the
value of $out from its call to get_src_socket().
The fix is to not be lazy and use a sub-shell!
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 39642e745254d93d74dde907787503854fe6ca4a)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The timeout for waiting for state changes isn't very predictable. It
is "about" MonitorInterval seconds... but can be longer given the
duration of eventscript runs and other things. So, we change the
timeout to MonitorInterval + EventScriptTimeout, hoping it never takes
that long.
Move the eventscript installation/removal from the old fake-tests into
a function in the functions file. Implement supporting functions to
create/remove/check-for various files that it handles. Also add a
function that uses all of this that waits for the next monitor event
(but only if all other monitor events pass).
The final check in the skip share check tests uses the above and waits
for a monitor event, and then checks that the node is still healthy.
Also enhance the wait_until function to handle a command starting with
'!' (as a separate word) to make it easy to wait for a file not to
exist.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 25e82a8a667a54c6921ef076c63fdd738dd75d19)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This function has been broken since it was updated to work with the
"stopped" state (probably commit
67c5bfb5f02c9d45a32d976021ede4fb2174dfe9). Although ${var#:*:0}
removes the shortest matching prefix of $var, '*' can match substrings
that include ':' if '0' isn't where you expect. So we were making
unexpected matches and incorrectly returning true for some cases.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 11137bc2d492a62a26ec9f9f62ff362e81643f66)
|
| |
|
|
|
|
|
|
| |
This facilitates tracing of tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1f906bd3476e7cebf217e35b5477d6a7bb615a0c)
|
| |
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a083a1976d621c76121f1fa2c2f484cfa47267bd)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many tests currently do this sort of thing:
onnode 0 $CTDB_TEST_WRAPPER wait_until_node_has_status 1 disconnected
In fact, they all use exactly the same "onnode 0 $CTDB_TEST_WRAPPER"
idiom. This is both repetitious and dangerous, since node 0 might be
shutdown during a test. Instead, we push "onnode any
$CTDB_TEST_WRAPPER" (which selects a connected node) into
wait_until_node_has_status() and just call that function directly in
tests, like this:
wait_until_node_has_status 1 disconnected
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a2aaef03d4d6bbd4b42f50f732254935d4d3469c)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make it possible to start on only 1 node - for tests that need to
restart a particular node.
_ctdb_hack_options() attempts to see what options are being passed to
a daemon that is being run via the initscript. It then sets a
corresponding environment variable that the initscript knows about.
Currently only the --start-as-stopped option is supported. This is
extremely ugly but it seems like the only way... :-(
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 407b3117dfc1072117abf681ec98b9e252d8744c)
|
| |
|
|
|
|
|
|
|
| |
The debug code should run "ctdb status" on a cluster node, not on the
test client.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 448cd8db1305c1e6dfab323f92eac4a576596e4e)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parsing of "ctdb status -Y" output to determine various node
states was implemented very strictly. Therefore, the parsing broke
due to the addition of the new "stopped" state to the output of "ctdb
status -Y". This relaxes the parsing so that it should work for
versions prior to the introduction of the "stopped" state, as well as
future versions that add new states to the end of the list of bits in
output of "ctdb status -Y".
Similarly the check for cluster unhealthy (in _cluster_is_healthy())
now just checks for a single 1 in any bit in the "ctdb status -Y"
output, rather than checking for a particular number of 0s.
New tests
tests/simple/{41_ctdb_stop.sh,42_ctdb_continue.sh,43_stop_recmaster_yield.sh}
do rudimentary testing of the stop and continue functions.
Remove tests tests/simple/41_ctdb_ban.sh and
tests/simple/42_ctdb_unban.sh. They were both unreliable.
tests/simple/21_ctdb_disablemonitor.sh now schedules a restart, since
one will be required.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 67c5bfb5f02c9d45a32d976021ede4fb2174dfe9)
|
| |
|
|
| |
(This used to be ctdb commit d187eb8507f35a650ff3ffc50fa49110eebca0bd)
|
| |
|
|
|
|
|
|
|
| |
The debug code should run "ctdb status" on a cluster node, not on the
test client.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 34e6f8a04b12f8879eb42d417f9741502ccccf0f)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* 2 new tests for NFS failover.
* Factor repeated code from tests into new functions
select_test_node_and_ips(), gratarp_sniff_start() and
gratarp_sniff_wait_show(). Use these new functions in existing and
new tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit de0b58e18fcc0f90075fca74077ab62ae8dab5da)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cluster_is_healthy() is now run locally in tests and internally causes
_cluster_is_healthy() to be run on node 0. When it detects that the
cluster is unhealthy and $ctdb_test_restart_scheduled is not true,
debug information is printed. This replaces the previous use of
$CTDB_TEST_CLEANING_UP.
To avoid spurious debug on expected restarts, added scheduled
restarts to several tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ee7caae3a55a64fb50cd28fa2fd4663c5dd83b4f)
|
| |
|
|
|
|
|
|
|
|
|
| |
This works around potential race conditions in the init script where
the restart operation is not necessarily reliable. It just wraps the
actual restart in a loop and tries for a successful restart up to 5
times.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1cac8a0ad429f29d1508158c7f7c42a2f1a22945)
|
| |
|
|
|
|
|
|
|
| |
If wait_until() does not timeout, print the time taken for the command
to succeed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit bdb856ee22816ae1f6b8d15856555f488054f489)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Removed a race from tpcdump_start(). It seems impossible to tell
when tcpdump is actually ready to capture packets. So this function
now generates some dummy ping packets and waits until it sees them
in the output file.
* tcpdump_start() sets $tcpdump_filter. This is the default filter
for tcpdump_wait() and tcpdump_show(), but other filters may be
passed to those functions.
* New functions tcptickle_sniff_start() and
tcptickle_sniff_wait_show() handle capturing TCP tickle packets.
These are used by complex/31_nfs_tickle.sh and
complex/32_cifs_tickle.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 8e2a89935a969340bfead8ed040d74703947cb81)
|
| |
|
|
|
|
|
|
|
|
| |
There are still very rare cases where IPs haven't been reallocated
before the beginning of the next test, so this adds a sleep and an
extra call to "ctdb recover" to restart_ctdb().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c2bdb77d91761c003e2f0e6918a27c54150f6030)
|
| |
|
|
|
|
|
|
|
| |
Sometimes "stty size" reports 0, for example when running in a shell
under Emacs. In this case, we just change it to 80.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e309cb3f95efcf6cff7d7c19713d7b161a138383)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* ctdb_restart_when_done() now schedules a restart by setting an
explicit variable that is respected in ctdb_test_exit(), rather than
adding a restart to $ctdb_test_exit_hook. This means that restarts
are all done in one place.
* ctdb_test_exit() turns off "set -e" to make sure that all cleanup
happens.
* ctdb_test_exit() now prints a clear message indicating where the
test ends and the cleanup begins. This message also includes the
return code of the test.
* Add debug in cluster_is_healthy to try to capture information about
unexpected unhealthiness when a test starts.
* Simplify simple/07_ctdb_process_exists.sh so that the exit code is
generated more obviously.
* Remove redundant calls to ctdb_test_exit at the end of tests, since
they're done automatically via a trap. Also remove any preceding
warnings of restarts or final hints about test success/failure.
* Allow multi-digit debug levels in simple/12_ctdb_getdebug.sh and
simple/13_ctdb_setdebug.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b6fa044a1364cbb3008085041453ee4885f7ced1)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Glitches during restarts of the CTDB cluster have been causing some
tests to fail. This is because restarts are initiated in the body of
many tests. This adds a simple function ctdb_restart_when_done, which
schedules a restart using an existing hook in the test exit code.
This function is now used in tests that need to restart CTDB.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d440e83bb4f0c19c085915d0f0e87cc0dabbc569)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
New tests/complex/ subdirectory contains 2 new tests to ensure that
NFS and CIFS connections are tracked by CTDB and that tickle resets
are sent when a node is disabled.
Changes to ctdb_test_functions.bash to support these tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 31cc46eb157ca1301312f14879e4fb4da7d81088)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
simple/12_ctdb_getdebug.sh now recognises output with multi-digit node
numbers.
Sharing the ctdb directory via NFS and testing on a real cluster by
setting CTDB_TEST_REAL_CLUSTER didn't work by default. The fix is to
hack scripts/test_wrap so that it tries to find a valid bin directory
next to the directory containing it is in.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ea2ca769e1d1068fbbad843750b19acfd87360e0)
|