| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Routines in system_common and system_<os> are supposed to be ctdb
functions with OS specific implementations.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent changes have caused these commands to attempt to get
capabilities from all nodes before doing further filtering. This
means that capabilities are unnecessarily fetched from nodes that are
unlikely to be the master. If such a node does not answer the control
then many nodes can fail to calculate the master node. In the case of
natgwlist this will cause "monitor" events to fail resulting in
unhealthy nodes.
Restore the behaviour where capabilities are only fetched for a node
that will be the master if it has the desired flags.
Although this masks a problem where a connected node is not replying,
it can help to avoid an outage in some cases.
Add supporting tests and infrastructure. Infrastructure just lets a
timeout be faked - just for ctdb_ctrl_getcapabilities_stub() so far.
First test checks that this infrastructure works if the first node
times out in natgwlist. Second test checks the case worked around by
the above fix - that is, no failure when a node with PNN beyond the
NATGW master can time out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu May 29 05:59:37 CEST 2014 on sn-devel-104
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ba69742ccd822562ca2135d2466e09bf1216644b missed the point of
filtering disconnected nodes while limiting the nodemap to those in
the NAT gateway group. It was really to avoid trying to fetch
capabilities from disconnected nodes. This should be explicitly done
in filter_nodemap_by_capabilities(), otherwise "ctdb natgwlist" simply
fails when there is a disconnected node.
Note that the alternate solution where filter_nodemap_by_flags() is
called before filter_nodemap_by_capabilities() would not be not
correct. Filtering on flags first can produce a "healthier" set of
nodes where none of them have the NAT gateway capability.
Also extend stub for ctdb_ctrl_getcapabilities() to fail when trying
to get capabilities from a disconnected node and add a corresponding
test to confirm that "ctdb natgwlist" is no longer broken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Mar 28 07:56:18 CET 2014 on sn-devel-104
|
|
|
|
|
|
|
|
| |
This will test that ctdb_fetch_lock correctly revokes readonly
delegations.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tests for xpnn need to implement a stub for ctdb_sys_have_ip(). The
cheapest way of doing this is to read a fake nodemap using the
existing code and check if the IP of the "current" node is the one
being asked about. However, the fake state initialisation isn't
currently available to without_daemon commands because it is meant to
represent daemon state. However, it can be made available by moving
the relevant code into a new stub for tevent_context_init(). The stub
still needs to initialise a tevent context - this can be done by
calling a lower level function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
|
|
|
|
|
| |
... and add a test to make sure it works.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Tue Nov 19 19:06:51 CET 2013 on sn-devel-104
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit ed7d999214ee009e480c26410a04fa105028cb8e.
This is not necessary since ctdb_transaction_start() now will return NULL
only when there is a failure and not when another transaction is currently
active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 46615c8e0e63291605d76a6d35f1a93180718c36)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows ctdb_load_nodes_file() to move to ctdb_server.c and
ctdb_set_nlist() to become static.
Setting ctdb->nodes_file needs to be done early, before the nodes file
is loaded. It is now set from CTDB_BASE instead ETCDIR, so setting
CTDB_BASE also needs to be done earlier.
Unhack ctdbd_test.c - it no longer needs to define
ctdb_load_nodes_file().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 20e705e63bd3b20837cc3ac92fdcf2a9650ccfc8)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ed7d999214ee009e480c26410a04fa105028cb8e)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit af4b6b8b3222d2a3c425fcc6833db579d0cd7ffa)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 929045335212e825deb645cc6c7f97b8a40fdbb3)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Main changes are:
libctdb_test.c -> ctdb_test_stubs.c
ctdb_tool_libctdb.c -> ctdb_functest.c
ctdb_tool_stubby.c is gone, replaced with existing ctdb_test.c.
Functions starting with "libctdb_test_" now start with
"ctdb_test_stubs_".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 6182bd0c19f215a997efe5272e633b1b1bd0c882)
|
|
|
|
|
|
|
|
| |
Instead, override controls using preprocessor magic.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 10aac42f30cc0d56dca42ece17d04ccbc321056d)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1585a8e275b0143e5e46311b3d5e9785119f735f)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ae0d8f432ef98a72c85a6cd42c503b718bef0e4e)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 873b9cadbcc363a9e5f450b0a1feb1cf2ce1e6c9)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation has a few flaws:
* A takeover run is called unconditionally when the timer goes even if
the recovery master role has moved. This means a node other than
the recovery master can incorrectly do a takeover run.
* The rebalancing target nodes are cleared in the setup for a takeover
run, regardless of whether the takeover run succeeds.
* The timer to force a rebalance isn't cleared if another takeover run
occurs before the deadline. Any forced rebalancing will happen in
the first takeover run and when the timer expires some time later
then an unnecessary takeover run will occur.
* If the recovery master role moves then the rebalancing data will
stay on the original node and affect the next takeover run to occur
if the recovery master role should come back to the original node.
Instead, store an array of rebalance target nodes in the recovery
master context. This is passed as an extra argument to
ctdb_takeover_run() each time it is called and is cleared when a
takeover run succeeds. The timer hangs off the array of rebalance
target nodes, which is cleared if the node isn't the recovery master.
This means that it is possible to lose rebalance data if the recovery
master role moves. However, that's a difficult problem to solve. The
best way of approaching it is probably to try to stop the recovery
master role from jumping around unnecesarily when inactive nodes join
the cluster.
The long term solution is to avoid this nonsense completely. The IP
allocation algorithm needs to cache state between runs so that it
knows which nodes have just become healthy. This also needs recovery
master stability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f6b066a23610fb0092298861c21a9b354b91e2f1)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 05bfdbbd0d4abdfbcf28e3930086723508b35952)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 9ffcd6a91287d86bae7b0c73aa129c81126e08e7)
|
|
|
|
|
|
|
|
|
| |
This fixes the segmentation error if any of the test code fails to
connect to CTDB daemon.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d48eecd748830598f4f080952f2bf05d6f92738c)
|
|
|
|
| |
(This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1734562a7b3512853b9e0232880c42d50c1c2e4c)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 0320bb4f8ca8171812ec7f41556aed847c74bfb4)
|
|
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cfd1371d3a1f78a0ed86485d83bd4d311727c3d4)
|
|
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 954ae6f84cb06a8dcbc12456d4752280072be5bf)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined. Both of
these events can (re)start services.
This stops IPs being hosted before the "startup" event has completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modifying the node flags with IP-allocation-only flags is not
necessary. It causes breakage if the flags are not cleared after use.
ctdb_takeover_run() no longer needs the general node flags - it only
needs the IP flags.
Instead of modifying the node flags in nodemap, construct a custom IP
flags list and have takeover_run_core() use that instead of node
flags. As well as being safer, this makes the IP allocation code more
self contained and a little bit clearer.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446)
|
|
|
|
|
|
|
|
| |
This has been replaced by set_ipflags() and associated functionality.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d0a3822573db296e73cc897835f783c8abc084b3)
|
|
|
|
|
|
|
|
|
| |
This is a no-op and is in a separate commit to make the previous
commit less cumbersome.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 107e656bbe24f9d21fbaf886a3e9417da4effe5a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This really needs to be per-node. The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).
* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.
* Enhance set_ipflags_internal() and set_ipflags() to setup
NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
and/or whether nodes are disabled/inactive.
* Replace can_node_servce_ip() with functions can_node_host_ip() and
can_node_takeover_ip(). These functions are the only ones that need
to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They
can make the decision without looking at any other flags due to
previous setup.
* Remove explicit flag checking in IP allocation functions (including
unassign_unsuitable_ips()) and just call can_node_host_ip() and
can_node_takeover_ip() as appropriate.
* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
|
|
|
|
|
|
|
|
|
| |
Implemented for CTDB_SET_NoIPTakeover.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a1addd89fd9c0390912604097acd028cc24d3483)
|
|
|
|
|
|
|
|
| |
Curiously test_ctdb_sys_check_iface_exists fails on Linux
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit 109f428aa34f8f4cc0329880d2f4a5593a6cc6f3)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The retry loop is currently in ctdb_takeover_run_core(). Pushing it
into each function will make it possible to put each algorithm into a
separate top-level function. This will make the code much clearer and
more maintainable.
Also keep associated test code compatible.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f6ce18d011dd9043b04256690d826deb2640cd89)
|
|
|
|
|
|
|
|
| |
Via $CTDB_SET_NoIPTakeoverOnDisabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d357d52dbd533444a4af6151d04ba119a1533068)
|
|
|
|
|
|
|
|
|
| |
Default to LCP2, like ctdbd. Also support "det" for deterministic
IPs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 20631f5f29859920844dd8f410e24917aabd3dfd)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2126795153dacb255e441abcb36ee05107b6282a)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit caff197edf6f928494028ac6c993901954aaa36f)
|
|
|
|
|
|
| |
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 81af67c6959fdbe0566e3f1a00e2be58dd268dc6)
|
|
|
|
|
|
|
|
|
| |
Some code (e.g. NAT gateway code) modifies the returned result so was
modifying the original.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a3f15d2828325bbfba5bc5c0a30429e2ce572a44)
|
|
|
|
|
|
|
|
|
|
| |
If the record does not exist in persistent DB, RSN for that record is
considered 0. To write a record, RSN for that record should be set to 1,
otherwise the RSN check would fail.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ac89da4eea98fa686408c5671a6c44c0fd1d7a58)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were two issues with this test:
1. Since the messages are sent from one node to the next, if a node
does not register for messages before CTDB on that nodes receives
the message, it will never be seen by ctdb_fetch and it would
block on receive and would not send any messages to next node.
The crude solution is to sleep just before the messages are sent,
so that ctdb_fetch on all nodes have registered for the messages.
2. If ctdb_fetch stops sending messages after timelimit expiry, the
next node will keep waiting to receive messages in event_loop_once().
The default timeout is 30 seconds for event_loop_once(). Adding a
timed event will always set the timeout value to the time remaining
for the timed event to expire.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit bc55e09fdac9f743d6428bfe0be77840ad0fd1ba)
|
|
|
|
|
|
|
|
|
|
|
| |
(our child died and kernel wrapped the pid-space and reused the pid for a different process
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.
Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a
(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 74539cc619b17ea187e34b5866869212896b0c1a)
|
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)
|
|
|
|
| |
(This used to be ctdb commit f06634951331232cddf0b48eac3552b92aca5b93)
|
|
|
|
|
|
|
| |
It is very misleading in ctdb_persistent.c, since it is used for non-persistent
dbs...
(This used to be ctdb commit a956fa3a27106d0154a3fb46987d61c0a6b7c768)
|
|
|
|
| |
(This used to be ctdb commit f4d395165816f74839ed48860e3210e05bc16d3d)
|