summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* recoverd: Track failure of "recovered" event, banning culpritsMartin Schwenke2012-10-111-29/+42
| | | | | | | Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9550c497e6d6ef5ee44826c4bd9ed5ad65174263)
* recoverd: When starting a takeover run disable IP verificationMartin Schwenke2012-10-112-0/+20
| | | | | | | | | | | | | | Disable for TakeoverTimeout seconds. Otherwise the the recovery daemon can get overzealous and start trying to add/delete addresses that it thinks are missing but where the eventscript just hasn't finished. This didn't used to matter so much but it is more important now that concurrent takeip/releaseip/updateip generate error - we want to avoid spamming the log. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 56fcee3c7730cb12fa666072d5400949af6e5f7c)
* ctdbd: Stop takeovers and releases from colliding in mid-airMartin Schwenke2012-10-112-7/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race here where release and takeover events for an IP can run at the same time. For example, a "ctdb deleteip" and a takeover initiated by the recovery daemon. The timeline is as follows: 1. The release code registers a callback to update the VNN. The callback is executed *after* the eventscripts run the releaseip event. 2. The release code calls the eventscripts for the releaseip event, removing IP from its interface. The takeover code "updates" the VNN saying that IP is on some iface.... even if/though the address is already there. 3. The release callback runs, removing the iface associated with IP in the VNN. The takeover code calls the eventscripts for the takeip event, adding IP to an interface. As a result, CTDB doesn't think it should be hosting IP but IP is on an interface. The recovery daemon fixes this later... but it shouldn't happen. This patch can cause some additional noise in the logs: Release of IP 10.0.2.133/24 on interface eth2 node:2 recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it. Release of IP 10.0.2.133/24 rejected update for this IP already in flight recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed recoverd:Failed to release local ip address In this case the node has started releasing an IP when the recovery daemon notices the addresses is still hosted and initiates another release. This noise is harmless but annoying. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)
* ctdbd: New tunable NoIPTakeoverOnDisabledMartin Schwenke2012-10-116-89/+115
| | | | | | | | | | | | Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)
* Eventscripts: Add service-start and service-stop pseudo-eventsMartin Schwenke2012-10-101-2/+28
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit be4ad110ede9981b181ac28f31ffd855a879d5df)
* ctdbd: Avoid unnecessary updateip eventMartin Schwenke2012-10-101-5/+5
| | | | | | | | | | | | | | | | | | | | | The existing code makes one fatally bad assumption: vnn->iface->references can never be -1 (or max-unit32_t in this case). Right now the reference counting is broken so a reference count of -1 is possible and causes a spurious updateip when vnn->iface is the same as best_face. This can occur frequently because we get a lot of redundant takeovers, especially when each IP can only be hosted on one interface. This makes the code much more defensive by noting that when best_iface is the same as vnn->iface there is never a need for an updateip event. This effectively neuters the updateip code path when IPs can only be hosted by a single interface. This should obsolete 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7054e4ded59c6b8f254dcfefaef64da05f25aecd)
* Correct include for ctdb_protocol.hVolker Lendecke2012-10-091-1/+1
| | | | | | | | With an old ctdb_protocol.h installed under /usr/local, ctdb will not compile because the <> form of include will find the header under /usr/local (This used to be ctdb commit c4f5a58471b206e2287c7958c7f29c1f1c0626ac)
* Revert "when creating/adding a public ip, set the initial interface to be ↵Amitay Isaacs2012-10-071-3/+0
| | | | | | | | | | | | | | the first interface specified" This reverts commit 4308935ba48ac7a29e7523315acf580019715f0f. This fixes 16_ctdb_config_add_ip.sh test when run against local daemons. When running against local daemons, if the interface is assigned as soon as an IP is added, then takeover would never assign this IP address. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 06dfd13604d08910e07cbf927c338d7b9fce9a2f)
* util: ctdb_fork() closes all sockets opened by the main daemonMartin Schwenke2012-10-052-18/+24
| | | | | | | | | Do some other hosuekeeping including stopping tevent. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 212298279557a2833ef0f81809b4a5cdac72ca02)
* eventscripts: Auto-start/stop services in backgroundMartin Schwenke2012-10-037-25/+65
| | | | | | | | | | | If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done in the background with logging. Fix some unit tests for samba and winbind. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)
* Eventscripts: split 50.samba into 49.winbind and 50.sambaMartin Schwenke2012-10-0312-158/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | winbind and samba can be separately managed. This makes the service starting and stopping code way too complicated, and even adds a small amount of complexity to the monitoring code. The sensible option is to split this eventscript in two. There are two potentially backward incompatible changes here: * Functionality has been removed that allowed 50.samba to manage winbind when CTDB_MANAGES_WINBIND was unset but the smb.conf "security" parameter was set to "ADS" or "DOMAIN". Maintaining this functionality would have required moving the testparm-related code to the functions file, deciding where the cache file should go, and then calling it from both 49.winbind and 50.samba. This feature wasn't of great value and asking administrators to set an extra variable in exchange for code simplicity seems like a reasonable deal. * External code will need to be changed if it calls 50.samba directly with winbind-related expectations. This is fairly obvious! Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 34535ae64420926b9a3bf7d453fed4e6f4c90115)
* Initscript: Kill any existing ctdbd processes if the ping succeedsMartin Schwenke2012-10-021-0/+6
| | | | | | | | | Initialising a new ctdbd will destroy the Unix domain socket so existing processes will be useless anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 043ef77086797a703aec436a26a05c56a1bcbf2b)
* tools/ctdb: Free the event contextMartin Schwenke2012-10-021-0/+1
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit dc2a8c638bd74b9f1dd75339cd2ae2f32ffa18a8)
* libctdb: Add comments to effect that some controls return result in statusMartin Schwenke2012-10-021-0/+3
| | | | | | | | | | | | | | | | | These controls include: CTDB_CONTROL_GET_RECMODE CTDB_CONTROL_GET_RECMASTER CTDB_CONTROL_GET_PID CTDB_CONTROL_GET_PNN CTDB_CONTROL_PING CTDB_CONTROL_GET_DB_PRIORITY In these cases the data field is empty. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b89e959904d7d1b0e5525abd7789f5101537a46a)
* tests/tool: New tests for natgwlist, getcapabilities, lvs, lvsmasterMartin Schwenke2012-09-2811-0/+353
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6bd4feff7039138d435428eeded51975c44e567c)
* tests/tool: New function setup_natgw() to setup $CTDB_NATGW_NODESMartin Schwenke2012-09-281-0/+20
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0f0aef21a1bb2d88a8c184ef70c718e0c91acdc3)
* tools/ctdb: Clean up control_natgw()Martin Schwenke2012-09-281-63/+69
| | | | | | | | | | * Factor out repeated code into new function find_natgw() * Support both machine and human readable output * Use libctdb Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a56ec75edd1705b0539513d396d311f0e80a3bf5)
* tools/ctdb: Convert some commands over to libctdbMartin Schwenke2012-09-281-19/+24
| | | | | | | | | control_getcapabilities(), control_lvs(), control_lvsmaster() updated to use ctdb_getcapabilities(), ctdb_getnodemap() as appropriate. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c30ec02615183ecf9b412ad415bf1abd859aec45)
* tests: libctdb stubs initial ctdb_getcapabilities() implementationMartin Schwenke2012-09-281-0/+7
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 81af67c6959fdbe0566e3f1a00e2be58dd268dc6)
* tests: libctdb stubs must copy pointers rather than just returning themMartin Schwenke2012-09-281-6/+25
| | | | | | | | | Some code (e.g. NAT gateway code) modifies the returned result so was modifying the original. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a3f15d2828325bbfba5bc5c0a30429e2ce572a44)
* libctdb: add ctdb_getcapabilities()Martin Schwenke2012-09-285-8/+106
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 140fafef23050d40d66f5b5558c7efcb78f80cd2)
* tools/ctdb: Remove redundant filtering loop in control_natgwlist()Martin Schwenke2012-09-281-3/+0
| | | | | | | | | This used to catch trailing blank lines. However, these are caught just as effectively by the whitespace filtering in the loop below. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7b75a3bb722dc86139b1a07a0100d08c34620b91)
* tools/ctdb: natgwlist output is either human readable or machine readableMartin Schwenke2012-09-281-12/+28
| | | | | | | | | The first line is currently human readable and the rest is machine readable. This doesn't make sense. Do one or the other... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b29d5bbaa7048291c4b3a39bf12e04f0436f67da)
* tools/ctdb: Factor out printing of the machine readable status headerMartin Schwenke2012-09-281-4/+8
| | | | | | | | It is already in 2 places and we might use it in another. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 12a0a7a208d1c8fa8991894200d1dc133f3a2d1a)
* tools/ctdb: NAT gateway code should use CTDB_NATGW_NODESMartin Schwenke2012-09-281-1/+1
| | | | | | | | ... not NATGW_NODES. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 2da7730dc06153173778ab14e228960e72ff8a86)
* tests/eventscripts: New policy routing test with invalid table IDMartin Schwenke2012-09-111-0/+41
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 93c97c3ba3ff714dfa0d056a91ff45010a6e2d66)
* tests/eventscripts: Modify ip stub to simulate invalid table IDMartin Schwenke2012-09-111-15/+36
| | | | | | | | | | This involves refactoring ip_route_check_table() into a new function ip_check_table() which tables the operation type (i.e. rule/route) as an argument. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit acdaa04079a9827885f32a7bc078d3365c89b474)
* Eventscripts: Indent error when a route delete fails in 11.per_ip_routingMartin Schwenke2012-09-111-2/+8
| | | | | | | | | This puts it under the umbrella of the previous warning that should also have been printed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5c3be8f26dcde0b1b3d86928953e74d4a8b35958)
* tests/eventscript: unit test for 13.per_ip_routing bogus route removalMartin Schwenke2012-09-111-0/+47
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6d41208074f0e9b56c585bca7eb39aaed653c4ca)
* eventscripts: 13.per_ip_routing should remove bogus routes on ipreallocatedMartin Schwenke2012-09-111-0/+26
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0d0a6f19960f233224970b8d5d19b0e37222616)
* tests/eventscripts: Add a policy routing unit test for "ip rule del" failureMartin Schwenke2012-09-111-0/+38
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0ce5b079f327aba55b62800ccb22d79976fac665)
* eventscripts: Print a warning on failure to delete a routing ruleMartin Schwenke2012-09-111-4/+12
| | | | | | | | | | | | del_routing_for_ip() currently fails silently, which could hide real errors. In add_routing_for_ip() we don't want to see any error when calling del_routing_for_ip(), since we don't expect the rule to be there. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 30d69defa7e97ab5e3ba0492a27868dde2616494)
* doc: Fix path string of /etc/sysconfig/ctdb fileAmitay Isaacs2012-08-201-1/+1
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 49dd755fcd077c84eaf3d2fe5dd7757f5588d49c)
* recoverd: All inactive nodes should yield recovery master roleMartin Schwenke2012-08-081-2/+2
| | | | | | | | | | | | | Not just stopped nodes. In reality, this means that banned nodes will also yield, since nodes in the other inactive states won't be running a daemon. This seems sensible since if another node notices that an inactive node is the recovery master then it will force an election anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fc18188b7b63eb0dafbc47e3abf80e306e1dfc31)
* recoverd: An inactive node should not force recovery master electionsMartin Schwenke2012-08-081-2/+3
| | | | | | | | | | | | | | | | | | | An inactive node can't become the recovery master. So if an inactive node notices that the recovery master is inactive, it shouldn't force an election for recovery master and nominate itself as a candidate. This can cause the recovery master to flip-flop between nodes when all nodes are inactive. If there is actually an active node then it will trigger the election. This is fairly cosmetic but is a step along the way towards ironing out weirdness when all nodes are stopped. Also, fix a related comment. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e7dc10da3ced54ea9d719ad167ee42dcca8dce75)
* recoverd: main_loop() should not verify local IPs if node is stoppedMartin Schwenke2012-08-081-0/+8
| | | | | | | | | Doing these checks is pointless and potentially causes unnecessary log messages. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a0c30c820fd47d4f8620dc060c825be10754f5d1)
* recoverd: verify_local_ip_allocation() should dup ifaces before early returnMartin Schwenke2012-08-081-3/+3
| | | | | | | | | | | | | | | | | | | If CTDB starts in STOPPED state then it thinks it is in the middle of a recovery. rec->ifaces is also NULL and an early exit further down (that checks to see if a recovery is in process) means that it stays that way. However, each time this function is entered the need for a takeover run is re-flagged. The takeover run never happens due to the the early exit, causing a couple of unneeded messages to be logged each time. This is avoided by moving the code that sets rec->ifaces so that it is executed earlier and, in this case, in the middle of a recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f586e8a2911fc6e7f6698f516653145d8fd45dad)
* recoverd: Update a log message that has bit-rottedMartin Schwenke2012-08-081-3/+8
| | | | | | | | | | This message used to be correct because the ipreallocated event only handled updating the NAT gateway. However, that has changed so the message needs to be updated. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cc9d96f4248e45ea99c5f00db1526426ac26fbc2)
* recoverd: Fix bogus info in message about changed flagsMartin Schwenke2012-08-081-1/+1
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9119a568c2b4601318f7751f537dca2f92a7230b)
* tests/eventscripts: Extra cases for policy routing missing config testMartin Schwenke2012-07-301-2/+5
| | | | | | | | Test the startup and monitor events too. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c29a943f9bbcfecb861e71d007c7698a53dc8773)
* Eventscripts: 13.per_ip_routing should always fail if config is missingMartin Schwenke2012-07-301-2/+11
| | | | | | | | | | | | | | Currently, if the configuration file is specified by $CTDB_PER_IP_ROUTING_CONF but is missing, takeip fails but (the absent) monitor event "succeeds", so the state of a node will flip-flop. Instead of this, if the configuration file is missing then fail early on for all events. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c64c6c77c3f6aa2898e5a575547b587bea868c76)
* Revert "Eventscripts - make 13.per_ip_routing fail gracefully if config is ↵Martin Schwenke2012-07-301-7/+2
| | | | | | | | | | | | | | missing" When the configuration file is missing this causes the node to flip-flop betwen unhealthy (when takeip fails) and healthy (no monitor event here). Will reimplement this properly. This reverts commit 351ca413eec460330571ca8b01ad269728fe15df. (This used to be ctdb commit 5277d749c9111716fd723647d5421907476422bf)
* ctdb tool: recmaster command might as well be auto-allMartin Schwenke2012-07-301-1/+1
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 076282622fcb2663d378e0c90ed0d9c19f73c005)
* doc: Document the new onnode -P optionMartin Schwenke2012-07-303-14/+42
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fa0f3cba5adaa38bed37dd8b121ad53e962a010d)
* tools/onnode: Add -P option to push files to given nodesMartin Schwenke2012-07-301-13/+36
| | | | | | | | | | | | | | A list of files is given rather than a command. These files are pushed to the specified nodes. Quoting is fragile/broken so filenames with spaces won't work - you win some, you lose some. :-) All of the other onnode options should work together with this option. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit aed9b98ddbbf3e81de4f7257a10676565f7d7507)
* Eventscripts: Clean up 11.routingMartin Schwenke2012-07-301-9/+8
| | | | | | | | | | The loops can all be done without cat or grep. The pair of loops in updateip is combined into a single loop. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 96fdda124f5511fb76190e7c7a7f0b98e6b01a31)
* ctdbd: Log a meaningful message if the nodes file/list is emptyMartin Schwenke2012-07-261-0/+9
| | | | | | | | | Right now the message says it can't bind to any of the addresses... even when there aren't any! Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 553455b386aa7848a516a921dfc14eb87c8a3fc1)
* ctdbd: Remove the worked "Forced" from message about running eventscriptsMartin Schwenke2012-07-261-1/+1
| | | | | | | | | | The eventscripts are run after a takeover run and in this case they're not forced. The messages seems to imply that somone has run "ctdb eventscript" when that is not necessarily the case. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3880589db4d563e438126cf5080261fa06b9e242)
* ctdbd: Fix ctdb_control_release_ip() on local daemonsMartin Schwenke2012-07-261-5/+13
| | | | | | | | | | | | | | | | When running on local daemons no IPs are actually assigned to interfaces. Commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e broke ctdb_control_release_ip() for local daemons because it asks the system which interface the given IP is on, instead of the old behaviour of trusting CTDB's internal records. For local deamons (i.e. !ctdb->do_checkpublicip) revert to the old behaviour of looking up the interface internally. This is good enough, given that the tests don't tend to misconfigure the addresses. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 38e8651b955afdbaf0ae87c24c55c052f8209290)
* Initscript: clean up drop_all_public_ips()Martin Schwenke2012-07-261-7/+3
| | | | | | | | | This makes the case implicit where $CTDB_PUBLIC_ADDRESSES is unset. This is OK because that's not an interesting code path. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5b2725d1ae052e848c2487cb10c5393a877d118c)