summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * scripts: Remove duplicate code from init script to set tunablesAmitay Isaacs2012-10-172-21/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The tunable variables defined in CTDB configuration file are currently set up from init script as well as part of "setup" event in 00.ctdb eventscript. Remove the duplication of this code and set tunable variables only from setup event. During the "setup" event, it's possible that ctdb tool commands can timeout if CTDB daemon is not ready. To guard against such eventuality, wait till "ctdb ping" command succeeds before executing any other ctdb tool commands. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 632c1b9c1cc2e242376358ce49fd2022b3f27aa2)
| * doc: Fix the hyperlink for "Testing CTDB" pageAmitay Isaacs2012-10-171-1/+1
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 08dbd9c7958f9a0ee3de314d49523d32e4be135c)
| * tests/eventscripts: add unit tests for policy routing reconfigureMartin Schwenke2012-10-114-0/+77
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bd4ff176387372b1c233373c0bc8ced523fc9670)
| * tests/eventscripts: add extra infrastructure for policy routing testsMartin Schwenke2012-10-1116-317/+170
| | | | | | | | | | | | | | | | Less copying and pasting is a good thing... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7d4b8cce96f33fff647a0c9d259c121dfc8403e9)
| * Eventscripts: Add support for "reconfigure" pseudo-event for policy routingMartin Schwenke2012-10-111-2/+17
| | | | | | | | | | | | | | | | | | This rebuilds all policy routes and can be used if the configuration changes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c185ffd2822fcee26d07398464c59b66c61f53fa)
| * recoverd: Track failure of "recovered" event, banning culpritsMartin Schwenke2012-10-111-29/+42
| | | | | | | | | | | | | | Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9550c497e6d6ef5ee44826c4bd9ed5ad65174263)
| * recoverd: When starting a takeover run disable IP verificationMartin Schwenke2012-10-112-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Disable for TakeoverTimeout seconds. Otherwise the the recovery daemon can get overzealous and start trying to add/delete addresses that it thinks are missing but where the eventscript just hasn't finished. This didn't used to matter so much but it is more important now that concurrent takeip/releaseip/updateip generate error - we want to avoid spamming the log. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 56fcee3c7730cb12fa666072d5400949af6e5f7c)
| * ctdbd: Stop takeovers and releases from colliding in mid-airMartin Schwenke2012-10-112-7/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race here where release and takeover events for an IP can run at the same time. For example, a "ctdb deleteip" and a takeover initiated by the recovery daemon. The timeline is as follows: 1. The release code registers a callback to update the VNN. The callback is executed *after* the eventscripts run the releaseip event. 2. The release code calls the eventscripts for the releaseip event, removing IP from its interface. The takeover code "updates" the VNN saying that IP is on some iface.... even if/though the address is already there. 3. The release callback runs, removing the iface associated with IP in the VNN. The takeover code calls the eventscripts for the takeip event, adding IP to an interface. As a result, CTDB doesn't think it should be hosting IP but IP is on an interface. The recovery daemon fixes this later... but it shouldn't happen. This patch can cause some additional noise in the logs: Release of IP 10.0.2.133/24 on interface eth2 node:2 recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it. Release of IP 10.0.2.133/24 rejected update for this IP already in flight recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed recoverd:Failed to release local ip address In this case the node has started releasing an IP when the recovery daemon notices the addresses is still hosted and initiates another release. This noise is harmless but annoying. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)
| * ctdbd: New tunable NoIPTakeoverOnDisabledMartin Schwenke2012-10-116-89/+115
| | | | | | | | | | | | | | | | | | | | | | | | Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)
| * Eventscripts: Add service-start and service-stop pseudo-eventsMartin Schwenke2012-10-101-2/+28
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit be4ad110ede9981b181ac28f31ffd855a879d5df)
| * ctdbd: Avoid unnecessary updateip eventMartin Schwenke2012-10-101-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code makes one fatally bad assumption: vnn->iface->references can never be -1 (or max-unit32_t in this case). Right now the reference counting is broken so a reference count of -1 is possible and causes a spurious updateip when vnn->iface is the same as best_face. This can occur frequently because we get a lot of redundant takeovers, especially when each IP can only be hosted on one interface. This makes the code much more defensive by noting that when best_iface is the same as vnn->iface there is never a need for an updateip event. This effectively neuters the updateip code path when IPs can only be hosted by a single interface. This should obsolete 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7054e4ded59c6b8f254dcfefaef64da05f25aecd)
| * Correct include for ctdb_protocol.hVolker Lendecke2012-10-091-1/+1
| | | | | | | | | | | | | | | | With an old ctdb_protocol.h installed under /usr/local, ctdb will not compile because the <> form of include will find the header under /usr/local (This used to be ctdb commit c4f5a58471b206e2287c7958c7f29c1f1c0626ac)
| * Revert "when creating/adding a public ip, set the initial interface to be ↵Amitay Isaacs2012-10-071-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | the first interface specified" This reverts commit 4308935ba48ac7a29e7523315acf580019715f0f. This fixes 16_ctdb_config_add_ip.sh test when run against local daemons. When running against local daemons, if the interface is assigned as soon as an IP is added, then takeover would never assign this IP address. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 06dfd13604d08910e07cbf927c338d7b9fce9a2f)
| * util: ctdb_fork() closes all sockets opened by the main daemonMartin Schwenke2012-10-052-18/+24
| | | | | | | | | | | | | | | | | | Do some other hosuekeeping including stopping tevent. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 212298279557a2833ef0f81809b4a5cdac72ca02)
| * eventscripts: Auto-start/stop services in backgroundMartin Schwenke2012-10-037-25/+65
| | | | | | | | | | | | | | | | | | | | | | If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done in the background with logging. Fix some unit tests for samba and winbind. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)
| * Eventscripts: split 50.samba into 49.winbind and 50.sambaMartin Schwenke2012-10-0312-158/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | winbind and samba can be separately managed. This makes the service starting and stopping code way too complicated, and even adds a small amount of complexity to the monitoring code. The sensible option is to split this eventscript in two. There are two potentially backward incompatible changes here: * Functionality has been removed that allowed 50.samba to manage winbind when CTDB_MANAGES_WINBIND was unset but the smb.conf "security" parameter was set to "ADS" or "DOMAIN". Maintaining this functionality would have required moving the testparm-related code to the functions file, deciding where the cache file should go, and then calling it from both 49.winbind and 50.samba. This feature wasn't of great value and asking administrators to set an extra variable in exchange for code simplicity seems like a reasonable deal. * External code will need to be changed if it calls 50.samba directly with winbind-related expectations. This is fairly obvious! Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 34535ae64420926b9a3bf7d453fed4e6f4c90115)
| * Initscript: Kill any existing ctdbd processes if the ping succeedsMartin Schwenke2012-10-021-0/+6
| | | | | | | | | | | | | | | | | | Initialising a new ctdbd will destroy the Unix domain socket so existing processes will be useless anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 043ef77086797a703aec436a26a05c56a1bcbf2b)
| * tools/ctdb: Free the event contextMartin Schwenke2012-10-021-0/+1
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit dc2a8c638bd74b9f1dd75339cd2ae2f32ffa18a8)
| * libctdb: Add comments to effect that some controls return result in statusMartin Schwenke2012-10-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These controls include: CTDB_CONTROL_GET_RECMODE CTDB_CONTROL_GET_RECMASTER CTDB_CONTROL_GET_PID CTDB_CONTROL_GET_PNN CTDB_CONTROL_PING CTDB_CONTROL_GET_DB_PRIORITY In these cases the data field is empty. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b89e959904d7d1b0e5525abd7789f5101537a46a)
| * tests/tool: New tests for natgwlist, getcapabilities, lvs, lvsmasterMartin Schwenke2012-09-2811-0/+353
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6bd4feff7039138d435428eeded51975c44e567c)
| * tests/tool: New function setup_natgw() to setup $CTDB_NATGW_NODESMartin Schwenke2012-09-281-0/+20
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0f0aef21a1bb2d88a8c184ef70c718e0c91acdc3)
| * tools/ctdb: Clean up control_natgw()Martin Schwenke2012-09-281-63/+69
| | | | | | | | | | | | | | | | | | | | * Factor out repeated code into new function find_natgw() * Support both machine and human readable output * Use libctdb Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a56ec75edd1705b0539513d396d311f0e80a3bf5)
| * tools/ctdb: Convert some commands over to libctdbMartin Schwenke2012-09-281-19/+24
| | | | | | | | | | | | | | | | | | control_getcapabilities(), control_lvs(), control_lvsmaster() updated to use ctdb_getcapabilities(), ctdb_getnodemap() as appropriate. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c30ec02615183ecf9b412ad415bf1abd859aec45)
| * tests: libctdb stubs initial ctdb_getcapabilities() implementationMartin Schwenke2012-09-281-0/+7
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 81af67c6959fdbe0566e3f1a00e2be58dd268dc6)
| * tests: libctdb stubs must copy pointers rather than just returning themMartin Schwenke2012-09-281-6/+25
| | | | | | | | | | | | | | | | | | Some code (e.g. NAT gateway code) modifies the returned result so was modifying the original. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a3f15d2828325bbfba5bc5c0a30429e2ce572a44)
| * libctdb: add ctdb_getcapabilities()Martin Schwenke2012-09-285-8/+106
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 140fafef23050d40d66f5b5558c7efcb78f80cd2)
| * tools/ctdb: Remove redundant filtering loop in control_natgwlist()Martin Schwenke2012-09-281-3/+0
| | | | | | | | | | | | | | | | | | This used to catch trailing blank lines. However, these are caught just as effectively by the whitespace filtering in the loop below. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7b75a3bb722dc86139b1a07a0100d08c34620b91)
| * tools/ctdb: natgwlist output is either human readable or machine readableMartin Schwenke2012-09-281-12/+28
| | | | | | | | | | | | | | | | | | The first line is currently human readable and the rest is machine readable. This doesn't make sense. Do one or the other... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b29d5bbaa7048291c4b3a39bf12e04f0436f67da)
| * tools/ctdb: Factor out printing of the machine readable status headerMartin Schwenke2012-09-281-4/+8
| | | | | | | | | | | | | | | | It is already in 2 places and we might use it in another. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 12a0a7a208d1c8fa8991894200d1dc133f3a2d1a)
| * tools/ctdb: NAT gateway code should use CTDB_NATGW_NODESMartin Schwenke2012-09-281-1/+1
| | | | | | | | | | | | | | | | ... not NATGW_NODES. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 2da7730dc06153173778ab14e228960e72ff8a86)
| * tests/eventscripts: New policy routing test with invalid table IDMartin Schwenke2012-09-111-0/+41
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 93c97c3ba3ff714dfa0d056a91ff45010a6e2d66)
| * tests/eventscripts: Modify ip stub to simulate invalid table IDMartin Schwenke2012-09-111-15/+36
| | | | | | | | | | | | | | | | | | | | This involves refactoring ip_route_check_table() into a new function ip_check_table() which tables the operation type (i.e. rule/route) as an argument. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit acdaa04079a9827885f32a7bc078d3365c89b474)
| * Eventscripts: Indent error when a route delete fails in 11.per_ip_routingMartin Schwenke2012-09-111-2/+8
| | | | | | | | | | | | | | | | | | This puts it under the umbrella of the previous warning that should also have been printed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5c3be8f26dcde0b1b3d86928953e74d4a8b35958)
| * tests/eventscript: unit test for 13.per_ip_routing bogus route removalMartin Schwenke2012-09-111-0/+47
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6d41208074f0e9b56c585bca7eb39aaed653c4ca)
| * eventscripts: 13.per_ip_routing should remove bogus routes on ipreallocatedMartin Schwenke2012-09-111-0/+26
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0d0a6f19960f233224970b8d5d19b0e37222616)
| * tests/eventscripts: Add a policy routing unit test for "ip rule del" failureMartin Schwenke2012-09-111-0/+38
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0ce5b079f327aba55b62800ccb22d79976fac665)
| * eventscripts: Print a warning on failure to delete a routing ruleMartin Schwenke2012-09-111-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | del_routing_for_ip() currently fails silently, which could hide real errors. In add_routing_for_ip() we don't want to see any error when calling del_routing_for_ip(), since we don't expect the rule to be there. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 30d69defa7e97ab5e3ba0492a27868dde2616494)
| * doc: Fix path string of /etc/sysconfig/ctdb fileAmitay Isaacs2012-08-201-1/+1
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 49dd755fcd077c84eaf3d2fe5dd7757f5588d49c)
| * recoverd: All inactive nodes should yield recovery master roleMartin Schwenke2012-08-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Not just stopped nodes. In reality, this means that banned nodes will also yield, since nodes in the other inactive states won't be running a daemon. This seems sensible since if another node notices that an inactive node is the recovery master then it will force an election anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fc18188b7b63eb0dafbc47e3abf80e306e1dfc31)
| * recoverd: An inactive node should not force recovery master electionsMartin Schwenke2012-08-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An inactive node can't become the recovery master. So if an inactive node notices that the recovery master is inactive, it shouldn't force an election for recovery master and nominate itself as a candidate. This can cause the recovery master to flip-flop between nodes when all nodes are inactive. If there is actually an active node then it will trigger the election. This is fairly cosmetic but is a step along the way towards ironing out weirdness when all nodes are stopped. Also, fix a related comment. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e7dc10da3ced54ea9d719ad167ee42dcca8dce75)
| * recoverd: main_loop() should not verify local IPs if node is stoppedMartin Schwenke2012-08-081-0/+8
| | | | | | | | | | | | | | | | | | Doing these checks is pointless and potentially causes unnecessary log messages. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a0c30c820fd47d4f8620dc060c825be10754f5d1)
| * recoverd: verify_local_ip_allocation() should dup ifaces before early returnMartin Schwenke2012-08-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CTDB starts in STOPPED state then it thinks it is in the middle of a recovery. rec->ifaces is also NULL and an early exit further down (that checks to see if a recovery is in process) means that it stays that way. However, each time this function is entered the need for a takeover run is re-flagged. The takeover run never happens due to the the early exit, causing a couple of unneeded messages to be logged each time. This is avoided by moving the code that sets rec->ifaces so that it is executed earlier and, in this case, in the middle of a recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f586e8a2911fc6e7f6698f516653145d8fd45dad)
| * recoverd: Update a log message that has bit-rottedMartin Schwenke2012-08-081-3/+8
| | | | | | | | | | | | | | | | | | | | This message used to be correct because the ipreallocated event only handled updating the NAT gateway. However, that has changed so the message needs to be updated. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cc9d96f4248e45ea99c5f00db1526426ac26fbc2)
| * recoverd: Fix bogus info in message about changed flagsMartin Schwenke2012-08-081-1/+1
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9119a568c2b4601318f7751f537dca2f92a7230b)
| * tests/eventscripts: Extra cases for policy routing missing config testMartin Schwenke2012-07-301-2/+5
| | | | | | | | | | | | | | | | Test the startup and monitor events too. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c29a943f9bbcfecb861e71d007c7698a53dc8773)
| * Eventscripts: 13.per_ip_routing should always fail if config is missingMartin Schwenke2012-07-301-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if the configuration file is specified by $CTDB_PER_IP_ROUTING_CONF but is missing, takeip fails but (the absent) monitor event "succeeds", so the state of a node will flip-flop. Instead of this, if the configuration file is missing then fail early on for all events. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c64c6c77c3f6aa2898e5a575547b587bea868c76)
| * Revert "Eventscripts - make 13.per_ip_routing fail gracefully if config is ↵Martin Schwenke2012-07-301-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | missing" When the configuration file is missing this causes the node to flip-flop betwen unhealthy (when takeip fails) and healthy (no monitor event here). Will reimplement this properly. This reverts commit 351ca413eec460330571ca8b01ad269728fe15df. (This used to be ctdb commit 5277d749c9111716fd723647d5421907476422bf)
| * ctdb tool: recmaster command might as well be auto-allMartin Schwenke2012-07-301-1/+1
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 076282622fcb2663d378e0c90ed0d9c19f73c005)
| * doc: Document the new onnode -P optionMartin Schwenke2012-07-303-14/+42
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fa0f3cba5adaa38bed37dd8b121ad53e962a010d)
| * tools/onnode: Add -P option to push files to given nodesMartin Schwenke2012-07-301-13/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A list of files is given rather than a command. These files are pushed to the specified nodes. Quoting is fragile/broken so filenames with spaces won't work - you win some, you lose some. :-) All of the other onnode options should work together with this option. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit aed9b98ddbbf3e81de4f7257a10676565f7d7507)