summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * tools/ctdb: Remove extra header from natgwlist -Y outputMartin Schwenke2012-10-181-4/+0
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 59520c9785d113ad5063eb5fbe42a9efc7e30076)
| * recoverd: Verifying local IPs should only check for unhosted available IPsMartin Schwenke2012-10-181-17/+34
| | | | | | | | | | | | | | | | | | | | | | Currently it checks for unhosted IPs among the known IPs rather than available IPs. This means that a takeover run can be flagged even when that takeover run will be unable to assign a known, unhosted IP. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3cc878bc97fdac764a60ed805f64d649eaab06e8)
| * Revert "Eventscripts - add facility to 10.interface to delete unmanaged IPs"Martin Schwenke2012-10-182-51/+0
| | | | | | | | | | | | | | | | | | | | This reverts commit 88f88d86b0d08240f749fb721b8c401c2eeb1099. This is dangerous and, on reflection, I can't see it being useful. There are often permanent IPs on interfaces that CTDB shares with its public IPs. (This used to be ctdb commit 16aba4eb620844626a1c71c58b51658caf44dea6)
| * Eventscripts: "recovered" event should not fail on NATGW failureMartin Schwenke2012-10-181-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recovery process has no protection against the "recovered" event failing, so this can cause a recovery loop. Instead of failing the "recovered" event, add a "monitor" event and fail that instead. In this case the failure semantics are well defined. A separate patch should ban nodes if the "recovered" event fails for an unknown reason. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit eaa7c165f58abd7e259c37d76b7dd37c91e13d9f)
| * Logging: Map TEVENT_DEBUG_FATAL to DEBUG_CRITMartin Schwenke2012-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is currently mapped to DEBUG_EMERG. CTDB really has no business logging anything at EMERG level since the whole system is not about to abort or catch fire. EMERG causes the message to appear on the console and on every terminal. That's a bit overzealous! There would be very few situations where logs are being filtered at level below ERROR, so CRIT should certainly suffice. The trigger for this was curious messages saying "No event for <n> seconds!" logged in a user's terminal. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0e56e2dad1861892aa8ba59494ad244f2498314e)
| * common: Debug ctdb_addr_to_str() using new function ctdb_external_trace()Martin Schwenke2012-10-185-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've seen this function report "Unknown family, 0" and then CTDB disappeared without a trace. If we can reproduce it then this might help us to debug it. The idea is that you do something like the following in /etc/sysconfig/ctdb: export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh" When we hit this error than we call out to gcore to get a core file so we can do forensics. This might block CTDB for a few seconds. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)
| * config/functions: fix a commentMichael Adam2012-10-171-1/+1
| | | | | | | | | | | | | | | | ctdb_check_counter_limits does not fail but succeed if count >= limit Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit af540ef728303b4a0a188b17c695e9aefab34489)
| * doc: Add info about execute permissions on event scriptsAmitay Isaacs2012-10-171-0/+2
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 25d886060b138bc5e78fe93d7bebe3990264f29d)
| * doc: Fix documentation for setup eventAmitay Isaacs2012-10-171-5/+3
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 36d25e96a2f8ae1461c5a708a2922f0475a39900)
| * scripts: Remove duplicate code from init script to set tunablesAmitay Isaacs2012-10-172-21/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The tunable variables defined in CTDB configuration file are currently set up from init script as well as part of "setup" event in 00.ctdb eventscript. Remove the duplication of this code and set tunable variables only from setup event. During the "setup" event, it's possible that ctdb tool commands can timeout if CTDB daemon is not ready. To guard against such eventuality, wait till "ctdb ping" command succeeds before executing any other ctdb tool commands. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 632c1b9c1cc2e242376358ce49fd2022b3f27aa2)
| * doc: Fix the hyperlink for "Testing CTDB" pageAmitay Isaacs2012-10-171-1/+1
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 08dbd9c7958f9a0ee3de314d49523d32e4be135c)
| * tests/eventscripts: add unit tests for policy routing reconfigureMartin Schwenke2012-10-114-0/+77
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bd4ff176387372b1c233373c0bc8ced523fc9670)
| * tests/eventscripts: add extra infrastructure for policy routing testsMartin Schwenke2012-10-1116-317/+170
| | | | | | | | | | | | | | | | Less copying and pasting is a good thing... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7d4b8cce96f33fff647a0c9d259c121dfc8403e9)
| * Eventscripts: Add support for "reconfigure" pseudo-event for policy routingMartin Schwenke2012-10-111-2/+17
| | | | | | | | | | | | | | | | | | This rebuilds all policy routes and can be used if the configuration changes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c185ffd2822fcee26d07398464c59b66c61f53fa)
| * recoverd: Track failure of "recovered" event, banning culpritsMartin Schwenke2012-10-111-29/+42
| | | | | | | | | | | | | | Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9550c497e6d6ef5ee44826c4bd9ed5ad65174263)
| * recoverd: When starting a takeover run disable IP verificationMartin Schwenke2012-10-112-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Disable for TakeoverTimeout seconds. Otherwise the the recovery daemon can get overzealous and start trying to add/delete addresses that it thinks are missing but where the eventscript just hasn't finished. This didn't used to matter so much but it is more important now that concurrent takeip/releaseip/updateip generate error - we want to avoid spamming the log. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 56fcee3c7730cb12fa666072d5400949af6e5f7c)
| * ctdbd: Stop takeovers and releases from colliding in mid-airMartin Schwenke2012-10-112-7/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race here where release and takeover events for an IP can run at the same time. For example, a "ctdb deleteip" and a takeover initiated by the recovery daemon. The timeline is as follows: 1. The release code registers a callback to update the VNN. The callback is executed *after* the eventscripts run the releaseip event. 2. The release code calls the eventscripts for the releaseip event, removing IP from its interface. The takeover code "updates" the VNN saying that IP is on some iface.... even if/though the address is already there. 3. The release callback runs, removing the iface associated with IP in the VNN. The takeover code calls the eventscripts for the takeip event, adding IP to an interface. As a result, CTDB doesn't think it should be hosting IP but IP is on an interface. The recovery daemon fixes this later... but it shouldn't happen. This patch can cause some additional noise in the logs: Release of IP 10.0.2.133/24 on interface eth2 node:2 recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it. Release of IP 10.0.2.133/24 rejected update for this IP already in flight recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed recoverd:Failed to release local ip address In this case the node has started releasing an IP when the recovery daemon notices the addresses is still hosted and initiates another release. This noise is harmless but annoying. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)
| * ctdbd: New tunable NoIPTakeoverOnDisabledMartin Schwenke2012-10-116-89/+115
| | | | | | | | | | | | | | | | | | | | | | | | Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)
| * Eventscripts: Add service-start and service-stop pseudo-eventsMartin Schwenke2012-10-101-2/+28
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit be4ad110ede9981b181ac28f31ffd855a879d5df)
| * ctdbd: Avoid unnecessary updateip eventMartin Schwenke2012-10-101-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code makes one fatally bad assumption: vnn->iface->references can never be -1 (or max-unit32_t in this case). Right now the reference counting is broken so a reference count of -1 is possible and causes a spurious updateip when vnn->iface is the same as best_face. This can occur frequently because we get a lot of redundant takeovers, especially when each IP can only be hosted on one interface. This makes the code much more defensive by noting that when best_iface is the same as vnn->iface there is never a need for an updateip event. This effectively neuters the updateip code path when IPs can only be hosted by a single interface. This should obsolete 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7054e4ded59c6b8f254dcfefaef64da05f25aecd)
| * Correct include for ctdb_protocol.hVolker Lendecke2012-10-091-1/+1
| | | | | | | | | | | | | | | | With an old ctdb_protocol.h installed under /usr/local, ctdb will not compile because the <> form of include will find the header under /usr/local (This used to be ctdb commit c4f5a58471b206e2287c7958c7f29c1f1c0626ac)
| * Revert "when creating/adding a public ip, set the initial interface to be ↵Amitay Isaacs2012-10-071-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | the first interface specified" This reverts commit 4308935ba48ac7a29e7523315acf580019715f0f. This fixes 16_ctdb_config_add_ip.sh test when run against local daemons. When running against local daemons, if the interface is assigned as soon as an IP is added, then takeover would never assign this IP address. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 06dfd13604d08910e07cbf927c338d7b9fce9a2f)
| * util: ctdb_fork() closes all sockets opened by the main daemonMartin Schwenke2012-10-052-18/+24
| | | | | | | | | | | | | | | | | | Do some other hosuekeeping including stopping tevent. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 212298279557a2833ef0f81809b4a5cdac72ca02)
| * eventscripts: Auto-start/stop services in backgroundMartin Schwenke2012-10-037-25/+65
| | | | | | | | | | | | | | | | | | | | | | If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done in the background with logging. Fix some unit tests for samba and winbind. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)
| * Eventscripts: split 50.samba into 49.winbind and 50.sambaMartin Schwenke2012-10-0312-158/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | winbind and samba can be separately managed. This makes the service starting and stopping code way too complicated, and even adds a small amount of complexity to the monitoring code. The sensible option is to split this eventscript in two. There are two potentially backward incompatible changes here: * Functionality has been removed that allowed 50.samba to manage winbind when CTDB_MANAGES_WINBIND was unset but the smb.conf "security" parameter was set to "ADS" or "DOMAIN". Maintaining this functionality would have required moving the testparm-related code to the functions file, deciding where the cache file should go, and then calling it from both 49.winbind and 50.samba. This feature wasn't of great value and asking administrators to set an extra variable in exchange for code simplicity seems like a reasonable deal. * External code will need to be changed if it calls 50.samba directly with winbind-related expectations. This is fairly obvious! Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 34535ae64420926b9a3bf7d453fed4e6f4c90115)
| * Initscript: Kill any existing ctdbd processes if the ping succeedsMartin Schwenke2012-10-021-0/+6
| | | | | | | | | | | | | | | | | | Initialising a new ctdbd will destroy the Unix domain socket so existing processes will be useless anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 043ef77086797a703aec436a26a05c56a1bcbf2b)
| * tools/ctdb: Free the event contextMartin Schwenke2012-10-021-0/+1
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit dc2a8c638bd74b9f1dd75339cd2ae2f32ffa18a8)
| * libctdb: Add comments to effect that some controls return result in statusMartin Schwenke2012-10-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These controls include: CTDB_CONTROL_GET_RECMODE CTDB_CONTROL_GET_RECMASTER CTDB_CONTROL_GET_PID CTDB_CONTROL_GET_PNN CTDB_CONTROL_PING CTDB_CONTROL_GET_DB_PRIORITY In these cases the data field is empty. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b89e959904d7d1b0e5525abd7789f5101537a46a)
| * tests/tool: New tests for natgwlist, getcapabilities, lvs, lvsmasterMartin Schwenke2012-09-2811-0/+353
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6bd4feff7039138d435428eeded51975c44e567c)
| * tests/tool: New function setup_natgw() to setup $CTDB_NATGW_NODESMartin Schwenke2012-09-281-0/+20
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0f0aef21a1bb2d88a8c184ef70c718e0c91acdc3)
| * tools/ctdb: Clean up control_natgw()Martin Schwenke2012-09-281-63/+69
| | | | | | | | | | | | | | | | | | | | * Factor out repeated code into new function find_natgw() * Support both machine and human readable output * Use libctdb Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a56ec75edd1705b0539513d396d311f0e80a3bf5)
| * tools/ctdb: Convert some commands over to libctdbMartin Schwenke2012-09-281-19/+24
| | | | | | | | | | | | | | | | | | control_getcapabilities(), control_lvs(), control_lvsmaster() updated to use ctdb_getcapabilities(), ctdb_getnodemap() as appropriate. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c30ec02615183ecf9b412ad415bf1abd859aec45)
| * tests: libctdb stubs initial ctdb_getcapabilities() implementationMartin Schwenke2012-09-281-0/+7
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 81af67c6959fdbe0566e3f1a00e2be58dd268dc6)
| * tests: libctdb stubs must copy pointers rather than just returning themMartin Schwenke2012-09-281-6/+25
| | | | | | | | | | | | | | | | | | Some code (e.g. NAT gateway code) modifies the returned result so was modifying the original. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a3f15d2828325bbfba5bc5c0a30429e2ce572a44)
| * libctdb: add ctdb_getcapabilities()Martin Schwenke2012-09-285-8/+106
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 140fafef23050d40d66f5b5558c7efcb78f80cd2)
| * tools/ctdb: Remove redundant filtering loop in control_natgwlist()Martin Schwenke2012-09-281-3/+0
| | | | | | | | | | | | | | | | | | This used to catch trailing blank lines. However, these are caught just as effectively by the whitespace filtering in the loop below. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7b75a3bb722dc86139b1a07a0100d08c34620b91)
| * tools/ctdb: natgwlist output is either human readable or machine readableMartin Schwenke2012-09-281-12/+28
| | | | | | | | | | | | | | | | | | The first line is currently human readable and the rest is machine readable. This doesn't make sense. Do one or the other... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b29d5bbaa7048291c4b3a39bf12e04f0436f67da)
| * tools/ctdb: Factor out printing of the machine readable status headerMartin Schwenke2012-09-281-4/+8
| | | | | | | | | | | | | | | | It is already in 2 places and we might use it in another. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 12a0a7a208d1c8fa8991894200d1dc133f3a2d1a)
| * tools/ctdb: NAT gateway code should use CTDB_NATGW_NODESMartin Schwenke2012-09-281-1/+1
| | | | | | | | | | | | | | | | ... not NATGW_NODES. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 2da7730dc06153173778ab14e228960e72ff8a86)
| * tests/eventscripts: New policy routing test with invalid table IDMartin Schwenke2012-09-111-0/+41
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 93c97c3ba3ff714dfa0d056a91ff45010a6e2d66)
| * tests/eventscripts: Modify ip stub to simulate invalid table IDMartin Schwenke2012-09-111-15/+36
| | | | | | | | | | | | | | | | | | | | This involves refactoring ip_route_check_table() into a new function ip_check_table() which tables the operation type (i.e. rule/route) as an argument. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit acdaa04079a9827885f32a7bc078d3365c89b474)
| * Eventscripts: Indent error when a route delete fails in 11.per_ip_routingMartin Schwenke2012-09-111-2/+8
| | | | | | | | | | | | | | | | | | This puts it under the umbrella of the previous warning that should also have been printed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5c3be8f26dcde0b1b3d86928953e74d4a8b35958)
| * tests/eventscript: unit test for 13.per_ip_routing bogus route removalMartin Schwenke2012-09-111-0/+47
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6d41208074f0e9b56c585bca7eb39aaed653c4ca)
| * eventscripts: 13.per_ip_routing should remove bogus routes on ipreallocatedMartin Schwenke2012-09-111-0/+26
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0d0a6f19960f233224970b8d5d19b0e37222616)
| * tests/eventscripts: Add a policy routing unit test for "ip rule del" failureMartin Schwenke2012-09-111-0/+38
| | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0ce5b079f327aba55b62800ccb22d79976fac665)
| * eventscripts: Print a warning on failure to delete a routing ruleMartin Schwenke2012-09-111-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | del_routing_for_ip() currently fails silently, which could hide real errors. In add_routing_for_ip() we don't want to see any error when calling del_routing_for_ip(), since we don't expect the rule to be there. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 30d69defa7e97ab5e3ba0492a27868dde2616494)
| * doc: Fix path string of /etc/sysconfig/ctdb fileAmitay Isaacs2012-08-201-1/+1
| | | | | | | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 49dd755fcd077c84eaf3d2fe5dd7757f5588d49c)
| * recoverd: All inactive nodes should yield recovery master roleMartin Schwenke2012-08-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Not just stopped nodes. In reality, this means that banned nodes will also yield, since nodes in the other inactive states won't be running a daemon. This seems sensible since if another node notices that an inactive node is the recovery master then it will force an election anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fc18188b7b63eb0dafbc47e3abf80e306e1dfc31)
| * recoverd: An inactive node should not force recovery master electionsMartin Schwenke2012-08-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An inactive node can't become the recovery master. So if an inactive node notices that the recovery master is inactive, it shouldn't force an election for recovery master and nominate itself as a candidate. This can cause the recovery master to flip-flop between nodes when all nodes are inactive. If there is actually an active node then it will trigger the election. This is fairly cosmetic but is a step along the way towards ironing out weirdness when all nodes are stopped. Also, fix a related comment. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e7dc10da3ced54ea9d719ad167ee42dcca8dce75)
| * recoverd: main_loop() should not verify local IPs if node is stoppedMartin Schwenke2012-08-081-0/+8
| | | | | | | | | | | | | | | | | | Doing these checks is pointless and potentially causes unnecessary log messages. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a0c30c820fd47d4f8620dc060c825be10754f5d1)