summaryrefslogtreecommitdiffstats
path: root/ctdb
Commit message (Collapse)AuthorAgeFilesLines
...
| * | libctdb: test: logging enhancementRusty Russell2010-06-214-20/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make children log through a pipe to the parent, which then spits it out only if the child has a problem. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8ac006cf6c6cbfd3fe1606178eb0f0127d33f632)
| * | libctdb: test infrastructureRusty Russell2010-07-1622-0/+2487
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces 'ctdb-test', a program for testing libctdb. It takes commands on standard input (with reduced functionality) or an input file. It still needs some cleaning up, but you can uncover a bug in libctdb today simply by running a simple attachdb test: $ ctdb-test tests/attachdb1.txt It will print out a crash, and the path of successful and failed operations which lead to it: ... Child signalled 11 on failure path: [malloc]:1:S[socket]:1:S[connect]:1:S[malloc]:1:S[malloc]:1:S[malloc]:1:S[malloc]:4:S[malloc]:4:F Feed that failure path into ctdb-test using --failpath (under a debugger): gdb --args ctdb-test tests/attachdb1.txt --failpath=[malloc]:1:S[socket]:1:S[connect]:1:S[malloc]:1:S[malloc]:1:S[malloc]:1:S[malloc]:4:S[malloc]:4:F And you hit the exact error. It is based on the fork-to-fail model of nfsim. The relevant parts are from page 154 of the proceedings of 2005 Ottawa Linux Symposium Volume II: http://www.linuxsymposium.org/2005/linuxsymposium_procv2.pdf Or our presentation of same (from slide 21): http://ozlabs.org/~jk/projects/nfsim/nfsim.sxi Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit b4aab4199a57898877b6545a54f212087ed4b35a)
| * | libctdb: implement synchronous readrecordlock interface.Rusty Russell2010-06-213-1/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because this doesn't use a generic callback, it's not quite as trivial as the other sync wrappers. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 1f20b938d46d4fcd50d2b473c1ab8dc31d178d2d)
| * | libctdb: implement ctdb_disconnect and ctdb_detachdbRusty Russell2010-06-184-12/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are important for testing, since we can easily tell if we leak memory if there are outstanding allocations after calling these. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 18a212aa40d0ff9ff59775c6fcf9dc973e991460)
| * | libctdb: fix io_elem resource leak on realloc failure.Rusty Russell2010-06-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found by nfsim. I knew about this, but as we stop when it happens anyway I didn't fix it. But it bugs nfsim, so fix it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 936b02443d36306407d6a26e8037cf31e3190b32)
| * | libctdb: fix writerecord() to actually write the record.Rusty Russell2010-06-211-0/+2
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 680ee6afaa89f21115a1bf33a8b9e7e92084a1a1)
| * | libctdb: ctdb_service() never returns < 0Rusty Russell2010-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Found by ctdb-test. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0e8210f19edf2ae14154afb85d9b96951881f31f)
| * | libctdb: check ctdb_request_free & ctdb_cancel used appropriately.Rusty Russell2010-06-181-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since I made this mistake myself, we should check for it. We could have one function that does both, but from a user's point of view they are very different and it's quite possibly a bug if they think the request is finished/unfinished when it's not. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 70f6ed2634fb10749cdad3deffa96a1aa439c235)
| * | libctdb: synchronous should be using ctdb_cancel to kill unfinished requests.Rusty Russell2010-06-181-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Found by ctdb-test. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit cd6b2f46075bfb64561496960af7fc2e95500e52)
| * | libctdb: fix uninitialized field usage on ctdb_attach failure pathRusty Russell2010-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Found by ctdb-test. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 54c1036090d930c19231038ca861297153c1d0cf)
| * | libctdb: removed unused lock field from struct ctdb_dbRusty Russell2010-06-181-3/+0
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 256653a223c48ed932ce85f89fc2c2dda14f8c27)
* | | Correctly set docdirVolker Lendecke2010-08-161-1/+1
| | | | | | | | | | | | (This used to be ctdb commit a69916d0687309766b0014dc9cee6a966aaa89da)
* | | tdb: workaround starvation problem in locking entire database.Rusty Russell2010-08-162-18/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Imported from SAMBA 11ab43084b10cf53b530cdc3a6036c898b79ca38) We saw tdb_lockall() take 71 seconds under heavy load; this is because Linux (at least) doesn't prevent new small locks being obtained while we're waiting for a big log. The workaround is to do divide and conquer using non-blocking chainlocks: if we get down to a single chain we block. Using a simple test program where children did "hold lock for 100ms, sleep for 1 second" the time to do tdb_lockall() dropped signifiantly. There are ln(hashsize) locks taken in the contended case, but that's slow anyway. More analysis is given in my blog at http://rusty.ozlabs.org/?p=120 This may also help transactions, though in that case it's the initial read lock which uses this gradual locking routine; the update-to-write-lock code is separate and still tries to update in one go. Even though ABI doesn't change, minor version bumped so behavior change can be easily detected. CQ:S1018154 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ec0009443a0ac4187ce5212a5143689daa58a02)
* | | tdb: Fix tdb_check() to work with read-only tdb databases.Rusty Russell2010-08-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | (Import from SAMBA bc1c82ea137e1bf6cb55139a666c56ebb2226b23) The function tdb_lockall() uses F_WRLCK internally, which doesn't work on a fd opened with O_RDONLY. Use tdb_lockall_read() instead. (This used to be ctdb commit a5db1122ec48d7e7384066848457c850c1a6cf3c)
* | | tdb: remove unused variable in tdb_new_database().Rusty Russell2010-08-161-1/+0
| | | | | | | | | | | | | | | | | | (Imported from SAMBA 2eab1d7fdcb54f9ec27431ca4858eb64cb1bd835) (This used to be ctdb commit 52a87e608d0406aee9df99f7ac3ce16e834b520b)
* | | tdb: fix short write logic in tdb_new_databaseRusty Russell2010-08-163-17/+17
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | Commit 207a213c/24fed55d purported to fix the problem of signals during tdb_new_database (which could cause a spurious short write, hence a failure). However, the code is wrong: newdb+written is not correct. Fix this by introducing a general tdb_write_all() and using it here and in the tracing code. Cc: Stefan Metzmacher <metze@samba.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 27ba0e5a6681063225df7244a85aa304c51c6948)
* | Create a new command "ctdb sync" that isd just an alias for "ctdb ↵Ronnie Sahlberg2010-08-101-0/+2
| | | | | | | | | | | | ipreallocate" (This used to be ctdb commit eededd592c92c59b435f0046989b2327fcc280b1)
* | Update a log message to reflect that this does no longer only happenRonnie Sahlberg2010-08-101-1/+1
| | | | | | | | | | | | when trying/failing to ban a node. (This used to be ctdb commit dc6b143c4785449e8c4ef7a46bf16adba750ab56)
* | Merge remote branch 'martins/master'Ronnie Sahlberg2010-08-0910-195/+214
|\ \ | | | | | | | | | (This used to be ctdb commit 9ca09ee9129b787428a2ceac9731b12166dc8718)
| * | Add some command-line options to ctdb_diagnostics.Martin Schwenke2010-08-061-45/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some contexts ctdb_diagnostics generates too many errors when it is run on heterogeneous and machine-configured clusters. In some clusters some nodes are expected to be differently configured and also machine-generated configured files can have comments containing timestamps. This adds some command-line options that can be used to reduce the number of errors reported: -n <nodes> Comma separated list of nodes to operate on -c Ignore comment lines (starting with '#') in file comparisons -w Ignore whitespace in file comparisons --no-ads Do not use commands that assume an Active Directory Server The -n option simply allows ctdb_diagnostics to operate on a subset of nodes, avoiding file comparisons with and data collection on nodes that are differently configured. For file comparisons, instead of showing each file on the current node and then comparing other nodes to that file, the file from the first (available or requested) nodes is shown and then other nodes are compared to that. That has resulted in changes in output - that is, ctdb diagnostics no longer prints messages referencing the current node. -c and -w are used to weaken comparisons between configuration files. --no-ads can be used to avoid running ADS-specific commands if a cluster uses LDAP (or other non-ADS) configuration. This also fixes a number of bugs in related code: * A call to onnode was losing the >> NODE ... << lines because they now go to stderr. This was changed in onnode long ago but ctdb_diagnostics was never updated to match. * ctdb_diagnostics was counting lines in /etc/ctdb/nodes to determine what nodes to operate on. For some time the nodes file has supported syntax that makes this invalid. "ctdb listnodes -Y" is now used to list available nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 36c8244a0f68c7c9bbee40982f230e9d14d3c0ea)
| * | Test suite: remove unnecessary verbosity from enable/continue tests.Martin Schwenke2010-08-052-12/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 69c95b2a42f55b80cd8d91a90ab55166f964163b)
| * | Test suite: Fix typo in continue test.Martin Schwenke2010-08-051-1/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c2bce140da7c4b118394ee77bb9d0348d27e7e95)
| * | Test suite: weaken ctdb continue/enable tests for non-deterministic IPs.Martin Schwenke2010-08-053-13/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These tests currently wait for the old IPs to fail back to the test node. This isn't guaranteed with DeterministicIPs disabled. This changes those tests to wait until the test node gets at least 1 IP assigned. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e9b3f5b1b51d541a911a27eb4348b368f28d185e)
| * | initscript: wait until we can ping ctdbd before setting tunables.Martin Schwenke2010-08-051-5/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we do a "sleep 1" after starting and before running set_ctdb_variables to set the tunables. This is too arbitrary and might fail if the system is heavily loaded. This, for example, could result in some nodes running with DeterministicIPs and some without, in which case a different IP allocation algorithm would run depending on who is the recmaster! This makes the start function wait until "ctdb ping" succeeds (with 10 second timeout) before trying to run set_ctdb_variables. If a timeout occurs then the start function attempts to kill ctdbd before exiting with a failure. It also cleans up the status reporting code for Red Hat and SUSE so that the final status code is reported. Currently there are cases where a correct status is prematurely reported before a failure occurs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cdcd05662a30b51caaeeab4ac44138cac2474e0a)
| * | Test suite - make the ctdb_fetch test cope with "Reqid wrap!" messages.Martin Schwenke2010-08-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Recent CTDB notice the wrap and print this message. The test needs to cope. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b93b60ec96d02ce4f54921e85a5c5554d1fc0c55)
| * | Test suite: remove thaw/freeze tests.Martin Schwenke2010-08-052-103/+0
| | | | | | | | | | | | | | | | | | | | | | | | They test debugging commands that no longer operate as expected. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d33fa4d6557aab1938049f194c2de55f2c395bd2)
| * | Test suite - fix addip test.Martin Schwenke2010-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test currently checks that all existing IPs plus the newly added IP are on the test node after "ctdb addip" is run. With DeterministicIPs enabled, if the new IP is "before" other IPs then the other IPs may be shuffled by the deterministic IPs modulo algorithm. This will happen on the 1st recovery after the move. Sometimes this recovery happens before we get the list of IPs to check and sometimes after, so the test is racy. The fix is to simply check for the presence of the new IP and not worry about the others. This reduces whatever value this test had... but you can't have everything. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1ef7c8e64c7a39330be09ae4d00b70238133e0b5)
| * | Merge remote branch 'martins/master'Martin Schwenke2010-08-0429-150/+1148
| |\ \ | | | | | | | | | | | | (This used to be ctdb commit 5d9e4b6ee7d2b5290a74e7be79bdf51a43b72f43)
| | * | Testing: IP allocation simulation - add option to change odds of a failure.Martin Schwenke2010-08-031-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b2a2e301025d7fbfe5eeaac436693cde6d404490)
| | * | Testing: IP allocation simulation - clean up usage message.Martin Schwenke2010-08-031-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Group options better and make the language consistent between options. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bc38c17e4115fae00c89d00537fdcfe621111b37)
| | * | Testing: IP allocation simulation - print maximum number of unhealthy nodes.Martin Schwenke2010-08-031-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This can imply something about imbalance. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ecb80e2b6be9326708d1fc87ad3028c6836d5858)
| | * | Testing: IP allocation simulation - improve help for options.Martin Schwenke2010-08-031-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 058501b92f602e7d2240d1cb08ed78a807564c48)
| * | | Test suite - try to make addip test more reliable and add some debugging.Martin Schwenke2010-08-041-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This test is failing in some situations. The "ctdb addip" command works but the IP never appears in the "ctdb ip" output. Try restricting the last octet to be between 101-199. At the moment addresses like 10.0.2.1 are being chosen and these are often the address of the host machine in autocluster configurations... so might cause weirdness. Also add some debugging if checking for the IP address times out. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ae52cb63756bc60de8d32e01bac5d70975a1c7a0)
| * | | Test suite: handle extra lines in statistics output.Martin Schwenke2010-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a476a56da2219c1047081032595c045f65f8ad3f)
| * | | Test suite: handle change to disconnected node error message.Martin Schwenke2010-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d75d7b49cf729bace820b3225e5c6d069bbcbc53)
* | | | iupdate the docs that ctdb freeze is no moreRonnie Sahlberg2010-08-051-8/+3
| | | | | | | | | | | | | | | | (This used to be ctdb commit 79ef9909dfa0904d789c69eb6b9c80e8908a1100)
* | | | remove the "ctdb freeze" debugging commandRonnie Sahlberg2010-08-052-36/+0
| |/ / |/| | | | | | | | (This used to be ctdb commit bd005b987255eb65cd3826dce984281ee757daf6)
* | | Testing: IP allocation simulation - make usage/failure more obvious.Martin Schwenke2010-08-023-58/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tweak the usage message for -g option. Print an error if no node groups defined, instead of curious Python error. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8b883eb9346b8278d268e35b56ac680cd9526b97)
* | | Testing: IP allocation simulation - rename an example to node_group_extra.py.Martin Schwenke2010-08-021-0/+31
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 974f849df0aca2cfedb38fa815894955e32803a8)
* | | Testing: IP allocation simulation - rename an example to node_group_simple.py.Martin Schwenke2010-08-021-0/+26
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0a2a5602233a8208e2729192e50d816faed0151a)
* | | Testing: IP allocation simulation - add general node group example.Martin Schwenke2010-08-021-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | This allows node pool configuration to be specifed on the command-line. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d382d9023928f75f360a115ae1e9c1036423416e)
* | | Testing: IP allocation simulation - update options processing in examples.Martin Schwenke2010-08-024-5/+12
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a65ca1a71386f40080dd553756f3600d3b20d523)
* | | Testing: IP allocation simulation - Update README.Martin Schwenke2010-08-021-0/+3
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ed64b7f2b3cd920bb0f5dfd7f64ed0afc0b99fc1)
* | | Testing: IP allocation simulation - fix nondeterminism in do_something_random().Martin Schwenke2010-08-021-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code makes random choices from unsorted lists. This ensures the lists are sorted. Also, make the code easier to read by doing the random selction from lists of PNNs rather than lists of Node objects. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a01244499dc3567f5aa934b1864b9bc183a6c242)
* | | Testing: IP allocation simulation - Tweak options handling and Cluster.diff().Martin Schwenke2010-08-021-31/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | process_args() must now be called by programs inporting this module. Options are put into global variable "options", which can be references using "ctdb_takeover.options". Can now pass extra option specifications to process_args(). Remove global variable prev and make it a Cluster object variable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a32298e7bc819694518e859f100f9444ff5663cd)
* | | Testing: IP allocation simulation - update copyright message.Martin Schwenke2010-08-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | There's a lot of new code here, so let's make the copyright message make sense. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e6e56e5989def6704b116e806c1f261c7f3fc03f)
* | | Testing: IP allocation simulation - add command line option for random seed.Martin Schwenke2010-08-011-0/+5
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8362029c7cfc1041e46ee2116aa5cade6edce435)
* | | Testing: IP allocation simulation - save some warnings for verbose mode.Martin Schwenke2010-08-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to see warnings about unallocatable IPs unless we're in verbose mode. Can node be run with -n (and without -v or -d) to see just the statistics. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 55370936ac5def5ebf138910388a2ddc2df9c20f)
* | | Testing: IP allocation simulation prints final imbalance in statistics.Martin Schwenke2010-08-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is useful to know. When things get unbalance they tend to stay that way. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a40faa2096effc2657ac05b729f3259bbb2e1fed)
* | | Testing: In IP allocation simulation count total number of events.Martin Schwenke2010-08-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This starts at -1 because we always have to do the initial allocation. No longer print event number for each event by default, only when verbose is enabled. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c9a761726d141bcaa8ba7851150f71a8130b473a)