summaryrefslogtreecommitdiffstats
path: root/ctdb
Commit message (Collapse)AuthorAgeFilesLines
* If the record is at the end of the database, pretending it has length 1 ↵Rusty Russell2009-08-041-4/+1
| | | | | | might take us out-of-bounds. Only pretend to be length 1 for the malloc. (This used to be ctdb commit 6de2823f5f7976d4efa20761e518d6b67753f054)
* Port from SAMBA tdb: commit 54a51839ea65aa788b18fce8de0ae4f9ba63e4e7 Author: ↵Rusty Russell2009-08-042-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rusty Russell <rusty@rustcorp.com.au> Date: Sat Jul 18 15:28:58 2009 +0930 Make tdb transaction lock recursive (samba version) This patch replaces 6ed27edbcd3ba1893636a8072c8d7a621437daf7 and 1a416ff13ca7786f2e8d24c66addf00883e9cb12, which fixed the bug where traversals inside transactions would release the transaction lock early. This solution is more general, and solves the more minor symptom that nested traversals would also release the transaction lock early. (It was also suggestd in Volker's comment in 6ed27ed). This patch also applies to ctdb, if the traverse.c part is removed (ctdb's tdb code never received the previous two fixes). Tested using the testsuite from ccan (adapted to the samba code). Thanks to Michael Adam for feedback. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Michael Adam <obnox@samba.org> commit 760104188d0d2ed96ec4a70138e6d0bf86d797ed Author: Rusty Russell <rusty@rustcorp.com.au> Date: Tue Jul 21 16:23:35 2009 +0930 tdb: fix locking error 54a51839ea65aa788b18fce8de0ae4f9ba63e4e7 "Make tdb transaction lock recursive (samba version)" was broken: I "cleaned it up" and prevented it from ever unlocking. To see the problem: $ bin/tdbtorture -s 1248142523 tdb_brlock failed (fd=3) at offset 8 rw_type=1 lck_type=14 len=1 tdb_transaction_lock: failed to get transaction lock tdb_transaction_start failed: Resource deadlock avoided My testcase relied on the *count* being correct, which it was. Fixing that now. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit ce19658ba13272238058e9b9bc03e62f48b737c0)
* Port from SAMBA tdb: commit a6cc04a20089e8fbcce138c271961c37ddcd6c34 Author: ↵Rusty Russell2009-08-041-0/+3
| | | | | | | | | | | | | | Andrew Tridgell <tridge@samba.org> Date: Mon Jun 1 13:13:07 2009 +1000 overallocate all records by 25% This greatly reduces the fragmentation of databases where records tend to grow slowly by a small amount each time. The case where this is most seen is the ldb index records. Adding this overallocation reduced the size of the resulting database by more than 20x when running a test that adds 10k users. (This used to be ctdb commit e72974e5cefabc7035399d16633f727f868caa61)
* Port from SAMBA tdb: commit a386173fa1c7c5bcc11ea9260d84b6c52c154b3d Author: ↵Rusty Russell2009-08-041-0/+12
| | | | | | | | | | | | Andrew Tridgell <tridge@samba.org> Date: Mon Jun 1 13:11:39 2009 +1000 auto-repack in transactions that expand the tdb The idea behind this is to recover from badly fragmented free lists. Choosing the point where the file expands is fairly arbitrary, but seems to work well. (This used to be ctdb commit 233c52bfb087f636ad61e95c12616c02901f4f83)
* Port from SAMBA ctdb: commit 936d76802f98d04d9743b2ca8eeeaadd4362db51 ↵Rusty Russell2009-08-042-1/+93
| | | | | | | | | | | | | | Author: Andrew Tridgell <tridge@samba.org> Date: Tue Dec 16 14:38:17 2008 +1100 imported the tdb_repack() code from CTDB The tdb_repack() function repacks a TDB so that it has a single freelist entry. The file doesn't shrink, but it does remove all freelist fragmentation. This code originated in the CTDB vacuuming code, but will now be used in ldb to cope with fragmentation from re-indexing (This used to be ctdb commit fe3ceb101a5a9c336973c2c6c31406bd8181c2fe)
* Port from SAMBA tdb: commit 4b4fec65db4e202afa13b2d15867f4d8a54d154e Author: ↵Rusty Russell2009-08-041-5/+7
| | | | | | | | | | | | Andrew Tridgell <tridge@samba.org> Date: Thu May 28 16:08:28 2009 +1000 make TDB_NOSYNC affect all the fsync/msync calls in transactions During a transaction commit tdb normally uses fsync/msync calls to make it crash safe. This can be disabled using the TDB_NOSYNC flag, but it wasn't disabling all the code paths that caused a fsync/msync. (This used to be ctdb commit e03980add02a28609a7a0a0c87ebc85419b98144)
* Port from SAMBA tdb: commit a91bcbccf8a2243dac57cacec6fdfc9907580f69 Author: ↵Rusty Russell2009-08-041-0/+5
| | | | | | | | Jim McDonough <jmcd@samba.org> Date: Thu May 21 16:26:26 2009 -0400 Detect tight loop in tdb_find() (This used to be ctdb commit 5253a0ba3a34fbf5810f363ecc094203d49e835f)
* Port from SAMBA tdb: commit 42c0931441ef53a3f977e1334355fa83f05ac184 Author: ↵Rusty Russell2009-08-041-1/+0
| | | | | | | | Tim Prouty <tprouty@samba.org> Date: Tue Mar 31 16:24:07 2009 -0700 tdb: Remove unused variable (This used to be ctdb commit aa22d1875b1997664af983c0baeabe34e40dd253)
* Port from SAMBA tdb:Rusty Russell2009-08-043-52/+133
| | | | | | | | | | | | | | | | commit b90863c0b7b860b006ac49c9396711ff351f777f Author: Howard Chu <hyc@highlandsun.com> Date: Tue Mar 31 13:15:54 2009 +1100 Add tdb_transaction_prepare_commit() Using tdb_transaction_prepare_commit() gives us 2-phase commits. This allows us to safely commit across multiple tdb databases at once, with reasonable transaction semantics Signed-off-by: tridge@samba.org (This used to be ctdb commit 4c3dac215a088947f645f727343997f5d47e3260)
* Merge commit 'martins/master'Ronnie Sahlberg2009-07-3010-206/+204
|\ | | | | | | (This used to be ctdb commit 32a69b0efa078b069802470be6488a4efe32961d)
| * Test suite: fix test file permissions in complex/44_failover_nfs_oneway.sh.Martin Schwenke2009-07-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | Something, perhaps root_squash, causing permission denied on the test file after we copy it over with scp. This sets the initial permissions to be friendly and adds -p to the scp command to maintain those friendly permissions. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 52f21f5a92eb14df7540a2ae9e212d936e646c06)
| * Test suite: fix the test suite's generic event script.Martin Schwenke2009-07-291-0/+5
| | | | | | | | | | | | | | | | | | Add a "stopped" case to log events and stop the event script from failing with an unknown event. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7f67f7395e2233f0bba2e9662404aad49e13f645)
| * Test suite: Fixes for node state parsing plus new stop/continue tests.Martin Schwenke2009-07-297-203/+195
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The parsing of "ctdb status -Y" output to determine various node states was implemented very strictly. Therefore, the parsing broke due to the addition of the new "stopped" state to the output of "ctdb status -Y". This relaxes the parsing so that it should work for versions prior to the introduction of the "stopped" state, as well as future versions that add new states to the end of the list of bits in output of "ctdb status -Y". Similarly the check for cluster unhealthy (in _cluster_is_healthy()) now just checks for a single 1 in any bit in the "ctdb status -Y" output, rather than checking for a particular number of 0s. New tests tests/simple/{41_ctdb_stop.sh,42_ctdb_continue.sh,43_stop_recmaster_yield.sh} do rudimentary testing of the stop and continue functions. Remove tests tests/simple/41_ctdb_ban.sh and tests/simple/42_ctdb_unban.sh. They were both unreliable. tests/simple/21_ctdb_disablemonitor.sh now schedules a restart, since one will be required. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 67c5bfb5f02c9d45a32d976021ede4fb2174dfe9)
| * Merge commit 'origin/master'Martin Schwenke2009-07-292-4/+4
| |\ | | | | | | | | | (This used to be ctdb commit d7ff60a74595dcb4ae41f5a8193de5b898d61227)
| * | onnode: update tests for healthy and connected to cope with new stopped bit.Martin Schwenke2009-07-281-2/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfc926c866e361ab28330747544b268ba130bf30)
* | | change the defaults for repacking to repack once every 120 seconds and ↵Ronnie Sahlberg2009-07-291-2/+2
| | | | | | | | | | | | | | | | | | letting it work for 30 second before timing out. (This used to be ctdb commit 2aa5d18bb42dca4ef9cb049b4fa9d7bc999ce4ad)
* | | repack limit tunableWolfgang Mueller-Friedt2009-07-293-2/+4
| | | | | | | | | | | | | | | | | | Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com> (This used to be ctdb commit a2768b0732f2ab2e3fafda55587bd2e99eedf0fa)
* | | remove repack from eventscriptWolfgang Mueller-Friedt2009-07-291-1/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com> (This used to be ctdb commit dd334caa98882fc59765b7c84eca8e86de785487)
* | | added event repackingWolfgang Mueller-Friedt2009-07-292-3/+128
| | | | | | | | | | | | | | | | | | Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com> (This used to be ctdb commit 78466364f22d6a183710338f138b8c808c6b7753)
* | | vacuum event frameworkRonnie Sahlberg2009-07-294-0/+221
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com> (This used to be ctdb commit 30cdad97706a9e9bb210120699aa939f6b16e8ca)
* | | initial part of new vacuuming patch.Ronnie Sahlberg2009-07-291-0/+5
| | | | | | | | | | | | | | | | | | create some new fields for ctdb_db and tunables (This used to be ctdb commit 3a8e7d36cc42aedf4b7665364224140dcbfb3efa)
* | | From Michael Adam:Ronnie Sahlberg2009-07-291-34/+95
| | | | | | | | | | | | | | | | | | Update the transaction test tool to the new api for transactions (This used to be ctdb commit 4d9a53f142deba6ab578af2fc35bfa99c29c3a99)
* | | client: refuse to do record_store() on a persistent tdb.Michael Adam2009-07-291-36/+2
| | | | | | | | | | | | | | | | | | | | | | | | Only allow stores wrapped in transactions on persistent dbs. Michael (This used to be ctdb commit 9dea71cf72ef79a9aadf8ee7cf1a1899527459ff)
* | | ctdbd: refuse PERSISTENT_STORE if transaction is running.Michael Adam2009-07-291-0/+5
| | | | | | | | | | | | | | | | | | Michael (This used to be ctdb commit c07d6d90f7afd19213ad44624c3e2b9c85f4eea8)
* | | Fix persistent transaction commit race condition.Michael Adam2009-07-294-7/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)
* | | client: set dmaster in ctdb_transaction_store() also when updating an ↵Michael Adam2009-07-291-2/+2
| |/ |/| | | | | | | | | | | | | existing record Michael (This used to be ctdb commit e9194a130327d6b05a8ab90bd976475b0e93b06d)
* | When processing the stop node control reply in the client code we shouldRonnie Sahlberg2009-07-292-4/+4
|/ | | | | | | | | | also check the returned status code in case the _stop() command failed due to the eventscripts failing. If this happens, make "ctdb stop" log an error to the console and try the operation again. (This used to be ctdb commit 20e82e0c48e07d1012549f5277f1f5a3f4bd10d1)
* document the two new commands setlmasterrole and setrecmasterroleRonnie Sahlberg2009-07-283-66/+125
| | | | (This used to be ctdb commit 1d7d7dd515e7ef62cacf2a712a2f4c4d62a38fa5)
* add two commands : setlmasterrole and setrecmasterrole to enable/disable ↵Ronnie Sahlberg2009-07-2811-186/+243
| | | | | | these capabilities at runtime (This used to be ctdb commit 51aaed0e9e42e901451292e8dd545297ab725a62)
* Document the natgw flag and how this changes the output of "ctdbRonnie Sahlberg2009-07-286-106/+185
| | | | | | getcapabilities" (This used to be ctdb commit 9b395986962909a5b0548eaea7e45215df72a08e)
* update the natgw eventscript to set the NATGW capability when this feature ↵Ronnie Sahlberg2009-07-281-0/+2
| | | | | | | | is used This does not modify any behaviour of the daemon itself other than showing this flag as ON in the ctdeb getcapabilities output (This used to be ctdb commit fb337c151bd16ad5ad0c99431224451979d8c651)
* add a command "setnatgwstate {on|off}" that can be used to indicate if this ↵Ronnie Sahlberg2009-07-289-100/+115
| | | | | | node is using natgw functionality or not. (This used to be ctdb commit 89a9bb29a60a6fb1fba55987e6cf0a4baa695e50)
* describe how to activate NATGW without restarting the nodes on a runningRonnie Sahlberg2009-07-284-39/+97
| | | | | | cluster (This used to be ctdb commit b6c8011024ce4574f945d5a470075c6779b34a43)
* new version 1.0.87Ronnie Sahlberg2009-07-172-3/+27
| | | | (This used to be ctdb commit d187eb8507f35a650ff3ffc50fa49110eebca0bd)
* Merge commit 'martins/master'Ronnie Sahlberg2009-07-171-2/+2
|\ | | | | | | (This used to be ctdb commit febf3d6d3f2bdf187c042f560aefc54b8ac72454)
| * Test suite: Fix debug code for unexpectedly unhealthy cluster.Martin Schwenke2009-07-161-2/+2
| | | | | | | | | | | | | | | | | | The debug code should run "ctdb status" on a cluster node, not on the test client. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 34e6f8a04b12f8879eb42d417f9741502ccccf0f)
* | document the new stopped eventRonnie Sahlberg2009-07-171-0/+6
| | | | | | | | (This used to be ctdb commit 70603d9a79c80379bf65d9d703c399a65c109c52)
* | create a new event : stopped.Ronnie Sahlberg2009-07-175-5/+53
| | | | | | | | | | | | | | | | This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ... Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered. (This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59)
* | When we create new election data to send during elections, we must re-read ↵Ronnie Sahlberg2009-07-171-1/+8
| | | | | | | | | | | | the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522)
* | update the eventscript to ensure that stopped nodes can not become the natgw ↵Ronnie Sahlberg2009-07-172-5/+9
| | | | | | | | | | | | | | | | master also verify that we actually do have a natgw master available if this is configured and make the node unhealthy if not. (This used to be ctdb commit 7f273ee769d671d8c8be87c9187302fb77e814f3)
* | if all nodes are STOPPED, pick one of the STOPPED nodes as natgw masterRonnie Sahlberg2009-07-171-0/+13
| | | | | | | | (This used to be ctdb commit 8bbd96cfbbe98f3fc19e432797cbf4478f753a0b)
* | Do not allow STOPPED or DELETED nodes to become the NATGW masterRonnie Sahlberg2009-07-171-2/+4
| | | | | | | | (This used to be ctdb commit 4505ea15408ad40dd8deb4041fd75a65a0ad9336)
* | stopped nodes can not win a recmaster electionRonnie Sahlberg2009-07-091-1/+18
| | | | | | | | | | | | stopped nodes must yield the recmaster role (This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba)
* | change the infolevel when logging stop/continue commandsRonnie Sahlberg2009-07-091-2/+2
| | | | | | | | (This used to be ctdb commit 1e007c833098b03dd81797c081da1ae1b10c971c)
* | recovery daemon needs to monitor when the local ctdb daemon is stopped and ↵Ronnie Sahlberg2009-07-091-0/+28
| | | | | | | | | | | | ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a)
* | document the new commands ctdb stop/continueRonnie Sahlberg2009-07-096-147/+251
| | | | | | | | (This used to be ctdb commit d6ddea4167ccdad05e88378ee3f22b6125969562)
* | dont let other nodes modify the STOPPED flag for the local process when ↵Ronnie Sahlberg2009-07-091-0/+10
| | | | | | | | | | | | pushing out flags changes (This used to be ctdb commit 501a2747d839ca291b70c761098549cf6d47a158)
* | add two new controls, CTOP_NODE and CONTINUE_NODERonnie Sahlberg2009-07-096-8/+72
| | | | | | | | | | | | that are used to stop/continue a node instead of using modflags messages (This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a)
* | make it possible to start the daemon in STOPPED modeRonnie Sahlberg2009-07-095-0/+15
| | | | | | | | (This used to be ctdb commit 866aa995dc029db6e510060e9e95a8ca149094ac)
* | remove the header printed for the machinereadable output for natgwlistRonnie Sahlberg2009-07-091-1/+0
| | | | | | | | (This used to be ctdb commit 049271c83a09afb8d6c3e5212cf9ca782956b0c6)