| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
metze
(This used to be ctdb commit a2c9e4578e149eccb2c6183f64a6b657eb95c5e1)
|
|
|
|
|
|
|
|
|
|
|
| |
We know ask for the known and available interfaces.
This means a node gets a RELEASE_IP event for all interfaces
it "knows", but doesn't serve and a node only gets a TAKE_IP event
for "available" interfaces.
metze
(This used to be ctdb commit a695a38e49e7c3e15a9706392dc920eeab1f11ba)
|
|
|
|
|
|
| |
metze
(This used to be ctdb commit 6bd780510058e5589f2f7c3722d37acbba4935ab)
|
|
|
|
|
|
| |
metze
(This used to be ctdb commit 91122c322fbec08138b92c528d9a946f6727b4fd)
|
|
|
|
|
|
| |
metze
(This used to be ctdb commit ff5291778f0752e176539397e9530dcf0e546bea)
|
|
|
|
|
|
| |
metze
(This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)
|
|
|
|
|
|
| |
metze
(This used to be ctdb commit 400b4806c4a9686a2ee6398b5d7c3e0ca0793fd1)
|
|
|
|
|
|
|
|
|
| |
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.
metze
(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
|
|
|
|
|
|
| |
metze
(This used to be ctdb commit 8171d66f0061fe23ed6dfef87ffe63bfc19596eb)
|
|
|
|
|
|
| |
metze
(This used to be ctdb commit 4b4dd5d7f81bf226e05c7f3d40087043da1517a2)
|
|
|
|
|
|
|
|
| |
configureable using --log-ringbuf-size=<num-entries>.
Add an entry in the sysconfig file to set this persistently.
(This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Merge commit 'rusty/ctdb-no-setsched'
Conflicts:
server/ctdb_vacuum.c
(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays. But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.
This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB. We warn, but don't exit, if it fails.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
1) It's buggy. Code needs to be carefully written (ie. no busy
loops) to handle running with it, and we fork and run scripts.[1]
2) It makes debugging harder. If ctdbd loops (as has happened recently)
it can be extremely hard to get in and see what's happening. We've already
seen the valgrind hacks.
3) We have seen recent scheduler problems. Perhaps they are unrelated,
but removing this very unusual setup is unlikely to hurt.
4) It doesn't make anything faster. Under all but the most perverse of
circumstances, 99% of the cpu gives the same performance as 100%, and
we will always preempt normal processes anyway.
[1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for
each script" by removing the switch_from_server_to_client() which
restored it, but even that was only for monitor scripts. Others were
run with RT priority.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit. We can also happily move the kill-child eventscript hack under
this flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
|
| |
| |
| |
| |
| |
| | |
metze
(This used to be ctdb commit 1cdc8dbb9cb971cf6dd6cd22b1adaf70ddc77e65)
|
| |
| |
| |
| |
| |
| | |
metze
(This used to be ctdb commit 5abe44d0113839d3a45c9a31d30856aa70c2ea1f)
|
| |
| |
| |
| |
| |
| | |
metze
(This used to be ctdb commit 7332d900538f0cbcd953a723417a0fe31dc9807c)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Depending on --max-persistent-check-errors we allow ctdb
to start with unhealthy persistent databases.
The default is 0 which means to reject a startup with
unhealthy dbs.
The health of the persistent databases is checked after each
recovery. Node monitoring and the "startup" is deferred
until all persistent databases are healthy.
Databases can become healthy automaticly by a completely
HEALTHY node joining the cluster. Or by an administrator
with "ctdb backupdb/restoredb" or "ctdb wipedb".
metze
(This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This node internal tdb will store the HEALTH state of persistent
tdbs.
metze
(This used to be ctdb commit cbda4666be88c11a810a192a70667b57f773ace1)
|
| |
| |
| |
| |
| |
| | |
metze
(This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)
|
|/
|
|
|
|
| |
metze
(This used to be ctdb commit 656a6ec5ed81ccfbb86144156a3158e48f105ee4)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
more :"
This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c.
Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c
(This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)
|
|
|
|
|
|
| |
This reverts commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d.
(This used to be ctdb commit 3d2d877d877146ca09a28a3a44f4840eb36fd377)
|
|\
| |
| |
| | |
(This used to be ctdb commit ac06a0e042e7d024060d6e87a49bda9ccc072c52)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch improves the handling of the fetch_lock operation on non-persistent
databases that ctdb clients have to do very frequently.
The normal flow how this goes is the following:
1. Client does a local fetch_lock on the database
2. Client looks if the local node is dmaster.
If yes, everything is fine
If no, continue here
3. Client unlocks the local record
4. Client issues a "get me the record" call to ctdbd
5. ctdbd goes out and fetches the dmaster role
6. ctdbd tells the client to retry
7. Client starts over again
The problem is between step 6 and 7: Before the client has had the chance to
retry (i.e. catch the record with a fetch_locked), another node might have come
asking ctdbd to migrate away the record again. This is a real problem, I've
seen >20 loops of this kind in real workloads.
This patch does the following: Whenever ctdb receives a record as result of
step 5, it puts the key on a "holdback list". As long as a key is on this list,
a request to migrate away the dmaster is put on hold. It is the client's duty
to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2
after having asked ctdb to fetch the record. This will release the key from the
"holdback list" and re-issue all dmaster migration requests.
As a safeguard against malicious clients, once a second (default 1000msecs,
tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of
held back keys, deletes them and releases all held back migration requests.
(This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)
|
| |
| |
| |
| |
| |
| | |
Michael
(This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996)
|
| |
| |
| |
| |
| |
| | |
Michael
(This used to be ctdb commit 4b1dbcf0853bdc4832d39a477823ae34f216da52)
|
| |
| |
| |
| | |
(This used to be ctdb commit 6af5e74a21546d723008d69d6752ebebf898c947)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a simplified version of the trans2 commit control:
It just rolls out the marshall buffer to all active nodes.
It is the main ctdbd part of the re-implementation of the
persistent transactions. The client code is changed to
take a global lock to start a transactions and store into
the marshal buffer instead of writing to the local tdb
under a local transaction.
The old transaction implementation is going to be
removed in a later commit.
Michael
(This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)
|
| |
| |
| |
| |
| |
| |
| |
| | |
since we no longer ban nodes when dodgy scripts continue to hang.
We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban.
(This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)
|
| |
| |
| |
| |
| |
| | |
EventScriptUnhealthyOnTimeout
(This used to be ctdb commit 401f421fa003d9515df15e759b50b56e0c67d69c)
|
|/
|
|
|
|
| |
there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails
(This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)
|
|
|
|
|
|
|
| |
Date: Wed, 9 Dec 2009 22:45:12 +0100
Subject: [PATCH] Revert an accidential commit
(This used to be ctdb commit af6656f2844d8fd72204a70358c9d589dbe1bd34)
|
|
|
|
|
|
|
|
| |
This might be a bit less efficient, but experience in winbind has shown that
event callbacks can trigger changes in the socket state in very hard to
diagnose ways.
(This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)
|
|
|
|
|
|
|
|
|
|
|
|
| |
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.
"ctdb scriptstatus all" returns all event script results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
|
|
|
|
|
|
|
|
|
| |
We're going to need this so ctdb can query non-monitor status.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)
|
|
|
|
|
|
|
|
|
|
| |
Rather than only tranferring to last_status for monitor events, do
it for every event (ctdb->last_status is now an array).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)
|
|
|
|
|
|
|
|
|
|
| |
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A new helper functions which sets up an event attached to the child's
stdout/stderr which gets routed to the logging callback after being
placed in the normal logs.
This is a generalization of the previous code which was hardcoded to
call ctdb_log_event_script_output.
The only subtlety is that we hang the child fds off the output buffer;
the destructor for that will flush, which means it has to be destroyed
before the output buffer is.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.
We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We put a "scripts" member in ctdb_event_script_state, rather than using
a special struct for monitor events. This will fit better as we further
unify the different events, and holds the reports from the child process
running each monitor script.
Rather than making the monitor state a child of current_monitor_status_ctx,
we just point current_monitor directly at it. This means we need to reset
that pointer in the destructor for ctdb_event_script_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have monitor_event_script_ctx and other_event_script_ctx, and
current_monitor_status_ctx in struct ctdb_context. This seems more
complex than it needs to be.
We use a single "event_script_ctx" as parent for all event script
state structures. Then we explicitly reparent monitor events under
current_monitor_status_ctx: this is freed every script invocation to
kill off any running scripts anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)
|
|
|
|
|
|
|
|
|
| |
eventscript.c uses this now, but our next patch makes others use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)
|
|
|
|
|
|
|
|
|
|
| |
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This starts the move toward more expressive encoding of return values:
positive values mean the script ran, negative means we had a problem with
the script (and the value is the errno).
This does timeout, but changes the ctdb tool to recognize it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
|
|
|
|
|
|
|
|
|
|
| |
This simplifies the code a little: last_status is now read to go
(it's only used by the scriptstatus command at the moment).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)
|
|
|
|
|
|
| |
Michael
(This used to be ctdb commit e5496a83ef4a01604195b27c4b97f50d4979510e)
|
|
|
|
|
|
|
|
| |
It is unlikely we will need something this verbose for normal troubleshooting.
This allows us to keep a significantly longer time interval of log messages
in the 500k slots available in the ringbuffer.
(This used to be ctdb commit cc99c05c0c6484ad574039a454e6133852cb41fa)
|
|
|
|
|
|
|
|
|
|
| |
This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed.
If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw.
bz58185
(This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)
|