summaryrefslogtreecommitdiffstats
path: root/monitor.c
Commit message (Collapse)AuthorAgeFilesLines
* mdmon: record sync_completed directly to the metadataDan Williams2010-06-151-3/+7
| | | | | | | | | | | | | | | | | | | | When sync_action is idle mdmon takes the latest value of md/resync_start or md/<dev>/recovery_start to record the resync/rebuild checkpoint in the metadata. However, now that mdmon is reading sync_completed there is no longer a need to wait for, or force an idle event to take a checkpoint. Simply update the forward progress of ->last_checkpoint at every wakeup event and force it to be recorded at least every 1/16th array-size interval. It may be recorded more frequently if a ->set_array_state() event occurs. This also cleans up some confusion in handling the dual-rebuild case. If more than one spare has been activated the kernel starts the rebuild at the lowest recovery offset, so we do not need to worry about min_recovery_start(). Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: periodically checkpoint recoveryDan Williams2010-05-141-0/+33
| | | | | | | | | | | | The kernel updates and notifies md/sync_completed when it is time to take a checkpoint. When this occurs (at 1/16 array size intervals) write 'idle' to md/sync_action to have the current recovery position updated in recovery_start and resync_start. Requires the metadata handler to reset ->last_checkpoint when it has determined that recovery has ended. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: insist on creating .pid file at startup.NeilBrown2010-02-081-1/+5
| | | | | | | | | | | | | | | | Now that we don't "mdadm --takeover" until /var/run is writable there is no need to continually try to create files in there. So only create these files at startup and fail if they cannot be made. This means that to start an array with externally managed metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be writable. To 'takeover' from a previous mdmon instance, /var/run must be writable. This means we don't need to worry about SIGHUP (which was once used to tell us it was time to create .pid) and SIGALRM. Signed-off-by: NeilBrown <neilb@suse.de>
* Introduce MaxSectorDan Williams2009-12-211-1/+1
| | | | | | | Replace occurrences of ~0ULL to make it clear we are talking about maximal resync/recovery position. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Add scaffolding for handling md/dev-XXX/recovery_startDan Williams2009-12-211-1/+3
| | | | | | Prepare the code to handle saving a recovery checkpoint. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: cleanup resync_startDan Williams2009-12-141-13/+6
| | | | | | | | | | We don't need to sprinkle reads of this attribute all over the place, just once at the entry of read_and_act(). Also, the mdinfo structure for the array already has a 'resync_start' member, so just reuse that. Finally, rename get_resync_start() to read_resync_start to make it consistent with the other sysfs accessors in monitor.c. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Update copyright dates and remove references to @cse.unsw.edu.auNeilBrown2009-06-021-2/+2
| | | | | | Also removed 'paper' addresses. Signed-off-by: NeilBrown <neilb@suse.de>
* Wait for POLLPRI on /proc or /sys files.NeilBrown2009-04-141-1/+1
| | | | | | | | | | | | | From 2.6.30, /proc/mounts and various /sys files will probably always returns 'readable' to select, so we will need to wait on POLLPRI to get the 'new data is available' signal. When using select, this corresponds to an 'exception', so adjust calls to select accordingly. In one case we sometimes wait on a socket and sometime on /proc/mounts, so we need to test which. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: fix resync completion detectionDan Williams2009-04-121-2/+4
| | | | | | | | | Starting with 2.6.30 the md/resync_start attribute will no longer return a non-sensical number when resync is complete, instead it now returns 'none'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: fix missed 'clean' eventDan Williams2009-02-241-28/+21
| | | | | | | | | | | | mdmon may miss events because it re-reads state after read_and_act. The additional read is used to determine dirty status before allowing a sigterm to proceed. Since read_and_act is in the best position to determine 'dirty' status and its return value is not used, modify it to return true if the array is dirty. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: pass symbolic name to mdmon instead of device name.NeilBrown2008-11-201-1/+1
| | | | | | | | | | | | | Now that names in /dev are usually created (eventually) by udev, it isn't really safe to rely in finding a name in /dev to pass to mdmon to identify which array to monitor. And it isn't really necessary to have a name in /dev. So just pass the symbolic name, e.g. md127 or md123. Change util.c to pass that name, and change mdmon to process the name sensibly. Signed-off-by: NeilBrown <neilb@suse.de>
* update copyright headersDan Williams2008-10-281-0/+19
| | | | Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: terminate cleanDan Williams2008-10-151-6/+38
| | | | | | | | | | | We generally don't want mdmon to be terminated, but if a SIGTERM gets through try to leave the monitored arrays in a clean state, block attempts to mark the array dirty, and stop servicing the socket. When we are killed by sigterm don't remove the pidfile let that be cleaned up by the next monitor. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* trivial warn_unused_result squashingDan Williams2008-10-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Made the mistake of recompiling the F9 mdadm rpm which has a patch to remove -Werror and add "-Wp,-D_FORTIFY_SOURCE -O2" which turns on lots of errors: config.c:568: warning: ignoring return value of asprintf Assemble.c:411: warning: ignoring return value of asprintf Assemble.c:413: warning: ignoring return value of asprintf super0.c:549: warning: ignoring return value of posix_memalign super0.c:742: warning: ignoring return value of posix_memalign super0.c:812: warning: ignoring return value of posix_memalign super1.c:692: warning: ignoring return value of posix_memalign super1.c:1039: warning: ignoring return value of posix_memalign super1.c:1155: warning: ignoring return value of posix_memalign super-ddf.c:508: warning: ignoring return value of posix_memalign super-ddf.c:645: warning: ignoring return value of posix_memalign super-ddf.c:696: warning: ignoring return value of posix_memalign super-ddf.c:715: warning: ignoring return value of posix_memalign super-ddf.c:1476: warning: ignoring return value of posix_memalign super-ddf.c:1603: warning: ignoring return value of posix_memalign super-ddf.c:1614: warning: ignoring return value of posix_memalign super-ddf.c:1842: warning: ignoring return value of posix_memalign super-ddf.c:2013: warning: ignoring return value of posix_memalign super-ddf.c:2140: warning: ignoring return value of write super-ddf.c:2143: warning: ignoring return value of write super-ddf.c:2147: warning: ignoring return value of write super-ddf.c:2150: warning: ignoring return value of write super-ddf.c:2162: warning: ignoring return value of write super-ddf.c:2169: warning: ignoring return value of write super-ddf.c:2172: warning: ignoring return value of write super-ddf.c:2176: warning: ignoring return value of write super-ddf.c:2181: warning: ignoring return value of write super-ddf.c:2686: warning: ignoring return value of posix_memalign super-ddf.c:2690: warning: ignoring return value of write super-ddf.c:3070: warning: ignoring return value of posix_memalign super-ddf.c:3254: warning: ignoring return value of posix_memalign bitmap.c:128: warning: ignoring return value of posix_memalign mdmon.c:94: warning: ignoring return value of write mdmon.c:221: warning: ignoring return value of pipe mdmon.c:327: warning: ignoring return value of write mdmon.c:330: warning: ignoring return value of chdir mdmon.c:335: warning: ignoring return value of dup monitor.c:415: warning: rv may be used uninitialized in this function ...some of these like the write() ones are not so trivial so save those fixes for the next patch. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* monitor: clean up some debug messagesDan Williams2008-09-151-2/+3
| | | | Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'mdadm --wait-clean' wait for array to be marked cleanDan Williams2008-09-151-32/+7
| | | | | | | | | For use in distro shutdown scripts with a RAID root file system. Returns immediately if the array is 'readonly', or not an externally managed array. It is up to the distro's scripts to make sure no new writes hit the device after this returns 'true'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* monitor: don't mark dirty on resync completeDan Williams2008-09-151-1/+1
| | | | | | ...instead look at array state to determine if the array is consistent Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* monitor: mark clean on active-idleDan Williams2008-09-151-3/+7
| | | | | | This also handles the case where 'clean' is set directly. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Allow an externally managed array to be marked readonlyNeilBrown2008-08-191-8/+14
| | | | | | | | | | | If the metadata_version is -mdXXX/whatever rather than /mdXXX/whatever then the array is readonly and should be left alone by mdmon. Signed-off-by: NeilBrown <neilb@suse.de>
* Extra option for set_array_state: you choose dirty or clean.NeilBrown2008-08-191-11/+3
| | | | | | | | | | | | | | When we first start an array, it might be good to start recovery straight away. That requires setting the array to 'dirty', but only the metadata handler can know if that is required or not. So have a third possible 'consistent' option to set_array_state. Either 'no' or 'yes' or 'you choose'. Return value indicates what was chosen. '1' (no) should be chosen unless there is a good reason. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: handle failures versus readauto arraysDan Williams2008-08-151-4/+20
| | | | | | | | | Transition readauto arrays to active before failing drives. Hmm... why do we keep reblocking / renotifying in the readonly case? Need to bottom out on this, but not right now. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: use activate spare for re-addDan Williams2008-08-121-3/+8
| | | | | | | | | Disks that are not in-sync or failed are not assembled into member arrays by mdadm. Teach mdmon to resolve this situation by checking for spares at start. imsm_activate_spare() is updated to prefer devices that can be re-added versus new spares. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* imsm: handle degraded->normal transitions in set_diskDan Williams2008-07-241-1/+0
| | | | | | | Removes the need for the call to ->set_array_state when sync_action transitions from 'recover' to 'idle'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* monitor: call get_resync_start on array shutdown.NeilBrown2008-07-181-0/+1
| | | | | | | | If the array is shutdown as soon as resync finishes, we might not notice the resync finish. So on array shutdown, check for current resync pos. Signed-off-by: Neil Brown <neilb@suse.de>
* mdmon: ping will wait for manage_mon to catch up.NeilBrown2008-07-181-0/+4
| | | | | | | | | | | When a 'ping' (empty message) is sent to mdmon, we wait for 'monitor' to do a full loop to make sure it has caught up with anything that needs doing. This allows synchronisation between mdadm and mdmon. Maybe monitor should signal managemon rather than managemon polling... Signed-off-by: Neil Brown <neilb@suse.de>
* Make sure resync_start is initialised properly and maintained properlyNeil Brown2008-07-181-1/+1
| | | | Signed-off-by: Neil Brown <neilb@suse.de>
* mdmon: close possibility of re-marking the metadata dirty on shutdownDan Williams2008-07-141-2/+4
| | | | Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: notify metadata of recovery completionDan Williams2008-07-141-0/+4
| | | | | | Array may no longer be degraded. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Make sure we remove pid file in monitor before manager exits.Neil Brown2008-07-121-1/+1
|
* Remove some noisy printfs.Neil Brown2008-07-121-1/+1
|
* Revise message passing code.Neil Brown2008-07-121-1/+2
| | | | More here
* Remove mgr_pipe for communicating from manage to monitor.Neil Brown2008-07-121-27/+10
| | | | | Data is being passed in shared memory, so the pipe is only being use as a wakeup. This can more easily be done with a thread-signal.
* Remove mon_pipe for communicating from monitor to managerNeil Brown2008-07-121-3/+2
| | | | | | The returned value was never used, and we don't really want this return path anyway as writing to a pipe could conceivably block, and the monitor must not block.
* Handle device removal from containerNeil Brown2008-07-121-43/+0
| | | | | | | This really should be done in mdadm, not mdmon. We ensure the device won't be suddenly commited as a hot-spare using O_EXCL, then check the 'holders' sysfs directory to make sure it is only in use once.
* mdmon: add debug print statements for profiling mdmonDan Williams2008-06-161-3/+48
| | | | | | | for development only as console output can block leading to monitor deadlocks in low mem situations Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Add DDF code for activate_spareNeil Brown2008-06-121-7/+6
| | | | Plus various bug fixes etc.
* Support adding a spare to a degraded array.Neil Brown2008-06-121-16/+5
| | | | | When signalled by the monitor, the manager will find spares and add them to the array and initiate a recovery.
* Some fixes to make failures in ddf get handled properly.Neil Brown2008-06-121-5/+5
|
* Allow passing metadata update to the monitor.Neil Brown2008-06-121-1/+13
| | | | | | Code in manager can now just call queue_metadata_update with a (freeable) buf holding the update, and it will get passed to the monitor and written out.
* Change mark_clean to set_array_state.Neil Brown2008-05-271-3/+6
| | | | DDF needs more fine grained understanding of the array state.
* Discard get_sync_pos. We should be using get_resync_start.Neil Brown2008-05-271-18/+2
| | | | | | | | | "sync_complete" just tracks the current resync/recover/check/whatever pass. "resync_start" tracks which parts of the array are known to be in-sync (modulo active writes). So it is what we need to use to update the metadata. Also we cannot call it when the array has stopped, as the value is no longer available then. We must call it when the resync completes. Possibly also call it preiodically if the array is quiescent.
* Exit when there are no more arrays to manage.Neil Brown2008-05-271-2/+19
|
* Remove stopped arrays.Neil Brown2008-05-271-12/+30
| | | | | | When an array becomes inactive, clean up and forget it. This involves signalling the manager.
* Implement mark_clean for ddf and remove mark_dirty and mark_syncNeil Brown2008-05-271-6/+8
| | | | | | | mark_dirty is just a special case of mark_clean - with sync_pos == 0. mark_sync is not required. We don't modify the metadata when sync finishes. Only when the array becomes non-writeable at which point we use mark_clean to record how far the resync progressed.
* add infrastructure to receive higher order commands, like remove_deviceDan Williams2008-05-151-5/+61
| | | | | | | | | | | From: Dan Williams <dan.j.williams@intel.com> Each md_message encapsulates a single command. A command includes an 'action' member which describes what if any data comes after the action. Communication with the monitor involves updating the active_cmd pointer and then writing to mgr_pipe. Pass/fail status is returned via mon_pipe. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* when failures happen they should be propagated to all member arraysDan Williams2008-05-151-3/+43
| | | | | | From: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* handle disk failuresDan Williams2008-05-151-17/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Dan Williams <dan.j.williams@intel.com> Added curr_state as a parameter to set_disk. Handlers look at this to record components failures, and set global 'degraded' or 'failed' status. When reading the state as faulty: 1/ mark the disk failed in the metadata 2/ write '-blocked' to the rdev state to allow the kernel's failure mechanism to advance 3/ the kernel will take away the drive's role in remove_and_add_spares() 4/ once the disk no longer has a role writing 'remove' to the rdev state will get the disk out of array. There is a window after writing '-blocked' where the kernel will return -EBUSY to remove requests. We rely on the fact that the disk will continue to show faulty so we lazily wait until the kernel is ready to remove the disk. If the manager thread needs to get the disk out of the way it can ping the monitor and wait, just like the replace_array() case. [buglet fix: swap the parameters of attr_match in read_dev_state] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Flag arrays for deletion after they have been stopped.Dan Williams2008-05-151-2/+14
| | | | | | | | | From: Dan Williams <dan.j.williams@intel.com> If they are later reassembled they will be replaced and deallocated via replace_array. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* handle resync completionDan Williams2008-05-151-4/+3
| | | | | | From: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* start resync when transitioning from initial readonly stateDan Williams2008-05-151-7/+24
| | | | | | | | | From: Dan Williams <dan.j.williams@intel.com> mdadm handles setting resync_start, monitor uses this value to determine whether to set the 'active' or 'readauto' state. Signed-off-by: Dan Williams <dan.j.williams@intel.com>