|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've been attempting to set up a (Host)RAID mirror with dm_mirror on
2.6.14.3, and I've been having a strange little problem. The configuration
in question is a set of 9GB SCSI disks that have 17942584 sectors. I set
up the dm_mirror table as such:
0 17942528 mirror core 2 2048 nosync 2 8:48 0 8:64 0
If I'm not mistaken, this sets up a 9GB RAID1 mriror with 1MB stripes
across both SCSI disks. The sector count of the dm device is less than the
size of the disks, so we shouldn't fall off the end. However, I always get
the messages like this in dmesg when I set up the dm table:
attempt to access beyond end of device
sdd: rw=0, want=17958656, limit=17942584
Clearly, something is trying to read sectors past the end of the drive. I
traced it down to the __rh_recovery_prepare function in dm-raid1.c, which
gets called when we're putting the mirror set together. This function
calls the dirty region log's get_resync_work function to see if there's any
resync that needs to be done, and queues up any areas that are out of sync.
The log's get_resync_work function is actually a pointer to the
core_get_resync_work function in dm-log.c.
The core_get_resync_work function queries a bitset lc->sync_bits to find
out if there are any regions that are out of date (i.e. the bit is 0),
which is where the problem occurs. If every bit in lc->sync_bits is 1
(which is the case when we've just configured a new RAID1 with the nosync
option), the find_next_zero_bit does NOT return the size parameter
(lc->region_count in this case), it returns the size parameter rounded up
to the nearest multiple of 32! I don't know if this is intentional, but
i386 and x86_64 both exhibit this behavior.
In any case, the statement "if (*region == lc->region_count)" looks like
it's supposed to catch the case where are no regions to resync and
return 0. Since find_next_zero_bit apparently has a habit of returning
a value that's larger than lc->region_count, the enclosed patch changes
the equality test to a greater-than test so that we don't try to resync
areas outside of the RAID1 region. Seeing as the HostRAID metadata
lives just past the end of the RAID1 data, mucking around in that area
is not a good idea.
I suppose another way to fix this would be to amend find_next_zero_bit so
that it doesn't return values larger than "size", but I don't know if
there's a reason for the current behavior.
Signed-Off-By: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The linux bitset operators (test_bit, set_bit etc) work on arrays of "unsigned
long". dm-log uses such bitsets but treats them as arrays of uint32_t, only
allocating and zeroing a multiple of 4 bytes (as 'clean_bits' is a uint32_t).
The patch below fixes this problem.
The problem is specific to 64-bit big endian machines such as s390x or ppc-64
and can prevent pvmove terminating.
In the simplest case, if "region_count" were (say) 30, then
bitset_size (below) would be 4 and bitset_uint32_count would be 1.
Thus the memory for this butset, after allocation and zeroing would
be
0 0 0 0 X X X X
On a bigendian 64bit machine, bit 0 for this bitset is in the 8th
byte! (and every bit that dm-log would use would be in the X area).
0 0 0 0 X X X X
^
here
which hasn't been cleared properly.
As the dm-raid1 code only syncs and counts regions which have a 0 in the
'sync_bits' bitset, and only finishes when it has counted high enough, a large
number of 1's among those 'X's will cause the sync to not complete.
It is worth noting that the code uses the same bitsets for in-memory and
on-disk logs. As these bitsets are host-endian and host-sized, this means
that they cannot safely be moved between computers with
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|