glusterfs.git/xlators/cluster, branch v3.10dev

dht, md-cache, upcall: Add invalidation of IATT when the layout changes

2016-08-31T06:08:54+00:00

Issue:
dht_layout is built as a part of lookup only. The layout can be
modified by rebalance process. Since every IO fop is preceded
by a lookup, there are very less issues of stale layout. But
with enhancements of aggressive caching of stats in md-cache,
the lookup will reduce and expose the stale layout issue often.

Solution:
Since stale layout is already an issue on dht, there is already
a plan to fix this at the dht layer, but this fix is not currently
planned for any release. Until this fix comes out, we can have
a workaround where, the upcall will send a notification to md-cache
when a layout xattr is changed. As a part of layout change notification
the existing cache is invalidated and the next lookup will fetch the
latest layout.

This is not a foolproof solution as the window between the layout change
and the next lookup(after invalidation of stat), where there will be stale
layout. But until the final fix comes in, this reduces the stale layout
window.

Change-Id: Iacf871a38b35880c1fc0bc68fe7ce291265e71d4
BUG: 1369638
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15300
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Raghavendra G

dht/tiering: fix unused variable warnings/errors

2016-08-30T18:45:03+00:00

http://review.gluster.org/14085 fixes a/the "leak" - via the
generated rpc/xdr headers - of pragmas that mask these warnings.

However 14085 won't pass the smoke test until all the warnings are
fixed.

Change-Id: I367a737570dd7d2f6cc25f4bf4299d31bb6826aa
BUG: 1369124
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: http://review.gluster.org/15242
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos 
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
Reviewed-by: Dan Lambright

glusterd : Introduce reset brick

2016-08-30T02:55:53+00:00

The command basically allows replace brick with src and
dst bricks as same.

Usage:
gluster v reset-brick   start
This command kills the brick to be reset. Once this command is run,
admin can do other manual operations that they need to do,
like configuring some options for the brick. Once this is done,
resetting the brick can be continued with the following options.

gluster v reset-brick    commit {force}

Does the job of resetting the brick. 'force' option should be used
when the brick already contains volinfo id.

Problem: On doing a disk-replacement of a brick in a replicate volume
the following 2 scenarios may occur :

a) there is a chance that reads are served from this replaced-disk brick,
which leads to empty reads. b) potential data loss if next writes succeed
only on replaced brick, and heal is done to other bricks from this one.

Solution: After disk-replacement, make sure that reset-brick command is
run for that brick so that pending markers are set for the brick and it
is not chosen as source for reads and heal. But, as of now replace-brick
for the same brick-path is not allowed. In order to fix the above
mentioned problem, same brick-path replace-brick is needed.
With this patch reset-brick commit {force} will be allowed even when
source and destination  are identical as long as
1) destination brick is not alive
2) source and destination brick have the same brick uuid and path.
Also, the destination brick after replace-brick will use the same port
as the source brick.

Change-Id: I440b9e892ffb781ea4b8563688c3f85c7a7c89de
BUG: 1266876
Signed-off-by: Anuradha Talur 
Reviewed-on: http://review.gluster.org/12250
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ashish Pandey 
Reviewed-by: Pranith Kumar Karampuri

afr: fix unused variable warnings/errors

2016-08-29T16:21:23+00:00

http://review.gluster.org/14085 fixes a/the "leak" - via the
generated rpc/xdr headers - of pragmas that mask these warnings.

However 14085 won't pass the smoke test until all the warnings are
fixed.

Change-Id: I98e3308a2548ae095048caa99c86edec15b5e782
BUG: 1369124
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: http://review.gluster.org/15241
CentOS-regression: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
Smoke: Gluster Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Anuradha Talur 
Reviewed-by: Pranith Kumar Karampuri 
NetBSD-regression: NetBSD Build System

dht: Implement ipc fop

2016-08-27T12:48:36+00:00

ipc is used by md-cache to communicate the list of xattrs that
it is caching, to the upcall xlator. Hence implement this in
dht, such that it winds to all the bricks if the ipc op is
GF_IPC_MDC_TARGET_UPCALL. The ips should not fail if any of
the bricks is down, as md-cache will replay the ipc late when
the brick comes back up.

Change-Id: Ica551a550c04cbb1240c0d211fe831c2e5eb6017
BUG: 1211863
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15225
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G

cluster/ec: Use locks for opendir

2016-08-25T13:48:33+00:00

Problem:
In some cases we see that readdir keeps winding to the brick that doesn't have
any blocked locks i.e. first brick. This is leading to the client assuming that
there are no blocking locks on the inode so it won't give away the lock. Other
clients end up blocked on the lock as if the command hung.

Fix:
Proper way to fix this issue is to use infra present in
http://review.gluster.org/14736 This is a stop gap fix where we start taking
inodelks in opendir which goes to all the bricks, this will detect if there is
any contention.

BUG: 1346719
Change-Id: I91109107a26f6535b945ac476338e9f21dc31eb9
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15309
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Ashish Pandey

quotad: fix potential buffer overflows

2016-08-25T12:18:09+00:00

This converts sprintf to gf_asprintf in following components:                                                                                                          * quotad.c
* dht
* afr
* protocol/client
* rpc/rpc-lib
* rpc/rpc-transport

Change-Id: If8a267bab3d91003bdef3a92664077a0136745ee
BUG: 1332073
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/14102
Tested-by: Manikandan Selvaganesh 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Manikandan Selvaganesh

cluster/ec: Do multi-threaded self-heal

2016-08-24T22:24:22+00:00

BUG: 1368451
Change-Id: I5d6b91d714ad6906dc478a401e614115c89a8fbb
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15083
Smoke: Gluster Build System 
Reviewed-by: Ashish Pandey 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/afr: Give option to do consistent-io

2016-08-22T20:55:42+00:00

Problem:
When tiering/rebalance does migrations and afr with 2-way replica is in
picture, migration can read stale data if the source brick goes down and writes
to the destination. After this deletion of the file leads to permanent loss of
data after migration.

Fix:
Rebalance/tiering should migrate only when the data is definitely not stale. So
introduce an option in afr called consistent-io which will be enabled in
migration daemons.

BUG: 1306398
Change-Id: I750f65091cc70a3ed4bf3c12f83d0949af43920a
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/13425
Reviewed-by: Anuradha Talur 
Reviewed-by: Krutika Dhananjay 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order

2016-08-22T09:38:36+00:00

When the bricks are brought offline and then online in cyclic
order while writes are in progress on a file, thanks to inode
refresh in write txns, AFR will mostly fail the write attempt
when the only good copy is offline. However, there is still a
remote possibility that the file will run into split-brain if
the brick that has the lone good copy goes offline *after* the
inode refresh but *before* the write txn completes (I call it
in-flight split-brain in the patch for ease of reference),
requiring intervention from admin to resolve the split-brain
before the IO can resume normally on the file. To get around this,
the patch does the following things:
i) retains the dirty xattrs on the file
ii) avoids marking the last of the good copies as bad (or accused)
    in case it is the one to go down during the course of a write.
iii) fails that particular write with the appropriate errno.

This way, we still have one good copy left despite the split-brain situation
which when it is back online, will be chosen as source to do the heal.

Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a
BUG: 1363721
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15080
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Oleksandr Natalenko 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri