glusterfs.git/xlators, branch v8dev

glusterd/shd: Change shd logfile to a unique name

2019-06-24T12:18:37+00:00

With the shd mux changes, shd was havinga a logfile
with volname of the first started volume.

This was creating a lot confusion, as other volumes data
is also logging to a logfile which has a different vol name.

With this changes the logfile will be changed to a unique name
ie "/var/log/glusterfs/glustershd.log". This was the same
logfile name before the shd mux

Change-Id: I2b94c1f0b2cf3c9493505dddf873687755a46dda
fixes: bz#1721601
Signed-off-by: Mohammed Rafi KC

shd/mux: Fix race between mux_proc unlink and stop

2019-06-24T05:01:44+00:00

There is a small race window, where we have a shd proc
without having a connection. That is when we stopped the
last shd running on a process. The list was removed
outside of a lock just after stopping the process.

So there is a window where we stopped the process, but
the shd proc list contains the entry.

Change-Id: Id82a82509e5cd72acac24e8b7b87197626525441
fixes: bz#1722541
Signed-off-by: Mohammed Rafi KC

cluster/ec: Prevent double pre-op xattrops

2019-06-22T05:06:15+00:00

Problem:
Race:
Thread-1                                    Thread-2
1) Does ec_get_size_version() to perform
pre-op fxattrop as part of write-1
                                           2) Calls ec_set_dirty_flag() in
                                              ec_get_size_version() for write-2.
					      This sets dirty[] to 1
3) Completes executing
ec_prepare_update_cbk leading to
ctx->dirty[] = '1'
					   4) Takes LOCK(inode->lock) to check if there are
					      any flags and sets dirty-flag because
				              lock->waiting_flag is 0 now. This leads to
					      fxattrop to increment on-disk dirty[] to '2'

At the end of the writes the file will be marked for heal even when it doesn't need heal.

Fix:
Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked
as '1' in step 2) above

Updates bz#1593224
Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865
Signed-off-by: Pranith Kumar K

posix/ctime: Fix ctime upgrade issue

2019-06-21T11:09:32+00:00

Problem:
On a EC volume, during upgrade from the older version where
ctime feature is not enabled(or not present) to the newer
version where the ctime feature is available (enabled default),
the self heal hangs and doesn't complete.

Cause:
The ctime feature has both client side code (utime) and
server side code (posix). The feature is driven from client.
Only if the client side sets the time in the frame, should
the server side sets the time attributes in xattr. But posix
setattr/fseattr was not doing that. When one of the server
nodes is updated, since ctime is enabled by default, it
starts setting xattr on setattr/fseattr on the updated node/brick.

On a EC volume the first two updated nodes(bricks) are not a
problem because there are 4 other bricks with consistent data.
However once the third brick is updated, the new attribute(mdata xattr)
will cause an inconsistency on metadata on 3 bricks, which
prevents the file to be repaired.

Fix:
Don't create mdata xattr with utimes/utimensat system call.
Only update if already present.

Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c
fixes: bz#1720201
Signed-off-by: Kotresh HR

WORM-Xlator: Avoid performing fsetxattr if fd is NULL

2019-06-21T04:19:11+00:00

If worm_create_cbk receives an error (op_ret == -1) fd will be NULL
and therefore performing fsetxattr would lead to a segfault and the
brick process crashes. To avoid this we allow setting fsetxattr only
if op_ret >= 0 . If an error happens we explicitly unwind

Change-Id: Ie7f8a198add93e5cd908eb7029cffc834c3b58a6
fixes: bz#1717757
Signed-off-by: David Spisla

ec-heal: check file's gfid when deleting stale name

2019-06-20T21:24:17+00:00

A name-less lookup does not contain parent's stat,
It is hard to check the lookuped file is at the right path.

This patch changes to a name lookup, and check file's gfid with
expected gfid. If the gfid is different, mark it estale.

fixes: bz#1702131
Change-Id: I2de20b10d680eed1e2fb1d3830b3b3dec4520dbf
Signed-off-by: Kinglong Mee

afr/read: Implement latency based read child selection

2019-06-20T12:30:59+00:00

Network latency is an important factor selecting a read subvolume.
So this patch is adding two new policy.

1) We measure the latency of a child during a GF_DUMP rpc call.
   Then use this latency to pick a read subvol having the least
   latency.

2) Second one is an hybrid mode where it calculates the effective
   latency by multiplying outstanding pending read request and
   latency, and choose the least one.

Change-Id: Ia49c8a08ab61f7dcdad8b8950aa4d338e7accf97
fixes: #520
Signed-off-by: Mohammed Rafi KC

posix: fix crash in posix_cs_set_state

2019-06-20T11:52:12+00:00

Fixes: bz#1721474
Change-Id: Ic2a53fa3d1e9e23424c6898e0986f80d52c5e3f6
Signed-off-by: Susant Palai

encryption/crypt: remove from volume file

2019-06-20T11:51:33+00:00

The feature is not supported and is moved out of the codebase from
glusterfs-5.x release. Doesn't make sense to keep the code to
support it.

For those who want to upgrade from an version supporting it to higher
version, please do a 'gluster volume reset $VOL encryption reset' and
then continue with the upgrade process.

updates: bz#1648169
Change-Id: I8cf822c0d7195940bd37f6af2432a3cac68d44d1
Signed-off-by: Amar Tumballi

md-cache: only update generation for inode at upcall and NULL stat

2019-06-19T09:13:03+00:00

1. For parallel writes from nfs-ganesha, two fops with two generations,
   but the fops reply maybe returned disordered.
2. The inode md-cache timeout should not increase conf->generation.

With this patch,
1, Fop only gets generation from inode md-cache or conf, does not increase it.
2. The generation is increased at upcall invalidate, estal/enoent error
   invalidate, reply with zeroed out stat from write-behind.

Change-Id: I897ecaa143fd18bc024c1948c7d1a6f831fd53da
Updates: bz#1683594
Signed-off-by: Kinglong Mee