glusterfs.git/tests/bugs/ec, branch release-5

cluster/ec: fix fd reopen

2020-02-20T08:26:40+00:00

Currently EC tries to reopen fd's that have been opened while a brick
was down. This is done as part of regular write operations, just after
having acquired the locks, and it's sent as a sub-fop of the main write
fop.

There were two problems:

1. The reopen was attempted on all UP bricks, even if a previous lock
didn't succeed. This is incorrect because most probably the open will
fail.

2. If reopen is sent and fails, the error is propagated to the main
operation, causing it to fail when it shouldn't.

To fix this, we only attempt reopens on bricks where the current fop
owns a lock, and we prevent any error to be propagated to the main
fop.

To implement this behaviour an argument used to indicate the minimum
number of required answers has overloaded to also include some flags. To
make the change consistent, it has been necessary to rename the
argument, which means that a lot of files have been changed. However
there are no functional changes.

This change has also uncovered a problem in discard code, which didn't
correctely process requests of small sizes because no real discard fop
was being processed, only a write of 0's on some region. In this case
some fields of the fop remained uninitialized or with incorrect values.
To fix this, a new function has been created to simulate success on a
fop and it's used in the discard case.

Thanks to Pranith for providing a test script that has also detected an
issue in this patch. This patch includes a small modification of this
script to force data to be written into bricks before stopping them.

Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec
Fixes: bz#1805047
Signed-off-by: Xavi Hernandez

cluster/ec: honor contention notifications for partially acquired locks

2019-06-28T11:07:08+00:00

EC was ignoring lock contention notifications received while a lock was
being acquired. When a lock is partially acquired (some bricks have
granted the lock but some others not yet) we can receive notifications
from acquired bricks, which should be honored, since we may not receive
more notifications after that.

Since EC was ignoring them, once the lock was acquired, it was not
released until the eager-lock timeout, causing unnecessary delays on
other clients.

This fix takes into consideration the notifications received before
having completed the full lock acquisition. After that, the lock will
be releaed as soon as possible.

Backport of:
> BUG: bz#1708156
> Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
> Signed-off-by: Xavi Hernandez 

Fixes: bz#1717282
Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
Signed-off-by: Xavi Hernandez 
Signed-off-by: Hari Gowtham

Land part 2 of clang-format changes

2018-09-12T12:22:45+00:00

Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu

All: run codespell on the code and fix issues.

2018-07-22T14:40:16+00:00

Please review, it's not always just the comments that were fixed.
I've had to revert of course all calls to creat() that were changed
to create() ...

Only compile-tested!

Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5
updates: bz#1193929
Signed-off-by: Yaniv Kaul

cluster/ec: avoid delays in self-heal

2018-03-14T03:12:27+00:00

Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1547662
Signed-off-by: Xavi Hernandez

socket: pollerr event shouldn't trigger socket_connnect_finish

2016-09-19T13:51:09+00:00

If connect fails with any other error than EINPROGRESS we cannot get
the error status using getsockopt (... SO_ERROR ... ). Hence we need
to remember the state of connect and take appropriate action in the
event_handler for the same.

As an added note, a event can come where poll_err is HUP and we have
poll_in as well (i.e some status was written to the socket), so for
such cases we need to finish the connect, process the data and then
the poll_err as is the case in the current code.

Special thanks to Kaushal M & Raghavendra G for figuring out the issue.

Change-Id: Ic45ad59ff8ab1d0a9d2cab2c924ad940b9d38528
BUG: 1372356
Signed-off-by: Atin Mukherjee 
Signed-off-by: Shyam 
Reviewed-on: http://review.gluster.org/15440
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

tests: Fix spurious failures because of wrong shd up function

2016-08-31T18:01:39+00:00

Fixed the way shd up check is done to prevent self-heal daemon
not running error when heal full command is executed.

Change-Id: I93c4a0da12316373d62cd4ea74432cd9bf2b090c
BUG: 1370053
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15341
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Anuradha Talur

tests: Fix get_pending_heal_count check in ec

2016-07-29T08:10:53+00:00

Continuation of http://review.gluster.org/#/c/14985.
Also renamed tests/bugs/disperse to tests/bugs/ec for a better
correlation to tests/basic/ec and xlators/cluster/ec

Change-Id: I662b3477c12af8a0b94597769e8f00f354b1168c
BUG: 1332054
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/15006
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Xavier Hernandez