glusterfs.git/tests/bugs/replicate, branch v4.1.7

afr: assign gfid during name heal when no 'source' is present

2018-12-03T12:56:09+00:00

Backport of https://review.gluster.org/#/c/glusterfs/+/21297/

Problem:
If parent dir is in split-brain or has dirty xattrs set, and the file
has gfid missing on one of the bricks, then name heal won't assign the
gfid.

Fix:
Use the brick we select the gfid from as the 'source'.

Note: Problem was found while trying to debug a split-brain issue on
Cynthia Zhou's setup.

fixes: bz#1655561
Change-Id: Id088d4f0fb017aa35122de426654194e581ed742
Reported-by: Cynthia Zhou 
Signed-off-by: Ravishankar N

tests: check for shd up status in bug-1637802-arbiter-stale-data-heal-lock.t

2018-10-22T16:36:03+00:00

Problem:
https://review.gluster.org/#/c/glusterfs/+/21427/ seems to be failing
this .t spuriously. On checking one of the failure logs, I see:

22:05:44 Launching heal operation to perform index self heal on volume patchy has been unsuccessful:
22:05:44 Self-heal daemon is not running. Check self-heal daemon log file.
22:05:44 not ok 20 , LINENUM:38

In glusterd log:
[2018-10-18 22:05:44.298832] E [MSGID: 106301] [glusterd-syncop.c:1352:gd_stage_op_phase] 0-management: Staging of operation 'Volume Heal' failed on localhost : Self-heal daemon is not running. Check self-heal daemon log file

But the tests which preceed this check whether via a statedump if the shd is
conected to the bricks, and they have succeeded and even started
healing. From glustershd.log:

[2018-10-18 22:05:40.975268] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-patchy-replicate-0: Completed data selfheal on 3b83d2dd-4cf2-4ea3-a33e-4275be40f440. sources=[0] 1  sinks=2

So the only reason I can see launching heal via cli failing is a race where
shd has been spawned but glusterd has not yet updated in-memory that it is up,
and hence failing the CLI.

Fix:
Check for shd up status before launching heal via CLI

Change-Id: Ic88abf14ad3d51c89cb438db601fae4df179e8f4
fixes: bz#1641761
Signed-off-by: Ravishankar N 
(cherry picked from commit 3dea105556130abd4da0fd3f8f2c523ac52398d1)

afr: prevent winding inodelks twice for arbiter volumes

2018-10-10T12:55:44+00:00

Backport of https://review.gluster.org/#/c/glusterfs/+/21380/

Problem:
In an arbiter volume, if there is a pending data heal of a file only on
arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks
it only once, leaving behind a stale lock on the brick. This causes
the next write to the file to hang.

Fix:
Fix the code-bug to take lock only once. This bug was introduced master
with commit eb472d82a083883335bc494b87ea175ac43471ff

Thanks to  Pranith Kumar K  for finding the RCA.

fixes: bz#1637953
Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1
Signed-off-by: Ravishankar N

afr: fix incorrect reporting of directory split-brain

2018-10-05T14:43:20+00:00

Backport of https://review.gluster.org/#/c/glusterfs/+/21135/

Problem:
When a directory has dirty xattrs due to failed post-ops or when
replace/reset brick is performed, AFR does a conservative merge as
expected, but heal-info reports it as split-brain because there are no
clear sources.

Fix:
Modify pending flag to contain information about pending heals and
split-brains. For directories, if spit-brain flag is not set,just show
them as needing heal and not being in split-brain.

Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383
fixes: bz#1633634
Signed-off-by: Ravishankar N

afr: heal gfids when file is not present on all bricks

2018-07-09T15:18:31+00:00

commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression
wherein if a file is present in only 1 brick of replica *and* doesn't
have a gfid associated with it, it doesn't get healed upon the next
lookup from the client. Fix it.

Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599
fixes: bz#1597117
Signed-off-by: Ravishankar N 
(cherry picked from commit eb472d82a083883335bc494b87ea175ac43471ff)

storage/posix: Fix posix_symlinks_match()

2018-07-02T17:14:07+00:00

1) snprintf into linkname_expected should happen with PATH_MAX
2) comparison should happen with linkname_actual with complete
   string linkname_expected

fixes bz#1595524
Change-Id: Ic3b3c362dc6c69c046b9a13e031989be47ecff14
Signed-off-by: Pranith Kumar K 
(cherry picked from commit 3099d3e6ba81d3e1abf37385b13aabf5837b9c5e)

afr: fix bug-1363721.t failure

2018-05-25T12:57:45+00:00

Problem:
In the .t, when the only good brick was brought down, writes on the fd were
still succeeding on the bad bricks. The inflight split-brain check was
marking the write as failure but since the write succeeded on all the
bad bricks, afr_txn_nothing_failed() was set to true and we were
unwinding writev with success to DHT and then catching the failure in
post-op in the background.

Fix:
Don't wind the FOP phase if the write_subvol (which is populated with readable
subvols obtained in pre-op cbk) does not have at least 1 good brick which was up
when the transaction started.

Note: This fix is not related to brick muliplexing. I ran the .t
10 times with this fix and brick-mux enabled without any failures.

Change-Id: I915c9c366aa32cd342b1565827ca2d83cb02ae85
updates: bz#1581548
Signed-off-by: Ravishankar N 
(cherry picked from commit 985a1d15db910e012ddc1dcdc2e333cc28a9968b)

afr: fixes to afr-eager locking

2018-04-18T07:49:12+00:00

1. If pre-op fails on all bricks,set lock->release to true in
afr_handle_lock_acquire_failure so that the GF_ASSERT in afr_unlock() does not
crash.

2. Added a missing 'return' after handling pre-op failure in
afr_transaction_perform_fop(), fixing a use-after-free issue.

Change-Id: If0627a9124cb5d6405037cab3f17f8325eed2d83
fixes: bz#1561129
Signed-off-by: Ravishankar N

cluster/afr: Make AFR eager-locking similar to EC

2018-03-14T13:32:35+00:00

Problem:
1) Afr's eager-lock only works for data transactions.
2) When there are conflicting writes, write with conflicting region initiates
unlock of eager-lock leading to extra pre-ops and post-ops on the file. When
eager-lock goes off, it leads to extra fsyncs for random-write workload in afr.

Solution (that is modeled after EC):
In EC, when there is a conflicting write, it waits for the current write to
complete before it winds the conflicted write. This leads to better utilization
of network and disk, because we will not be doing extra xattrops and FSYNCs and
inodelk/unlock. Moved fd based counters to inode based counters.

I tried to model the solution based on EC's locking, but it is not similar to
AFR because we had to keep backward compatibility.

Lifecycle of lock:
==================
First transaction is added to inode->owners list and an inodelk will be sent on
the wire. All the next transactions will be put in inode->waiters list until
the first transaction completes inodelk and [f]xattrop completely.  Once
[f]xattrop also completes, all the requests in the inode->waiters list are
checked if it conflict with any of the existing locks which are in
inode->owners list and if not are added to inode->owners list and resumed with
doing transaction. When these transactions complete fop phase they will be
moved to inode->post_op list and resume the transactions that were paused
because of conflicts. Post-op and unlock will not be issued on the wire until
that is the last transaction on that inode. Last transaction when it has to
perform post-op can choose to sleep for deyed-post-op-secs value. During that
time if any other transaction comes, it will wake up the sleeping transaction
and takes over the ownership of the lock and the cycle continues. If the
dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping
transaction and it will set lock->release to true and starts doing post-op and
then unlock. During this time if any other transactions come, they will be put
in inode->frozen list. Once the previous unlock comes it will move the frozen
list to waiters list and moves the first element from this waiters-list to
owners-list and attempts the lock and the cycle continues. This is the general
idea.  There is logic at the time of dealying and at the time of new
transaction or in flush fop to wakeup existing sleeping transactions or
choosing whether to delay a transaction etc, which is subjected to change based
on future enhancements etc.

Fixes: #418
BUG: 1549606
Change-Id: I88b570bbcf332a27c82d2767dfa82472f60055dc
Signed-off-by: Pranith Kumar K

afr: don't treat all cases all bricks being blamed as split-brain

2018-02-01T14:17:50+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not
met. Though the FOP is still unwound with failure, the xattrs remain on
the disk.  Due to these partial post-ops and partial heals (healing only when
2 bricks are up), we can end up in split-brain purely from the afr
xattrs point of view i.e each brick is blamed by atleast one of the
others. These scenarios are hit when there is frequent
connect/disconnect of the client/shd to the bricks while I/O or heal
are in progress.

Fix:
Instead of undoing the post-op, pick a source based on the xattr values.
If 2 bricks blame one, the blamed one must be treated as sink.
If there is no majority, all are sources. Once we pick a source,
self-heal will then do the heal instead of erroring out due to
split-brain.

Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae
BUG: 1539358
Signed-off-by: Ravishankar N