glusterfs.git/tests/bugs, branch v6.8

afr: prevent spurious entry heals leading to gfid split-brain

2020-02-28T06:06:10+00:00

Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1804594
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N 
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)

Cluster/afr: Don't treat all bricks having metadata pending as split-brain

2020-02-25T07:06:51+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.

Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.

Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.

Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1805097
Signed-off-by: karthik-us

server: Mount fails after reboot 1/3 gluster nodes

2020-02-11T08:44:38+00:00

Problem: At the time of coming up one server node(1x3) after reboot
client is unmounted.The client is unmounted because a client
is getting AUTH_FAILED event and client call fini for the graph.The
client is getting AUTH_FAILED because brick is not attached with a
graph at that moment

Solution: To avoid the unmounting the client graph throw ENOENT error
          from server in case if brick is not attached with server at
          the time of authenticate clients.

> Credits: Xavi Hernandez 
> Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
> Fixes: bz#1793852
> Signed-off-by: Mohit Agrawal 
> (cherry picked from commit > f6421dff22a6ddaf14134f6894deae219948c89d)

Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
Fixes: bz#1794020
Signed-off-by: Mohit Agrawal

features/shard: Send correct size when reads are sent beyond file size

2019-10-24T09:24:22+00:00

Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1737141
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)

cluster/afr: Heal entries when there is a source & no healed_sinks

2019-10-17T10:52:54+00:00

Problem:
In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame
anything for entry heal, heal will not complete even though we have
clear source and sinks. This will happen because while doing
afr_selfheal_find_direction() only the bricks which are blamed by
non-accused bricks are considered as sinks. Later in
__afr_selfheal_entry_finalize_source() when it tries to mark all the
non-sources as sinks it fails to do so because there won't be any
healed_sinks marked, no witness present and there will be a source.

Fix:
If there is a source and no healed_sinks, then reset all the locked
sources to 0 and healed sinks to 1 to do conservative merge.

Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548
Fixes: bz#1760706
Signed-off-by: karthik-us

afr: support split-brain CLI for replica 3

2019-10-17T10:51:33+00:00

Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).

Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1760792
Signed-off-by: Ravishankar N 
(cherry picked from commit 47dbd753187f69b3835d2e42fdbe7485874c4b3e)

ctime/rebalance: Heal ctime xattr on directory during rebalance

2019-09-27T11:34:25+00:00

After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.

Note that ctime still doesn't support consistent time across
distribute sub-volume.

This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.

Backport of:
 > Patch: https://review.gluster.org/23127
 > Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
 > BUG: 1734026
 > Signed-off-by: Kotresh HR 

Patch: https://review.gluster.org/23127
Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
fixes: bz#1752413
Signed-off-by: Kotresh HR

dht: Custom xattrs are not healed in case of add-brick

2019-09-27T11:26:15+00:00

Problem: If any custom xattrs are set on the directory before
         add a brick, xattrs are not healed on the directory
         after adding a brick.

Solution: xattr are not healed because dht_selfheal_dir_mkdir_lookup_cbk
          checks the value of MDS and if MDS value is not negative
          selfheal code path does not take reference of MDS xattrs.Change the
          condition to take reference of MDS xattr so that custom xattrs are
          populated on newly added brick

Backport of:
 > Patch: https://review.gluster.org/22520
 > BUG: bz#1702299
 > Change-Id: Id14beedb98cce6928055f294e1594b22132e811c
 > Signed-off-by: Mohit Agrawal 

fixes: bz#1753561
Change-Id: Id14beedb98cce6928055f294e1594b22132e811c
Signed-off-by: Kotresh HR

afr/lookup: Pass xattr_req in while doing a selfheal in lookup

2019-09-23T07:00:22+00:00

We were not passing xattr_req when doing a name self heal
as well as a meta data heal. Because of this, some xdata
was missing which causes i/o errors

Backport of > https://review.gluster.org/#/c/glusterfs/+/23024/
>Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
>Fixes: bz#1728770
>Signed-off-by: Mohammed Rafi KC 

Fixes: bz#1749307
Signed-off-by: Mohammed Rafi KC 
(cherry picked from commit d026f0bcfd301712e4f0671ccf238f43f2e6dd30)

Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f

afr: wake up index healer threads

2019-09-05T05:59:32+00:00

(Backport of https://review.gluster.org/#/c/glusterfs/+/23288/)

...whenever shd is re-enabled after disabling or there is a change in
`cluster.heal-timeout`, without needing to restart shd or waiting for the
current `cluster.heal-timeout` seconds to expire.

See BZ 1743988 for more details.

Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe
fixes: bz#1743988
Reported-by: Glen Kiessling 
Signed-off-by: Ravishankar N