glusterfs.git/tests/bugs/replicate, branch devel

protocol/client: Fix lock memory leak (#2338)

2021-04-22T05:53:05+00:00

Problem-1:
When an overlapping lock is issued the merged lock is not assigned the
owner. When flush is issued on the fd, this particular lock is not freed
leading to memory leak

Fix-1:
Assign the owner while merging the locks.

Problem-2:
On fd-destroy lock structs could be present in fdctx. For some reason
with flock -x command and closing of the bash fd, it leads to this code
path. Which leaks the lock structs.

Fix-2:
When fdctx is being destroyed in client, make sure to cleanup any lock
structs.

fixes: #2337
Change-Id: I298124213ce5a1cf2b1f1756d5e8a9745d9c0a1c
Signed-off-by: Pranith Kumar K

afr: don't reopen fds on which POSIX locks are held (#1980)

2021-03-27T10:14:04+00:00

When client.strict-locks is enabled on a volume and there are POSIX
locks held on the files, after disconnect and reconnection of the
clients do not re-open such fds which might lead to multiple clients
acquiring the locks and cause data corruption.

Change-Id: I8777ffbc2cc8d15ab57b58b72b56eb67521787c5
Fixes: #1977
Signed-off-by: karthik-us

cluster/dht: Provide option to disable fsync in data migration (#2259)

2021-03-17T05:32:21+00:00

At the moment dht rebalance doesn't give any option to disable fsync
after data migration. Making this an option would give admins take
responsibility of data in a way that is suitable for their cluster.
Default value is still 'on', so that the behavior is intact for people
who don't care about this.

For example: If the data that is going to be migrated is already backed
up or snapshotted, there is no need for fsync to happen right after
migration which can affect active I/O on the volume from applications.

fixes: #2258
Change-Id: I7a50b8d3a2f270d79920ef306ceb6ba6451150c4
Signed-off-by: Pranith Kumar K

features/index: Optimize link-count fetching code path (#1789)

2021-03-10T05:13:24+00:00

* features/index: Optimize link-count fetching code path

Problem:
AFR requests 'link-count' in lookup to check if there are any pending
heals. Based on this information, afr will set dirent->inode to NULL in
readdirp when heals are ongoing to prevent serving bad data. When heals
are completed, link-count xattr is leading to doing an opendir of
xattrop directory and then reading the contents to figure out that there
is no healing needed for every lookup. This was not detected until this
github issue because ZFS in some cases can lead to very slow readdir()
calls. Since Glusterfs does lot of lookups, this was slowing down
all operations increasing load on the system.

Code problem:
index xlator on any xattrop operation adds index to the relevant dirs
and after the xattrop operation is done, will delete/keep the index in
that directory based on the value fetched in xattrop from posix. AFR
sends all-zero xattrop for changelog xattrs. This is leading to
priv->pending_count manipulation which sets the count back to -1. Next
Lookup operation triggers opendir/readdir to find the actual link-count in
lookup because in memory priv->pending_count is -ve.

Fix:
1) Don't add to index on all-zero xattrop for a key.
2) Set pending-count to -1 when the first gfid is added into xattrop
   directory, so that the next lookup can compute the link-count.

fixes: #1764
Change-Id: I8a02c7e811a72c46d78ddb2d9d4fdc2222a444e9
Signed-off-by: Pranith Kumar K 

* addressed comments

Change-Id: Ide42bb1c1237b525d168bf1a9b82eb1bdc3bc283
Signed-off-by: Pranith Kumar K 

* tests: Handle base index absence

Change-Id: I3cf11a8644ccf23e01537228766f864b63c49556
Signed-off-by: Pranith Kumar K 

* Addressed LOCK based comments, .t comments

Change-Id: I5f53e40820cade3a44259c1ac1a7f3c5f2f0f310
Signed-off-by: Pranith Kumar K

afr: fix directory entry count (#2233)

2021-03-08T23:24:07+00:00

AFR may hide some existing entries from a directory when reading it
because they are generated internally for private management. However
the returned number of entries from readdir() function is not updated
accordingly. So it may return a number higher than the real entries
present in the gf_dirent list.

This may cause unexpected behavior of clients, including gfapi which
incorrectly assumes that there was an entry when the list was actually
empty.

This patch also makes the check in gfapi more robust to avoid similar
issues that could appear in the future.

Fixes: #2232
Change-Id: I81ba3699248a53ebb0ee4e6e6231a4301436f763
Signed-off-by: Xavi Hernandez

tests: Handle nanosecond duration in profile info (#2135)

2021-02-08T08:59:08+00:00

Problem:
volume profile info now prints duration in nano seconds. Tests were
written when the duration was printed in micro seconds. This leads to
spurious failures.

Fix:
Change tests to handle nano second durations

fixes: #2134
Change-Id: I94722be87000a485d98c8b0f6d8b7e1a526b07e7
Signed-off-by: Pranith Kumar K

tests: ./tests/bugs/replicate/bug-921231.t is continuously failing (#2006)

2021-01-13T12:28:28+00:00

The test case (./tests/bugs/replicate/bug-921231.t )
is continuously failing.The test case is failing because
inodelk_max_latency is showing wrong value in profile.
The value is not correct because recently the profile
timestamp is changed from microsec to nanosec from
the patch #1833.

Fixes: #2005
Change-Id: Ieb683836938d986b56f70b2380103efe95657821
Signed-off-by: Mohit Agrawal

core: Implement gracefull shutdown for a brick process (#1751)

2020-12-16T06:05:31+00:00

* core: Implement gracefull shutdown for a brick process

glusterd sends a SIGTERM to brick process at the time
of stopping a volume if brick_mux is not enabled.In case
of brick_mux at the time of getting a terminate signal
for last brick a brick process sends a SIGTERM to own
process for stop a brick process.The current approach
does not cleanup resources in case of either last brick
is detached or brick_mux is not enabled.

Solution: glusterd sends a terminate notification to a
brick process at the time of stopping a volume for gracefull
shutdown

Change-Id: I49b729e1205e75760f6eff9bf6803ed0dbf876ae
Fixes: #1749
Signed-off-by: Mohit Agrawal 

* core: Implement gracefull shutdown for a brick process

Resolve some reviwere comment
Fixes: #1749
Signed-off-by: Mohit Agrawal 

Change-Id: I50e6a9e2ec86256b349aef5b127cc5bbf32d2561

* core: Implement graceful shutdown for a brick process

Implement a key cluster.brick-graceful-cleanup to enable graceful
shutdown for a brick process.If key value is on glusterd sends a
detach request to stop the brick.

Fixes: #1749
Change-Id: Iba8fb27ba15cc37ecd3eb48f0ea8f981633465c3
Signed-off-by: Mohit Agrawal 

* core: Implement graceful shutdown for a brick process

Resolve reviewer comments
Fixes: #1749
Signed-off-by: Mohit Agrawal 

Change-Id: I2a8eb4cf25cd8fca98d099889e4cae3954c8579e

* core: Implement gracefull shutdown for a brick process

Resolve reviewer comment specific to avoid memory leak

Fixes: #1749
Change-Id: Ic2f09efe6190fd3776f712afc2d49b4e63de7d1f
Signed-off-by: Mohit Agrawal 

* core: Implement gracefull shutdown for a brick process

Resolve reviewer comment specific to avoid memory leak

Fixes: #1749
Change-Id: I68fbbb39160a4595fb8b1b19836f44b356e89716
Signed-off-by: Mohit Agrawal

glusterd/afr: enable granular-entry-heal by default (#1621)

2020-10-22T09:36:41+00:00

1. The option has been enabled and tested for quite some time now in RHHI-V
downstream and I think it is safe to make it 'on' by default. Since it
is not possible to simply change it from 'off' to 'on' without breaking
rolling upgrades, old clients etc., I have made it default only for new volumes
starting from op-verison GD_OP_VERSION_9_0.

Note: If you do a volume reset, the option will be turned back off.
This is okay as the dir's gfid will be captured in 'xattrop' folder  and heals
will proceed. There might be stale entries inside entry-changes' folder,
which will be removed when we enable the option again.

2. I encountered a cust. issue where entry heal was pending on a dir. with
236436 files in it and the glustershd.log output was just stuck at
"performing entry selfheal", so I have added logs to give us
more info in DEBUG level about whether entry heal and data heal are
progressing (metadata heal doesn't take much time). That way, we have a
quick visual indication to say things are not 'stuck' if we briefly
enable debug logs, instead of taking statedumps or checking profile info
etc.

Fixes: #1483
Change-Id: I4f116f8c92f8cd33f209b758ff14f3c7e1981422
Signed-off-by: Ravishankar N

cluster/afr: Heal directory rename without rmdir/mkdir

2020-04-13T14:01:51+00:00

Problem1:
When a directory is renamed while a brick
is down entry-heal always did an rm -rf on that directory on
the sink on old location and did mkdir and created the directory
hierarchy again in the new location. This is inefficient.

Problem2:
Renamedir heal order may lead to a scenario where directory in
the new location could be created before deleting it from old
location leading to 2 directories with same gfid in posix.

Fix:
As part of heal, if oldlocation is healed first and is not present in
source-brick always rename it into a hidden directory inside the
sink-brick so that when heal is triggered in new-location shd can
rename it from this hidden directory to the new-location.

If new-location heal is triggered first and it detects that the
directory already exists in the brick, then it should skip healing the
directory until it appears in the hidden directory.

Credits: Ravi for rename-data-loss.t script

Fixes: #1211
Change-Id: I0cba2006f35cd03d314d18211ce0bd530e254843
Signed-off-by: Pranith Kumar K