glusterfs.git/tests/bugs/glusterd, branch devel

cluster/dht: use readdir for fix-layout in rebalance (#2243)

2021-03-22T04:49:27+00:00

Problem:
On a cluster with 15 million files, when fix-layout was started, it was
not progressing at all. So we tried to do a os.walk() + os.stat() on the
backend filesystem directly. It took 2.5 days. We removed os.stat() and
re-ran it on another brick with similar data-set. It took 15 minutes. We
realized that readdirp is extremely costly compared to readdir if the
stat is not useful. fix-layout operation only needs to know that the
entry is a directory so that fix-layout operation can be triggered on
it. Most of the modern filesystems provide this information in readdir
operation. We don't need readdirp i.e. readdir+stat.

Fix:
Use readdir operation in fix-layout. Do readdir+stat/lookup for
filesystems that don't provide d_type in readdir operation.

fixes: #2241
Change-Id: I5fe2ecea25a399ad58e31a2e322caf69fc7f49eb
Signed-off-by: Pranith Kumar K

core: Implement gracefull shutdown for a brick process (#1751)

2020-12-16T06:05:31+00:00

* core: Implement gracefull shutdown for a brick process

glusterd sends a SIGTERM to brick process at the time
of stopping a volume if brick_mux is not enabled.In case
of brick_mux at the time of getting a terminate signal
for last brick a brick process sends a SIGTERM to own
process for stop a brick process.The current approach
does not cleanup resources in case of either last brick
is detached or brick_mux is not enabled.

Solution: glusterd sends a terminate notification to a
brick process at the time of stopping a volume for gracefull
shutdown

Change-Id: I49b729e1205e75760f6eff9bf6803ed0dbf876ae
Fixes: #1749
Signed-off-by: Mohit Agrawal 

* core: Implement gracefull shutdown for a brick process

Resolve some reviwere comment
Fixes: #1749
Signed-off-by: Mohit Agrawal 

Change-Id: I50e6a9e2ec86256b349aef5b127cc5bbf32d2561

* core: Implement graceful shutdown for a brick process

Implement a key cluster.brick-graceful-cleanup to enable graceful
shutdown for a brick process.If key value is on glusterd sends a
detach request to stop the brick.

Fixes: #1749
Change-Id: Iba8fb27ba15cc37ecd3eb48f0ea8f981633465c3
Signed-off-by: Mohit Agrawal 

* core: Implement graceful shutdown for a brick process

Resolve reviewer comments
Fixes: #1749
Signed-off-by: Mohit Agrawal 

Change-Id: I2a8eb4cf25cd8fca98d099889e4cae3954c8579e

* core: Implement gracefull shutdown for a brick process

Resolve reviewer comment specific to avoid memory leak

Fixes: #1749
Change-Id: Ic2f09efe6190fd3776f712afc2d49b4e63de7d1f
Signed-off-by: Mohit Agrawal 

* core: Implement gracefull shutdown for a brick process

Resolve reviewer comment specific to avoid memory leak

Fixes: #1749
Change-Id: I68fbbb39160a4595fb8b1b19836f44b356e89716
Signed-off-by: Mohit Agrawal

glusterd/cli: enhance rebalance-status after replace/reset-brick (#1869)

2020-12-08T10:51:35+00:00

* glusterd/cli: enhance rebalance-status after replace/reset-brick

Rebalance status is being reset during replace/reset-brick operations.
This cause 'volume status' to shows rebalance as "not started".

Fix:
change rebalance-status to "reset due to (replace|reset)-brick"

Change-Id: I6e3372d67355eb76c5965984a23f073289d4ff23
Signed-off-by: Tamar Shacked 

* glusterd/cli: enhance rebalance-status after replace/reset-brick

Rebalance status is being reset during replace/reset-brick operations.
This cause 'volume status' to shows rebalance as "not started".

Fix: change rebalance-status to "reset due to (replace|reset)-brick"

Fixes: #1717
Signed-off-by: Tamar Shacked 

Change-Id: I1e3e373ca3b2007b5b7005b6c757fb43801fde33

* cli: changing rebal task ID to "None" in case status is being reset

Rebalance status is being reset during replace/reset-brick operations.
This cause 'volume status' to shows rebalance as "not started".

Fix:
change rebalance-status to "reset due to (replace|reset)-brick"

Fixes: #1717

Change-Id: Ia73a8bea3dcd8e51acf4faa6434c3cb0d09856d0
Signed-off-by: Tamar Shacked

glusterd: modify logic for checking hostname in add-brick (#1781)

2020-12-07T05:54:43+00:00

* glusterd: modify logic for checking hostname in add-brick

Problem: add-brick command parses only the bricks provided
in cli for a subvolume. If in same subvolume bricks are
increased, these are not checked with present volume bricks.

Fixes: #1779
Change-Id: I768bcf7359a008f2d6baccef50e582536473a9dc
Signed-off-by: Sheetal Pamecha 

* removed assignment of unused variable

Fixes: #1779
Change-Id: Id5ed776b28343e1225b9898e81502ce29fb480fa
Signed-off-by: Sheetal Pamecha 

* few more changes

Change-Id: I7bacedb984f968939b214f9d13546f4bf92e9df7
Signed-off-by: Sheetal Pamecha 

* few more changes

Change-Id: I7bacedb984f968939b214f9d13546f4bf92e9df7
Signed-off-by: Sheetal Pamecha 

* correction in last commit
Signed-off-by: Sheetal Pamecha 

Change-Id: I1fd0d941cf3f32aa6e8c7850def78e5af0d88782

io-stats: Configure ios_sample_buf_size based on sample_interval value (#1574)

2020-10-15T10:58:58+00:00

io-stats xlator declares a ios_sample_buf_size 64k object(10M) per xlator
but in case of sample_interval is 0 this big buffer is not required so
declare the default value only while sample_interval is not 0.The new
change would be helpful to reduce RSS size for a brick and shd process
while the number of volumes are huge.

Change-Id: I3e82cca92e40549355edfac32580169f3ce51af8
Fixes: #1542
Signed-off-by: Mohit Agrawal

glusterd: Fix Add-brick with increasing replica count failure

2020-09-23T12:11:46+00:00

Problem: add-brick operation fails with multiple bricks on same
server error when replica count is increased.

This was happening because of extra runs in a loop to compare
hostnames and if bricks supplied were less than "replica" count,
the bricks will get compared to itself resulting in above error.

Fixes: #1508
Change-Id: I8668e964340b7bf59728bb838525d2db062197ed
Signed-off-by: Sheetal Pamecha

tests: provide an option to mark tests as 'flaky'

2020-08-18T08:38:20+00:00

* also add some time gap in other tests to see if we get things properly
* create a directory 'tests/000/', which can host any tests, which are flaky.
* move all the tests mentioned in the issue to above directory.
* as the above dir gets tested first, all flaky tests would be reported quickly.
* change `run-tests.sh` to continue tests even if flaky tests fail.

Reference: gluster/project-infrastructure#72
Updates: #1000
Change-Id: Ifdafa38d083ebd80f7ae3cbbc9aa3b68b6d21d0e
Signed-off-by: Amar Tumballi

glusterd: getspec() returns wrong response when volfile not found

2020-07-21T06:31:18+00:00

In a cluster env: getspec() detects that volfile not found.
but further on, this return code is set by another call
so the error is lost and not handled.
As a result the server responds with ambiguous message:
{op_ret = -1, op_errno = 0..} - which cause the client to stuck.

Fix:
server side: don't override the failure error.

fixes: #1375
Change-Id: Id394954d4d0746570c1ee7d98969649c305c6b0d
Signed-off-by: Tamar Shacked

tests: added volume operations to increase code coverage

2020-05-26T13:15:46+00:00

Added test for volume options like
localtime-logging, fixed enable-shared-storage
to include function coverage and few negative
tests for other volume options to increase the
code coverage in the glusterd component.

Change-Id: Ib1706c1fd5bc98a64dcb5c8b15a121d639a597d7
Updates: #1052
Signed-off-by: nik-redhat

glusterd: add-brick command failure

2020-06-16T12:33:21+00:00

Problem: add-brick operation is failing when replica or disperse
count is not mentioned in the add-brick command.

Reason: with commit a113d93 we are checking brick order while
doing add-brick operation for replica and disperse volumes. If
replica count or disperse count is not mentioned in the command,
the dict get is failing and resulting add-brick operation failure.

fixes: #1306

Change-Id: Ie957540e303bfb5f2d69015661a60d7e72557353
Signed-off-by: Sanju Rakonde