glusterfs.git/libglusterfs/src/glusterfs, branch release-9

posix: fix chmod error on symlinks (#2158)

2021-02-12T14:59:36+00:00

After glibc 2.32, lchmod() is returning EOPNOTSUPP instead of ENOSYS when
called on symlinks. The man page says that the returned code is ENOTSUP.
They are the same in linux, but this patch correctly handles all errors.

Fixes: #2154
Change-Id: Ib3bb3d86d421cba3d7ec8d66b6beb131ef6e0925
Signed-off-by: Xavi Hernandez

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail (#1614)

2020-11-30T12:09:53+00:00

During glusterd handshake glusterd received a volume dictionary
from peer end to compare the own volume dictionary data.If the options
are differ it sets the key to recognize volume options are changed
and call import syntask to delete/start the volume.In brick_mux
environment while number of volumes are high(5k) the dict api in function
glusterd_compare_friend_volume takes time because the function
glusterd_handle_friend_req saves all peer volume data in a single dictionary.
Due to time taken by the function glusterd_handle_friend RPC requests receives
a call_bail from a peer end gluster(CLI) won't be able to show volume status.

Solution: To optimize the code done below changes
1) Populate a new specific dictionary to save the peer end version specific
   data so that function won't take much time to take the decision about the
   peer end has some volume updates.
2) In case of volume has differ version set the key in status_arr instead
   of saving in a dictionary to make the operation is faster.

Note: To validate the changes followed below procedure
1) Setup 5100 distributed volumes 3x1
2) Enable brick_mux
3) Start all the volumes
4) Kill all gluster processes on 3rd node
5) Run a loop to update volume option on a 1st node
   for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done
6) Start the glusterd process on the 3rd node
7) Wait to finish handshake and check there should not be any call_bail message
   in the logs

Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1
Fixes: #1613
Signed-off-by: Mohit Agrawal

core: tcmu-runner process continuous growing logs lru_size showing -1 (#1776)

2020-11-24T09:59:58+00:00

* core: tcmu-runner process continuous growing logs lru_size showing -1

At the time of calling inode_table_prune it checks if current lru_size
is greater than lru_limit but lru_list is empty it throws a log message
"Empty inode lru list found but with (%d) lru_size".As per code reading
it seems lru_size is out of sync with the actual number of inodes in
lru_list. Due to throwing continuous error messages entire disk is
getting full and the user has to restart the tcmu-runner process to use
the volumes.The log message was introduce by a patch
https://review.gluster.org/#/c/glusterfs/+/15087/.

Solution: Introduce a flag in_lru_list to take decision about inode is
          being part of lru_list or not.
Fixes: #1775
Signed-off-by: Mohit Agrawal 

Change-Id: I4b836bebf4b5db65fbf88ff41c6c88f4a7ac55c1

* core: tcmu-runner process continuous growing logs lru_size showing -1
Update in_lru_list flag only while modify lru_size

Fixes: #1775
Change-Id: I3bea1c6e748b4f50437999bae59edeb3d7677f47
Signed-off-by: Mohit Agrawal 

* core: tcmu-runner process continuous growing logs lru_size showing -1

Resolve comments in inode_table_destroy and inode_table_prune

Fixes: #1775
Change-Id: I5aa4d8c254f0fe374daa5ec604f643dea8dd56ff
Signed-off-by: Mohit Agrawal moagrawa@redhat.com

* core: tcmu-runner process continuous growing logs lru_size showing -1

Update in_lru_list only while update lru_size

Fixes: #1775
Change-Id: I950eb1f0010c3d4bcc44a33225a502d2291d1a83
Signed-off-by: Mohit Agrawal

enahancement/debug: Option to generate core dump without killing the process (#1814)

2020-11-23T02:39:44+00:00

Comments and idea proposed by: Xavi Hernandez(jahernan@redhat.com):

On production systems sometimes we see a log message saying that an assertion
has failed. But it's hard to track why it failed without additional information
(on debug builds, a GF_ASSERT() generates a core dump and kills the process,
so it can be used to debug the issue, but many times we are only able to
reproduce assertion failures on production systems, where GF_ASSERT() only logs
a message and continues).

In other cases we may have a core dump caused by a bug, but the core dump doesn't
necessarily happen when the bug has happened. Sometimes the crash happens so much
later that the causes that triggered the bug are lost. In these cases we can add
more assertions to the places that touch the potential candidates to cause the bug,
but the only thing we'll get is a log message, which may not be enough.

One solution would be to always generate a core dump in case of assertion failure,
but this was already discussed and it was decided that it was too drastic. If a
core dump was really needed, a new macro was created to do so: GF_ABORT(),
but GF_ASSERT() would continue to not kill the process on production systems.

I'm proposing to modify GF_ASSERT() on production builds so that it conditionally
triggers a signal when a debugger is attached. When this happens, the debugger
will generate a core dump and continue the process as if nothing had happened.
If there's no debugger attached, GF_ASSERT() will behave as always.

The idea I have is to use SIGCONT to do that. This signal is harmless, so we can
unmask it (we currently mask all unneeded signals) and raise it inside a GF_ASSERT()
when some global variable is set to true.

To produce the core dump, run the script under extras/debug/gfcore.py on other
terminal. gdb breaks and produces coredump when GF_ASSERT is hit.

The script is copied from #1810 which is written by Xavi Hernandez(jahernan@redhat.com)

Fixes: #1810
Change-Id: I6566ca2cae15501d8835c36f56be4c6950cb2a53
Signed-off-by: Vinayakswami Hariharmath

posix: Attach a posix_spawn_disk_thread with glusterfs_ctx (#1595)

2020-11-09T11:45:42+00:00

Currently posix xlator spawns posix_disk_space_threads per brick and in
case of brick_mux environment while glusterd attached bricks at maximum
level(250) with a single brick process in that case 250 threads are
spawned for all bricks and brick process memory size also increased.

Solution: Attach a posix_disk_space thread with glusterfs_ctx to
          spawn a thread per process basis instead of spawning a per brick

Fixes: #1482
Change-Id: I8dd88f252a950495b71742e2a7588bd5bb019ec7
Signed-off-by: Mohit Agrawal moagrawa@redhat.com

xlators: misc conscious language changes (#1715)

2020-11-02T12:33:01+00:00

core:change xlator_t->ctx->master to xlator_t->ctx->primary
afr: just changed comments.
meta: change .meta/master to .meta/primary. Might break scripts.
changelog: variable/function name changes only.

These are unrelated to geo-rep.
Fixes: #1713

Change-Id: I58eb5fcd75d65fc8269633acc41313503dccf5ff
Signed-off-by: Ravishankar N

libglusterfs/coverity: pointer to local outside the scope (#1675)

2020-10-21T10:44:29+00:00

issue: gf_store_read_and_tokenize() returns the address
of the locally referred string.

fix: pass the buf to gf_store_read_and_tokenize() and
use it for tokenize.

CID: 1430143

Updates: #1060

Change-Id: Ifc346540c263f58f4014ba2ba8c1d491c20ac609
Signed-off-by: Vinayakswami Hariharmath

core: configure optimum inode table hash_size for shd (#1576)

2020-10-11T05:26:57+00:00

In brick_mux environment a shd process consume high memory.
After print the statedump i have found it allocates 1M per afr xlator
for all bricks.In case of configure 4k volumes it consumes almost total
6G RSS size in which 4G consumes by inode_tables

[cluster/replicate.test1-replicate-0 - usage-type gf_common_mt_list_head memusage]
size=1273488
num_allocs=2
max_size=1273488
max_num_allocs=2
total_allocs=2

inode_new_table function allocates memory(1M) for a list of inode and dentry hash.
For shd lru_limit size is 1 so we don't need to create a big hash table so to reduce
RSS size for shd process pass optimum bucket count at the time of creating inode_table.

Change-Id: I039716d42321a232fdee1ee8fd50295e638715bb
Fixes: #1538
Signed-off-by: Mohit Agrawal

rpcsvc: Add latency tracking for rpc programs

2020-09-04T05:14:17+00:00

Added latency tracking of rpc-handling code. With this change we
should be able to monitor the amount of time rpc-handling code is
consuming for each of the rpc call.

fixes: #1466
Change-Id: I04fc7f3b12bfa5053c0fc36885f271cb78f581cd
Signed-off-by: Pranith Kumar K

build: extend --enable-valgrind to support Memcheck and DRD

2020-08-31T11:17:24+00:00

Extend '-enable-valgrind' to '--enable=valgrind[=memcheck,drd]'
to enable Memcheck or DRD Valgrind tool, respectively.

Change-Id: I80d13d72ba9756e0cbcdbeb6766b5c98e3e8c002
Signed-off-by: Dmitry Antipov 
Updates: #1002