glusterfs.git/glusterfsd/src/glusterfsd.c, branch v6.7

event: rename event_XXX with gf_ prefixed

2019-08-28T08:34:45+00:00

I hit one crash issue when using the libgfapi.

In the libgfapi it will call glfs_poller() --> event_dispatch()
in file api/src/glfs.c:721, and the event_dispatch() is defined
by libgluster locally, the problem is the name of event_dispatch()
is the extremly the same with the one from libevent package form
the OS.

For example, if a executable program Foo, which will also use and
link the libevent and the libgfapi at the same time, I can hit the
crash, like:

kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp
00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000]

The link for Foo is:
lib_foo_LADD = -levent $(GFAPI_LIBS)
It will crash.

This is because the glfs_poller() is calling the event_dispatch() from
the libevent, not the libglsuter.

The gfapi link info :
GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid

If I link Foo like:
lib_foo_LADD = $(GFAPI_LIBS) -levent
It will works well without any problem.

And if Foo call one private lib, such as handler_glfs.so, and the
handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't
and it will dlopen(handler_glfs.so), then the crash will be hit everytime.

The link info will be:
foo_LADD = -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like:
foo_LADD = $(GFAPI_LIBS) -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS.

And in some cases when the --as-needed link option is added(on many dists
it is added as default), then the crash is back again, the above workaround
won't work.

Backport of:
> https://review.gluster.org/#/c/glusterfs/+/23110/
> Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
> Fixes: #699
> Signed-off-by: Xiubo Li 

Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
updates: bz#1740525
Signed-off-by: Xiubo Li 
(cherry picked from commit 799edc73c3d4f694c365c6a7c27c9ab8eed5f260)

core: avoid dynamic TLS allocation when possible

2019-07-03T06:25:48+00:00

Some interdependencies between logging and memory management functions
make it impossible to use the logging framework before initializing
memory subsystem because they both depend on Thread Local Storage
allocated through pthread_key_create() during initialization.

This causes a crash when we try to log something very early in the
initialization phase.

To prevent this, several dynamically allocated TLS structures have
been replaced by static TLS reserved at compile time using '__thread'
keyword. This also reduces the number of error sources, making
initialization simpler.

Backport of:
> BUG: 1193929
> Change-Id: I8ea2e072411e30790d50084b6b7e909c7bb01d50
> Signed-off-by: Xavi Hernandez 

Updates: bz#1724210
Change-Id: I8ea2e072411e30790d50084b6b7e909c7bb01d50
Signed-off-by: Xavi Hernandez

core: Log level changes do not effect on running client process

2019-04-16T10:59:36+00:00

Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level
         per xlator during reconfigure only for a brick process not for
         the client process.

Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure
             about brick multiplex introudce a flag brick_mux at ctx->cmd_args.

Note: There are two other changes done with this patch
      1) Ignore client-log-level option to attach a brick with
         already running brick if brick_mux is enabled
      2) Add a log to print pid of the running process to make easier
         debugging

> Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9
> Signed-off-by: Mohit Agrawal 
> Fixes: bz#1696046
> (Cherry picked from commit 798aadbe51a9a02dd98a0f861cc239ecf7c8ed57)
> (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/22495/)

Change-Id: If91682830f894ab8f6857f19dcb1797fc15ca64c
Fixes: bz#1699715
Signed-off-by: Mohit Agrawal

glusterfsd: Multiple shd processes are spawned on brick_mux environment

2019-03-12T20:52:59+00:00

Problem: Multiple shd processes are spawned while starting volumes
         in the loop on brick_mux environment.glusterd spawn a process
         based on a pidfile and shd daemon is taking some time to
         update pid in pidfile due to that glusterd is not able to
         get shd pid

Solution: Commit cd249f4cb783f8d79e79468c455732669e835a4f changed
          the code to update pidfile in parent for any gluster daemon
          after getting the status of forking child in parent.To resolve
          the same correct the condition update pidfile in parent only
          for glusterd and for rest of the daemon pidfile is updated in
          child

> Change-Id: Ifd14797fa949562594a285ec82d58384ad717e81
> fixes: bz#1684404
> (Cherry pick from commit 66986594a9023c49e61b32769b7e6b260b600626)
> (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/22290/)

Change-Id: I9a68064d2da1acd0ec54b4071a9995ece0c3320c
fixes: bz#1683880
Signed-off-by: Mohit Agrawal

fuse: reflect the actual default for lru-limit option

2019-02-25T15:26:08+00:00

in both `--help` text and man page

updates: bz#1679998
Change-Id: I9aa9367c6863ac8e2403255280697c9e6be26cf0
Signed-off-by: Amar Tumballi

mount/fuse: expose auto-invalidation as a mount option

2019-02-02T03:07:35+00:00

Auto invalidation is necessary when same (meta)data is shared/access
across multiple mounts. However, if (meta)data is not shared, all
relevant I/O goes through the cache of single mount and hence is
coherent with (meta)data on bricks always. So, fuse-auto-invalidation
can be disabled for this case which gives a huge performance boost for
workloads that write data and then immediately read the data they just
wrote.

From glusterfs --help,


      --auto-invalidation[=BOOL]   controls whether fuse-kernel can
                             auto-invalidate attribute, dentry and page-cache.
                             Disable this only if same files/directories are
                             not accessed across two different mounts
                             concurrently [default: "on"]


Details on how disabling auto-invalidation helped to reduce pgbench
init times can be found at [1]. Time taken for pgbench init of scale
8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s)
with auto-invalidations turned off along with other
optimizations. Just disabling auto-invalidation contributed 56%
improvement by reducing the total time taken by 33260s.

[1] https://www.spinics.net/lists/gluster-devel/msg25907.html

Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce
Signed-off-by: Raghavendra Gowdappa 
updates: bz#1664934

core: Resolve dict_leak at the time of destroying graph

2019-01-14T04:01:54+00:00

Problem: In gluster code some of the places it call's get_new_dict
         to create a dictionary without taking reference so at the time
         of dict_unref it has become a leak

Solution: To resolve the same call dict_new instead of get_new_dict

updates bz#1650403
Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06
Signed-off-by: Mohit Agrawal

glusterd: kill the process without releasing the cleanup mutex lock

2019-01-02T07:23:14+00:00

Problem:
glusterd acquires a cleanup mutex lock before it starts
cleanup process, so that any other thread which tries to acquire
lock on any resource will be blocked on cleanup mutex lock.

We don't want any thread to try to acquire any resource, once
the cleanup is started. because other threads might try to acquire
lock on resources which are already freed by the thread which is
going though the cleanup phase.

previously we were releasing the cleanup mutex lock before the
process exit. As we are releasing the cleanup mutex lock, before
the process can exit some other thread which is blocked on
cleanup mutex lock is acquiring the cleanup mutex lock and
trying to acquire some resources which are already freed as a
part of cleanup. This is leading glusterd to crash.

Solution: We should exit the process without releasing the
cleanup mutex lock.

Change-Id: Ibae1c62260f141019017f7a547519a5d38dc2bb6
fixes: bz#1654270
Signed-off-by: Sanju Rakonde

posix: use synctask for janitor

2018-12-19T14:36:52+00:00

With brick mux, the number of threads increases as the number of
bricks increases. As an initiative to reduce the number of
threads in brick mux scenario, replacing janitor thread to use
synctask infra.

Now close() and closedir() handle by separate janitor
thread which is linked with glusterfs_ctx.

Updates #475
Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73
Signed-off-by: Poornima G

fuse: add --lru-limit option

2018-12-14T17:34:28+00:00

The inode LRU mechanism is moot in fuse xlator (ie. there is no
limit for the LRU list), as fuse inodes are referenced from
kernel context, and thus they can only be dropped on request of
the kernel. This might results in a high number of passive
inodes which are useless for the glusterfs client, causing a
significant memory overhead.

This change tries to remedy this by extending the LRU semantics
and allowing to set a finite limit on the fuse inode LRU.

A brief history of problem:

When gluster's inode table was designed, fuse didn't have any
'invalidate' method, which means, userspace application could
never ask kernel to send a 'forget()' fop, instead had to wait
for kernel to send it based on kernel's parameters. Inode table
remembers the number of times kernel has cached the inode based
on the 'nlookup' parameter. And 'nlookup' field is not used by
no other entry points (like server-protocol, gfapi etc).

Hence the inode_table of fuse module always has to have lru-limit
as '0', which means no limit. GlusterFS always had to keep all
inodes in memory as kernel would have had a reference to it.
Again, the reason for this is, kernel's glusterfs inode reference
was pointer of 'inode_t' structure in glusterfs. As it is a
pointer, we could never free it (to prevent segfault, or memory
corruption).

Solution:

In the inode table, handle the prune case of inodes with 'nlookup'
differently, and call a 'invalidator' method, which in this case is
fuse_invalidate(), and it sends the request to kernel for getting
the forget request.

When the kernel sends the forget, it means, it has dropped all
the reference to the inode, and it will send the forget with the
'nlookup' parameter too. We just need to make sure to reduce the
'nlookup' value we have when we get forget. That automatically
cause the relevant prune to happen.

Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B

fixes: bz#1560969
Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2
Signed-off-by: Amar Tumballi