glusterfs.git/api/src/glfs-internal.h, branch v3.11.0

gfapi/handleops: Introducing glfs_xreaddirplus_r() fop for handleops

2017-05-05T11:12:48+00:00

Its known that readdirplus operation fetches stat as well for each of the
dirents. But often applications may need extra information, like for eg.,
NFS-Ganesha which operates on handles needs handles for each of those
dirents returned. So this would require extra calls to the backend, in this
case LOOKUP (which is very expensive operation) resulting in very low
readdir performance.

To address that introducing this new API using which applications can
make request for any extra information to be returned as part of
readdirplus response.

Currently this new api returns stat and handles as demanded by application.
The synopsis of the API is noted in glfs.h.

@todo:
* Enhance test script using this new API

Below were the perf results on single brick volume with and without
these changes -

Dataset used -
10*100 directories and each directory containing 100 empty files.

I used NFS-Ganesha application to test these changes -
>for i in {1..5}; do systemctl restart nfs-ganesha; sleep 10; mount -t nfs -o vers=4 localhost:/brick_vol /mnt; cd /mnt; echo "ITERATION$i"; date; find . > tmp-nfs.log; date; cd /; umount /mnt; sleep 2; done;

Without these changes -
ITERATION1
Mon Mar 20 17:22:26 IST 2017
Mon Mar 20 17:23:18 IST 2017
ITERATION2
Mon Mar 20 17:23:39 IST 2017
Mon Mar 20 17:24:28 IST 2017
ITERATION3
Mon Mar 20 17:24:49 IST 2017
Mon Mar 20 17:25:36 IST 2017
ITERATION4
Mon Mar 20 17:30:57 IST 2017
Mon Mar 20 17:31:37 IST 2017
ITERATION5
Mon Mar 20 17:31:57 IST 2017
Mon Mar 20 17:32:40 IST 2017
[root@dhcp35-197 /]#

On an average ~46.2 sec

With these changes applied -
ITERATION1
Mon Mar 20 17:35:03 IST 2017
Mon Mar 20 17:35:15 IST 2017
ITERATION2
Mon Mar 20 17:35:36 IST 2017
Mon Mar 20 17:35:46 IST 2017
ITERATION3
Mon Mar 20 17:36:06 IST 2017
Mon Mar 20 17:36:17 IST 2017
ITERATION4
Mon Mar 20 17:41:38 IST 2017
Mon Mar 20 17:41:49 IST 2017
ITERATION5
Mon Mar 20 17:42:10 IST 2017
Mon Mar 20 17:42:20 IST 2017

On an average ~10.8 sec

This is backport of below upstream patch -
        https://review.gluster.org/15663

>Updates #174
>BUG: 1442950
>Change-Id: I0f74f74dc62085ca4c4a23c38e3edc84bd850876
>Signed-off-by: Soumya Koduri 
>Reviewed-on: https://review.gluster.org/15663
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Niels de Vos 
>CentOS-regression: Gluster Build System 

BUG: 1447571
Change-Id: I0f74f74dc62085ca4c4a23c38e3edc84bd850876
Signed-off-by: Soumya Koduri 
Reviewed-on: https://review.gluster.org/17164
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

gfapi: glfs_subvol_done should NOT wait for graph migration.

2016-11-30T07:52:42+00:00

In graph_setup function glfs_subvol_done is called which
is executed in an epoll thread. glfs_lock waits on other
thread to finish graph migration. This can lead to dead lock
if we consume all the epoll threads.

In general any call-back function executed in epoll thread
should not call any blocking call which waits on a network
reply either directly or indirectly, e.g. syncop functions
should not be called in these threads.

As a fix we should not wait for migration in the call-back path.

Change-Id: If96d0689fe1b4d74631e383048cdc30b01690dc2
BUG: 1397754
Signed-off-by: Rajesh Joseph 
Reviewed-on: http://review.gluster.org/15913
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
CentOS-regression: Gluster Build System

gfapi: redesign the public interface for upcall consumers

2016-09-28T18:00:38+00:00

The glfs_callback_arg and glfs_callback_inode_arg were allocated by
gfapi, and expected to be free()'d by the application. However it is not
reasonable to expect that applications use the same memory allocator to
as the compiled libgfapi.so. For instance, it is possible that gfapi
uses glibc malloc/free, and an application like NFS-Ganesha the versions
from jemalloc. Mismatching of the malloc() and free() functions causes
segmentation faults at best.

In order to prevent problems like this in the future, the API for
applications that consume upcalls has been remodeled. Any of the
structures that gfapi allocates, should be free'd with glfs_free(). The
members of the structures can not be accessed directly anymore, each
has its own function to access now.

Correcting the naming of the functions, structures and constants is a
continuation of commit 2775dc64101ed37c8d9809bf9852dbf0746ee2b6. These
new improvements not only have correct prefixes for the functions and
structures, the naming also reflects more to the upcall framework and
does not use "callback" anymore.

Change-Id: I2b8bd5a0a82036d2abea1a217f5e5975a1d4fe93
BUG: 1344714
Signed-off-by: Niels de Vos 
Reviewed-on: http://review.gluster.org/14701
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kaleb KEITHLEY 
Reviewed-by: soumya k 
Reviewed-by: jiffin tony Thottan

md-cache: Register the list of xattrs with cache-invalidation

2016-08-31T06:07:01+00:00

Issue:
md-cache caches a specified list of xattrs, and when cache invalidation
is enabled, it makes sense to recieve invalidation only when those xattrs
are modified by other clients. But the current implementation of upcall
is that, it will send invalidation when any of the on-disk xattrs is modified.

Solution:
md-cache sends a list of xattrs that it is interested in, to upcall by
issuing an ipc(). The challenge here is to make sure everytime a brick
goes offline and comes back up, the ipc() needs to be issued to the
bricks. Hence ipc() is sent from md-cache every time there is a
CHILD_UP/CHILD_MODIFIED event.

TODO:
There will be patches following, in cluster xlators, to implement ipc fop.

Change-Id: I6efcf3df474f5ce6eabd3d6694c00c7bd89bc25d
BUG: 1211863
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15002
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Rajesh Joseph 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Prashanth Pai 
Reviewed-by: Raghavendra G

gfapi: do not cache upcalls if the application is not interested

2016-08-25T20:35:43+00:00

When the volume option 'features.cache-invalidation' is enabled, upcall
events are sent from the brick process to the client. Even if the client
is not interested in upcall events itself, md-cache or other xlators may
benefit from them.

By adding a new 'cache_upcalls' boolean in the 'struct glfs', we can
enable the caching of upcalls when the application called
glfs_h_poll_upcall(). NFS-Ganesha sets up a thread for handling upcalls
in the initialization phase, and calls glfs_h_poll_upcall() before any
NFS-client accesses the NFS-export.

In the future there will be a more flexible registration API for
enabling certain kind of upcall events. Until that is available, this
should work just fine.

Verificatio of this change is not trivial within our current regression
test framework. The bug report contains a description on how to reliably
reproduce the problem with the glusterfs-coreutils.

Change-Id: I818595c92db50e6e48f7bfe287ee05103a4a30a2
BUG: 1368842
Signed-off-by: Niels de Vos 
Reviewed-on: http://review.gluster.org/15191
Smoke: Gluster Build System 
Reviewed-by: Poornima G 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: soumya k 
Reviewed-by: Kaleb KEITHLEY

gfapi: Fix IO error caused when there is consecutive graph switches

2016-08-10T10:40:58+00:00

This is part 2 of the fix, the part 1 can be found at:
http://review.gluster.org/#/c/14656/

Problem:
=======
Consider a race between, __glfs_active_subvol() and graph_setup().
Lets say @TIME T1:
fs->active_subvol = A
fs->next_subvol = B
__glfs_active_subvol()                //under lock fs->mutex
{
  ....
  new_subvol = fs->next_subvol       //which is B
  ....                               //Start migration from A to B
  __glfs_first_lookup(){
     ....
     unlock fs->mutex                //@TIME T2
     network fop
     lock fs->mutex
     ....
  }
  ....                                //migration continue on B
  fs->active_subvol = fs->next_subvol //which is C (explained below)
  ....
}

@Time T2, lets say in another thread, graph_setup() is called with C,
note that at T2, fs->mutex is unlocked.

graph_stup(C...)
{
  lock fs->mutex
  ....
  if (fs->next_subvol)                // which is B
      destroy subvol (fs->next_subvol)
  ....
  fs->next_subvol = C
  ....
  unlock fs->mutex
}

Thus at the end of this,
fs->old_subvol = A;
fs->active_subvol = C;
fs->next_subvol = NULL;
which is wrong, as B completed migration, but was destroyed by
graph_setup, and C never was migrated.

Solution:
=========
Any new graph can be in one of the 2 states:
- Picked for migration, migration in progress (fs->mip_subvol)
- Not picked so far for migration (fs->next_subvol)
graph_setup() updates fs->next_subvol only, __glfs_active_subvol()
moves fs->next_subvol to fs->mip_subvol and fs->next_subvol = NULL
atomically, and then once the migration is complete, make that the
fs->active_subvol

Change-Id: Ib6ff0565105c5eedb912a43da4017cd413243612
BUG: 1343038
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/14722
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Raghavendra Talur 
Reviewed-by: Rajesh Joseph 
Reviewed-by: Niels de Vos

libgfapi/upcall : prepend "glfs_" to callback_arg, callback_inode_arg

2016-06-10T21:09:32+00:00

Change-Id: I371525775db4f6a4d69beb94baaa53d17b16fb41
BUG: 1344714
Signed-off-by: Jiffin Tony Thottan 
Reviewed-on: http://review.gluster.org/14702
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Jeff Darcy 
Tested-by: Jeff Darcy 
Smoke: Gluster Build System

libgfapi: glfd close is not correctly handled for async fop

2016-02-11T05:46:30+00:00

There is chance that before the async fop is complete client can send
a close. libgfapi destroys glfd on close. Therefore it can lead to
crash or unexpected behaviour when the pening fop reaches libgfapi
layer. Currently we don't provide any api to cancel these outstanding
fops neither we check if the glfd is already closed or not.

Therefore as a fix provided refcount for glfd. Each fop (sync or async)
will take a ref and once the fop is complete it will unref the refcount.
We should not call the registered callback function if glfd is already
closed. To achieve this we maintain state of glfd so that we can safely
take a call if the fd is closed or not.

Change-Id: Ibe71b2225312db3f1be66b244fcf8826c70c357d
BUG: 1303995
Signed-off-by: Rajesh Joseph 
Reviewed-on: http://review.gluster.org/13340
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan 
NetBSD-regression: NetBSD Build System

api: Fix errno being set to EINVAL even on success

2016-01-05T15:41:10+00:00

BUG: 1289068
Change-Id: I7905ac70a537f23e1844c097a24eaa6cb762fb82
Signed-off-by: Prashanth Pai 
Reviewed-on: http://review.gluster.org/12909
Tested-by: NetBSD Build System 
Tested-by: Gluster Build System 
Reviewed-by: jiffin tony Thottan 
Reviewed-by: Kaushal M 
Reviewed-by: Shyamsundar Ranganathan

libgfapi: non-default symbol version macros are incorrect

2015-08-19T16:29:13+00:00

default symbol versions are in form glfs_h_lookupat@@GFAPI_2.7.4,
versus old, non-default versions are in the form glfs_h_lookup@GFAPI_2.4.2

I.e. "@@" versus "@"

Change-Id: I88a6b129558c0b3a6064de7620b3b20425e80bc9
BUG: 1254863
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: http://review.gluster.org/11955
Tested-by: NetBSD Build System 
Tested-by: Gluster Build System 
Reviewed-by: Niels de Vos