glusterfs.git/libglusterfs/src/stack.h, branch v3.12.5

groups: don't allocate auxiliary gid list on stack

2017-07-06T18:26:08+00:00

When glusterfs wants to retrieve the list of auxiliary gids
of a user, it typically allocates a sufficiently big gid_t
array on stack and calls getgrouplist(3) with it. However,
"sufficiently big" means to be of maximum supported gid list
size, which in GlusterFS is GF_MAX_AUX_GROUPS = 64k.
That means a 64k * sizeof(gid_t) = 256k allocation, which is
big enough to overflow the stack in certain cases.

A further observation is that stack allocation of the gid list
brings no gain, as in all cases the content of the gid list
eventually gets copied over to a heap allocated buffer.

So we add a convenience wrapper of getgrouplist to libglusterfs
called gf_getgrouplist which calls getgrouplist with a sufficiently
big heap allocated buffer (it takes care of the allocation too).
We are porting all the getgrouplist invocations to gf_getgrouplist
and thus eliminate the huge stack allocation.

BUG: 1464327
Change-Id: Icea76d0d74dcf2f87d26cb299acc771ca3b32d2b
Signed-off-by: Csaba Henk 
Reviewed-on: https://review.gluster.org/17706
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
Reviewed-by: Amar Tumballi 
CentOS-regression: Gluster Build System

multiple: fix struct/typedef inconsistencies

2017-06-30T10:31:02+00:00

The most common pattern, both in our code and elsewhere, is this:

   struct _xyz {
      ...
   };
   typedef struct _xyz xyz_t;

These exceptions - especially call_frame/call_stack - have been slowing
down code navigation for years.  By converging on a single pattern,
navigating from xyz_t in code to the actual definition of struct _xyz
(i.e. without having to visit the typedef first) might even be
automatable.

Change-Id: I0e5dd1f51f98e000173c62ef4ddc5b21d9ec44ed
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/17650
Smoke: Gluster Build System 
Tested-by: Jeff Darcy 
CentOS-regression: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Niels de Vos

libglusterfs/stack.h: reduce duplication of code

2017-04-27T14:54:40+00:00

* Use STACK_UNWIND_STRICT everywhere.
* Provide STACK_WIND_COMMON as both STACK_WIND_COOKIE
  and STACK_WIND differ by just 1 line and 1 option.

Updates gluster/glusterfs#137

Change-Id: Ifbb6b9c4702b02f4a02834824f509fd10c78f0ce
Signed-off-by: Amar Tumballi 
Reviewed-on: https://review.gluster.org/16915
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

dht: At places needed use STACK_WIND_COOKIE

2017-01-09T18:37:02+00:00

Issue:
frame has a void * cookie pointer.
In case of STACK_WIND_COOKIE frame->cookie is assigned
to what is sent by the caller.
In case of STACK_WIND frame->cookie is assigned to point
point to the frame itself.

For ease of coding, at many places, the cookie in the cbk
is used to get the pointer to the next xl. This is
inconsistent when STACK_WIND_TAIL comes into picture.

Eg: dht_setxattr () {
    for (i = 0 ; i < conf->subvolume_cnt ; i++) {
       STACK_WIND (..dht_checking_pathinfo_cbk,
                   conf->subvolumes[i] ..);
    }

    dht_checking_pathinfo_cbk (...void *cookie...) {
        prev = cookie;
        ...
        for (i = 0; i < conf->subvolume_cnt; i++) {
            if (conf->subvolumes[i] == prev->this) {
                 ...
            }
        }
    }

    Consider the below graph:
    dht (parent)
    readdir-ahead  => Doesn't define setxattr and uses STACK_WIND_TAIL
    protocol-client

    With this graph, when dht_checking_pathinfo_cbk is called,
    cookie will have frame pointing to protocol-client.
    i.e. prev->this will be protocol-client. But dht was expecting
    it to be readdir-ahead as it has stored in conf->subvolumes[i]

Solution:
    Hence, as a thumb rule, if cbk is using cookie, then we explicitly
    call STACK_WIND_COOKIE.

Change-Id: I83aea1e24c809c5a91a0db7283e908e125471bd4
BUG: 1401812
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/16039
Reviewed-by: Raghavendra G 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

libglusterfs: Fix a read hang

2016-11-29T03:59:08+00:00

Issue:
=====
In certain cases, there was no unwind of read
from read-ahead xlator, thus resulting in hang.

RCA:
====
In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead
of STACK_WIND(). One such case is when inode_ctx for that file
is not present (can happen if readdirp was called, and populates
md-cache and serves all the lookups from cache).

Consider the following graph:
...
io-cache (parent)
   |
readdir-ahead
   |
read-ahead
...

Below is the code snippet of ioc_readv calling STACK_WIND_TAIL:
ioc_readv()
{
...
 if (!inode_ctx)
   STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this),
                    FIRST_CHILD (frame->this)->fops->readv, fd,
                    size, offset, flags, xdata);
   /* Ideally, this stack_wind should wind to readdir-ahead:readv()
      but it winds to read-ahead:readv(). See below for
      explaination.
    */
...
}

STACK_WIND_TAIL (frame, obj, fn, ...)
{
  frame->this = obj;
  /* for the above mentioned graph, frame->this will be readdir-ahead
   * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which
   * is as expected
   */
  ...
  THIS = obj;
  /* THIS will be read-ahead instead of readdir-ahead!, as obj expands
   * to "FIRST_CHILD (frame->this)" and frame->this was pointing
   * to readdir-ahead in the previous statement.
   */
  ...
  fn (frame, obj, params);
  /* fn will call read-ahead:readv() instead of readdir-ahead:readv()!
   * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and
   * frame->this was pointing ro readdir-ahead in the first statement
   */
  ...
}

Thus, the readdir-ahead's readv() implementation will be skipped, and
ra_readv() will be called with frame->this = "readdir-ahead" and
this = "read-ahead". This can lead to corruption / hang / other problems.
But in this perticular case, when 'frame->this' and 'this' passed
to ra_readv() doesn't match, it causes ra_readv() to call ra_readv()
again!. Thus the logic of read-ahead readv() falls apart and leads to
hang.

Solution:
=========
Modify STACK_WIND_TAIL() as:
STACK_WIND_TAIL (frame, obj, fn, ...)
{
  next_xl = obj /* resolve obj as the variables passed in obj macro
                   can be overwritten in the further instrucions */
  next_xl_fn = fn /* resolve fn and store in a tmp variable, before
                     modifying any variables */
  frame->this = next_xl;
  ...
  THIS = next_xl;
  ...
  next_xl_fn (frame, next_xl, params);
  ...
}

As a part of http://review.gluster.org/15901/ the caller io-cache
was fixed.

BUG: 1388292
Change-Id: Ie662ac8f18fa16909376f1e59387bc5b886bd0f9
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15923
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Rajesh Joseph 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

protocol/server: Print error-xlator name

2016-11-16T10:56:05+00:00

Problem:
At the moment from which xlator the errors are stemming from is a mystery.

Fix:
With this patch we can find on the server side which xlator first gave
the errno received by server xlator. I am not yet sure how to get this
done for client side which has lot of copy_frame()s. May be another
patch.

Change-Id: Ie13307b965facf2a496123e81ce0bd6756f98ac9
BUG: 1394548
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15836
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Vijay Bellur 
CentOS-regression: Gluster Build System

libglusterfs: Add debug and trace logs for stack trace

2016-04-27T19:26:31+00:00

It has become very difficult to identify the xlator which returned
negative op_ret. Being able to just change the log level and
visualize the stack is helpful in such cases.

Change-Id: I6545b4802c1ab4d0d230d5e9e036afb2384882e1
BUG: 1330052
Signed-off-by: Raghavendra Talur 
Reviewed-on: http://review.gluster.org/13448
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Rajesh Joseph 
Reviewed-by: Vijay Bellur

debug/io-stats: Add FOP sampling feature

2015-11-01T17:14:34+00:00

Summary:
- Using sampling feature you can record details about every Nth FOP.
  The fields in each sample are: FOP type, hostname, uid, gid, FOP priority,
  port and time taken (latency) to fufill the request.
- Implemented using a ring buffer which is not (m/c) allocated in the IO path,
  this should make the sampling process pretty cheap.
- DNS resolution done @ dump time not @ sample time for performance w/
  cache
- Metrics can be used for both diagnostics, traffic/IO profiling as well
  as P95/P99 calculations
- To control this feature there are two new volume options:
  diagnostics.fop-sample-interval - The sampling interval, e.g. 1 means
  sample every FOP, 100 means sample every 100th FOP
  diagnostics.fop-sample-buf-size - The size (in bytes) of the ring
  buffer used to store the samples.  In the even more samples
  are collected in the stats dump interval than can be held in this buffer,
  the oldest samples shall be discarded.  Samples are stored in the log
  directory under /var/log/glusterfs/samples.
- Uses DNS cache written by sshreyas@fb.com (Thank-you!), the DNS cache
  TTL is controlled by the diagnostics.stats-dnscache-ttl-sec option
  and defaults to 24hrs.

Test Plan:
- Valgrind'd to ensure it's leak free
- Run prove test(s)
- Shadow testing on 100+ brick cluster

Change-Id: I9ee14c2fa18486b7efb38e59f70687249d3f96d8
BUG: 1271310
Signed-off-by: Jeff Darcy 
Reviewed-on: http://review.gluster.org/12210
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

mem-pool,stack,store,syncop,timer/libglusterfs : Porting to a new logging framework

2015-06-27T05:52:01+00:00

Change-Id: Idd3dcaf7eeea5207b3a5210676ce3df64153197f
BUG: 1194640
Signed-off-by: Mohamed Ashiq 
Reviewed-on: http://review.gluster.org/10827
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: NetBSD Build System

stack: use list_head for managing frames

2015-06-22T07:41:43+00:00

PROBLEM
--------

statedump requests that traverse call frames of all call stacks in
execution may race with a STACK_RESET on a stack.  This could crash the
corresponding glusterfs process. For e.g, recently we observed this in a
regression test case tests/basic/afr/sparse-self-heal.t.

FIX
---

gf_proc_dump_pending_frames takes a (TRY_LOCK) call_pool->lock before
iterating through call frames of all call stacks in progress.  With this
fix, STACK_RESET removes its call frames under the same lock.

Additional info
----------------

This fix makes call_stack_t to use struct list_head in place of custom
doubly-linked list implementation. This makes call_frame_t manipulation
easier to maintain in the context of STACK_WIND et al.

BUG: 1229658
Change-Id: I7e43bccd3994cd9184ab982dba3dbc10618f0d94
Signed-off-by: Krishnan Parthasarathi 
Reviewed-on: http://review.gluster.org/11095
Reviewed-by: Niels de Vos 
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: NetBSD Build System