glusterfs.git/libglusterfs/src/xlator.c, branch v3.12.5

core/memacct: save allocs in mem_acct_rec list

2017-12-08T14:40:31+00:00

With configure --enable-debug, add all object allocations
to a list in the corresponding mem_acct_rec. This
allows us to see all objects of a particular type
and allows for additional debugging in case of memory
leaks.

This is not compiled in by default and must be explicitly
enabled. It is intended to be used by developers.

> Change-Id: I7cf2dbeadecf994423d7e7591e85f18d2575cce8
> BUG: 1522662
> Signed-off-by: N Balachandran 

(cherry picked from commit 47d01546a1826dc14a8331ea8700015f1cfdc4db)
Change-Id: I7cf2dbeadecf994423d7e7591e85f18d2575cce8
BUG: 1523455
Signed-off-by: N Balachandran

mgtm/core : use sha hash function for volfile check

2017-07-10T05:07:11+00:00

We are storing the entire volfile and using this to check
volfile change. With brick multiplexing there will be lot
of graphs per process which will increase the memory foot
print of the process. So instead of storing the entire
graph we could use sha256 and we can compare the hash to
see whether volfile change happened or not.

Also with Brick multiplexing, the direct comparison of vol
file is not correct. There are two problems.

Problem 1:

We are currently storing one single graph (the last
updated volfile) whereas, what we need is the entire
graph with all atttached bricks.

If we fix this issue, we have second problem

Problem 2:
With multiplexing we have a graph that contains multiple
bricks. But what we are checking as part of the reconfigure
is, comparing the entire graph with one single graph,
which will always fail.

Solution:
We create list in glusterfs_ctx_t that stores sha256 hash
of individual brick graphs. When a graph changes happens
we compare the stored hash and the current hash. If the
hash matches, then no need for reconfigure. Otherwise we
first do the reconfigure and then update the hash.

For now, gfapi has not changed this way. Meaning when gfapi
volfile fetch or reconfigure happens, we still store the
entire graph and compare, each memory.

This is fine, because libgfapi will not load brick graphs.
But changing the libgfapi will make the code similar in
both glusterfsd-mgmt and api. Also it helps to reduce some
memory.

Change-Id: I9df917a771a52b95622ab8f63af34ec390163a77
BUG: 1467986
Signed-off-by: Mohammed Rafi KC 
Reviewed-on: https://review.gluster.org/17709
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Amar Tumballi

nl-cache: In case of nameless operations do not cache

2017-05-22T12:39:59+00:00

Issue:
In nameless lookup/other fops, parent inode will be NULL, when we try
to add the cache to the NULL inode, it causes a crash.

Hence handle the scenario of nameless fops, and do not cache/serve
the nameless fops.

Change-Id: I3b90f882ac89e6aaf3419db89e6f890797f37700
BUG: 1451588
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/17316
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

dht: send lookup on old name inside rename with bname and pargfid

2017-04-29T14:28:38+00:00

Inside rename, a lookup is done on the source name to make sure that
the file is there. But we used to do a gfid based lookup and hence,
even if the source name was renamed to a new name from some other client,
lookup will be successful as server3_3_lookup will fetch the new path
based on the gfid.

So even if the source file does not exist any more rename will carry on,
and as server3_3_link(destination is hashed to a different brick other
than source cached scenario) also does gfid based resolve, it wont
detect that the source name does not exist and hardlink creation will be
successful (since gfid based resolve will get the new dentry).

To solve this problem, do a name based lookup inside rename. So that
rename will fail right away if the source does not exist.

Change-Id: Ieba8bdd6675088dbf18de90ed4622df043d163bd
BUG: 1412135
Signed-off-by: Susant Palai 
Reviewed-on: https://review.gluster.org/16375
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: N Balachandran 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

xlator: do not call dlclose() when debugging

2017-04-07T17:17:12+00:00

Valgrind can not show the symbols if a .so after calling dlclose(). The
unhelpful ??? in the output gets resolved properly with this change:

  ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324
  ==25170==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
  ==25170==    by 0x52C7C0B: __gf_calloc (mem-pool.c:117)
  ==25170==    by 0x12B0638A: ???
  ==25170==    by 0x528FCE6: __xlator_init (xlator.c:472)
  ==25170==    by 0x528FE16: xlator_init (xlator.c:498)
  ==25170==    by 0x52DA8D6: glusterfs_graph_init (graph.c:321)
  ==25170==    by 0x52DB587: glusterfs_graph_activate (graph.c:695)
  ==25170==    by 0x5046407: glfs_process_volfp (glfs-mgmt.c:79)
  ==25170==    by 0x5043B9E: glfs_volumes_init (glfs.c:281)
  ==25170==    by 0x5044FEC: glfs_init_common (glfs.c:986)
  ==25170==    by 0x50451A7: glfs_init@@GFAPI_3.4.0 (glfs.c:1031)

By not calling dlclose(), the dynamically loaded .so is still available
upon program exit, and Valgrind is able to resolve the symbols. This
will add an additional leak, so dlclose() is called for normal builds,
but skipped when configuring with "./configure --enable-valgrind" or
passing the "run-with-valgrind" xlator option.

URL: http://valgrind.org/docs/manual/faq.html#faq.unhelpful
Change-Id: I2044e21b1b8fcce32ad1a817fdd795218f967731
BUG: 1425623
Signed-off-by: Niels de Vos 
Reviewed-on: https://review.gluster.org/16809
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Samikshan Bairagya 
Reviewed-by: Kaleb KEITHLEY

libglusterfs: provide standardized atomic operations

2017-04-05T13:14:26+00:00

The current macros ATOMIC_INCREMENT() and ATOMIC_DECREMENT() expect a
lock as first argument. There are at least two issues with this
approach:

  1. this lock is unused on architectures that have atomic operations
  2. some structures use a single lock for multiple variables

By defining a gf_atomic_t type, the unused lock can be removed, saving a
few bytes on modern architectures.

Because the gf_atomic_t type locates the lock for the variable (in case
of older architectures), each variable is protected the same on all
architectures. This makes the behaviour across all architectures more
equal (per variable locking, by a gf_lock_t or compiler optimization).

BUG: 1437037
Change-Id: Ic164892b06ea676e6a9566f8a98b7faf0efe76d6
Signed-off-by: Niels de Vos 
Reviewed-on: https://review.gluster.org/16963
Smoke: Gluster Build System 
Reviewed-by: Xavier Hernandez 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Jeff Darcy

libglusterfs: fix serious leak of xlator_t structures

2017-02-09T13:49:10+00:00

There's a lot of logic (and some long comments) around how to free
these structures safely, but then we didn't do it.  Now we do.

Change-Id: I9731ae75c60e99cc43d33d0813a86912db97fd96
BUG: 1420571
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16570
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Poornima G 
Reviewed-by: Shyamsundar Ranganathan

core: run many bricks within one glusterfsd process

2017-01-31T00:13:58+00:00

This patch adds support for multiple brick translator stacks running
in a single brick server process.  This reduces our per-brick memory usage by
approximately 3x, and our appetite for TCP ports even more.  It also creates
potential to avoid process/thread thrashing, and to improve QoS by scheduling
more carefully across the bricks, but realizing that potential will require
further work.

Multiplexing is controlled by the "cluster.brick-multiplex" global option.  By
default it's off, and bricks are started in separate processes as before.  If
multiplexing is enabled, then *compatible* bricks (mostly those with the same
transport options) will be started in the same process.

Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
BUG: 1385758
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/14763
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur

libglusterfs: serialize init/reconfigure calls

2017-01-06T04:38:49+00:00

These functions do not generally "expect" to be called more than once
in parallel, and many are likely to misbehave in that case (one case
in DHT already).  Such parallel calls have not generally happened
because there are only a few places where we call these functions, and
those have been implicitly serialized until recently.  However, recent
changes in the epoll layer change that, as does brick multiplexing.
Therefore, the serialization is now explicit at the init/reconfigure
level.

It would be sufficient to serialize calls to a particular translator's
init and reconfigure functions, but that would require per-translator
locks and a bit more complexity in maintaining/using them.  Since
there's no clear reason why we would need or want to support a higher
level of parallelism, the simpler approach of a global lock should
suffice.

Change-Id: I26296c2826e91dc00b7f0c2061bcc2964ef90c4c
BUG: 1399134
Signed-off-by: Jeff Darcy 
Reviewed-on: http://review.gluster.org/16030
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

features/shard: Fill loc.pargfid too for named lookups on individual shards

2016-11-08T11:05:27+00:00

On a sharded volume when a brick is replaced while IO is going on, named
lookup on individual shards as part of read/write was failing with
ENOENT on the replaced brick, and as a result AFR initiated name heal in
lookup callback. But since pargfid was empty (which is what this patch
attempts to fix), the resolution of the shards by protocol/server used
to fail and the following pattern of logs was seen:

Brick-logs:

[2016-11-08 07:41:49.387127] W [MSGID: 115009]
[server-resolve.c:566:server_resolve] 0-rep-server: no resolution type
for (null) (LOOKUP)
[2016-11-08 07:41:49.387157] E [MSGID: 115050]
[server-rpc-fops.c:156:server_lookup_cbk] 0-rep-server: 91833: LOOKUP(null)
(00000000-0000-0000-0000-000000000000/16d47463-ece5-4b33-9c93-470be918c0f6.82)
==> (Invalid argument) [Invalid argument]

Client-logs:
[2016-11-08 07:41:27.497687] W [MSGID: 114031]
[client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-0: remote
operation failed. Path: (null) (00000000-0000-0000-0000-000000000000)
[Invalid argument]
[2016-11-08 07:41:27.497755] W [MSGID: 114031]
[client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-1: remote
operation failed. Path: (null) (00000000-0000-0000-0000-000000000000)
[Invalid argument]
[2016-11-08 07:41:27.498500] W [MSGID: 114031]
[client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-2: remote
operation failed. Path: (null) (00000000-0000-0000-0000-000000000000)
[Invalid argument]
[2016-11-08 07:41:27.499680] E [MSGID: 133010]

Also, this patch makes AFR by itself choose a non-NULL pargfid even if
its ancestors fail to initialize all pargfid placeholders.

Change-Id: I5f85b303ede135baaf92e87ec8e09941f5ded6c1
BUG: 1392445
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15788
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System