glusterfs.git/tests/basic/ec, branch v3.12.5

cluster/ec: Test script failing with brick multiplexing enabled

2017-07-18T11:51:15+00:00

Problem:
Killing the bricks(using kill signal) in test scripts will
result in test failures with brick multiplexing enabled.

Solution:
Updated the script to use kill_brick function to bring down
the bricks.

BUG: 1472094
Change-Id: Ibbf1fdc1be660ad3cd93e95af2838c0aae0181af
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17809
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: Non-disruptive upgrade on EC volume fails

2017-07-14T00:26:04+00:00

Problem:
Enabling optimistic changelog on EC volume was not
handling node down scenarios appropriately resulting
in volume data inaccessibility.

Solution:
Update dirty xattr appropriately on good bricks whenever
nodes are down. This would fix the metadata information
as part of heal and thus ensures data accessibility.

BUG: 1468261
Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17703
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: correctly handle end of file for seek

2017-07-06T06:17:44+00:00

When a SEEK_HOLE was issued near to the end of file, sometimes an
offset beyond the end of file was returned. Another problem was that
using some offsets greater than the end of file returned successfully
instead of failing with ENXIO.

Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5
BUG: 1449348
Signed-off-by: Xavier Hernandez 
Reviewed-on: https://review.gluster.org/17228
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Pranith Kumar Karampuri

ec: fix ec-data-heal.t failure

2017-06-12T16:35:22+00:00

With brick mux enabled, this test was constantly failing. Further it was
found that the the test does a series of killing a particular brick and
bringing it up in cmdline where as just starting the volume with force
would suffice.

Change-Id: Iee491d0777eaa28dca5c78f92d4b400fcc897fd2
BUG: 1460638
Signed-off-by: Atin Mukherjee 
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/17508
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

cluster/ec: Update xattr and heal size properly

2017-06-06T14:41:52+00:00

Problem-1 : Recursive healing of same file is happening
when IO is going on even after data heal completes.

Solution:
RCA: At the end of the write, when ec_update_size_version
gets called, we send it only on good bricks and not
on healing brick. Due to this, xattr on healing brick
will always remain out of sync and when the background
heal check source and sink, it finds this brick to be
healed and start healing from scratch. That involve
ftruncate and writing all of the data again.

To solve this, send xattrop on all the good bricks as
well as healing bricks.

Problem-2: The above fix exposes the data corruption
during heal. If the write on a file is going on and
heal finishes, we find that the file gets corrupted.

RCA:
The real problem happens in ec_rebuild_data(). Here we receive the
'size' argument which contains the real file size at the time of
starting self-heal and it's assigned to heal->total_size.

After that, a sequence of calls to ec_sync_heal_block() are done. Each
call ends up calling ec_manager_heal_block(), which does the actual work
of healing a block.

First a lock on the inode is taken in state EC_STATE_INIT using
ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is
called. This function calls ec_set_inode_size() to store the real size
of the inode (it uses heal->total_size).

The next step is to read the block to be healed. This is done using a
regular ec_readv(). One of the things this call does is to trim the
returned size if the file is smaller than the requested size.

In our case, when we read the last block of a file whose size was = 512
mod 1024 at the time of starting self-heal, ec_readv() will return only
the first 512 bytes, not the whole 1024 bytes.

This isn't a problem since the following ec_writev() sent from the heal
code only attempts to write the amount of data read, so it shouldn't
modify the remaining 512 bytes.

However ec_writev() also checks the file size. If we are writing the
last block of the file (determined by the size stored on the inode that
we have set to heal->total_size), any data beyond the (imposed) end of
file will be cleared with 0's. This causes the 512 bytes after the
heal->total_size to be cleared. Since the file was written after heal
started, the these bytes contained data, so the block written to the
damaged brick will be incorrect.

Solution:
Align heal->total_size to a multiple of the stripe size.

Thanks "Xavier Hernandez" 
to find out the root cause and to fix the issue.

Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
BUG: 1428673
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/16985
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Xavier Hernandez

cluster/ec: Implement FALLOCATE FOP for EC

2017-05-23T07:13:06+00:00

FALLOCATE file operations is not implemented in the
existing EC code. This change set implements it
for EC.

BUG: 1448293
Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/15200
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

Fixes quota aux mount failure

2017-05-08T06:15:55+00:00

The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1433906
Signed-off-by: Sanoj Unnikrishnan 
Reviewed-on: https://review.gluster.org/16938
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Manikandan Selvaganesh 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Atin Mukherjee

cluster/ec: Introduce optimistic changelog in EC

2017-03-04T12:37:56+00:00

Problem: Fix to https://bugzilla.redhat.com/show_bug.cgi?id=1316873 has made
changes to set dirty flag before every update fop, data or metadata, and unset
it after successful operation. That makes some of the fops very slow such as
entry operations or metadata operations.

Solution: File data operations are the only operation which take some time and
setting dirty flag before a fop and unsetting it after serves the purpose as
probability of failure of a fop is high when the time duration is more. For all
the other operations, set dirty flag at the end of the fop, if any brick is
down and need heal.

Providing following option to choose between high performance or better heal
marking for metadata and entry fops.

Set/Unset dirty flag for every update fop at the start of the fop. If ON, this
option impacts performance of entry operations or metadata operations as it
will set dirty flag at the start and unset it at the end of ALL update fop. If
OFF and all the bricks are good, dirty flag will be set at the start only for
file fops For metadata and entry fops dirty flag will not be set at the start,
if all the bricks are good. This does not impact performance for metadata
operations and entry operation but has a very small window to miss marking
entry as dirty in case it is required to be healed.

Thanks to Xavi and Ashish for the design
Picked the .t file from Ashish' patch https://review.gluster.org/16298

BUG: 1408809
Change-Id: I3ce860063f0e2901e50754dcfc3e4ed22daf819f
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/16821
Smoke: Gluster Build System 
Reviewed-by: Xavier Hernandez 
Tested-by: Xavier Hernandez 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: Don't trigger data/metadata heal on Lookups

2017-02-27T03:06:55+00:00

Problem-1
If Lookup which doesn't take any locks observes version mismatch it can't be
trusted. If we launch a heal based on this information it will lead to
self-heals which will affect I/O performance in the cases where Lookup is
wrong. Considering self-heal-daemon and operations on the inode from client
which take locks can still trigger heal we can choose to not attempt a heal on
Lookup.

Problem-2:
Fixed spurious failure of
tests/bitrot/bug-1373520.t
For the issues above, what was happening was that ec_heal_inspect()
is preventing 'name' heal to happen

Problem-3:
tests/basic/ec/ec-background-heals.t
To be honest I don't know what the problem was, while fixing
the 2 problems above, I made some changes to ec_heal_inspect() and
ec_need_heal() after which when I tried to recreate the spurious
failure it just didn't happen even after a long time.

BUG: 1414287
Signed-off-by: Pranith Kumar K 
Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834
Reviewed-on: https://review.gluster.org/16468
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System 
Reviewed-by: Ashish Pandey

core: run many bricks within one glusterfsd process

2017-01-31T00:13:58+00:00

This patch adds support for multiple brick translator stacks running
in a single brick server process.  This reduces our per-brick memory usage by
approximately 3x, and our appetite for TCP ports even more.  It also creates
potential to avoid process/thread thrashing, and to improve QoS by scheduling
more carefully across the bricks, but realizing that potential will require
further work.

Multiplexing is controlled by the "cluster.brick-multiplex" global option.  By
default it's off, and bricks are started in separate processes as before.  If
multiplexing is enabled, then *compatible* bricks (mostly those with the same
transport options) will be started in the same process.

Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
BUG: 1385758
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/14763
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur