<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t, branch v8dev</title>
<subtitle>GlusterFS is a distributed file-system capable of scaling to several petabytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system.</subtitle>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/'/>
<entry>
<title>performance/md-cache: update cache only from fops issued after previous invalidation</title>
<updated>2018-08-02T07:01:15+00:00</updated>
<author>
<name>Raghavendra G</name>
<email>rgowdapp@redhat.com</email>
</author>
<published>2018-06-28T00:25:09+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=47cbe34db67be7bfbf925b3d76bb25d48b9ed680'/>
<id>47cbe34db67be7bfbf925b3d76bb25d48b9ed680</id>
<content type='text'>
Invalidations are triggered mainly by two codepaths - upcall and
write-behind unwinding a cached write with zeroed out stat. For the
case of upcall, following race can happen:
* stat s1 is fetched from brick
* invalidation is detected on brick
* invalidation is propagated to md-cache and cache is invalidated
* s1 updates md-cache with a stale state

For the case of write-behind, imagine following sequence of operations,
* A stat s1 was issued from application thread t1 when size of file
  was s1
* stat s1 completes on brick stack, but yet to reach md-cache
* A write w1 from application thread t2 extends file to size s2 is
  cached in write-behind and response is unwound with zeroed out stat
* md-cache while handling write-cbk, invalidates cache
* md-cache receives response for s1, updates cache with stale stat
  with size of s1 overwriting invalidation state

Fix is to remember when s1 was incident on md-cache and update cache
with results of s1 only if the it was incident after invalidation of
cache.

This patch identified some bugs in regression tests which is tracked
in https://bugzilla.redhat.com/show_bug.cgi?id=1608158. As a stop gap
measure I am marking following tests as bad
        basic/afr/split-brain-resolution.t
        bugs/bug-1368312.t
        bugs/replicate/bug-1238398-split-brain-resolution.t
        bugs/replicate/bug-1417522-block-split-brain-resolution.t
        bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t

Change-Id: Ia4bb9dd36494944e2d91e9e71a79b5a3974a8c77
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Updates: bz#1512691
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Invalidations are triggered mainly by two codepaths - upcall and
write-behind unwinding a cached write with zeroed out stat. For the
case of upcall, following race can happen:
* stat s1 is fetched from brick
* invalidation is detected on brick
* invalidation is propagated to md-cache and cache is invalidated
* s1 updates md-cache with a stale state

For the case of write-behind, imagine following sequence of operations,
* A stat s1 was issued from application thread t1 when size of file
  was s1
* stat s1 completes on brick stack, but yet to reach md-cache
* A write w1 from application thread t2 extends file to size s2 is
  cached in write-behind and response is unwound with zeroed out stat
* md-cache while handling write-cbk, invalidates cache
* md-cache receives response for s1, updates cache with stale stat
  with size of s1 overwriting invalidation state

Fix is to remember when s1 was incident on md-cache and update cache
with results of s1 only if the it was incident after invalidation of
cache.

This patch identified some bugs in regression tests which is tracked
in https://bugzilla.redhat.com/show_bug.cgi?id=1608158. As a stop gap
measure I am marking following tests as bad
        basic/afr/split-brain-resolution.t
        bugs/bug-1368312.t
        bugs/replicate/bug-1238398-split-brain-resolution.t
        bugs/replicate/bug-1417522-block-split-brain-resolution.t
        bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t

Change-Id: Ia4bb9dd36494944e2d91e9e71a79b5a3974a8c77
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Updates: bz#1512691
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: all children of AFR must be up to resolve s-brain</title>
<updated>2017-02-10T01:37:00+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2017-01-30T04:24:16+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=0e03336a9362e5717e561f76b0c543e5a197b31b'/>
<id>0e03336a9362e5717e561f76b0c543e5a197b31b</id>
<content type='text'>
Problem:
The various split-brain resolution policies (favorite-child-policy based,
CLI based and mount (get/setfattr) based) attempt to resolve split-brain
even when not all bricks of replica are up. This can be a problem when
say in a replica 3, the only good copy is down and the other 2 bricks
are up and blame each other (i.e. split-brain). We end up healing the
file in such a  case and allow I/O on it.

Fix:
A decision on whether the file is in split-brain or not must be taken
only if we are able to examine the afr xattrs of *all* bricks of a given
replica.

Change-Id: Icddb1268b380005799990f5379ef957d84639ef9
BUG: 1417522
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16476
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
The various split-brain resolution policies (favorite-child-policy based,
CLI based and mount (get/setfattr) based) attempt to resolve split-brain
even when not all bricks of replica are up. This can be a problem when
say in a replica 3, the only good copy is down and the other 2 bricks
are up and blame each other (i.e. split-brain). We end up healing the
file in such a  case and allow I/O on it.

Fix:
A decision on whether the file is in split-brain or not must be taken
only if we are able to examine the afr xattrs of *all* bricks of a given
replica.

Change-Id: Icddb1268b380005799990f5379ef957d84639ef9
BUG: 1417522
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16476
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
