| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* features/index: Optimize link-count fetching code path
Problem:
AFR requests 'link-count' in lookup to check if there are any pending
heals. Based on this information, afr will set dirent->inode to NULL in
readdirp when heals are ongoing to prevent serving bad data. When heals
are completed, link-count xattr is leading to doing an opendir of
xattrop directory and then reading the contents to figure out that there
is no healing needed for every lookup. This was not detected until this
github issue because ZFS in some cases can lead to very slow readdir()
calls. Since Glusterfs does lot of lookups, this was slowing down
all operations increasing load on the system.
Code problem:
index xlator on any xattrop operation adds index to the relevant dirs
and after the xattrop operation is done, will delete/keep the index in
that directory based on the value fetched in xattrop from posix. AFR
sends all-zero xattrop for changelog xattrs. This is leading to
priv->pending_count manipulation which sets the count back to -1. Next
Lookup operation triggers opendir/readdir to find the actual link-count in
lookup because in memory priv->pending_count is -ve.
Fix:
1) Don't add to index on all-zero xattrop for a key.
2) Set pending-count to -1 when the first gfid is added into xattrop
directory, so that the next lookup can compute the link-count.
fixes: #1764
Change-Id: I8a02c7e811a72c46d78ddb2d9d4fdc2222a444e9
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* addressed comments
Change-Id: Ide42bb1c1237b525d168bf1a9b82eb1bdc3bc283
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* tests: Handle base index absence
Change-Id: I3cf11a8644ccf23e01537228766f864b63c49556
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* Addressed LOCK based comments, .t comments
Change-Id: I5f53e40820cade3a44259c1ac1a7f3c5f2f0f310
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
|
| |
commit 8e7bfd6a58b444b26cb50fb98870e77302f3b9eb changed the syntax for
arbiter volume creation to 'replica 2 arbiter 1', while still allowing
the old syntax of 'replica 3 arbiter 1'. But while doing so, it also
removed a conditional check, thereby allowing replica count > 3. This
patch fixes it.
Fixes: #2192
Change-Id: Ie109325adb6d78e287e658fd5f59c26ad002e2d3
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
TODO:
Remove 'slave-timeout' and 'slave-gluster-command-dir'.
These variables are defined in geo-replication/gsyncd.conf.in.
So I will remove them when I change that folder.
Change-Id: Ib9167ca586d83e01f8ec755cdf58b3438184c9dd
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit f5e1eb87d4af44be3b317b7f99ab88f89c2f0b1a meant to enable the
volume option only for replica volumes but inadvertently enabled
it for all volume types. Fixing it now.
Also found a bug in glusterd where disabling the option on plain
distribute was succeeding even though setting it in the fist place
fails. Fixed that too.
Fixes: #1483
Change-Id: Icb6c169a8eec44cc4fb4dd636405d3b3485e91b4
Reported-by: Sheetal Pamecha <spamecha@redhat.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. The option has been enabled and tested for quite some time now in RHHI-V
downstream and I think it is safe to make it 'on' by default. Since it
is not possible to simply change it from 'off' to 'on' without breaking
rolling upgrades, old clients etc., I have made it default only for new volumes
starting from op-verison GD_OP_VERSION_9_0.
Note: If you do a volume reset, the option will be turned back off.
This is okay as the dir's gfid will be captured in 'xattrop' folder and heals
will proceed. There might be stale entries inside entry-changes' folder,
which will be removed when we enable the option again.
2. I encountered a cust. issue where entry heal was pending on a dir. with
236436 files in it and the glustershd.log output was just stuck at
"performing entry selfheal", so I have added logs to give us
more info in DEBUG level about whether entry heal and data heal are
progressing (metadata heal doesn't take much time). That way, we have a
quick visual indication to say things are not 'stuck' if we briefly
enable debug logs, instead of taking statedumps or checking profile info
etc.
Fixes: #1483
Change-Id: I4f116f8c92f8cd33f209b758ff14f3c7e1981422
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem1:
When a directory is renamed while a brick
is down entry-heal always did an rm -rf on that directory on
the sink on old location and did mkdir and created the directory
hierarchy again in the new location. This is inefficient.
Problem2:
Renamedir heal order may lead to a scenario where directory in
the new location could be created before deleting it from old
location leading to 2 directories with same gfid in posix.
Fix:
As part of heal, if oldlocation is healed first and is not present in
source-brick always rename it into a hidden directory inside the
sink-brick so that when heal is triggered in new-location shd can
rename it from this hidden directory to the new-location.
If new-location heal is triggered first and it detects that the
directory already exists in the brick, then it should skip healing the
directory until it appears in the hidden directory.
Credits: Ravi for rename-data-loss.t script
Fixes: #1211
Change-Id: I0cba2006f35cd03d314d18211ce0bd530e254843
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* also add some time gap in other tests to see if we get things properly
* create a directory 'tests/000/', which can host any tests, which are flaky.
* move all the tests mentioned in the issue to above directory.
* as the above dir gets tested first, all flaky tests would be reported quickly.
* change `run-tests.sh` to continue tests even if flaky tests fail.
Reference: gluster/project-infrastructure#72
Updates: #1000
Change-Id: Ifdafa38d083ebd80f7ae3cbbc9aa3b68b6d21d0e
Signed-off-by: Amar Tumballi <amar@kadalu.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If we set favourite child policy, then automatic split-brain resolution
should work in all cases. This was failing when quorum count was set to
a non-zero value. The initial lookup before the read txn was failing
with ENOTCONN. Since we don't have a readable subvol, we were failing it.
We were only looking to the split brain resolution choice set through the
cli command.
Fix:
We will now consider the favourite child policy if split-brain choice
has not been set via cli command.
Change-Id: Id2016c3a90d0763ac6f1a0131571053f595576f0
Fixes: #1404
Signed-off-by: Mohammed Rafi KC <rafi.kavungal@iternity.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
AFR doesn't delay post-op for fsync fop. For fsync heavy workloads
this leads to un-necessary fxattrop/finodelk for every fsync leading
to bad performance.
Fix:
Have delayed post-op for fsync. Add special flag in xdata to indicate
that afr shouldn't delay post-op in cases where either the
process will terminate or graph-switch would happen. Otherwise it leads
to un-necessary heals when the graph-switch/process-termination
happens before delayed-post-op completes.
Fixes: #1253
Change-Id: I531940d13269a111c49e0510d49514dc169f4577
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In add-brick that increases replica count
SHD was restarted after pending xattrs are set on the new bricks and
adding bricks. But before restarting SHD there is a possibility that
old SHD would do a scan on root-directory see no heal is needed and
delete index for root-dir leading to no heals until lookup is executed
on the mount
Fix:
Stop shd, perform pending-xattr setting/adding new bricks and
then restart shd
Fixes: #1240
Change-Id: I94fd7c6c909211b597185dfe097a559db6c0d00f
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.
Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.
Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().
Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.
The .ts which relied on inode refresh in lookup path to trigger heals are
now changed to do read txn so that inode refresh and the heal happens.
Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec
Fixes: #1179
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Erik Jacobson <erik.jacobson at hpe.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Current implementation assumes that ping-event will come after connect event
but that may not be the case in the cases where after socket connection fds
need to be re-opened which would consume more time. So handle any order of the
ping/child-up events.
fixes: bz#1800583
Change-Id: I6bcdc0caa503bdc039ef2b4739fbf4afae121f05
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
| |
fixes: bz#1760189
Change-Id: Iffbf8d6f4c50b8e2de8364658697bdbe96549f5d
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.
Note that ctime still doesn't support consistent time across
distribute sub-volume.
This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.
Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
fixes: bz#1734026
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
On a EC volume, during upgrade from the older version where
ctime feature is not enabled(or not present) to the newer
version where the ctime feature is available (enabled default),
the self heal hangs and doesn't complete.
Cause:
The ctime feature has both client side code (utime) and
server side code (posix). The feature is driven from client.
Only if the client side sets the time in the frame, should
the server side sets the time attributes in xattr. But posix
setattr/fseattr was not doing that. When one of the server
nodes is updated, since ctime is enabled by default, it
starts setting xattr on setattr/fseattr on the updated node/brick.
On a EC volume the first two updated nodes(bricks) are not a
problem because there are 4 other bricks with consistent data.
However once the third brick is updated, the new attribute(mdata xattr)
will cause an inconsistency on metadata on 3 bricks, which
prevents the file to be repaired.
Fix:
Don't create mdata xattr with utimes/utimensat system call.
Only update if already present.
Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c
fixes: bz#1720201
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The test case is failing to heal the volume within $HEAL_TIMEOUT @195.
This is happening because as part of split-brain resolution the file
gets expunged from the sink and the new entry mark for that file will
be done on the source bricks as part of impunging. Since the source
bricks shd-threads failed to get the heal-domain lock, they will wait
for the heal-timeout of 10 minutes, which is greater than $HEAL_TIMEOUT.
Fix:
Set the cluster.heal-timeout to 5 seconds to trigger the heal so that
one of the source brick heals the file within the $HEAL_TIMEOUT.
Change-Id: Ie73c578cc5361c0d617a48ccc86026734d20ba8c
fixes: bz#1718998
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
afr_child_up_status_meta works only when LOOKUP on $M0 is successful.
There are cases where quorum is not met and LOOKUP fails on $M0 which
leads to failures similar to:
grep: /mnt/glusterfs/0/.meta/graphs/active/patchy-replicate-0/private: Transport endpoint is not connected
This was happening once in a while based on attribute-timeout and
md-cache not serving the lookup.
Fix:
Find child-up status based on statedump instead. Also changed mount
options to include --entry-timeout=0 and --attribute-timeout=0
updates bz#1193929
Change-Id: Ic0de72c3006d7399a5feb3e4d10d4748949b2ab3
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Iceefe22af754096c599dc570d4894d14fce4deae
Updates: bz#1193929
Signed-off-by: Xavier Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
When split-brain choice is changed from one brick to another
brick, inode-invalidate is not called so readv call is served
from cache leading to failures in split-brain-resolution.t.
Fixed it by calling inode_invaldate() when this happens.
updates bz#1193929
Change-Id: I2624614eec38c0303f3e1dc55dfae3d4b864218b
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a fop to create an entry fails on one of the data brick,
we mark the pending changelog on the entry on brick for which
it was successful. This is done as part of post op phase to
make sure that entry gets healed even if it gets renamed to
some other path where its parent was not marked as bad.
As it happens as part of post op, we should consider thin-arbiter
to check if the brick, which was successful, is the good brick or not.
This will avoide split brain and other issues.
Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
updates: bz#1662264
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
| |
Fixes: bz#1665358
Change-Id: Idbf88ec3ac683733b32c313377eeb72f2819bf0d
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
| |
With this changeset, default value for the AFR client side
heal volume option is set to "off"
fixes: bz#1663102
Change-Id: Ie4016932339c4896487e3e7cb5caca68739b7ba2
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Iec47856ce2819e7d7d38a60279602e53ba45858d
updates: bz#1624332
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Test : Check success/failure of write fop while
different bricks/ta process are down.
Change-Id: I3c376935df93ebf1f794c964bd19bc1280d91c59
updates: bz#1624332
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the proposal to remove few features as they are not
actively maintained [1], removing tier translator from the
build. Also make sure there are no regression tests involving
tiering feature are present.
[1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html
Change-Id: I2c177f711f9b54b7b24e1a13525ff3132bd9a9c5
updates: bz#1642807
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.
- All further write txn failures are handled based on this in-memory
value without querying the TA.
- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.
- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.
- Any write txns got after the release request are maintained in a ta_waitq.
- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.
- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.
Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1579788
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change when SHD starts the index crawl it requests
all the clients to release the AFR_TA_DOM_NOTIFY lock so that
clients will know the in memory state is no more valid and
any new operations needs to query the thin-arbiter if required.
When SHD completes healing all the files without any failure, it
will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on
TA to see whether there are any new failures happened by that time.
If there are new failures marked on TA, SHD will start the crawl
immediately to heal those failures as well. If there are no new
failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets
the xattrs on TA, so that both the data bricks will be considered
as good there after.
Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
fixes: bz#1579788
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If both data bricks are up, read subvol will be based on read_subvols.
If only one data brick is up:
- First qeury the data-brick that is up. If it blames the other brick,
allow the reads.
- If if doesn't, query the TA to obtain the source of truth.
TODO: See if in-memory state can be maintained for read txns (BZ 1624358).
updates: bz#1579788
Change-Id: I61eec35592af3a1aaf9f90846d9a358b2e4b2fcc
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.
Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc
fixes bz#1622821
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When metadata-self-heal is triggered on the mount, it blocks
lookup until metadata-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs metadata heal and all of them trigger heals waiting
for other clients to complete heal.
Fix:
Only when the heal is needed but the pending xattrs are not set,
trigger metadata heal that could block lookup. This is the only
case where different clients may give different metadata to the
clients without heals, which should be avoided.
Updates bz#1622821
Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modifications
PROBLEM:
Stats of dentries that are readdirp'd ahead can become stale due to
fops like writes, truncate etc that modify the file pointed by
dentries. When a readdir is finally wound at offset corresponding to
these entries, the iatts that are returned to the application come
from readdir-ahead's cache, which are stale by now. This problem gets
further aggravated when caching translators/modules cache and continue
to serve this stale information.
FIX:
* Store the iatt in context of the inode pointed by dentry.
* Whenever the inode pointed by dentry undergoes modification, in cbk
of modification fop, update the iatt stored in inode-ctx to reflect
the modification.
* When serving a readdirp response from application, update iatts of
dentries with the iatts stored in the context of inodes pointed by
these dentries.
* Some fops don't have valid iatts in their responses. For eg., write
response whose data is still cached in write-behind will have zeroed
out stat. In this case keep only ia_type and ia_gfid and reset rest
of the iatt members to zero.
- fuse-bridge in this case just sends "entry" information back to
kernel and attr is not sent.
- gfapi sets entry->inode to NULL and zeroes out the entire stat
* There is one tiny race between the entry creation and a readdirp on
its parent dir, which could cause the inode-ctx setting and inode
ctx reading to happen on two different inode objects. To prevent
this, when entry->inode doesn't eqaul to linked_inode,
- fuse-bridge is made to send only "entry" information without
attributes
- gfapi sets entry->inode to NULL and zeroes out the entire stat.
Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
| |
fixes bz#1615789
Change-Id: I1f42e78fec5ddaf2a425dc4b82c9a20472aa146d
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test was retried once on build
https://build.gluster.org/job/regression-on-demand-multiplex/174/
(logs for the first try is not available with this build)
Test case was failing in line #47 where it was was checking for the
heal count to be 0. Line #51 had passed that means file got the gfid
split brain resolved, and both the bricks had same gfids.
At line #54 it again failed which checks for the md5sum on both the
bricks. At this point the md5sum of the brick where the file got
impunged had the md5sum same as the newly created empty file. This
means the data heal has not happened for the file.
At line #64 enabling granular-entry-heal faild, but without the logs
it is not possible to debug this issue.
Change-Id: I56d854dbb9e188cafedfd24a9d463603ae79bd06
fixes: bz#1615331
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
| |
Please see BZ for details.
Change-Id: Id9273432874bc6a452ac96b2b8c7a61ea6c5b98d
Fixes: bz#1615239
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
| |
Please see bug description for details.
Change-Id: Ieb6bce6d1d5c4c31f1878dd1a1c3d007d8ff81d5
fixes: bz#1614654
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shd keeps doing heals in a loop until it heals at least one entry in the
previous run. A heal is termed successful only if it heals both metadata and
entry/data heal i.e. the entry needs to be completely healed by just that healer.
In tests/basic/afr/granular-esh/replace-brick.t test, brick-0 is old and brick-1
is new. After replace-brick only root-gfid will be present in brick-0's index
1) shd-thread corresponding to brick-0 does metadata heal, this creates
root-gfid in brick-0's 'dirty' index.
2) Both healer threads corresponding to brick-0 and brick-1 now try to heal
root-gfid and brick-1 gets the heal-domain lock. brick-0's shd-thread will
experience a failure and it goes back to waiting for 10 minutes
(cluster.heal-timeout).
3) When brick-1's healer-thread completes healing root-gfid it creates 5 files
which create indices in brick-0, so until brick-0 doesn't trigger one more
heal, heal won't happen. $HEAL_TIMEOUT is set at 120 seconds, which is lesser
than cluster.heal-timeout, so decreasing this to 5 seconds so that the next
heal is triggered which will do the heals.
fixes bz#1613807
Change-Id: I881133fc28880d8615fbc4558a0dfa0dc63d7798
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modified while in cache"
This reverts commit 7131de81f72dda0ef685ed60d0887c6e14289b8c.
With the latest master, I created a single brick volume and some files
inside it.
[root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
again"; ls -l /mnt/fuse1
umount: /mnt/fuse1: not mounted
total 0
----------. 0 root root 0 Jan 1 1970 file-1
----------. 0 root root 0 Jan 1 1970 file-2
----------. 0 root root 0 Jan 1 1970 file-3
----------. 0 root root 0 Jan 1 1970 file-4
----------. 0 root root 0 Jan 1 1970 file-5
d---------. 0 root root 0 Jan 1 1970 subdir
Trying again
total 3
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-1
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-2
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-3
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-4
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-5
d---------. 0 root root 0 Jan 1 1970 subdir
[root@rhgs313-6 ~]#
Conversation can be followed on gluster-devel on thread with subj:
tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected
pointed this patch as culprit.
Change-Id: I1eb46f6c196f44fde8ce991840a0e724e6f50862
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Updates: bz#1390050
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
invalidation
Invalidations are triggered mainly by two codepaths - upcall and
write-behind unwinding a cached write with zeroed out stat. For the
case of upcall, following race can happen:
* stat s1 is fetched from brick
* invalidation is detected on brick
* invalidation is propagated to md-cache and cache is invalidated
* s1 updates md-cache with a stale state
For the case of write-behind, imagine following sequence of operations,
* A stat s1 was issued from application thread t1 when size of file
was s1
* stat s1 completes on brick stack, but yet to reach md-cache
* A write w1 from application thread t2 extends file to size s2 is
cached in write-behind and response is unwound with zeroed out stat
* md-cache while handling write-cbk, invalidates cache
* md-cache receives response for s1, updates cache with stale stat
with size of s1 overwriting invalidation state
Fix is to remember when s1 was incident on md-cache and update cache
with results of s1 only if the it was incident after invalidation of
cache.
This patch identified some bugs in regression tests which is tracked
in https://bugzilla.redhat.com/show_bug.cgi?id=1608158. As a stop gap
measure I am marking following tests as bad
basic/afr/split-brain-resolution.t
bugs/bug-1368312.t
bugs/replicate/bug-1238398-split-brain-resolution.t
bugs/replicate/bug-1417522-block-split-brain-resolution.t
bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t
Change-Id: Ia4bb9dd36494944e2d91e9e71a79b5a3974a8c77
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Updates: bz#1512691
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libargp or argp-standalone is available on all commonly used
distributions. There is no need to bundle an unmaintained version of
argp-standalone in this repository anymore.
FreeBSD places the argp.h file in /usr/local/include when
argp-standalone is installed. This path is not added to CPPFLAGS by
default, so thats done in configure.ac as well.
Change-Id: I384a53ab0a008ec9d48fd83afeaf8fbc197e91ee
Fixes: bz#1609337
Signed-off-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
while in cache
PROBLEM:
Entries that are readdirp'd ahead can undergo modification in terms
of writes, truncates which could modify their iatts. When a readdir
is finally wound at offset corresponding to these entries, the iatts
that are returned to the application come from readdir-ahead's cache,
which are stale by now. This problem gets further aggravated when caching
translators/modules cache and continue to serve this stale information.
FIX:
Whenever a dentry undergoes modification, in the cbk of the modification fop,
a "dirty" flag (default 0) is set in its inode ctx. When it's time for
readdir-ahead to serve these entries, it will read the inode ctx and check
if the entry is "dirty", and if it is, set the entry's attrs to all zeroes,
as an indicator to fuse, md-cache etc not to cache these attributes.
Also there is one tiny race between the entry creation and a readdirp on its
parent dir, which could cause the inode-ctx setting and inode ctx reading to
happen on two different inode objects. To prevent this, fuse-bridge is made to
drop entries for which dentry->inode is not the same as linked inode,
in readdirp cbk.
Change-Id: If7396507632b5268442ca580473d5155fee9cbef
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
| |
fixes bz#1597156
Change-Id: I323eb9190e40b12df216698dcdba74a6d336beeb
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
| |
fixes: bz#1596524
updates: gluster/glusterd2#515
Change-Id: I8a46fa2fd1fd2b0e9fbcecd3bb18d348aed9c6a9
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
BZ 1337791 marked this .t as bad based on the tar version being a likely
suspect. Undoing this to check as it passes on the latest jenkins slaves.
Change-Id: Ia581064a9c620351d3fe7aeef95d2644337952e1
fixes: bz#1595492
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Updates: #363
This new value (3) will try to wind read requests to the child of AFR
having the least amount of pending requests in its queue.
Change-Id: If6bda2aac9bf7aec3fc39622f78659313c4b6508
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
| |
BUG: 1557932
Change-Id: I3783e41b3812267bc10c0d05d062a31396ce135b
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
We are not seeing much improvement with this change. So removing the
feature so that it doesn't need to be maintained anymore.
Fixes: #414
Change-Id: Ic7969b151544daf2547bd262a9fa03f575626411
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses 'timeout' command with 300 seconds default. Right now,
there is just 1 test which takes more than that in a properly
setup machine.
Ideally best case is set the default to something like 30 seconds,
and if a test is supposed to take more than that, owner should add
a timeout line to test knowingly. That way, it makes test writers
think about a time limit too.
Change-Id: I747005ce1f208aeb2ecbf899e8feea487ecd21a0
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not
met. Though the FOP is still unwound with failure, the xattrs remain on
the disk. Due to these partial post-ops and partial heals (healing only when
2 bricks are up), we can end up in split-brain purely from the afr
xattrs point of view i.e each brick is blamed by atleast one of the
others. These scenarios are hit when there is frequent
connect/disconnect of the client/shd to the bricks while I/O or heal
are in progress.
Fix:
Instead of undoing the post-op, pick a source based on the xattr values.
If 2 bricks blame one, the blamed one must be treated as sink.
If there is no majority, all are sources. Once we pick a source,
self-heal will then do the heal instead of erroring out due to
split-brain.
Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae
BUG: 1539358
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This test was failing due to an infra issue. The infra issue is now
fixed.
BUG: 1517961
Change-Id: I09dfab9c0a3ebe73c738222e6269d9e35c85eddb
Signed-off-by: Nigel Babu <nigelb@redhat.com>
|
|
|
|
|
|
| |
Change-Id: If6c36dc6c395730dfb17b5b4df6f24629d904926
BUG: 1517961
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|