glusterfs.git - GlusterFS is a distributed file-system capable of scaling to several petabytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system.

	Commit message (Collapse)	Author	Age	Files	Lines
*	afr: fix directory entry count	Xavi Hernandez	2021-04-09	2	-0/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AFR may hide some existing entries from a directory when reading it because they are generated internally for private management. However the returned number of entries from readdir() function is not updated accordingly. So it may return a number higher than the real entries present in the gf_dirent list. This may cause unexpected behavior of clients, including gfapi which incorrectly assumes that there was an entry when the list was actually empty. This patch also makes the check in gfapi more robust to avoid similar issues that could appear in the future. Fixes: #2232 Change-Id: I81ba3699248a53ebb0ee4e6e6231a4301436f763 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	cli: syntax check for arbiter volume creation (#2207) (#2222)	Ravishankar N	2021-03-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	commit 8e7bfd6a58b444b26cb50fb98870e77302f3b9eb changed the syntax for arbiter volume creation to 'replica 2 arbiter 1', while still allowing the old syntax of 'replica 3 arbiter 1'. But while doing so, it also removed a conditional check, thereby allowing replica count > 3. This patch fixes it. Updates: #2192 Change-Id: Ie109325adb6d78e287e658fd5f59c26ad002e2d3 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	cluster/dht: Allow fix-layout only on directories (#2109) (#2114)	Pranith Kumar Karampuri	2021-02-04	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: fix-layout operation assumes that the directory passed is directory i.e. layout->cnt == conf->subvolume_cnt. This will lead to a crash when fix-layout is attempted on a file. Fix: Disallow fix-layout on files fixes: #2107 Change-Id: I2116b8773059f67e3260e9207e20eab3de711417 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
*	DHT/Rebalance - Ensure Rebalance reports status only once upon stopping (#1783)	Barak Sason Rofman	2020-11-24	1	-0/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DHT/Rebalance - Ensure Rebalance reports status only once upon stopping Upon issuing rebalance stop command, the status of rebalance is being logged twice to the log file, which can sometime result in an inconsistent reports (one report states status stopped, while the other may report something else). This fix ensures rebalance reports it's status only once and that the correct status is being reported. fixes: #1782 Change-Id: Id3206edfad33b3db60e9df8e95a519928dc7cb37 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	posix: fix io_uring crash in reconfigure (#1804)	Ravishankar N	2020-11-17	1	-0/+20
\| \| \| \| \| \| \| \| \|	Call posix_io_uring_fini only if it was inited to begin with. Fixes: #1794 Reported-by: Mohit Agrawal <moagrawa@redhat.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: I0e840b6b1d1f26b104b30c8c4b88c14ce4aaac0d
*	tests: Fix issues in CentOS 8 (#1756)	Xavi Hernandez	2020-11-06	2	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* tests: Fix issues in CentOS 8 Due to some configuration changes in CentOS 8/RHEL 8, ssl-ciphers.t and bug-1053579.t were failing. The first one was failing because TLS v1.0 is disabled by default. The test hash been updated to check that at least one of TLS v1.0, v1.1 or v1.2 succeeds. For the second case, the issue is that the test assumed that the latest added group to a user should always be listed the last, but this is not always true because nsswitch.conf now uses 'sss' before 'files', which means that data comes from a db that could not be sorted. Updates: #1009 Change-Id: I4ca01a099854ec25926c3d76b3a98072175bab06 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> * tests: Fix TLS version detection The old test didn't correctly determine which version of TLS should be allowed by openssl. Change-Id: Ic081c329d5ed1842fa9f5fd23742ae007738aec0 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	glusterd: fix bug in enabling granular-entry-heal (#1752)	Ravishankar N	2020-11-05	1	-5/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit f5e1eb87d4af44be3b317b7f99ab88f89c2f0b1a meant to enable the volume option only for replica volumes but inadvertently enabled it for all volume types. Fixing it now. Also found a bug in glusterd where disabling the option on plain distribute was succeeding even though setting it in the fist place fails. Fixed that too. Fixes: #1483 Change-Id: Icb6c169a8eec44cc4fb4dd636405d3b3485e91b4 Reported-by: Sheetal Pamecha <spamecha@redhat.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	xlators: misc conscious language changes (#1715)	Ravishankar N	2020-11-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	core:change xlator_t->ctx->master to xlator_t->ctx->primary afr: just changed comments. meta: change .meta/master to .meta/primary. Might break scripts. changelog: variable/function name changes only. These are unrelated to geo-rep. Fixes: #1713 Change-Id: I58eb5fcd75d65fc8269633acc41313503dccf5ff Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	extras/rebalance: Script to perform directory rebalance (#1676)	Pranith Kumar Karampuri	2020-10-30	2	-0/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* extras/rebalance: Script to perform directory rebalance How should the script be executed? $ /path/to/directory-rebalance.py <dir-to-rebalance> will do rebalance just for that directory. The script assumes that fix-layout operation is completed for all the directories present inside the <dir-to-rebalance> How does it work? For the given directory path that needs to be rebalanced, full crawl is performed and the files that need to be healed and the size of each file is first written to the index. Once building the index is completed, the index is read and for each file the script executes equivalent of setfattr -n trusted.distribute.migrate-data -v 1 <path/to/file> Why does the script take two passes? Printing a sensible ETA has been a primary goal of the script. Without knowing the approximate size that will be rebalanced, it is difficult to find ETA. Hence the script does one pass to find files, sizes which it writes to the index file and then the next pass is done on the index file. It takes a minute or two for the ETA to converge but in our testing it has been giving a reasonable ETA What versions does the script support? For the script to work correctly, dht should handle "trusted.distribute.migrate-data" setxattr correctly. fixes: #1654 Change-Id: Ie5070127bd45f1a1b9cd18ed029e364420c971c1 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
*	cluster/dht: Perform migrate-file with lk-owner (#1581)	Pranith Kumar Karampuri	2020-10-29	4	-0/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* cluster/dht: Perform migrate-file with lk-owner 1) Added GF_ASSERT() calls in client-xlator to find these issues sooner. 2) Fuse is setting zero-lkowner with len as 8 when the fop doesn't have any lk-owner. Changed this to have len as 0 just as we have in fops triggered from xlators lower to fuse. * syncop: Avoid frame allocation if we can * cluster/dht: Set lkowner in daemon rebalance code path * cluster/afr: Set lkowner for ta-selfheal * cluster/ec: Destroy frame after heal is done * Don't assert for lk-owner in lk call * set lkowner for mandatory lock heal tests fixes: #1529 Change-Id: Ia803db6b00869316893abb1cf435b898eec31228 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
*	tests: exclude more contrib/fuse-lib objects (#1694)	Dmitry Antipov	2020-10-27	1	-0/+3
\| \| \| \| \| \| \|	Exclude more contrib/fuse-lib objects to avoid silly tests/basic/0symbol-check.t breakage. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1692
*	glusterd/afr: enable granular-entry-heal by default (#1621)	Ravishankar N	2020-10-22	12	-18/+528
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. The option has been enabled and tested for quite some time now in RHHI-V downstream and I think it is safe to make it 'on' by default. Since it is not possible to simply change it from 'off' to 'on' without breaking rolling upgrades, old clients etc., I have made it default only for new volumes starting from op-verison GD_OP_VERSION_9_0. Note: If you do a volume reset, the option will be turned back off. This is okay as the dir's gfid will be captured in 'xattrop' folder and heals will proceed. There might be stale entries inside entry-changes' folder, which will be removed when we enable the option again. 2. I encountered a cust. issue where entry heal was pending on a dir. with 236436 files in it and the glustershd.log output was just stuck at "performing entry selfheal", so I have added logs to give us more info in DEBUG level about whether entry heal and data heal are progressing (metadata heal doesn't take much time). That way, we have a quick visual indication to say things are not 'stuck' if we briefly enable debug logs, instead of taking statedumps or checking profile info etc. Fixes: #1483 Change-Id: I4f116f8c92f8cd33f209b758ff14f3c7e1981422 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	test: The test case tests/bugs/bug-1064147.t is failing (#1662)	mohit84	2020-10-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	The test case tests/bugs/bug-1064147.t is failing at the time of comparing root permission with permission changed while one of the brick was down.The permission was not matching because layout was not existing on root at the time of healing a permission, so correct permission was not healed on newly started brick Fixes: #1661 Change-Id: If63ea47576dd14f4b91681dd390e2f84f8b6ac18 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	tests/00-geo-rep: 00-georep-verify-non-root-setup.t fails on devel branch ↵	Shwetha Acharya	2020-10-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	(#1639) Increasing the timeout value to overcome the encountered delay in execution of test Change-Id: Id40d92366738439634a6b06d447a43a2c6cdbf44 Updates: #1594 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	io-stats: Configure ios_sample_buf_size based on sample_interval value (#1574)	mohit84	2020-10-15	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	io-stats xlator declares a ios_sample_buf_size 64k object(10M) per xlator but in case of sample_interval is 0 this big buffer is not required so declare the default value only while sample_interval is not 0.The new change would be helpful to reduce RSS size for a brick and shd process while the number of volumes are huge. Change-Id: I3e82cca92e40549355edfac32580169f3ce51af8 Fixes: #1542 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	tests/00-geo-rep: 00-georep-verify-non-root-setup.t fails on devel branch ↵	Shwetha Acharya	2020-10-13	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	(#1617) ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 nroot@127.0.0.1 /build/install/sbin/gluster --xml --remote-host=localhost volume info slave failes with error 255. Adding ssh key clean up code at the beginning of the test, inorder to clean any stale entries Updates: #1594 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	mount/fuse: Fix graph-switch when reader-thread-count is set	Pranith Kumar K	2020-10-05	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The current graph-switch code sets priv->handle_graph_switch to false even when graph-switch is in progress which leads to crashes in some cases Fix: priv->handle_graph_switch should be set to false only when graph-switch completes. fixes: #1539 Change-Id: I5b04f7220a0a6e65c5f5afa3e28d1afe9efcdc31 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
*	cluster/afr: Heal directory rename without rmdir/mkdir	Pranith Kumar K	2020-04-13	8	-46/+750
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem1: When a directory is renamed while a brick is down entry-heal always did an rm -rf on that directory on the sink on old location and did mkdir and created the directory hierarchy again in the new location. This is inefficient. Problem2: Renamedir heal order may lead to a scenario where directory in the new location could be created before deleting it from old location leading to 2 directories with same gfid in posix. Fix: As part of heal, if oldlocation is healed first and is not present in source-brick always rename it into a hidden directory inside the sink-brick so that when heal is triggered in new-location shd can rename it from this hidden directory to the new-location. If new-location heal is triggered first and it detects that the directory already exists in the brick, then it should skip healing the directory until it appears in the hidden directory. Credits: Ravi for rename-data-loss.t script Fixes: #1211 Change-Id: I0cba2006f35cd03d314d18211ce0bd530e254843 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	gfapi: Move the SECURE_ACCESS_FILE check out of glfs_mgmt_init	Môshe van der Sterre	2020-09-28	3	-0/+218
\| \| \| \| \| \| \| \|	glfs_mgmt_init is only called for glfs_set_volfile_server, but secure_mgmt is also required to use glfs_set_volfile with SSL. fixes: #829 Change-Id: Ibc769fe634d805e085232f85ce6e1c48bf4acc66
*	glusterd: Fix Add-brick with increasing replica count failure	Sheetal Pamecha	2020-09-23	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: add-brick operation fails with multiple bricks on same server error when replica count is increased. This was happening because of extra runs in a loop to compare hostnames and if bricks supplied were less than "replica" count, the bricks will get compared to itself resulting in above error. Fixes: #1508 Change-Id: I8668e964340b7bf59728bb838525d2db062197ed Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	fuse: fetch arbitrary number of groups from /proc/[pid]/status	Csaba Henk	2020-07-17	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Glusterfs so far constrained itself with an arbitrary limit (32) for the number of groups read from /proc/[pid]/status (this was the number of groups shown there prior to Linux commit v3.7-9553-g8d238027b87e (v3.8-rc1~74^2~59); since this commit, all groups are shown). With this change we'll read groups up to the number Glusterfs supports in general (64k). Note: the actual number of groups that are made use of in a regular Glusterfs setup shall still be capped at ~93 due to limitations of the RPC transport. To be able to handle more groups than that, brick side gid resolution (server.manage-gids option) can be used along with NIS, LDAP or other such networked directory service (see https://github.com/gluster/glusterdocs/blob/5ba15a2/docs/Administrator%20Guide/Handling-of-users-with-many-groups.md#limit-in-the-glusterfs-protocol ). Also adding some diagnostic messages to frame_fill_groups(). Change-Id: I271f3dc3e6d3c44d6d989c7a2073ea5f16c26ee0 fixes: #1075 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	metadisp: new translator for data and metadata separation	Sheena Artrip	2020-01-29	6	-0/+641
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: feature/metadisp is an xlator for performing "metadata dispersal" across multiple children. it does this by flattening the complex POSIX paths into /$GFID style paths, then forwarding the metadata operations to its first child and forwarding the data operations to its second child. The purpose of this xlator is to allow separation of data and metadata, in cases where metadata might be stored in another format (embedded kv?), on another disk (ssd), on another host (dht2). Change-Id: I392c8bd0c867a3237d144aea327323f700a2728d Updates: #816 Signed-Off-By: Sheena Artrip <sheenobu@fb.com> Tested-By: Amar Tumballi <amar@kadalu.io>
*	tests: provide an option to mark tests as 'flaky'	Amar Tumballi	2020-08-18	14	-36/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	* also add some time gap in other tests to see if we get things properly * create a directory 'tests/000/', which can host any tests, which are flaky. * move all the tests mentioned in the issue to above directory. * as the above dir gets tested first, all flaky tests would be reported quickly. * change `run-tests.sh` to continue tests even if flaky tests fail. Reference: gluster/project-infrastructure#72 Updates: #1000 Change-Id: Ifdafa38d083ebd80f7ae3cbbc9aa3b68b6d21d0e Signed-off-by: Amar Tumballi <amar@kadalu.io>
*	features/shard: optimization over shard lookup in case of prealloc	Vinayakswami Hariharmath	2020-08-06	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assume that we are preallocating a VM of size 1TB with a shard block size of 64MB then there will be ~16k shards. This creation happens in 2 steps shard_fallocate() path i.e 1. lookup for the shards if any already present and 2. mknod over those shards do not exist. But in case of fresh creation, we dont have to lookup for all shards which are not present as the the file size will be 0. Through this, we can save lookup on all shards which are not present. This optimization is quite useful in the case of preallocating big vm. Also if the file is already present and the call is to extend it to bigger size then we need not to lookup for non- existent shards. Just lookup preexisting shards, populate the inodes and issue mknod on extended size. Fixes: #1425 Change-Id: I60036fe8302c696e0ca80ff11ab0ef5bcdbd7880 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
*	afr/split-brain: fix client side split-brain resolution when quorum is enabled	Mohammed Rafi KC	2020-07-29	1	-0/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If we set favourite child policy, then automatic split-brain resolution should work in all cases. This was failing when quorum count was set to a non-zero value. The initial lookup before the read txn was failing with ENOTCONN. Since we don't have a readable subvol, we were failing it. We were only looking to the split brain resolution choice set through the cli command. Fix: We will now consider the favourite child policy if split-brain choice has not been set via cli command. Change-Id: Id2016c3a90d0763ac6f1a0131571053f595576f0 Fixes: #1404 Signed-off-by: Mohammed Rafi KC <rafi.kavungal@iternity.com>
*	tests: Fix regression failures of 01-georep-glusterd-tests.t	Shwetha K Acharya	2020-08-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	TEST $GEOREP_CLI $master $slave1 create push-pem force times out on Centos 7 builders. Increasing the GEO_REP_TIMEOUT and SCRIPT_TIMEOUT to address the same. Fixes: #1410 Change-Id: I81b5590e33f40ea4210cc56d18e2b9fa34033cd8 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	test: ./tests/features/ssl-ciphers.t fail on centos 8	Mohit Agrawal	2020-07-30	1	-2/+9
\| \| \| \| \| \| \| \| \| \|	Check the tlsv1 openssl connection based on openssl version. If openssl version is 1.1 it supports tls1 protocol otherwise it supports tlsv1_2 protocol. Fixes: #1403 Change-Id: I3ca286492049e6f84de70e3b969fa41db10378ab Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	glusterd: getspec() returns wrong response when volfile not found	Tamar Shacked	2020-07-21	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a cluster env: getspec() detects that volfile not found. but further on, this return code is set by another call so the error is lost and not handled. As a result the server responds with ambiguous message: {op_ret = -1, op_errno = 0..} - which cause the client to stuck. Fix: server side: don't override the failure error. fixes: #1375 Change-Id: Id394954d4d0746570c1ee7d98969649c305c6b0d Signed-off-by: Tamar Shacked <tshacked@redhat.com>
*	dht - fixing xattr inconsistency	Barak Sason Rofman	2020-07-07	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scenario of setting an xattr to a dir, killing one of the bricks, removing the xattr, bringing back the brick results in xattr inconsistency - The downed brick will still have the xattr, but the rest won't. This patch add a mechanism that will remove the extra xattrs during lookup. This patch is a modification to a previous patch based on comments that were made after merge: https://review.gluster.org/#/c/glusterfs/+/24613/ fixes: #1324 Change-Id: Ifec0b7aea6cd40daa8b0319b881191cf83e031d1 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	tests/features/interrupt.t: fixes	Csaba Henk	2020-07-08	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Modify the patterns for which we grep the logs so that they don't match themselves. The test runner inserts the invocation of the cases to the log, thus the patterns will occur in the logs verbatim. So if the pattern matches itself, the test case will be moot (always reporting success). - Invoke the test utility (open-and-sleep) on unique paths so that the file at the passed path shall be created on each invocation. The kernel does not send an interrupt if the file is extant. (This was shadowed by the above mistske with result evaluation.) - Modify the pattern for which we grep the log in the test case where interrupt handling is expected so that it asserts that the interrupt was handled. (So far we did not exclude the possibility of the interrupt triggered but not handled due to a race; however, it seems to be the case that this theoretic race does not have the potential to prevent interrupt handling. And if this ever changes in the future we'd rather be notified about that.) Change-Id: I606da2b4064c1ecc4781c7dfdefed95a433478ce Updates: #1374 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	dht: Heal missing dir entry on brick in revalidate path	Susant Palai	2020-06-23	1	-0/+33
\| \| \| \| \| \| \| \| \|	Mark dir as missing in layout structure to be healed in dht_selfheal_directory. fixes: #1327 Change-Id: If2c69294bd8107c26624cfe220f008bc3b952a4e Signed-off-by: Susant Palai <spalai@redhat.com>
*	tests: added volume operations to increase code coverage	nik-redhat	2020-05-26	5	-14/+144
\| \| \| \| \| \| \| \| \| \| \| \|	Added test for volume options like localtime-logging, fixed enable-shared-storage to include function coverage and few negative tests for other volume options to increase the code coverage in the glusterd component. Change-Id: Ib1706c1fd5bc98a64dcb5c8b15a121d639a597d7 Updates: #1052 Signed-off-by: nik-redhat <nladha@redhat.com>
*	Revert "dht - fixing xattr inconsistency"	Barak Sason Rofman	2020-06-25	1	-54/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 620158475f462251c996901a8e24306ef6cb4c42. The patch to revert is https://review.gluster.org/#/c/glusterfs/+/24613/ Reverting is required as comments were posted regarding a more efficient implementation were made after the patch was merged. A new patch will be posted to adress the comments will be posted. updates: #1324 Change-Id: I59205baefe1cada033c736d41ce9c51b21727d3f Signed-off-by: Barak Sason Rofman <redhat@gmail.com>
*	dht - fixing xattr inconsistency	Barak Sason Rofman	2020-06-21	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \|	The scenario of setting an xattr to a dir, killing one of the bricks, removing the xattr, bringing back the brick results in xattr inconsistency - The downed brick will still have the xattr, but the rest won't. This patch add a mechanism that will remove the extra xattrs during lookup. fixes: #1324 Change-Id: Ibcc449bad6c7cb46bcae380e42e4496d733b453d Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	glusterd: add-brick command failure	Sanju Rakonde	2020-06-16	2	-3/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: add-brick operation is failing when replica or disperse count is not mentioned in the add-brick command. Reason: with commit a113d93 we are checking brick order while doing add-brick operation for replica and disperse volumes. If replica count or disperse count is not mentioned in the command, the dict get is failing and resulting add-brick operation failure. fixes: #1306 Change-Id: Ie957540e303bfb5f2d69015661a60d7e72557353 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	mount/fuse: use cookies to get fuse-interrupt-record instead of xdata	Pranith Kumar K	2020-06-17	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: On executing tests/features/flock_interrupt.t the following error log appears [2020-06-16 11:51:54.631072 +0000] E [fuse-bridge.c:4791:fuse_setlk_interrupt_handler_cbk] 0-glusterfs-fuse: interrupt record not found This happens because fuse-interrupt-record is never sent on the wire by getxattr fop and there is no guarantee that in the cbk it will be available in case of failures. Fix: wind getxattr fop with fuse-interrupt-record as cookie and recover it in the cbk Fixes: #1310 Change-Id: I4cfff154321a449114fc26e9440db0f08e5c7daa Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests/glusterd: spurious failure of ↵	Sanju Rakonde	2020-05-29	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t Test Summary Report ------------------- tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t (Wstat: 0 Tests: 23 Failed: 3) Failed tests: 21-23 After glusterd restart, volume start is failing. Looks like, it need some time to sync the data. Adding sleep for the same. Note: All other changes are made to avoid spurious failures in the future. fixes: #1272 Change-Id: Ib184757fb936e03b5b6208465e44a8e790b71c1c Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	afr: more quorum checks in lookup and new entry marking	Ravishankar N	2020-05-27	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: See github issue for details. Fix: -In lookup if the entry exists in 2 out of 3 bricks, don't fail the lookup with ENOENT just because there is an entrylk on the parent. Consider quorum before deciding. -If entry FOP does not succeed on quorum no. of bricks, do not perform new entry mark. Fixes: #1303 Change-Id: I56df8c89ad53b29fa450c7930a7b7ccec9f4a6c5 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	Indicate timezone offsets in timestamps	Csaba Henk	2020-03-12	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Logs and other output carrying timestamps will have now timezone offsets indicated, eg.: [2020-03-12 07:01:05.584482 +0000] I [MSGID: 106143] [glusterd-pmap.c:388:pmap_registry_remove] 0-pmap: removing brick (null) on port 49153 To this end, - gf_time_fmt() now inserts timezone offset via %z strftime(3) template. - A new utility function has been added, gf_time_fmt_tv(), that takes a struct timeval pointer (tv) instead of a time_t value to specify the time. If tv->tv_usec is negative, gf_time_fmt_tv(... tv ...) is equivalent to gf_time_fmt(... tv->tv_sec ...) Otherwise it also inserts tv->tv_usec to the formatted string. - Building timestamps of usec precision has been converted to gf_time_fmt_tv, which is necessary because the method of appending a period and the usec value to the end of the timestamp does not work if the timestamp has zone offset, but it's also beneficial in terms of eliminating repetition. - The buffer passed to gf_time_fmt/gf_time_fmt_tv has been unified to be of GF_TIMESTR_SIZE size (256). We need slightly larger buffer space to accommodate the zone offset and it's preferable to use a buffer which is undisputedly large enough. This change does not* do the following: - Retaining a method of timestamp creation without timezone offset. As to my understanding we don't need such backward compatibility as the code just emits timestamps to logs and other diagnostic texts, and doesn't do any later processing on them that would rely on their format. An exception to this, ie. a case where timestamp is built for internal use, is graph.c:fill_uuid(). As far as I can see, what matters in that case is the uniqueness of the produced string, not the format. - Implementing a single-token (space free) timestamp format. While some timestamp formats used to be single-token, now all of them will include a space preceding the offset indicator. Again, I did not see a use case where this could be significant in terms of representation. - Moving the codebase to a single unified timestamp format and dropping the fmt argument of gf_time_fmt/gf_time_fmt_tv. While the gf_timefmt_FT format is almost ubiquitous, there are a few cases where different formats are used. I'm not convinced there is any reason to not use gf_timefmt_FT in those cases too, but I did not want to make a decision in this regard. Change-Id: I0af73ab5d490cca7ed8d07a2ce7ac22a6df2920a Updates: #837 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	features/shard: Use fd lookup post file open	Vinayakswami Hariharmath	2020-06-03	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue: When a process has the open fd and the same file is unlinked in middle of the operations, then file based lookup fails with ENOENT or stale file Solution: When the file already open and fd is available, use fstat to get the file attributes Change-Id: I0e83aee9f11b616dcfe13769ebfcda6742e4e0f4 Fixes: #1281 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
*	test: Test case brick-mux-validation-in-cluster.t is failing on RHEL-8	Mohit Agrawal	2020-06-09	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Brick process are not properly attached on any cluster node while some volume options are changed on peer node and glusterd is down on that specific node. Solution: At the time of restart glusterd it got a friend update request from a peer node if peer node having some changes on volume.If the brick process is started before received a friend update request in that case brick_mux behavior is not workingproperly. All bricks are attached to the same process even volumes options are not the same. To avoid the issue introduce an atomic flag volpeerupdate and update the value while glusterd has received a friend update request from peer for a specific volume.If volpeerupdate flag is 1 volume is started by glusterd_import_friend_volume synctask Change-Id: I4c026f1e7807ded249153670e6967a2be8d22cb7 Credit: Sanju Rakaonde <srakonde@redhat.com> fixes: #1290 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	cluster/afr: Delay post-op for fsync	Pranith Kumar K	2020-05-29	3	-0/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: AFR doesn't delay post-op for fsync fop. For fsync heavy workloads this leads to un-necessary fxattrop/finodelk for every fsync leading to bad performance. Fix: Have delayed post-op for fsync. Add special flag in xdata to indicate that afr shouldn't delay post-op in cases where either the process will terminate or graph-switch would happen. Otherwise it leads to un-necessary heals when the graph-switch/process-termination happens before delayed-post-op completes. Fixes: #1253 Change-Id: I531940d13269a111c49e0510d49514dc169f4577 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Prioritize ENOSPC over other errors	karthik-us	2020-05-21	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a replicate/arbiter volume if file creations or writes fails on quorum number of bricks and on one brick it is due to ENOSPC and on other brick it fails for a different reason, it may fail with errors other than ENOSPC in some cases. Fix: Prioritize ENOSPC over other lesser priority errors and do not set op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any error_no which can be misinterpreted by __afr_dir_write_finalize(). Also removing the function afr_has_arbiter_fop_cbk_quorum() which might consider a successful reply form a single brick as quorum success in some cases, whereas we always need fop to be successful on quorum number of bricks in arbiter configuration. Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08 Fixes: #1254 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	open-behind: rewrite of internal logic	Xavi Hernandez	2020-05-12	5	-0/+872
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a critical flaw in the previous implementation of open-behind. When an open is done in the background, it's necessary to take a reference on the fd_t object because once we "fake" the open answer, the fd could be destroyed. However as long as there's a reference, the release function won't be called. So, if the application closes the file descriptor without having actually opened it, there will always remain at least 1 reference, causing a leak. To avoid this problem, the previous implementation didn't take a reference on the fd_t, so there were races where the fd could be destroyed while it was still in use. To fix this, I've implemented a new xlator cbk that gets called from fuse when the application closes a file descriptor. The whole logic of handling background opens have been simplified and it's more efficient now. Only if the fop needs to be delayed until an open completes, a stub is created. Otherwise no memory allocations are needed. Correctly handling the close request while the open is still pending has added a bit of complexity, but overall normal operation is simpler. Change-Id: I6376a5491368e0e1c283cc452849032636261592 Fixes: #1225 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	io-cache,quick-read: deprecate volume options with flawed semantics or naming	Csaba Henk	2020-05-14	4	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- performance.cache-size has a flawed semantics, as it's dispatched on two independent translators, io-cache and quick-read. - performance.qr-cache-timeout has a confusing name, as other options affecting quick-read have an unabbreviated "quick-read-..." prefix in their names. We keep these options with unchanged operation, but in the help output we indicate their deprecation. The following better alternatives are introduced: - performance.io-cache-size to tune cache-size option of io-cache - performance.quick-read-cache-size to tune cache-size option of quick-read - performance.quick-read-cache-timeout as a preferred synonym for performance.qr-cache-timeout Fixes: #952 Change-Id: Ibd04fb638de8cac450ba992ad8a415154f9f4281 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	dht - sparse files rebalance enhancements	Barak Sason Rofman	2020-05-06	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently data migration in rebalance reads sparse file sequentially, disregarding which segments are holes and which are data. This can lead to extremely long migration time for large sparse file. Data migration mechanism needs to be enhanced so only data segments are read and migrated. This can be achieved using lseek to seek for holes and data in the file. This enhancement is a consequence of https://bugzilla.redhat.com/show_bug.cgi?id=1823703 fixes: #1222 Change-Id: If5f448a0c532926464e1f34f504c5c94749b08c3 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	features/shard: Aggregate file size, block-count before unwinding removexattr	Krutika Dhananjay	2020-05-22	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops. These iatts are further cached at layers like md-cache. Shard translator, in its current state, simply returns these values without updating the aggregated file size and block-count. This patch fixes this problem. Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872 Fixes: #1243 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	features/shard: Aggregate size, block-count in iatt before unwinding setxattr	Krutika Dhananjay	2020-05-15	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops. These iatts are further cached at layers like md-cache. Shard translator, in its current state, simply returns these values without updating the aggregated file size and block-count. This patch fixes this problem. Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea Fixes: #1243 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	mgmt/glusterd: Stop old shd before increasing replica count	Pranith Kumar K	2020-05-13	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In add-brick that increases replica count SHD was restarted after pending xattrs are set on the new bricks and adding bricks. But before restarting SHD there is a possibility that old SHD would do a scan on root-directory see no heal is needed and delete index for root-dir leading to no heals until lookup is executed on the mount Fix: Stop shd, perform pending-xattr setting/adding new bricks and then restart shd Fixes: #1240 Change-Id: I94fd7c6c909211b597185dfe097a559db6c0d00f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: Disable client-heals	Pranith Kumar K	2020-05-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: ok 32 [ 11/ 9] < 46> 'gf_rm_file_and_gfid_link /d/backends/patchy0 del-file' not ok 33 [ 13/ 131] < 48> '! dd if=/dev/zero of=/mnt/glusterfs/0/del-file bs=1M count=1 oflag=direct' -> '' The assumption in the test above is that the file wouldn't exist when dd happens. But heal can lead to creation of the file in some cases leading to spurious failures. Fix: Disable client side heal. Fixes: #1245 Change-Id: I96b2b45528f9dfb3199d503a467cafafba9b387f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>