summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* iobuf: use lists instead of iobufs in iobuf_arena struct (#2097)Yaniv Kaul2021-02-163-17/+15
| | | | | | | | | | | | | | We only need passive and active lists, there's no need for a full iobuf variable. Also ensured passive_list is before active_list, as it's always accessed first. Note: this almost brings us to using 2 cachelines only for that structure. We can easily make other variables smaller (page_size could be 4 bytes) and fit exactly 2 cache lines. Fixes: #2096 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Remove tests from components that are no longer in the tree (#2160)Pranith Kumar Karampuri2021-02-1310-294/+0
| | | | | fixes: #2159 Change-Id: Ibaaebc48b803ca6ad4335c11818c0c71a13e9f07 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* mount: optimize parameter backup-volfile-servers (#2043)chenglin1302021-02-113-2/+51
| | | | | | Optimize parameter backup-volfile-servers to support IPv6 address. Fixes: #2042 Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
* Stirng not null terminated (#2112)nishith-vihar2021-02-111-5/+8
| | | | | | | | | | CID: 1214629,1274235,1430115,1437648 Null character is added at the end of buffer which corrects the issue. Change-Id: I8f7016520ffd41b2c68fe3c7f053e0e04f306c84 Updates: #1060 Signed-off-by: Nishith Vihar Sakinala <nsakinal@redhat.com>
* posix: fix chmod error on symlinks (#2155)Xavi Hernandez2021-02-114-5/+19
| | | | | | | | | After glibc 2.32, lchmod() is returning EOPNOTSUPP instead of ENOSYS when called on symlinks. The man page says that the returned code is ENOTSUP. They are the same in linux, but this patch correctly handles all errors. Fixes: #2154 Change-Id: Ib3bb3d86d421cba3d7ec8d66b6beb131ef6e0925 Signed-off-by: Xavi Hernandez xhernandez@redhat.com
* glusterd: fix for starting brick on new port (#2090)Nikhil Ladha2021-02-102-16/+22
| | | | | | | | | | | | | | | The Errno set by the runner code was not correct when the bind() fails to assign an already occupied port in the __socket_server_bind(). Fix: Updated the code to return the correct errno from the __socket_server_bind() if the bind() fails due to EADDRINUSE error. And, use the returned errno from runner_run() to retry allocating a new port to the brick process. Fixes: #1101 Change-Id: If124337f41344a04f050754e402490529ef4ecdc Signed-off-by: nik-redhat nladha@redhat.com
* Glustereventsd Default port change (#2091)schaffung2021-02-103-3/+3
| | | | | | | | | | | | Issue : The default port of glustereventsd is currently 24009 which is preventing glustereventsd from binding to the UDP port due to selinux policies. Fix: Changing the default port to be bound by chanding it to something in the ephemeral range. Fixes: #2080 Change-Id: Ibdc87f83f82f69660dca95d6d14b226e10d8bd33 Signed-off-by: srijan-sivakumar <ssivakum@redhat.com>
* gfapi: avoid crash while logging message. (#2139)Rinku Kothiya2021-02-091-1/+1
| | | | | | | | | Breaking parameter into two different parameter to avoid a crash. fixes: #2138 Change-Id: Idd5f3631488c1d892748f83e6847fb6fd2d0802a Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* stack.h/c: remove unused variable and reorder structYaniv Kaul2021-02-083-17/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Removed unused ref_count variable - Reordered the struct to get related variables closer together. - Changed 'complete' from a '_Bool' to a 'int32_t' Before: ``` struct _call_frame { call_stack_t * root; /* 0 8 */ call_frame_t * parent; /* 8 8 */ struct list_head frames; /* 16 16 */ void * local; /* 32 8 */ xlator_t * this; /* 40 8 */ ret_fn_t ret; /* 48 8 */ int32_t ref_count; /* 56 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ gf_lock_t lock; /* 64 40 */ void * cookie; /* 104 8 */ _Bool complete; /* 112 1 */ /* XXX 3 bytes hole, try to pack */ glusterfs_fop_t op; /* 116 4 */ struct timespec begin; /* 120 16 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ struct timespec end; /* 136 16 */ const char * wind_from; /* 152 8 */ const char * wind_to; /* 160 8 */ const char * unwind_from; /* 168 8 */ const char * unwind_to; /* 176 8 */ /* size: 184, cachelines: 3, members: 17 */ /* sum members: 177, holes: 2, sum holes: 7 */ /* last cacheline: 56 bytes */ ``` After: ``` struct _call_frame { call_stack_t * root; /* 0 8 */ call_frame_t * parent; /* 8 8 */ struct list_head frames; /* 16 16 */ struct timespec begin; /* 32 16 */ struct timespec end; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ void * local; /* 64 8 */ gf_lock_t lock; /* 72 40 */ void * cookie; /* 112 8 */ xlator_t * this; /* 120 8 */ /* --- cacheline 2 boundary (128 bytes) --- */ ret_fn_t ret; /* 128 8 */ glusterfs_fop_t op; /* 136 4 */ int32_t complete; /* 140 4 */ const char * wind_from; /* 144 8 */ const char * wind_to; /* 152 8 */ const char * unwind_from; /* 160 8 */ const char * unwind_to; /* 168 8 */ /* size: 176, cachelines: 3, members: 16 */ /* last cacheline: 48 bytes */ ``` Fixes: #2130 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* tests: Handle nanosecond duration in profile info (#2135)Pranith Kumar Karampuri2021-02-084-5/+5
| | | | | | | | | | | | | Problem: volume profile info now prints duration in nano seconds. Tests were written when the duration was printed in micro seconds. This leads to spurious failures. Fix: Change tests to handle nano second durations fixes: #2134 Change-Id: I94722be87000a485d98c8b0f6d8b7e1a526b07e7 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* cluster/ec: Change self-heal-window-size to 4MiB by default (#2071)Xavi Hernandez2021-02-061-1/+1
| | | | | | | | | | | | | The current block size used for self-heal by default is 128 KiB. This requires a significant amount of management requests for a very small portion of data healed. With this patch the block size is increased to 4 MiB. For a standard EC volume configuration of 4+2, this means that each healed block of a file will update 1 MiB on each brick. Change-Id: Ifeec4a2d54988017d038085720513c121b03445b Updates: #2067 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* introduce microsleep to improve sleep precision (#2104)renlei42021-02-064-3/+33
| | | | | | | | | | | | | | | | | | | | * syncop: introduce microsecond sleep support Introduce microsecond sleep function synctask_usleep, which can be used to improve precision instead of synctask_sleep. Change-Id: Ie7a15dda4afc09828bfbee13cb8683713d7902de * glusterd: use synctask_usleep in glusterd_proc_stop() glusterd_proc_stop() sleep 1s for proc stop before force kill. but in most cases, process can be stopped in 100ms. This patch use synctask_usleep to check proc running state every 100ms instead of sleep 1, can reduce up to 1s stop time. in some cases like enable 100 volumes quota, average execution time reduced from 2500ms to 500ms. fixes: #2116 Change-Id: I645e083076c205aa23b219abd0de652f7d95dca7
* glusterd-volgen: Add functionality to accept any custom xlator (#1974)Ryo Furuhashi2021-02-054-31/+343
| | | | | | | | | | | | | | | | | | | | | * glusterd-volgen: Add functionality to accept any custom xlator Add new function which allow users to insert any custom xlators. It makes to provide a way to add any processing into file operations. Users can deploy the plugin(xlator shared object) and integrate it to glusterfsd. If users want to enable a custom xlator, do the follows: 1. put xlator object(.so file) into "XLATOR_DIR/user/" 2. set the option user.xlator.<xlator> to the existing xlator-name to specify of the position in graph 3. restart gluster volume Options for custom xlator are able to set in "user.xlator.<xlator>.<optkey>". Fixes: #1943 Signed-off-by:Ryo Furuhashi <ryo.furuhashi.nh@hitachi.com> Co-authored-by: Yaniv Kaul <ykaul@redhat.com> Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
* Fix comments format, remove 'is a folder' warningMichael Scherer2021-02-042-2/+2
| | | | | | | | | | | | // is for C and C++, shell use #. Vim syntax coloration is misleading. This displayed in each jenkins log: ./tests/00-geo-rep/../include.rc: line 1: //: is a folder Likely no impact besides a wrong warning. Fix #2093
* dht: don't parse decommissioned-bricks option when in decommission (#2088)Tamar Shacked2021-02-041-3/+6
| | | | | | | | | | | | | | Scenario: 1) decommission start: the option decommissioned-bricks is added to the vol file and being parsed by dht. 2) another configuration change (like setting a new loglevel): the decommissioned-bricks option still exists on the vol file and being parsed again, this leads to invalid data. Fix: Prevent the parsing of "decommissioned-bricks" when decommission is running. This counts on the fact that once a decommission is running it cannot be started again. Fixes: #1992 Change-Id: I7a016750e2f865aee4cd620bd9033ec19421d47d Signed-off-by: Tamar Shacked <tshacked@redhat.com>
* cluster/dht: Allow fix-layout only on directories (#2109)Pranith Kumar Karampuri2021-02-032-0/+37
| | | | | | | | | | | | | Problem: fix-layout operation assumes that the directory passed is directory i.e. layout->cnt == conf->subvolume_cnt. This will lead to a crash when fix-layout is attempted on a file. Fix: Disallow fix-layout on files fixes: #2107 Change-Id: I2116b8773059f67e3260e9207e20eab3de711417 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* cluster/afr: Change default self-heal-window-size to 1MB (#2068)Pranith Kumar Karampuri2021-02-032-3/+9
| | | | | | | | | At the moment self-heal-window-size is 128KB. This leads to healing data in 128KB chunks. With the growth of data and the avg file sizes nowadays, 1MB seems like a better default. Change-Id: I70c42c83b16c7adb53d6b5762969e878477efb5c Fixes: #2067 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* features/shard: delay unlink of a file that has fd_count > 0 (#1563)Vinayak hariharmath2021-02-037-25/+391
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * features/shard: delay unlink of a file that has fd_count > 0 When there are multiple processes working on a file and if any process unlinks that file then unlink operation shouldn't harm other processes working on it. This is a posix a compliant behavior and this should be supported when shard feature is enabled also. Problem description: Let's consider 2 clients C1 and C2 working on a file F1 with 5 shards on gluster mount and gluster server has 4 bricks B1, B2, B3, B4. Assume that base file/shard is present on B1, 1st, 2nd shards on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1 has opened the F1 in append mode and is writing to it. The write FOP goes to 5th shard in this case. So the inode->fd_count = 1 on B1(base file) and B4 (5th shard). C2 at the same time issued unlink to F1. On the server, the base file has fd_count = 1 (since C1 has opened the file), the base file is renamed under .glusterfs/unlink and returned to C2. Then unlink will be sent to shards on all bricks and shards on B2 and B3 will be deleted which have no open reference yet. C1 starts getting errors while accessing the remaining shards though it has open references for the file. This is one such undefined behavior. Likewise we will encounter many such undefined behaviors as we dont have one global lock to access all shards as one. Of Course having such global lock will lead to performance hit as it reduces window for parallel access of shards. Solution: The above undefined behavior can be addressed by delaying the unlink of a file when there are open references on it. File unlink happens in 2 steps. step 1: client creates marker file under .shard/remove_me and sends unlink on base file to the server step 2: on return from the server, the associated shards will be cleaned up and finally marker file will be removed. In step 2, the back ground deletion process does nameless lookup using marker file name (marker file is named after the gfid of the base file) in glusterfs/unlink dir. If the nameless look up is successful then that means the gfid still has open fds and deletion of shards has to be delayed. If nameless lookup fails then that indicates the gfid is unlinked and no open fds on that file (the gfid path is unlinked during final close on the file). The shards on which deletion is delayed are unlinked one the all open fds are closed and this is done through a thread which wakes up every 10 mins. Also removed active_fd_count from inode structure and referring fd_count wherever active_fd_count was used. fixes: #1358 Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> * features/shard: delay unlink of a file that has fd_count > 0 When there are multiple processes working on a file and if any process unlinks that file then unlink operation shouldn't harm other processes working on it. This is a posix a compliant behavior and this should be supported when shard feature is enabled also. Problem description: Let's consider 2 clients C1 and C2 working on a file F1 with 5 shards on gluster mount and gluster server has 4 bricks B1, B2, B3, B4. Assume that base file/shard is present on B1, 1st, 2nd shards on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1 has opened the F1 in append mode and is writing to it. The write FOP goes to 5th shard in this case. So the inode->fd_count = 1 on B1(base file) and B4 (5th shard). C2 at the same time issued unlink to F1. On the server, the base file has fd_count = 1 (since C1 has opened the file), the base file is renamed under .glusterfs/unlink and returned to C2. Then unlink will be sent to shards on all bricks and shards on B2 and B3 will be deleted which have no open reference yet. C1 starts getting errors while accessing the remaining shards though it has open references for the file. This is one such undefined behavior. Likewise we will encounter many such undefined behaviors as we dont have one global lock to access all shards as one. Of Course having such global lock will lead to performance hit as it reduces window for parallel access of shards. Solution: The above undefined behavior can be addressed by delaying the unlink of a file when there are open references on it. File unlink happens in 2 steps. step 1: client creates marker file under .shard/remove_me and sends unlink on base file to the server step 2: on return from the server, the associated shards will be cleaned up and finally marker file will be removed. In step 2, the back ground deletion process does nameless lookup using marker file name (marker file is named after the gfid of the base file) in glusterfs/unlink dir. If the nameless look up is successful then that means the gfid still has open fds and deletion of shards has to be delayed. If nameless lookup fails then that indicates the gfid is unlinked and no open fds on that file (the gfid path is unlinked during final close on the file). The shards on which deletion is delayed are unlinked one the all open fds are closed and this is done through a thread which wakes up every 10 mins. Also removed active_fd_count from inode structure and referring fd_count wherever active_fd_count was used. fixes: #1358 Change-Id: Iec16d7ff5e05f29255491a43fbb6270c72868999 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> * features/shard: delay unlink of a file that has fd_count > 0 When there are multiple processes working on a file and if any process unlinks that file then unlink operation shouldn't harm other processes working on it. This is a posix a compliant behavior and this should be supported when shard feature is enabled also. Problem description: Let's consider 2 clients C1 and C2 working on a file F1 with 5 shards on gluster mount and gluster server has 4 bricks B1, B2, B3, B4. Assume that base file/shard is present on B1, 1st, 2nd shards on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1 has opened the F1 in append mode and is writing to it. The write FOP goes to 5th shard in this case. So the inode->fd_count = 1 on B1(base file) and B4 (5th shard). C2 at the same time issued unlink to F1. On the server, the base file has fd_count = 1 (since C1 has opened the file), the base file is renamed under .glusterfs/unlink and returned to C2. Then unlink will be sent to shards on all bricks and shards on B2 and B3 will be deleted which have no open reference yet. C1 starts getting errors while accessing the remaining shards though it has open references for the file. This is one such undefined behavior. Likewise we will encounter many such undefined behaviors as we dont have one global lock to access all shards as one. Of Course having such global lock will lead to performance hit as it reduces window for parallel access of shards. Solution: The above undefined behavior can be addressed by delaying the unlink of a file when there are open references on it. File unlink happens in 2 steps. step 1: client creates marker file under .shard/remove_me and sends unlink on base file to the server step 2: on return from the server, the associated shards will be cleaned up and finally marker file will be removed. In step 2, the back ground deletion process does nameless lookup using marker file name (marker file is named after the gfid of the base file) in glusterfs/unlink dir. If the nameless look up is successful then that means the gfid still has open fds and deletion of shards has to be delayed. If nameless lookup fails then that indicates the gfid is unlinked and no open fds on that file (the gfid path is unlinked during final close on the file). The shards on which deletion is delayed are unlinked one the all open fds are closed and this is done through a thread which wakes up every 10 mins. Also removed active_fd_count from inode structure and referring fd_count wherever active_fd_count was used. fixes: #1358 Change-Id: I07e5a5bf9d33c24b63da72d4f3f59392c5421652 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> * features/shard: delay unlink of a file that has fd_count > 0 When there are multiple processes working on a file and if any process unlinks that file then unlink operation shouldn't harm other processes working on it. This is a posix a compliant behavior and this should be supported when shard feature is enabled also. Problem description: Let's consider 2 clients C1 and C2 working on a file F1 with 5 shards on gluster mount and gluster server has 4 bricks B1, B2, B3, B4. Assume that base file/shard is present on B1, 1st, 2nd shards on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1 has opened the F1 in append mode and is writing to it. The write FOP goes to 5th shard in this case. So the inode->fd_count = 1 on B1(base file) and B4 (5th shard). C2 at the same time issued unlink to F1. On the server, the base file has fd_count = 1 (since C1 has opened the file), the base file is renamed under .glusterfs/unlink and returned to C2. Then unlink will be sent to shards on all bricks and shards on B2 and B3 will be deleted which have no open reference yet. C1 starts getting errors while accessing the remaining shards though it has open references for the file. This is one such undefined behavior. Likewise we will encounter many such undefined behaviors as we dont have one global lock to access all shards as one. Of Course having such global lock will lead to performance hit as it reduces window for parallel access of shards. Solution: The above undefined behavior can be addressed by delaying the unlink of a file when there are open references on it. File unlink happens in 2 steps. step 1: client creates marker file under .shard/remove_me and sends unlink on base file to the server step 2: on return from the server, the associated shards will be cleaned up and finally marker file will be removed. In step 2, the back ground deletion process does nameless lookup using marker file name (marker file is named after the gfid of the base file) in glusterfs/unlink dir. If the nameless look up is successful then that means the gfid still has open fds and deletion of shards has to be delayed. If nameless lookup fails then that indicates the gfid is unlinked and no open fds on that file (the gfid path is unlinked during final close on the file). The shards on which deletion is delayed are unlinked one the all open fds are closed and this is done through a thread which wakes up every 10 mins. Also removed active_fd_count from inode structure and referring fd_count wherever active_fd_count was used. fixes: #1358 Change-Id: I3679de8545f2e5b8027c4d5a6fd0592092e8dfbd Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> * Update xlators/storage/posix/src/posix-entry-ops.c Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com> * features/shard: delay unlink of a file that has fd_count > 0 When there are multiple processes working on a file and if any process unlinks that file then unlink operation shouldn't harm other processes working on it. This is a posix a compliant behavior and this should be supported when shard feature is enabled also. Problem description: Let's consider 2 clients C1 and C2 working on a file F1 with 5 shards on gluster mount and gluster server has 4 bricks B1, B2, B3, B4. Assume that base file/shard is present on B1, 1st, 2nd shards on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1 has opened the F1 in append mode and is writing to it. The write FOP goes to 5th shard in this case. So the inode->fd_count = 1 on B1(base file) and B4 (5th shard). C2 at the same time issued unlink to F1. On the server, the base file has fd_count = 1 (since C1 has opened the file), the base file is renamed under .glusterfs/unlink and returned to C2. Then unlink will be sent to shards on all bricks and shards on B2 and B3 will be deleted which have no open reference yet. C1 starts getting errors while accessing the remaining shards though it has open references for the file. This is one such undefined behavior. Likewise we will encounter many such undefined behaviors as we dont have one global lock to access all shards as one. Of Course having such global lock will lead to performance hit as it reduces window for parallel access of shards. Solution: The above undefined behavior can be addressed by delaying the unlink of a file when there are open references on it. File unlink happens in 2 steps. step 1: client creates marker file under .shard/remove_me and sends unlink on base file to the server step 2: on return from the server, the associated shards will be cleaned up and finally marker file will be removed. In step 2, the back ground deletion process does nameless lookup using marker file name (marker file is named after the gfid of the base file) in glusterfs/unlink dir. If the nameless look up is successful then that means the gfid still has open fds and deletion of shards has to be delayed. If nameless lookup fails then that indicates the gfid is unlinked and no open fds on that file (the gfid path is unlinked during final close on the file). The shards on which deletion is delayed are unlinked one the all open fds are closed and this is done through a thread which wakes up every 10 mins. Also removed active_fd_count from inode structure and referring fd_count wherever active_fd_count was used. fixes: #1358 Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> * Update fd.c * features/shard: delay unlink of a file that has fd_count > 0 When there are multiple processes working on a file and if any process unlinks that file then unlink operation shouldn't harm other processes working on it. This is a posix a compliant behavior and this should be supported when shard feature is enabled also. Problem description: Let's consider 2 clients C1 and C2 working on a file F1 with 5 shards on gluster mount and gluster server has 4 bricks B1, B2, B3, B4. Assume that base file/shard is present on B1, 1st, 2nd shards on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1 has opened the F1 in append mode and is writing to it. The write FOP goes to 5th shard in this case. So the inode->fd_count = 1 on B1(base file) and B4 (5th shard). C2 at the same time issued unlink to F1. On the server, the base file has fd_count = 1 (since C1 has opened the file), the base file is renamed under .glusterfs/unlink and returned to C2. Then unlink will be sent to shards on all bricks and shards on B2 and B3 will be deleted which have no open reference yet. C1 starts getting errors while accessing the remaining shards though it has open references for the file. This is one such undefined behavior. Likewise we will encounter many such undefined behaviors as we dont have one global lock to access all shards as one. Of Course having such global lock will lead to performance hit as it reduces window for parallel access of shards. Solution: The above undefined behavior can be addressed by delaying the unlink of a file when there are open references on it. File unlink happens in 2 steps. step 1: client creates marker file under .shard/remove_me and sends unlink on base file to the server step 2: on return from the server, the associated shards will be cleaned up and finally marker file will be removed. In step 2, the back ground deletion process does nameless lookup using marker file name (marker file is named after the gfid of the base file) in glusterfs/unlink dir. If the nameless look up is successful then that means the gfid still has open fds and deletion of shards has to be delayed. If nameless lookup fails then that indicates the gfid is unlinked and no open fds on that file (the gfid path is unlinked during final close on the file). The shards on which deletion is delayed are unlinked one the all open fds are closed and this is done through a thread which wakes up every 10 mins. Also removed active_fd_count from inode structure and referring fd_count wherever active_fd_count was used. fixes: #1358 Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> * features/shard: delay unlink of a file that has fd_count > 0 When there are multiple processes working on a file and if any process unlinks that file then unlink operation shouldn't harm other processes working on it. This is a posix a compliant behavior and this should be supported when shard feature is enabled also. Problem description: Let's consider 2 clients C1 and C2 working on a file F1 with 5 shards on gluster mount and gluster server has 4 bricks B1, B2, B3, B4. Assume that base file/shard is present on B1, 1st, 2nd shards on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1 has opened the F1 in append mode and is writing to it. The write FOP goes to 5th shard in this case. So the inode->fd_count = 1 on B1(base file) and B4 (5th shard). C2 at the same time issued unlink to F1. On the server, the base file has fd_count = 1 (since C1 has opened the file), the base file is renamed under .glusterfs/unlink and returned to C2. Then unlink will be sent to shards on all bricks and shards on B2 and B3 will be deleted which have no open reference yet. C1 starts getting errors while accessing the remaining shards though it has open references for the file. This is one such undefined behavior. Likewise we will encounter many such undefined behaviors as we dont have one global lock to access all shards as one. Of Course having such global lock will lead to performance hit as it reduces window for parallel access of shards. Solution: The above undefined behavior can be addressed by delaying the unlink of a file when there are open references on it. File unlink happens in 2 steps. step 1: client creates marker file under .shard/remove_me and sends unlink on base file to the server step 2: on return from the server, the associated shards will be cleaned up and finally marker file will be removed. In step 2, the back ground deletion process does nameless lookup using marker file name (marker file is named after the gfid of the base file) in glusterfs/unlink dir. If the nameless look up is successful then that means the gfid still has open fds and deletion of shards has to be delayed. If nameless lookup fails then that indicates the gfid is unlinked and no open fds on that file (the gfid path is unlinked during final close on the file). The shards on which deletion is delayed are unlinked one the all open fds are closed and this is done through a thread which wakes up every 10 mins. Also removed active_fd_count from inode structure and referring fd_count wherever active_fd_count was used. fixes: #1358 Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
* tests: 00-georep-verify-non-root-setup.t back to tests/00-geo-rep/ (#2102)Shwetha Acharya2021-02-031-0/+0
| | | | | | | | | | | 00-georep-verify-non-root-setup.t should be moved back to tests/00-geo-rep/ from tests/000-flaky/ directory as the recent failures encountered on the stated test case were not linked to the test case instead they were linked to installed libtirpc in the build environment. Fixes: #2101 Change-Id: I2b35e9ed95ad3de68ad8574ff76805f5db64c0b2 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* Move 00-georep-verify-non-root-setup.t to tests/000-flaky/ (#2087)Shwetha Acharya2021-02-011-0/+0
| | | | | | | | | | | | | Spurious failures of 00-georep-verify-non-root-setup.t seen only on build machines. These failures are not reproducible even on softserve / centos / fedora machines. So, moving test 00-georep-verify-non-root-setup.t to tests/000-flaky/ untill the issue is RCAd at build machines. Fixes: #2086 Change-Id: Id1eab598fa0f9ba5ba019e6b3f057a5b10fdb0ea Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* features/shard: unlink fails due to nospace to mknod marker fileVinayakswami Hariharmath2021-01-262-0/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | When we hit the max capacity of the storage space, shard_unlink() starts failing if there is no space left on the brick to create a marker file. shard_unlink() happens in below steps: 1. create a marker file in the name of gfid of the base file under BRICK_PATH/.shard/.remove_me 2. unlink the base file 3. shard_delete_shards() deletes the shards in background by picking the entries in BRICK_PATH/.shard/.remove_me If a marker file creation fails then we can't really delete the shards which eventually a problem for user who is looking to make space by deleting unwanted data. Solution: Create the marker file by marking xdata = GLUSTERFS_INTERNAL_FOP_KEY which is considered to be internal op and allowed to create under reserved space. Fixes: #2038 Change-Id: I7facebab940f9aeee81d489df429e00ef4fb7c5d Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* Revert "skip the lock when refcount is not zero" (#2053)Vinayak hariharmath2021-01-261-5/+8
| | | | | | This reverts commit 50e953e2450b5183988c12e87bdfbc997e0ad8a8. Fixes: #2052 Change-Id: Ic0670a63423b5d79c3d48001e18910b1dbf7e98d
* glusterd: do not allow changing storage.linux-aio for running volumeDmitry Antipov2021-01-221-10/+19
| | | | | | | | Do not allow changing storage.linux-aio for running volume, cleanup nearby storage.linux-io_uring error message as well. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Updates: #2039
* AFR - fixing coverity issue (Argument cannot be negative) (#2026)Barak Sason Rofman2021-01-222-2/+2
| | | | | | | | | CID 1430124 A negative value is being passed to a parameter hat cannot be negative. Modified the value which is being passed. Change-Id: I06dca105f7a78ae16145b0876910851fb631e366 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* tests: fix tests/bugs/nfs/bug-1053579.t (#2034)Xavi Hernandez2021-01-221-23/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | * tests: fix tests/bugs/nfs/bug-1053579.t On NFS the number of groups associated to a user that can be passed to the server is limited. This test created a user with 200 groups and checked that a file owned by the latest created group couldn't be accessed, under the assumption that the last group won't be passed to the server. However there's no guarantee on how the list of groups is generated, so the latest created group could be passed as one of the initial groups, making the allowing access to the file and causing the test to fail (because it was expecting to not be possible). Given that there's no way to be sure which groups will be passed, this patch changes the test so that a check is done for each group the user belongs to. Then we check that there have been some successes and some failures. Once 'manage-gids' is set, we do the same, but this time the number of failures must be 0. Fixes: #2033 Change-Id: Ide06da2861fcade2166372d1f3e9eb4ff2dd5f58 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* posix - fix coverity issue (Unchecked return value)Barak Sason Rofman2021-01-212-2/+9
| | | | | | | | | | CID 1291733 The return value of the method pthread_cancel was not being checked. Added a retun value check and proper error handling. Change-Id: I8c52b0e462461fc59718deb3b7c2f1b4e55613c7 updates: #1060 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* glusterd - fixing coverity issues (#1947)Barak Sason Rofman2021-01-211-16/+26
| | | | | | | | | | | | | | * glusterd - fixing coverity issues - Dereference after null check (CID 1437686) - Dereference null return value (CID 1437687) - A check for the return value of a memory allocation was missing, added it. - A value of a pointer was being dereferenced after a NULL-pointer check. With this change the pointer is no longer dereferenced. Change-Id: I10bf8a2cb08612981dbb788315dad7dbb4efe2cb updates: #1060 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* posix: implement AIO-based GF_FOP_FSYNC (#1953)Dmitry Antipov2021-01-211-65/+179
| | | | | | | | | | Implement GF_FOP_FSYNC using io_submit() with IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC operations. Refactor common code to posix_aio_cb_init() and posix_aio_cb_fini() as suggested by Ravishankar N. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Updates: #1952
* Dereference after null reference (CID:1124543) (#2023)nishith-vihar2021-01-201-6/+0
| | | | | | | 'this' pointer was being dereferenced after null check. This change avoids it. Change-Id: I7dedee44c08df481d2a037eb601f3d5c4d9284f5 Updates: #1060 Signed-off-by: Nishith Vihar Sakinala <nsakinal@redhat.com>
* doc: Remove the unneeded backslash for glusterfs manpage (#2003)winndows2021-01-191-1/+1
| | | | | | | | | Remove the unneeded backslash for glusterfs manpage, so we can get "PATH" instead of "PATHR": --dump-fuse=PATHR -> --dump-fuse=PATH Updates: #1000 Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
* dht: don't ignore xdata in fgetxattr (#2020)Xavi Hernandez2021-01-191-2/+2
| | | | | | | | | | | DHT was passing NULL for xdata in fgetxattr() request, ignoring any data sent by upper xlators. This patch fixes the issue by sending the received xdata to lower xlators, as it's currently done for getxattr(). Fixes: #1991 Change-Id: If3d3f1f2ce6215f3b1acc46480e133cb4294eaec Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* locks: remove unused conditional switch to spin_lock code (#2007)Vinayak hariharmath2021-01-195-91/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * locks: remove ununsed conditional switch to spin_lock code use of spin_locks is depend on the variable use_spinlocks but the same is commented in the current code base through https://review.gluster.org/#/c/glusterfs/+/14763/. So it is of no use to have conditional switching to spin_lock or mutex. Removing the dead code as part of the patch Fixes: #1996 Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> * locks: remove unused conditional switch to spin_lock code use of spin_locks is depend on the variable use_spinlocks but the same is commented in the current code base through https://review.gluster.org/#/c/glusterfs/+/14763/. So it is of no use to have conditional switching to spin_lock or mutex. Removing the dead code as part of the patch Fixes: #1996 Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com> * locks: remove unused conditional switch to spin_lock code use of spin_locks is depend on the variable use_spinlocks but the same is commented in the current code base through https://review.gluster.org/#/c/glusterfs/+/14763/. So it is of no use to have conditional switching to spin_lock or mutex. Removing the dead code as part of the patch Fixes: #1996 Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* dict: dict_reset() delete all elements using iterationTamar Shacked2021-01-181-19/+35
| | | | | | | Enhance dict_reset() imp by deleting all elements using iteration Fixes: #1536 Change-Id: Ib4d4f80bd30d52c891eb0fd4b563db19134e2328 Signed-off-by: Tamar Shacked <tshacked@redhat.com>
* shard: Fixed redundant checks done (#1769)Rinku Kothiya2021-01-181-19/+20
| | | | | | | | | This patch fixes redundant checks done while calling shard_modify_size_and_block_count. Fixes: #1703 Change-Id: I735e532c78cbb181afa4b51480ad742ef4a75f77 Signed-off-by: Rinku Kothiya rkothiya@redhat.com
* Removing unused memory allocationRinku Kothiya2021-01-1852-75/+59
| | | | | | | | | | Removing extra unused type. Removing leftovers from the RDMA Fixes: #904 Change-Id: Id5d28622120578b7076d112e355ad8df116021dd Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* glusterd: Removing redundant NULL checks for this and other cleanups. (#1735)schaffung2021-01-1844-2971/+1009
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * glusterd: Removing redundant NULL checks for this Issue: It has been noticed that the NULL checks performed on `this` are actually being done on `THIS` as `this` is derived from `THIS`. If the `THIS` had been NULL, the crash would have happened before itself. Fix: Basically removing the validations and assertion functions which check if `this` is NULL. Fixes: #1596 Signed-off-by: srijan-sivakumar <ssivakum@redhat.com> * Made changes wrt review comments received. Fixes: #1596 Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com> * glusterd: The efficient usage of `THIS` and `this`. This commit addresses the review comments and tries to change code in more places wherein the `THIS` and `this` can be handled efficiently. Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com> * Updated commit to address review comments. Updates: #1596 Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com> * Addressing Review comments. Updates: #1596 Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com> * Made changes after regression failure. Updates: #1596 Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com> * One has to be careful while working with c Instead of a `||` operation, the cleanup left out with `|`. Does the compiler ceck for these things? Updates: #1596 Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com> * Fixing clang-format issues. Change-Id: I68c52249af66080f59f57e558901f2654bd43cd8 Updates: #1596 Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com> Co-authored-by: srijan-sivakumar <ssivakumar@redhat.com>
* afr: remove memcpy() + ntoh32() pattern (#1998)Ravishankar N2021-01-154-34/+7
| | | | | | | | | Remove memcpy and/or byte order conversions when fetching values from the dictionary. Fixes: #504 Change-Id: Idf2367bac8cc592c419a11ea751495e1c664ec4d Reported-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* posix: Reduce posix_fdstat() calls in IO paths (#1994)mohit842021-01-151-5/+9
| | | | | | | | | The fops(posix_seek, posix_open, posix_readv) are calling posix_fdstat even cloud sync is not enabled, for these specific fops prestat is use by only cloud specific function(posix_cs_maintenance) Fixes: #1981 Change-Id: I4d3b6c41e88925456d2f957aba6b1d2441904f73 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* posix: use malloc in page_aligned_alloc() when possible (#2009)Ravishankar N2021-01-151-8/+11
| | | | | | | | | | | | - Some callers of this function do not require that the allocated buffer be zeroed out. Use GF_MALLOC instead of GF_CALLOC for such cases. - posix_rchecksum seems to be using the incorrect bufer size for computing the checksum. Fixed it. Updates: #1885 Reported-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: I44413b1efd7b69d3a4d318639d5ebdb38a99af7f
* core: Reduce calls to THIS wherever possible (#2010)Karthik Subrahmanya2021-01-153-26/+35
| | | | | | | | | In few functions 'THIS' is called inside a loop and saved for later use in 'old_THIS'. Instead we can call 'THIS' only when 'old_THIS' is NULL and reuse that itself to reduce redundant calls. Change-Id: Ie5d4e5fe42bd4df02d101b4c199759cb84e6aee1 Fixes: #1755 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* tests: ./tests/bugs/replicate/bug-921231.t is continuously failing (#2006)mohit842021-01-132-2/+2
| | | | | | | | | | | | The test case (./tests/bugs/replicate/bug-921231.t ) is continuously failing.The test case is failing because inodelk_max_latency is showing wrong value in profile. The value is not correct because recently the profile timestamp is changed from microsec to nanosec from the patch #1833. Fixes: #2005 Change-Id: Ieb683836938d986b56f70b2380103efe95657821 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* glusterfs-events: Fix incorrect attribute access (#2002)Leela Venkaiah G2021-01-131-4/+4
| | | | | | | | | | | | Issue: When GlusterCmdException is raised, current code try to access message atrribute which doesn't exist and resulting in a malformed error string on failure operations Code Change: Replace `message` with `args[0]` Fixes: #2001 Change-Id: I65c9f0ee79310937a384025b8d454acda154e4bb Signed-off-by: Leela Venkaiah G <lgangava@redhat.com>
* avoiding memory allocation while printing traceRinku Kothiya2021-01-111-12/+3
| | | | | | | | | | Printing trace can fail due to memory allocation issues this patch avoids that. Fixes: #1966 Change-Id: I14157303a2ff5d19de0e4ece0a460ff0cbd58c26 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* features/shard: avoid repeatative calls to gf_uuid_unparse() (#1689)Vinayak hariharmath2021-01-112-63/+65
| | | | | | | | | | | The issue is shard_make_block_abspath() calls gf_uuid_unparse() every time while constructing shard path. The gfid can be parsed and saved once and passed while constructing the path. Thus we can avoid calling gf_uuid_unparse(). Fixes: #1423 Change-Id: Ia26fbd5f09e812bbad9e5715242f14143c013c9c Signed-off-by: Vinayakswami Hariharmath vharihar@redhat.com
* geo-rep : Change in attribute for getting function name in py 3 (#1900)schaffung2021-01-091-1/+1
| | | | | | | | | | | | | | Issue: The schedule_geo-rep script uses `func_name` to obtain the name of the function being referred to but from python3 onwards, the attribute has been changed to `__name__`. Code Change: Changing `func_name` to `__name__`. Fixes: #1898 Change-Id: I4ed69a06cffed9db17c8f8949b8000c74be1d717 Signed-off-by: srijan-sivakumar <ssivakum@redhat.com> Co-authored-by: srijan-sivakumar <ssivakumar@redhat.com>
* glusterd: fix resource leak (#1970)Sheetal Pamecha2021-01-082-0/+6
| | | | | | | | | | | | | * glusterd: fix resource leak Change-Id: I03b4ad477b70eeeda387ff0d161d08a7353f147e CID: 1438341, 1438342 Updates: #1060 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com> * Add check for resource leak Change-Id: If34c8074fa4b70184d8103fd4d09695c84b907f5 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* skip the lock when refcount is not zeroRinku Kothiya2021-01-081-8/+5
| | | | | | | Fixes: #1380 Change-Id: I68bb46d2cf8b41c8e709fbeee4778e3cdfc2d46c Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashedMohit Agrawal2021-01-081-4/+13
| | | | | | | | | | | | | | The test case ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashed at the time of detaching a brick.The brick process is getting crashed because there is a race condition to send a disconnect on rpc associated with victim brick and handling GF_EVENT_CLEANUP for the victim brick. Solution: Save victim_name on local variable to avoid crash. Fixes: #1978 Change-Id: I76877f20b6ac0eecc39f1fa7d82afc9744dc5e04 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* glusterd: Fix for shared storage in ipv6 env (#1972)Nikhil Ladha2021-01-081-1/+1
| | | | | Change-Id: Ib38993724c709b35b603f9ac666630c50c932c3e Fixes: #1406 Signed-off-by: nik-redhat <nladha@redhat.com>
* configure: disable LTO when building with debug (#1967)Tamar Shacked2021-01-061-1/+1
| | | | | | | | LTO isn't added to the build when it is configured with "--enable-debug" Fixes: #1772 Change-Id: I87300d950871bdda6542d9bbfb6bdffd500585cc Signed-off-by: Tamar Shacked <tshacked@redhat.com>