| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We only need passive and active lists, there's no need for a full
iobuf variable.
Also ensured passive_list is before active_list, as it's always accessed
first.
Note: this almost brings us to using 2 cachelines only for that structure.
We can easily make other variables smaller (page_size could be 4 bytes) and fit
exactly 2 cache lines.
Fixes: #2096
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
| |
fixes: #2159
Change-Id: Ibaaebc48b803ca6ad4335c11818c0c71a13e9f07
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
| |
Optimize parameter backup-volfile-servers to support IPv6 address.
Fixes: #2042
Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
|
|
|
|
|
|
|
|
|
|
| |
CID: 1214629,1274235,1430115,1437648
Null character is added at the end of buffer which corrects the
issue.
Change-Id: I8f7016520ffd41b2c68fe3c7f053e0e04f306c84
Updates: #1060
Signed-off-by: Nishith Vihar Sakinala <nsakinal@redhat.com>
|
|
|
|
|
|
|
|
|
| |
After glibc 2.32, lchmod() is returning EOPNOTSUPP instead of ENOSYS when
called on symlinks. The man page says that the returned code is ENOTSUP.
They are the same in linux, but this patch correctly handles all errors.
Fixes: #2154
Change-Id: Ib3bb3d86d421cba3d7ec8d66b6beb131ef6e0925
Signed-off-by: Xavi Hernandez xhernandez@redhat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Errno set by the runner code was not correct when the bind() fails
to assign an already occupied port in the __socket_server_bind().
Fix:
Updated the code to return the correct errno from the
__socket_server_bind() if the bind() fails due to EADDRINUSE error. And,
use the returned errno from runner_run() to retry allocating a new port
to the brick process.
Fixes: #1101
Change-Id: If124337f41344a04f050754e402490529ef4ecdc
Signed-off-by: nik-redhat nladha@redhat.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue : The default port of glustereventsd is currently 24009
which is preventing glustereventsd from binding to the UDP port
due to selinux policies.
Fix: Changing the default port to be bound by chanding it to something
in the ephemeral range.
Fixes: #2080
Change-Id: Ibdc87f83f82f69660dca95d6d14b226e10d8bd33
Signed-off-by: srijan-sivakumar <ssivakum@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Breaking parameter into two different parameter
to avoid a crash.
fixes: #2138
Change-Id: Idd5f3631488c1d892748f83e6847fb6fd2d0802a
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Removed unused ref_count variable
- Reordered the struct to get related variables closer together.
- Changed 'complete' from a '_Bool' to a 'int32_t'
Before:
```
struct _call_frame {
call_stack_t * root; /* 0 8 */
call_frame_t * parent; /* 8 8 */
struct list_head frames; /* 16 16 */
void * local; /* 32 8 */
xlator_t * this; /* 40 8 */
ret_fn_t ret; /* 48 8 */
int32_t ref_count; /* 56 4 */
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
gf_lock_t lock; /* 64 40 */
void * cookie; /* 104 8 */
_Bool complete; /* 112 1 */
/* XXX 3 bytes hole, try to pack */
glusterfs_fop_t op; /* 116 4 */
struct timespec begin; /* 120 16 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
struct timespec end; /* 136 16 */
const char * wind_from; /* 152 8 */
const char * wind_to; /* 160 8 */
const char * unwind_from; /* 168 8 */
const char * unwind_to; /* 176 8 */
/* size: 184, cachelines: 3, members: 17 */
/* sum members: 177, holes: 2, sum holes: 7 */
/* last cacheline: 56 bytes */
```
After:
```
struct _call_frame {
call_stack_t * root; /* 0 8 */
call_frame_t * parent; /* 8 8 */
struct list_head frames; /* 16 16 */
struct timespec begin; /* 32 16 */
struct timespec end; /* 48 16 */
/* --- cacheline 1 boundary (64 bytes) --- */
void * local; /* 64 8 */
gf_lock_t lock; /* 72 40 */
void * cookie; /* 112 8 */
xlator_t * this; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
ret_fn_t ret; /* 128 8 */
glusterfs_fop_t op; /* 136 4 */
int32_t complete; /* 140 4 */
const char * wind_from; /* 144 8 */
const char * wind_to; /* 152 8 */
const char * unwind_from; /* 160 8 */
const char * unwind_to; /* 168 8 */
/* size: 176, cachelines: 3, members: 16 */
/* last cacheline: 48 bytes */
```
Fixes: #2130
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
volume profile info now prints duration in nano seconds. Tests were
written when the duration was printed in micro seconds. This leads to
spurious failures.
Fix:
Change tests to handle nano second durations
fixes: #2134
Change-Id: I94722be87000a485d98c8b0f6d8b7e1a526b07e7
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current block size used for self-heal by default is 128 KiB. This
requires a significant amount of management requests for a very small
portion of data healed.
With this patch the block size is increased to 4 MiB. For a standard
EC volume configuration of 4+2, this means that each healed block of
a file will update 1 MiB on each brick.
Change-Id: Ifeec4a2d54988017d038085720513c121b03445b
Updates: #2067
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* syncop: introduce microsecond sleep support
Introduce microsecond sleep function synctask_usleep,
which can be used to improve precision instead of synctask_sleep.
Change-Id: Ie7a15dda4afc09828bfbee13cb8683713d7902de
* glusterd: use synctask_usleep in glusterd_proc_stop()
glusterd_proc_stop() sleep 1s for proc stop before force kill.
but in most cases, process can be stopped in 100ms.
This patch use synctask_usleep to check proc running state
every 100ms instead of sleep 1, can reduce up to 1s stop time.
in some cases like enable 100 volumes quota, average execution
time reduced from 2500ms to 500ms.
fixes: #2116
Change-Id: I645e083076c205aa23b219abd0de652f7d95dca7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* glusterd-volgen: Add functionality to accept any custom xlator
Add new function which allow users to insert any custom xlators.
It makes to provide a way to add any processing into file operations.
Users can deploy the plugin(xlator shared object) and integrate it to glusterfsd.
If users want to enable a custom xlator, do the follows:
1. put xlator object(.so file) into "XLATOR_DIR/user/"
2. set the option user.xlator.<xlator> to the existing xlator-name to specify of the position in graph
3. restart gluster volume
Options for custom xlator are able to set in "user.xlator.<xlator>.<optkey>".
Fixes: #1943
Signed-off-by:Ryo Furuhashi <ryo.furuhashi.nh@hitachi.com>
Co-authored-by: Yaniv Kaul <ykaul@redhat.com>
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
// is for C and C++, shell use #. Vim syntax coloration is
misleading.
This displayed in each jenkins log:
./tests/00-geo-rep/../include.rc: line 1: //: is a folder
Likely no impact besides a wrong warning.
Fix #2093
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scenario:
1) decommission start: the option decommissioned-bricks is added to the vol file
and being parsed by dht.
2) another configuration change (like setting a new loglevel): the decommissioned-bricks option
still exists on the vol file and being parsed again, this leads to invalid data.
Fix:
Prevent the parsing of "decommissioned-bricks" when decommission is running.
This counts on the fact that once a decommission is running it cannot be started again.
Fixes: #1992
Change-Id: I7a016750e2f865aee4cd620bd9033ec19421d47d
Signed-off-by: Tamar Shacked <tshacked@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
fix-layout operation assumes that the directory passed is directory i.e.
layout->cnt == conf->subvolume_cnt. This will lead to a crash when
fix-layout is attempted on a file.
Fix:
Disallow fix-layout on files
fixes: #2107
Change-Id: I2116b8773059f67e3260e9207e20eab3de711417
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
| |
At the moment self-heal-window-size is 128KB. This leads to healing data
in 128KB chunks. With the growth of data and the avg file sizes
nowadays, 1MB seems like a better default.
Change-Id: I70c42c83b16c7adb53d6b5762969e878477efb5c
Fixes: #2067
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: Iec16d7ff5e05f29255491a43fbb6270c72868999
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I07e5a5bf9d33c24b63da72d4f3f59392c5421652
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I3679de8545f2e5b8027c4d5a6fd0592092e8dfbd
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* Update xlators/storage/posix/src/posix-entry-ops.c
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* Update fd.c
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
| |
00-georep-verify-non-root-setup.t should be moved back to
tests/00-geo-rep/ from tests/000-flaky/ directory as the recent
failures encountered on the stated test case were not linked
to the test case instead they were linked to installed
libtirpc in the build environment.
Fixes: #2101
Change-Id: I2b35e9ed95ad3de68ad8574ff76805f5db64c0b2
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spurious failures of 00-georep-verify-non-root-setup.t
seen only on build machines. These failures are not
reproducible even on softserve / centos / fedora machines.
So, moving test 00-georep-verify-non-root-setup.t to
tests/000-flaky/ untill the issue is RCAd at build
machines.
Fixes: #2086
Change-Id: Id1eab598fa0f9ba5ba019e6b3f057a5b10fdb0ea
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we hit the max capacity of the storage space, shard_unlink()
starts failing if there is no space left on the brick to create a
marker file.
shard_unlink() happens in below steps:
1. create a marker file in the name of gfid of the base file under
BRICK_PATH/.shard/.remove_me
2. unlink the base file
3. shard_delete_shards() deletes the shards in background by
picking the entries in BRICK_PATH/.shard/.remove_me
If a marker file creation fails then we can't really delete the
shards which eventually a problem for user who is looking to make
space by deleting unwanted data.
Solution:
Create the marker file by marking xdata = GLUSTERFS_INTERNAL_FOP_KEY
which is considered to be internal op and allowed to create under
reserved space.
Fixes: #2038
Change-Id: I7facebab940f9aeee81d489df429e00ef4fb7c5d
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
|
|
|
|
|
|
| |
This reverts commit 50e953e2450b5183988c12e87bdfbc997e0ad8a8.
Fixes: #2052
Change-Id: Ic0670a63423b5d79c3d48001e18910b1dbf7e98d
|
|
|
|
|
|
|
|
| |
Do not allow changing storage.linux-aio for running volume,
cleanup nearby storage.linux-io_uring error message as well.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #2039
|
|
|
|
|
|
|
|
|
| |
CID 1430124
A negative value is being passed to a parameter hat cannot be negative.
Modified the value which is being passed.
Change-Id: I06dca105f7a78ae16145b0876910851fb631e366
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* tests: fix tests/bugs/nfs/bug-1053579.t
On NFS the number of groups associated to a user that can be passed
to the server is limited. This test created a user with 200 groups
and checked that a file owned by the latest created group couldn't
be accessed, under the assumption that the last group won't be passed
to the server.
However there's no guarantee on how the list of groups is generated,
so the latest created group could be passed as one of the initial
groups, making the allowing access to the file and causing the test
to fail (because it was expecting to not be possible).
Given that there's no way to be sure which groups will be passed, this
patch changes the test so that a check is done for each group the user
belongs to. Then we check that there have been some successes and some
failures.
Once 'manage-gids' is set, we do the same, but this time the number of
failures must be 0.
Fixes: #2033
Change-Id: Ide06da2861fcade2166372d1f3e9eb4ff2dd5f58
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
CID 1291733
The return value of the method pthread_cancel was not being checked.
Added a retun value check and proper error handling.
Change-Id: I8c52b0e462461fc59718deb3b7c2f1b4e55613c7
updates: #1060
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* glusterd - fixing coverity issues
- Dereference after null check (CID 1437686)
- Dereference null return value (CID 1437687)
- A check for the return value of a memory allocation was missing, added
it.
- A value of a pointer was being dereferenced after a NULL-pointer check.
With this change the pointer is no longer dereferenced.
Change-Id: I10bf8a2cb08612981dbb788315dad7dbb4efe2cb
updates: #1060
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Implement GF_FOP_FSYNC using io_submit() with IOCB_CMD_FSYNC
and IOCB_CMD_FDSYNC operations.
Refactor common code to posix_aio_cb_init() and posix_aio_cb_fini()
as suggested by Ravishankar N.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #1952
|
|
|
|
|
|
|
| |
'this' pointer was being dereferenced after null check. This change avoids it.
Change-Id: I7dedee44c08df481d2a037eb601f3d5c4d9284f5
Updates: #1060
Signed-off-by: Nishith Vihar Sakinala <nsakinal@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Remove the unneeded backslash for glusterfs manpage, so
we can get "PATH" instead of "PATHR":
--dump-fuse=PATHR -> --dump-fuse=PATH
Updates: #1000
Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
|
|
|
|
|
|
|
|
|
|
|
| |
DHT was passing NULL for xdata in fgetxattr() request, ignoring any
data sent by upper xlators.
This patch fixes the issue by sending the received xdata to lower
xlators, as it's currently done for getxattr().
Fixes: #1991
Change-Id: If3d3f1f2ce6215f3b1acc46480e133cb4294eaec
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* locks: remove ununsed conditional switch to spin_lock code
use of spin_locks is depend on the variable use_spinlocks
but the same is commented in the current code base through
https://review.gluster.org/#/c/glusterfs/+/14763/. So it is
of no use to have conditional switching to spin_lock or
mutex. Removing the dead code as part of the patch
Fixes: #1996
Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* locks: remove unused conditional switch to spin_lock code
use of spin_locks is depend on the variable use_spinlocks
but the same is commented in the current code base through
https://review.gluster.org/#/c/glusterfs/+/14763/. So it is
of no use to have conditional switching to spin_lock or
mutex. Removing the dead code as part of the patch
Fixes: #1996
Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* locks: remove unused conditional switch to spin_lock code
use of spin_locks is depend on the variable use_spinlocks
but the same is commented in the current code base through
https://review.gluster.org/#/c/glusterfs/+/14763/. So it is
of no use to have conditional switching to spin_lock or
mutex. Removing the dead code as part of the patch
Fixes: #1996
Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
|
|
|
|
|
|
|
| |
Enhance dict_reset() imp by deleting all elements using iteration
Fixes: #1536
Change-Id: Ib4d4f80bd30d52c891eb0fd4b563db19134e2328
Signed-off-by: Tamar Shacked <tshacked@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This patch fixes redundant checks done while calling
shard_modify_size_and_block_count.
Fixes: #1703
Change-Id: I735e532c78cbb181afa4b51480ad742ef4a75f77
Signed-off-by: Rinku Kothiya rkothiya@redhat.com
|
|
|
|
|
|
|
|
|
|
| |
Removing extra unused type.
Removing leftovers from the RDMA
Fixes: #904
Change-Id: Id5d28622120578b7076d112e355ad8df116021dd
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* glusterd: Removing redundant NULL checks for this
Issue: It has been noticed that the NULL checks performed
on `this` are actually being done on `THIS` as `this` is
derived from `THIS`. If the `THIS` had been NULL, the
crash would have happened before itself.
Fix: Basically removing the validations and assertion
functions which check if `this` is NULL.
Fixes: #1596
Signed-off-by: srijan-sivakumar <ssivakum@redhat.com>
* Made changes wrt review comments received.
Fixes: #1596
Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com>
* glusterd: The efficient usage of `THIS` and `this`.
This commit addresses the review comments and tries to
change code in more places wherein the `THIS` and `this` can
be handled efficiently.
Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com>
* Updated commit to address review comments.
Updates: #1596
Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com>
* Addressing Review comments.
Updates: #1596
Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com>
* Made changes after regression failure.
Updates: #1596
Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com>
* One has to be careful while working with c
Instead of a `||` operation, the cleanup left out
with `|`. Does the compiler ceck for these things?
Updates: #1596
Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com>
* Fixing clang-format issues.
Change-Id: I68c52249af66080f59f57e558901f2654bd43cd8
Updates: #1596
Signed-off-by: srijan-sivakumar <ssivakumar@redhat.com>
Co-authored-by: srijan-sivakumar <ssivakumar@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Remove memcpy and/or byte order conversions when fetching values from
the dictionary.
Fixes: #504
Change-Id: Idf2367bac8cc592c419a11ea751495e1c664ec4d
Reported-by: Yaniv Kaul <ykaul@redhat.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
| |
The fops(posix_seek, posix_open, posix_readv) are calling
posix_fdstat even cloud sync is not enabled, for these specific
fops prestat is use by only cloud specific function(posix_cs_maintenance)
Fixes: #1981
Change-Id: I4d3b6c41e88925456d2f957aba6b1d2441904f73
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Some callers of this function do not require that the allocated buffer
be zeroed out. Use GF_MALLOC instead of GF_CALLOC for such cases.
- posix_rchecksum seems to be using the incorrect bufer size for
computing the checksum. Fixed it.
Updates: #1885
Reported-by: Yaniv Kaul <ykaul@redhat.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Change-Id: I44413b1efd7b69d3a4d318639d5ebdb38a99af7f
|
|
|
|
|
|
|
|
|
| |
In few functions 'THIS' is called inside a loop and saved for later
use in 'old_THIS'. Instead we can call 'THIS' only when 'old_THIS'
is NULL and reuse that itself to reduce redundant calls.
Change-Id: Ie5d4e5fe42bd4df02d101b4c199759cb84e6aee1
Fixes: #1755
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test case (./tests/bugs/replicate/bug-921231.t )
is continuously failing.The test case is failing because
inodelk_max_latency is showing wrong value in profile.
The value is not correct because recently the profile
timestamp is changed from microsec to nanosec from
the patch #1833.
Fixes: #2005
Change-Id: Ieb683836938d986b56f70b2380103efe95657821
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue: When GlusterCmdException is raised, current code try to access
message atrribute which doesn't exist and resulting in a malformed
error string on failure operations
Code Change: Replace `message` with `args[0]`
Fixes: #2001
Change-Id: I65c9f0ee79310937a384025b8d454acda154e4bb
Signed-off-by: Leela Venkaiah G <lgangava@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Printing trace can fail due to memory allocation issues
this patch avoids that.
Fixes: #1966
Change-Id: I14157303a2ff5d19de0e4ece0a460ff0cbd58c26
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The issue is shard_make_block_abspath() calls gf_uuid_unparse()
every time while constructing shard path. The gfid can be parsed
and saved once and passed while constructing the path. Thus
we can avoid calling gf_uuid_unparse().
Fixes: #1423
Change-Id: Ia26fbd5f09e812bbad9e5715242f14143c013c9c
Signed-off-by: Vinayakswami Hariharmath vharihar@redhat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue: The schedule_geo-rep script uses `func_name` to obtain
the name of the function being referred to but from python3
onwards, the attribute has been changed to `__name__`.
Code Change:
Changing `func_name` to `__name__`.
Fixes: #1898
Change-Id: I4ed69a06cffed9db17c8f8949b8000c74be1d717
Signed-off-by: srijan-sivakumar <ssivakum@redhat.com>
Co-authored-by: srijan-sivakumar <ssivakumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* glusterd: fix resource leak
Change-Id: I03b4ad477b70eeeda387ff0d161d08a7353f147e
CID: 1438341, 1438342
Updates: #1060
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* Add check for resource leak
Change-Id: If34c8074fa4b70184d8103fd4d09695c84b907f5
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
|
|
|
|
|
|
|
| |
Fixes: #1380
Change-Id: I68bb46d2cf8b41c8e709fbeee4778e3cdfc2d46c
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test case ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
is getting crashed at the time of detaching a brick.The brick
process is getting crashed because there is a race condition
to send a disconnect on rpc associated with victim brick and
handling GF_EVENT_CLEANUP for the victim brick.
Solution: Save victim_name on local variable to avoid crash.
Fixes: #1978
Change-Id: I76877f20b6ac0eecc39f1fa7d82afc9744dc5e04
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
| |
Change-Id: Ib38993724c709b35b603f9ac666630c50c932c3e
Fixes: #1406
Signed-off-by: nik-redhat <nladha@redhat.com>
|
|
|
|
|
|
|
|
| |
LTO isn't added to the build when it is
configured with "--enable-debug"
Fixes: #1772
Change-Id: I87300d950871bdda6542d9bbfb6bdffd500585cc
Signed-off-by: Tamar Shacked <tshacked@redhat.com>
|