| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Replace offensive variables with the values the Linux kernel uses.
2. Introduce an internal function - _list_del() that can be used
when list->next and list->prev are going to be assigned later on.
(too bad in the code we do not have enough uses of list_move() and
list_move() tail, btw. Would have contributed also to code readability)
* list.h: defined LIST_POSION1, LIST_POISION2 similar to Linux kernel
defines
Fixes: #2025
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
On a cluster with 15 million files, when fix-layout was started, it was
not progressing at all. So we tried to do a os.walk() + os.stat() on the
backend filesystem directly. It took 2.5 days. We removed os.stat() and
re-ran it on another brick with similar data-set. It took 15 minutes. We
realized that readdirp is extremely costly compared to readdir if the
stat is not useful. fix-layout operation only needs to know that the
entry is a directory so that fix-layout operation can be triggered on
it. Most of the modern filesystems provide this information in readdir
operation. We don't need readdirp i.e. readdir+stat.
Fix:
Use readdir operation in fix-layout. Do readdir+stat/lookup for
filesystems that don't provide d_type in readdir operation.
fixes: #2241
Change-Id: I5fe2ecea25a399ad58e31a2e322caf69fc7f49eb
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* fuse: add an option to specify the mount display name
There are two things this PR is fixing.
1. When a mount is specified with volfile (-f) option, today, you can't make it out its from glusterfs as only volfile is added as 'fsname', so we add it as 'glusterfs:/<volname>'.
2. Provide an options for admins who wants to show the source of mount other than default (useful when one is not providing 'mount.glusterfs', but using their own scripts.
Updates: #1000
Change-Id: I19e78f309a33807dc5f1d1608a300d93c9996a2f
Signed-off-by: Amar Tumballi <amar@kadalu.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We only need passive and active lists, there's no need for a full
iobuf variable.
Also ensured passive_list is before active_list, as it's always accessed
first.
Note: this almost brings us to using 2 cachelines only for that structure.
We can easily make other variables smaller (page_size could be 4 bytes) and fit
exactly 2 cache lines.
Fixes: #2096
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
| |
After glibc 2.32, lchmod() is returning EOPNOTSUPP instead of ENOSYS when
called on symlinks. The man page says that the returned code is ENOTSUP.
They are the same in linux, but this patch correctly handles all errors.
Fixes: #2154
Change-Id: Ib3bb3d86d421cba3d7ec8d66b6beb131ef6e0925
Signed-off-by: Xavi Hernandez xhernandez@redhat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Removed unused ref_count variable
- Reordered the struct to get related variables closer together.
- Changed 'complete' from a '_Bool' to a 'int32_t'
Before:
```
struct _call_frame {
call_stack_t * root; /* 0 8 */
call_frame_t * parent; /* 8 8 */
struct list_head frames; /* 16 16 */
void * local; /* 32 8 */
xlator_t * this; /* 40 8 */
ret_fn_t ret; /* 48 8 */
int32_t ref_count; /* 56 4 */
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
gf_lock_t lock; /* 64 40 */
void * cookie; /* 104 8 */
_Bool complete; /* 112 1 */
/* XXX 3 bytes hole, try to pack */
glusterfs_fop_t op; /* 116 4 */
struct timespec begin; /* 120 16 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
struct timespec end; /* 136 16 */
const char * wind_from; /* 152 8 */
const char * wind_to; /* 160 8 */
const char * unwind_from; /* 168 8 */
const char * unwind_to; /* 176 8 */
/* size: 184, cachelines: 3, members: 17 */
/* sum members: 177, holes: 2, sum holes: 7 */
/* last cacheline: 56 bytes */
```
After:
```
struct _call_frame {
call_stack_t * root; /* 0 8 */
call_frame_t * parent; /* 8 8 */
struct list_head frames; /* 16 16 */
struct timespec begin; /* 32 16 */
struct timespec end; /* 48 16 */
/* --- cacheline 1 boundary (64 bytes) --- */
void * local; /* 64 8 */
gf_lock_t lock; /* 72 40 */
void * cookie; /* 112 8 */
xlator_t * this; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
ret_fn_t ret; /* 128 8 */
glusterfs_fop_t op; /* 136 4 */
int32_t complete; /* 140 4 */
const char * wind_from; /* 144 8 */
const char * wind_to; /* 152 8 */
const char * unwind_from; /* 160 8 */
const char * unwind_to; /* 168 8 */
/* size: 176, cachelines: 3, members: 16 */
/* last cacheline: 48 bytes */
```
Fixes: #2130
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* syncop: introduce microsecond sleep support
Introduce microsecond sleep function synctask_usleep,
which can be used to improve precision instead of synctask_sleep.
Change-Id: Ie7a15dda4afc09828bfbee13cb8683713d7902de
* glusterd: use synctask_usleep in glusterd_proc_stop()
glusterd_proc_stop() sleep 1s for proc stop before force kill.
but in most cases, process can be stopped in 100ms.
This patch use synctask_usleep to check proc running state
every 100ms instead of sleep 1, can reduce up to 1s stop time.
in some cases like enable 100 volumes quota, average execution
time reduced from 2500ms to 500ms.
fixes: #2116
Change-Id: I645e083076c205aa23b219abd0de652f7d95dca7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: Iec16d7ff5e05f29255491a43fbb6270c72868999
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I07e5a5bf9d33c24b63da72d4f3f59392c5421652
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I3679de8545f2e5b8027c4d5a6fd0592092e8dfbd
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* Update xlators/storage/posix/src/posix-entry-ops.c
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* Update fd.c
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
| |
CID 1430124
A negative value is being passed to a parameter hat cannot be negative.
Modified the value which is being passed.
Change-Id: I06dca105f7a78ae16145b0876910851fb631e366
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* locks: remove ununsed conditional switch to spin_lock code
use of spin_locks is depend on the variable use_spinlocks
but the same is commented in the current code base through
https://review.gluster.org/#/c/glusterfs/+/14763/. So it is
of no use to have conditional switching to spin_lock or
mutex. Removing the dead code as part of the patch
Fixes: #1996
Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* locks: remove unused conditional switch to spin_lock code
use of spin_locks is depend on the variable use_spinlocks
but the same is commented in the current code base through
https://review.gluster.org/#/c/glusterfs/+/14763/. So it is
of no use to have conditional switching to spin_lock or
mutex. Removing the dead code as part of the patch
Fixes: #1996
Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* locks: remove unused conditional switch to spin_lock code
use of spin_locks is depend on the variable use_spinlocks
but the same is commented in the current code base through
https://review.gluster.org/#/c/glusterfs/+/14763/. So it is
of no use to have conditional switching to spin_lock or
mutex. Removing the dead code as part of the patch
Fixes: #1996
Change-Id: Ib005dd86969ce33d3409164ef3e1011bb3169129
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Removing extra unused type.
Removing leftovers from the RDMA
Fixes: #904
Change-Id: Id5d28622120578b7076d112e355ad8df116021dd
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* posix: avoiding redundant access of dictionary
This patch fixes the redundant access of dictionary
for the same information by the macro PL_LOCAL_GET_REQUESTS
fixes: #1707
Change-Id: I48047537436ce920e74bc11cecd9773d7fe4457c
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* posix: avoiding redundant access of dictionary
- Converted the macro SET_BIT to function set_bit
- Removed the code to delete the key GLUSTERFS_INODELK_DOM_COUNT
- Assigned the value to local->bitfield
Change-Id: I101f3fda65e9e75e05907d671203c5d7f072fa8f
Fixes: #1707
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* posix: avoiding redundant access of dictionary
deleted GLUSTERFS_INODELK_DOM_COUNT key
Change-Id: I638269e6a9f6fc11351eaede4c103e032881fe12
Fixes: #1707
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* posix: avoiding redundant access of dictionary
Smoke test warnings fixed.
Fixes: #1707
Change-Id: I8682bd0e49f44cbc1442324e1756b56481f18ccd
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- In 'BUMP_THROUGHPUT' macro changed 'elapsed' from microseconds to nanoseconds.
- In 'update_ios_latency' function 'elapsed' is now in nanoseconds.
- In 'collect_ios_latency_sample' function removed conversion from nano to micro,
instead directly assigned as 'nano' to 'tv_nsec' of 'timespec' macro
- in 'ios_sample_t' macro changed 'timeval' to 'timespec' to support above change.
- In '_io_stats_write_latency_sample' function changed formula to from 1e+6 to 1e+9
since 'ios_sample_t' macro now has 'timespec'
- In 'BUMP_THROUGHPUT','_ios_sample_t','collect_ios_latency_sample' & update_ios_latency'
changed 'elapsed' datatype from 'double' to 'int64_t'
- In glusterfs/libglusterfs/src/glusterfs/common-utils.h changed return type of
'gf_tsdiff' function from 'double' to 'int64_t' since it can return negative values.
- In glusterfs/libglusterfs/src/latency.c, libglusterfs/src/glusterfs/common-utils.h,
xlators/debug/io-stats/src/io-stats.c & xlators/storage/posix/src/posix-helpers.c
'elapsed' is now of type 'int64_t'
Fixes: #1825
Signed-off-by: Shree Vatsa N <vatsa@kadalu.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* core: Implement gracefull shutdown for a brick process
glusterd sends a SIGTERM to brick process at the time
of stopping a volume if brick_mux is not enabled.In case
of brick_mux at the time of getting a terminate signal
for last brick a brick process sends a SIGTERM to own
process for stop a brick process.The current approach
does not cleanup resources in case of either last brick
is detached or brick_mux is not enabled.
Solution: glusterd sends a terminate notification to a
brick process at the time of stopping a volume for gracefull
shutdown
Change-Id: I49b729e1205e75760f6eff9bf6803ed0dbf876ae
Fixes: #1749
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* core: Implement gracefull shutdown for a brick process
Resolve some reviwere comment
Fixes: #1749
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I50e6a9e2ec86256b349aef5b127cc5bbf32d2561
* core: Implement graceful shutdown for a brick process
Implement a key cluster.brick-graceful-cleanup to enable graceful
shutdown for a brick process.If key value is on glusterd sends a
detach request to stop the brick.
Fixes: #1749
Change-Id: Iba8fb27ba15cc37ecd3eb48f0ea8f981633465c3
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* core: Implement graceful shutdown for a brick process
Resolve reviewer comments
Fixes: #1749
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I2a8eb4cf25cd8fca98d099889e4cae3954c8579e
* core: Implement gracefull shutdown for a brick process
Resolve reviewer comment specific to avoid memory leak
Fixes: #1749
Change-Id: Ic2f09efe6190fd3776f712afc2d49b4e63de7d1f
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* core: Implement gracefull shutdown for a brick process
Resolve reviewer comment specific to avoid memory leak
Fixes: #1749
Change-Id: I68fbbb39160a4595fb8b1b19836f44b356e89716
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
| |
fixes: #1888
Change-Id: Ibe336f6f7f19cd148523f65b6fa2b81dca1bd7b6
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* core: Convert mem_get(0) and mem_put functions to Macors
Problem: Currently mem_get(0) and mem_put functions access
memory pools those are not required while mem-pool
is disabled.
Change-Id: Ief9bdaeb8637f5bc2b097eb6099fb942130e08ae
Solution: Convert mem_get(0) functions as a Macros
Fixes: #1359
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* core: Convert mem_get(0) and mem_put functions to Macors
Resolver reviewer comments
Fixes: #1359
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I8dfdfc1a1cd9906e442271abefc7a635e632581e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* core: Optimize _xlator->stats structure to make memory access friendly
Current xlator->stats is not efficient for frequently access memroy
variables, to make it friendly optimize stats structure.
Fixes: #1583
Change-Id: I5c9d263b11d9bbf0bf5501e461bdd3cce03591f9
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* core: Optimize _xlator->stats structure to make memory access friendly
Resolve reviewer comments
Fixes: #1583
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I44a728263bfc397158dc95e4a9bae393fd3c9883
* core: Optimize _xlator->stats structure to make memory access friendly
Resolve reviewer comments
Fixes: #1583
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I55e093e3f639052644ce6379cbbe2a15b0ef4be7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a part of offensive language removal, we changed 'master' to 'primary' in
some parts of the code that are *not* related to geo-replication via
commits e4c9a14429c51d8d059287c2a2c7a76a5116a362 and
0fd92465333be674485b984e54b08df3e431bb0d.
But it is better to use 'root' in some places to distinguish it from the
geo-rep changes which use 'primary/secondary' instead of 'master/slave'.
This patch mainly changes glusterfs_ctx_t->primary to
glusterfs_ctx_t->root. Other places like meta xlator is also changed.
gf-changelog.c is not changed since it is related to geo-rep.
Updates: #1000
Change-Id: I3cd610f7bea06c7a28ae2c0104f34291023d1daf
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* runner: moving to posix_spawnp instead of fork
Removed the fork(), and implemented the
posix_spwanp() acccrodingly, as it provides much better
performance than fork(). More detailed description about
the benefits can be found in the description of the issue
linked below.
Fixes:#810
Signed-off-by: nik-redhat <nladha@redhat.com>
* Added the close_fds_except call
Signed-off-by: nik-redhat <nladha@redhat.com>
* Added comments
Signed-off-by: nik-redhat <nladha@redhat.com>
* Made the functions static
Signed-off-by: nik-redhat <nladha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During glusterd handshake glusterd received a volume dictionary
from peer end to compare the own volume dictionary data.If the options
are differ it sets the key to recognize volume options are changed
and call import syntask to delete/start the volume.In brick_mux
environment while number of volumes are high(5k) the dict api in function
glusterd_compare_friend_volume takes time because the function
glusterd_handle_friend_req saves all peer volume data in a single dictionary.
Due to time taken by the function glusterd_handle_friend RPC requests receives
a call_bail from a peer end gluster(CLI) won't be able to show volume status.
Solution: To optimize the code done below changes
1) Populate a new specific dictionary to save the peer end version specific
data so that function won't take much time to take the decision about the
peer end has some volume updates.
2) In case of volume has differ version set the key in status_arr instead
of saving in a dictionary to make the operation is faster.
Note: To validate the changes followed below procedure
1) Setup 5100 distributed volumes 3x1
2) Enable brick_mux
3) Start all the volumes
4) Kill all gluster processes on 3rd node
5) Run a loop to update volume option on a 1st node
for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done
6) Start the glusterd process on the 3rd node
7) Wait to finish handshake and check there should not be any call_bail message
in the logs
Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1
Fixes: #1613
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* core: tcmu-runner process continuous growing logs lru_size showing -1
At the time of calling inode_table_prune it checks if current lru_size
is greater than lru_limit but lru_list is empty it throws a log message
"Empty inode lru list found but with (%d) lru_size".As per code reading
it seems lru_size is out of sync with the actual number of inodes in
lru_list. Due to throwing continuous error messages entire disk is
getting full and the user has to restart the tcmu-runner process to use
the volumes.The log message was introduce by a patch
https://review.gluster.org/#/c/glusterfs/+/15087/.
Solution: Introduce a flag in_lru_list to take decision about inode is
being part of lru_list or not.
Fixes: #1775
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I4b836bebf4b5db65fbf88ff41c6c88f4a7ac55c1
* core: tcmu-runner process continuous growing logs lru_size showing -1
Update in_lru_list flag only while modify lru_size
Fixes: #1775
Change-Id: I3bea1c6e748b4f50437999bae59edeb3d7677f47
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* core: tcmu-runner process continuous growing logs lru_size showing -1
Resolve comments in inode_table_destroy and inode_table_prune
Fixes: #1775
Change-Id: I5aa4d8c254f0fe374daa5ec604f643dea8dd56ff
Signed-off-by: Mohit Agrawal moagrawa@redhat.com
* core: tcmu-runner process continuous growing logs lru_size showing -1
Update in_lru_list only while update lru_size
Fixes: #1775
Change-Id: I950eb1f0010c3d4bcc44a33225a502d2291d1a83
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#1814)
Comments and idea proposed by: Xavi Hernandez(jahernan@redhat.com):
On production systems sometimes we see a log message saying that an assertion
has failed. But it's hard to track why it failed without additional information
(on debug builds, a GF_ASSERT() generates a core dump and kills the process,
so it can be used to debug the issue, but many times we are only able to
reproduce assertion failures on production systems, where GF_ASSERT() only logs
a message and continues).
In other cases we may have a core dump caused by a bug, but the core dump doesn't
necessarily happen when the bug has happened. Sometimes the crash happens so much
later that the causes that triggered the bug are lost. In these cases we can add
more assertions to the places that touch the potential candidates to cause the bug,
but the only thing we'll get is a log message, which may not be enough.
One solution would be to always generate a core dump in case of assertion failure,
but this was already discussed and it was decided that it was too drastic. If a
core dump was really needed, a new macro was created to do so: GF_ABORT(),
but GF_ASSERT() would continue to not kill the process on production systems.
I'm proposing to modify GF_ASSERT() on production builds so that it conditionally
triggers a signal when a debugger is attached. When this happens, the debugger
will generate a core dump and continue the process as if nothing had happened.
If there's no debugger attached, GF_ASSERT() will behave as always.
The idea I have is to use SIGCONT to do that. This signal is harmless, so we can
unmask it (we currently mask all unneeded signals) and raise it inside a GF_ASSERT()
when some global variable is set to true.
To produce the core dump, run the script under extras/debug/gfcore.py on other
terminal. gdb breaks and produces coredump when GF_ASSERT is hit.
The script is copied from #1810 which is written by Xavi Hernandez(jahernan@redhat.com)
Fixes: #1810
Change-Id: I6566ca2cae15501d8835c36f56be4c6950cb2a53
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently posix xlator spawns posix_disk_space_threads per brick and in
case of brick_mux environment while glusterd attached bricks at maximum
level(250) with a single brick process in that case 250 threads are
spawned for all bricks and brick process memory size also increased.
Solution: Attach a posix_disk_space thread with glusterfs_ctx to
spawn a thread per process basis instead of spawning a per brick
Fixes: #1482
Change-Id: I8dd88f252a950495b71742e2a7588bd5bb019ec7
Signed-off-by: Mohit Agrawal moagrawa@redhat.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
core:change xlator_t->ctx->master to xlator_t->ctx->primary
afr: just changed comments.
meta: change .meta/master to .meta/primary. Might break scripts.
changelog: variable/function name changes only.
These are unrelated to geo-rep.
Fixes: #1713
Change-Id: I58eb5fcd75d65fc8269633acc41313503dccf5ff
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
issue: gf_store_read_and_tokenize() returns the address
of the locally referred string.
fix: pass the buf to gf_store_read_and_tokenize() and
use it for tokenize.
CID: 1430143
Updates: #1060
Change-Id: Ifc346540c263f58f4014ba2ba8c1d491c20ac609
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In brick_mux environment a shd process consume high memory.
After print the statedump i have found it allocates 1M per afr xlator
for all bricks.In case of configure 4k volumes it consumes almost total
6G RSS size in which 4G consumes by inode_tables
[cluster/replicate.test1-replicate-0 - usage-type gf_common_mt_list_head memusage]
size=1273488
num_allocs=2
max_size=1273488
max_num_allocs=2
total_allocs=2
inode_new_table function allocates memory(1M) for a list of inode and dentry hash.
For shd lru_limit size is 1 so we don't need to create a big hash table so to reduce
RSS size for shd process pass optimum bucket count at the time of creating inode_table.
Change-Id: I039716d42321a232fdee1ee8fd50295e638715bb
Fixes: #1538
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Added latency tracking of rpc-handling code. With this change we
should be able to monitor the amount of time rpc-handling code is
consuming for each of the rpc call.
fixes: #1466
Change-Id: I04fc7f3b12bfa5053c0fc36885f271cb78f581cd
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Extend '-enable-valgrind' to '--enable=valgrind[=memcheck,drd]'
to enable Memcheck or DRD Valgrind tool, respectively.
Change-Id: I80d13d72ba9756e0cbcdbeb6766b5c98e3e8c002
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #1002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
gf_rev_dns_lookup_cached() allocated struct dnscache->dict if it was null
but the freeing was left to the caller.
Fix:
Moved dict allocation and freeing into corresponding init and fini
routines so that its easier for the caller to avoid such leaks.
Updates: #1000
Change-Id: I90d6a6f85ca2dd4fe0ab461177aaa9ac9c1fbcf9
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterfs so far constrained itself with an arbitrary limit (32)
for the number of groups read from /proc/[pid]/status (this was
the number of groups shown there prior to Linux commit
v3.7-9553-g8d238027b87e (v3.8-rc1~74^2~59); since this commit, all
groups are shown).
With this change we'll read groups up to the number Glusterfs
supports in general (64k).
Note: the actual number of groups that are made use of in a
regular Glusterfs setup shall still be capped at ~93 due to limitations
of the RPC transport. To be able to handle more groups than that,
brick side gid resolution (server.manage-gids option) can be used along
with NIS, LDAP or other such networked directory service (see
https://github.com/gluster/glusterdocs/blob/5ba15a2/docs/Administrator%20Guide/Handling-of-users-with-many-groups.md#limit-in-the-glusterfs-protocol
).
Also adding some diagnostic messages to frame_fill_groups().
Change-Id: I271f3dc3e6d3c44d6d989c7a2073ea5f16c26ee0
fixes: #1075
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Add gf_tvdiff() and gf_tsdiff() to calculate the difference
between 'struct timeval' and 'struct timespec' values, use
them where appropriate.
Change-Id: I172be06ee84e99a1da76847c15e5ea3fbc059338
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #1002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In the commit fb20713b380e1df8d7f9e9df96563be2f9144fd6 we use
syntask to close fd but we have found the patch is reducing the
performance
Solution: Use janitor thread to close fd's and save the pfd ctx into
ctx janitor list and also save the posix_xlator into pfd object to
avoid the race condition during cleanup in brick_mux environment
Change-Id: Ifb3d18a854b267333a3a9e39845bfefb83fbc092
Fixes: #1396
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
| |
fixes: #1428
Change-Id: I0cb1c42d620ac1aeab8da25a2e1d7835219d2e4a
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
|
|
|
|
|
| |
Change-Id: Ieebd9a54307813954011ac8833824831dce6da10
Fixes: #1376
|
|
|
|
|
|
|
|
|
| |
Use trivial no-op mempool if configured with --disable-mempool.
Cleanup OLD_MEM_POOLS leftovers, adjust related statedumps.
Change-Id: Ibaa90e538a34f6dcd216e45c05dd32d955b151f6
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1359
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue:
If the the pointer tmptier is destroyed in the function
code it still it checks for the same in the out label.
And tries to destroy the same pointer again.
Fix:
So, instead of passing the ptr by value, if we
pass it by reference then, on making the ptr in the
function the value will persist, in the calling
function and next time when the gf_store_iter_destory()
is called it won't try to free the ptr again.
CID: 1430122
Updates: #1060
Change-Id: I019cea8e301c7cc87be792c03b58722fc96f04ef
Signed-off-by: nik-redhat <nladha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Add thin convenient library wrapper gf_time(),
adjust related users and comments as well.
Change-Id: If8969af2f45ee69c30c3406bce5baa8305fb7f80
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #1002
|
|
|
|
|
|
|
|
|
|
| |
If --enable-tsan is specified and API headers are detected,
annotate synctask context initialization and context switch
with ThreadSanitizer API.
Change-Id: I7ac4085d7ed055448f1fc80df7d59905d129f186
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1400
|
|
|
|
|
|
|
|
|
|
|
|
| |
To return the length of searlized dictionary the function
iterates full dictionary and access every key value member
of the dictionary.Instead of iterating full dictionary introduce
a variable totkvlen at dictionary to save the key value length
at the time of storing key/value pair in the dictionary.
Change-Id: Ie8cfdea1cc335bce51f59179281df3c89afab68b
Fixes: #1395
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
| |
Fixes: #1399
Change-Id: I11cf75a0ea9a16724f36f73feb1c90dabed25c4b
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scenario of setting an xattr to a dir, killing one of the bricks,
removing the xattr, bringing back the brick results in xattr
inconsistency - The downed brick will still have the xattr, but the rest
won't.
This patch add a mechanism that will remove the extra xattrs during
lookup.
This patch is a modification to a previous patch based on comments that
were made after merge:
https://review.gluster.org/#/c/glusterfs/+/24613/
fixes: #1324
Change-Id: Ifec0b7aea6cd40daa8b0319b881191cf83e031d1
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
These macros helps to clearly identify all negetive checks are 'errors',
and all 0 and above are success. With this clasification, we can add more
error codes to the process / method / function.
Updates: #280
Change-Id: I0ebc5c4ad41eb78e4f2c1b98648986be62e7b521
Signed-off-by: Amar Tumballi <amar@kadalu.io>
|
|
|
|
|
|
|
|
|
| |
Replace an over-engineered GF_SKIP_IRRELEVANT_ENTRIES() with
inline function gf_irrelevant_entry(), adjust related users.
Change-Id: I6f66c460f22a82dd9ebeeedc2c55fdbc10f4eec5
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1350
|
|
|
|
|
|
|
|
| |
Convert an ad-hoc hack to a regular library function gf_syncfs().
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Change-Id: I3ed93e9f28f22c273df1466ba4a458eacb8df395
Fixes: #1329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found with GCC UBsan:
rpcsvc.c:102:36: runtime error: passing zero to ctz(), which is not a valid argument
#0 0x7fcd1ff6faa4 in rpcsvc_get_free_queue_index /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:102
#1 0x7fcd1ff81e12 in rpcsvc_handle_rpc_call /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:837
#2 0x7fcd1ff833ad in rpcsvc_notify /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:1000
#3 0x7fcd1ff8829d in rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520
#4 0x7fcd0dd72f16 in socket_event_poll_in_async /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2502
#5 0x7fcd0dd8986a in gf_async ../../../../libglusterfs/src/glusterfs/async.h:189
#6 0x7fcd0dd8986a in socket_event_poll_in /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2543
#7 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2934
#8 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854
#9 0x7fcd2048aff7 in event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640
#10 0x7fcd2048aff7 in event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751
...
Fix, simplify, and prefer 'unsigned long' as underlying bitmap type.
Change-Id: If3f24dfe7bef8bc7a11a679366e219a73caeb9e4
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1283
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Logs and other output carrying timestamps
will have now timezone offsets indicated, eg.:
[2020-03-12 07:01:05.584482 +0000] I [MSGID: 106143] [glusterd-pmap.c:388:pmap_registry_remove] 0-pmap: removing brick (null) on port 49153
To this end,
- gf_time_fmt() now inserts timezone offset via %z strftime(3) template.
- A new utility function has been added, gf_time_fmt_tv(), that
takes a struct timeval pointer (*tv) instead of a time_t value to
specify the time. If tv->tv_usec is negative,
gf_time_fmt_tv(... tv ...)
is equivalent to
gf_time_fmt(... tv->tv_sec ...)
Otherwise it also inserts tv->tv_usec to the formatted string.
- Building timestamps of usec precision has been converted to
gf_time_fmt_tv, which is necessary because the method of appending
a period and the usec value to the end of the timestamp does not work
if the timestamp has zone offset, but it's also beneficial in terms of
eliminating repetition.
- The buffer passed to gf_time_fmt/gf_time_fmt_tv has been unified to
be of GF_TIMESTR_SIZE size (256). We need slightly larger buffer space
to accommodate the zone offset and it's preferable to use a buffer
which is undisputedly large enough.
This change does *not* do the following:
- Retaining a method of timestamp creation without timezone offset.
As to my understanding we don't need such backward compatibility
as the code just emits timestamps to logs and other diagnostic
texts, and doesn't do any later processing on them that would rely
on their format. An exception to this, ie. a case where timestamp
is built for internal use, is graph.c:fill_uuid(). As far as I can
see, what matters in that case is the uniqueness of the produced
string, not the format.
- Implementing a single-token (space free) timestamp format.
While some timestamp formats used to be single-token, now all of
them will include a space preceding the offset indicator. Again,
I did not see a use case where this could be significant in terms
of representation.
- Moving the codebase to a single unified timestamp format and
dropping the fmt argument of gf_time_fmt/gf_time_fmt_tv.
While the gf_timefmt_FT format is almost ubiquitous, there are
a few cases where different formats are used. I'm not convinced
there is any reason to not use gf_timefmt_FT in those cases too,
but I did not want to make a decision in this regard.
Change-Id: I0af73ab5d490cca7ed8d07a2ce7ac22a6df2920a
Updates: #837
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Using 'const' qualifier on function return type makes no effect in
ISO C, as reported with -Wignored-qualifiers of gcc-10 and clang-10.
Change-Id: I83de7ab4c8255284683bb462cd9f584ebf0f983b
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1249
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a critical flaw in the previous implementation of open-behind.
When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.
To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.
To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.
The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.
Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.
Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current scaling of the syncop thread pool is not working properly
and can leave some tasks in the run queue more time than necessary
when the maximum number of threads is not reached.
This patch provides a better scaling condition to react faster to
pending work.
Condition variables and sleep in the context of a synctask have also
been implemented. Their purpose is to replace regular condition
variables and sleeps that block synctask threads and prevent other
tasks to be executed.
The new features have been applied to several places in glusterd.
Change-Id: Ic50b7c73c104f9e41f08101a357d30b95efccfbf
Fixes: #1116
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
| |
fixes: #1204
Change-Id: Ied5d4d553771ff315ed3f1a7229f96733fe7ed00
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|