| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* glusterd: After upgrade on release 9.1 glusterd protocol is broken
After upgrade on release-9 glusterd protocol is broken
because on the upgraded nodes glusterd is not able to find an
actor at expected index in rpc procedure table.The new proc (GLUSTERD_MGMT_V3_POST_COMMIT)
was introduced from a patch(https://review.gluster.org/#/c/glusterfs/+/24771/)
in the middle due to that index of existing actor is changed on new upgraded nodes
glusterd is failing.
Solution: Change the proc(GLUSTERD_MGMT_V3_POST_COMMIT) position at
last in proc table to avoid an issue.
Fixes: #2351
Change-Id: I36575fd4302944336a75a8d4a305401a7128fd84
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem-1:
When an overlapping lock is issued the merged lock is not assigned the
owner. When flush is issued on the fd, this particular lock is not freed
leading to memory leak
Fix-1:
Assign the owner while merging the locks.
Problem-2:
On fd-destroy lock structs could be present in fdctx. For some reason
with flock -x command and closing of the bash fd, it leads to this code
path. Which leaks the lock structs.
Fix-2:
When fdctx is being destroyed in client, make sure to cleanup any lock
structs.
fixes: #2337
Change-Id: I298124213ce5a1cf2b1f1756d5e8a9745d9c0a1c
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue:
`for` loop was executed only once, leading to structural
dead code in coverity
Fix:
Updated the code to use `if` condition instead of
`for` loop for the same.
CID: 1437779
Updates: #1060
Change-Id: I2ca1d2c9d2842d586161fe971bb8c7b3444dfb2b
Signed-off-by: nik-redhat <nladha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current implementation of rebalance for sparse files has a bug that,
in some cases, causes a read of 0 bytes from the source subvolume.
Posix xlator doesn't allow 0 byte reads and fails them with EINVAL,
which causes rebalance to abort the migration.
This patch implements a more robust way of finding data segments in
a sparse file that avoids 0 byte reads, allowing the file to be
migrated successfully.
Fixes: #2317
Change-Id: Iff168dda2fb0f2edf716b21eb04cc2cc8ac3915c
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The force option does fail for snapshot create command even though
the quorum is satisfied and is redundant.
The change deprecates the force option for snapshot create command
and checks if all bricks are online instead of checking for quorum
for creating a snapshot.
Fixes: #2099
Change-Id: I45d866e67052fef982a60aebe8dec069e78015bd
Signed-off-by: Nishith Vihar Sakinala <nsakinal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* marker: initiate quota xattrs for empty volume
When a VOL is empty, it's failed to list quota info after setting limit-usage.
# gluster volume quota gv0 list
/ N/A N/A N/A N/A N/A N/A
Because there is no QUOTA_SIZE_KEY in the xattrs of the VOL directory.
# getfattr -d -m. -e hex /data/brick2/gv0
getfattr: Removing leading '/' from absolute path names
# file: data/brick2/gv0
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.mdata=0x01000000000000000000000000603e70f6000000003b3f3c8000000000603e70f6000000003351d14000000000603e70f9000000000ff95b00
trusted.glusterfs.quota.limit-set.1=0x0000000000a00000ffffffffffffffff
trusted.glusterfs.volume-id=0xe27d61be048c4195a9e1ee349775eb59
This patch fix it by setting QUOTA_SIZE_KEY for the empty VOL directory when quota enable.
# gluster volume quota gv0 list
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ 4.0MB 80%(3.2MB) 0Bytes 4.0MB No No
Fixes: #2260
Change-Id: I6ab3e43d6ef33e5ce9531b48e62fce9e8b3fc555
Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
|
|
|
|
|
|
|
|
| |
Fixing "Null pointer dereferences"
fixes: #2129
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
When client.strict-locks is enabled on a volume and there are POSIX
locks held on the files, after disconnect and reconnection of the
clients do not re-open such fds which might lead to multiple clients
acquiring the locks and cause data corruption.
Change-Id: I8777ffbc2cc8d15ab57b58b72b56eb67521787c5
Fixes: #1977
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Since commit bd540db1e, eager-locking was enabled for fsync. But on
certain VM workloads wit sharding enabled, shard xlator keeps sending
fsync on the base shard. This can cause blocked inodelks from other
clients (including shd) to time out due to call bail.
Fix:
Make afr fsync aware of inodelk count and not delay post-op + unlock
when inodelk count > 1, just like writev.
Code is restructured so that any fd based AFR_DATA_TRANSACTION can be made
aware by setting GLUSTERFS_INODELK_DOM_COUNT in xdata request.
Note: We do not know yet why VMs go in to paused state because of the
blocked inodelks but this patch should be a first step in reducing the
occurence.
Updates: #2198
Change-Id: Ib91ebdd3101d590c326e69c829cf9335003e260b
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
On a cluster with 15 million files, when fix-layout was started, it was
not progressing at all. So we tried to do a os.walk() + os.stat() on the
backend filesystem directly. It took 2.5 days. We removed os.stat() and
re-ran it on another brick with similar data-set. It took 15 minutes. We
realized that readdirp is extremely costly compared to readdir if the
stat is not useful. fix-layout operation only needs to know that the
entry is a directory so that fix-layout operation can be triggered on
it. Most of the modern filesystems provide this information in readdir
operation. We don't need readdirp i.e. readdir+stat.
Fix:
Use readdir operation in fix-layout. Do readdir+stat/lookup for
filesystems that don't provide d_type in readdir operation.
fixes: #2241
Change-Id: I5fe2ecea25a399ad58e31a2e322caf69fc7f49eb
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
| |
fixes: #2268
Change-Id: If00ee847e15ac7f7e5b0e12125a7d02a610b9708
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
| |
Also moved options to NO_DOC
Change-Id: I86623f4139d156812e622a87655483c9d2491052
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
| |
1447088 - Resource leak
1447089 - Buffer not null terminated
updates: #2216
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
priv->root_inode seems to be a remenant of pump xlator and was getting
populated in discover code path. thin-arbiter code used it to populate
loc info but it seems that in case of some daemons like quotad, the
discover path for root gfid is not hit, causing it to crash.
Fix:
root inode can be accessed via this->itable->root, so use that and
remove priv->rot_inode instances from the afr code.
Fixes: #2234
Change-Id: Iec59c157f963a4dc455652a5c85a797d00cba52a
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the moment dht rebalance doesn't give any option to disable fsync
after data migration. Making this an option would give admins take
responsibility of data in a way that is suitable for their cluster.
Default value is still 'on', so that the behavior is intact for people
who don't care about this.
For example: If the data that is going to be migrated is already backed
up or snapshotted, there is no need for fsync to happen right after
migration which can affect active I/O on the volume from applications.
fixes: #2258
Change-Id: I7a50b8d3a2f270d79920ef306ceb6ba6451150c4
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
| |
CID: 1214629,1274235,1437648
The buffer has been null terminated thus resolving the issue
Change-Id: Ieb1d067d8dd860c55a8091dd6fbba1bcbb4dc19f
Updates: #1060
Signed-off-by: Nishith Vihar Sakinala <nsakinal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In dht_queue_readdir(p) 'frame' is accessed after unwind. This will lead to
undefined behavior as frame would be freed upon unwind.
Fix:
Store the variables that are needed in local variables and use them
instead.
fixes: #2239
Change-Id: I6b2e48e87c85de27fad67a12d97abd91fa27c0c1
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* features/index: Optimize link-count fetching code path
Problem:
AFR requests 'link-count' in lookup to check if there are any pending
heals. Based on this information, afr will set dirent->inode to NULL in
readdirp when heals are ongoing to prevent serving bad data. When heals
are completed, link-count xattr is leading to doing an opendir of
xattrop directory and then reading the contents to figure out that there
is no healing needed for every lookup. This was not detected until this
github issue because ZFS in some cases can lead to very slow readdir()
calls. Since Glusterfs does lot of lookups, this was slowing down
all operations increasing load on the system.
Code problem:
index xlator on any xattrop operation adds index to the relevant dirs
and after the xattrop operation is done, will delete/keep the index in
that directory based on the value fetched in xattrop from posix. AFR
sends all-zero xattrop for changelog xattrs. This is leading to
priv->pending_count manipulation which sets the count back to -1. Next
Lookup operation triggers opendir/readdir to find the actual link-count in
lookup because in memory priv->pending_count is -ve.
Fix:
1) Don't add to index on all-zero xattrop for a key.
2) Set pending-count to -1 when the first gfid is added into xattrop
directory, so that the next lookup can compute the link-count.
fixes: #1764
Change-Id: I8a02c7e811a72c46d78ddb2d9d4fdc2222a444e9
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* addressed comments
Change-Id: Ide42bb1c1237b525d168bf1a9b82eb1bdc3bc283
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* tests: Handle base index absence
Change-Id: I3cf11a8644ccf23e01537228766f864b63c49556
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* Addressed LOCK based comments, .t comments
Change-Id: I5f53e40820cade3a44259c1ac1a7f3c5f2f0f310
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AFR may hide some existing entries from a directory when reading it
because they are generated internally for private management. However
the returned number of entries from readdir() function is not updated
accordingly. So it may return a number higher than the real entries
present in the gf_dirent list.
This may cause unexpected behavior of clients, including gfapi which
incorrectly assumes that there was an entry when the list was actually
empty.
This patch also makes the check in gfapi more robust to avoid similar
issues that could appear in the future.
Fixes: #2232
Change-Id: I81ba3699248a53ebb0ee4e6e6231a4301436f763
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID: 1444461
A lock is being destroyed, but in some code-flows might be used later
on, modified code-flow to make sure the destroyed lock is not being used
in all cases.
Change-Id: I9610d56d9cb8a8ab7062e9094493dba9afdd0b30
updates: #1060
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
| |
Fixes CID: 1124725
Updates: #1060
Change-Id: Iced092c5ad1a9445e4c758f09a481501bae7275f
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in glusterd_svc_start:
1) synctaskA gets attach_lock and then releases big_lock to execute runner_run.
2) synctaskB then gets big_lock but can not gets attach_lock and then wait.
3) After executes runner_run, synctaskA then gets big_lock but synctaskB holds it, wait.
This leads to deadlock.
This patch uses runner_run_nowait to avoid the deadlock.
fixes: #2117
Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn>
|
|
|
|
|
|
|
| |
Fixes coverity issues 1447029 and 1447028.
Updates: #2161
Change-Id: I6a564231d6aeb76de20675b7ced5d45eed8c377f
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I97e73c0aae74fc5d80c975f56f2f7a64e3e1ae95
Updates: #2169
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cluster/afr: Fix race in lockinfo (f)getxattr
A shared dictionary was updated outside the lock after having updated
the number of remaining answers. This means that one thread may be
processing the last answer and unwinding the request before another
thread completes updating the dict.
Thread 1 Thread 2
LOCK()
call_cnt-- (=1)
UNLOCK()
LOCK()
call_cnt-- (=0)
UNLOCK()
update_dict(dict)
if (call_cnt == 0) {
STACK_UNWIND(dict);
}
update_dict(dict)
if (call_cnt == 0) {
STACK_UNWIND(dict);
}
The updates from thread 1 are lost.
This patch also reduces the work done inside the locked region and
reduces code duplication.
Fixes: #2161
Change-Id: Idc0d34ab19ea6031de0641f7b05c624d90fac8fa
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When parallel-readdir is enabled, readdir(p) requests sent by DHT can be
immediately processed and answered in the same thread before the call to
STACK_WIND_COOKIE() completes.
This means that the readdir(p) cbk is processed synchronously. In some
cases it may decide to send another readdir(p) request, which causes a
recursive call.
When some special conditions happen and the directories are big, it's
possible that the number of nested calls is so high that the process
crashes because of a stack overflow.
This patch fixes this by not allowing nested readdir(p) calls. When a
nested call is detected, it's queued instead of sending it. The queued
request is processed when the current call finishes by the top level
stack function.
Fixes: #2169
Change-Id: Id763a8a51fb3c3314588ec7c162f649babf33099
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the commit (c878174) we have introduced a check
to avoid stale layout issue.To avoid a stale layout
issue dht has set a key along with layout at the time
of wind a create fop and posix validates the parent
layout based on the key value. If layout does not match
it throw and error.In case of volume shrink layout has
been changed by reabalance daemon and if layout does not
matches dht is not able to wind a create fop successfully.
Solution: To avoid the issue populate a key only while
dht has wind a fop first time. After got an
error in 2nd attempt dht takes a lock and then
reattempt to wind a fop again.
Fixes: #2187
Change-Id: Ie018386e7823a11eea415496bb226ca032453a55
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* fuse: add an option to specify the mount display name
There are two things this PR is fixing.
1. When a mount is specified with volfile (-f) option, today, you can't make it out its from glusterfs as only volfile is added as 'fsname', so we add it as 'glusterfs:/<volname>'.
2. Provide an options for admins who wants to show the source of mount other than default (useful when one is not providing 'mount.glusterfs', but using their own scripts.
Updates: #1000
Change-Id: I19e78f309a33807dc5f1d1608a300d93c9996a2f
Signed-off-by: Amar Tumballi <amar@kadalu.io>
|
|
|
|
|
| |
when passing wrong volume-name which doesn't exits, it will get stuck.
The errno is 0 inited in glusterd-handshake.c. After initing the errno,
the process blocks in gf_fuse_umount.
|
|
|
|
|
|
|
|
|
|
|
| |
In the commit 61ae58e67567ea4de8f8efc6b70a9b1f8e0f1bea
introduced a coverity bug use object after cleanup
the object.
Cleanup memory after comeout from a critical section
Fixes: #2180
Change-Id: Iee2050c4883a0dd44b8523bb822b664462ab6041
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
| |
A couple of methods are not being used, removing them.
Change-Id: I5bb4b7f04bae9486cf9b7960cf5ed91d0b59c8c7
updates: #1000
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rebalance cli is not showing correct status after reboot.
The CLI is not correct status because defrag object is not
valid at the time of creating a rpc connection to show the status.
The defrag object is not valid because at the time of start a glusterd
glusterd_restart_rebalance can be call almost at the same time by two
different synctask and glusterd got a disconnect on rpc object and it
cleanup the defrag object.
Solution: To avoid the defrag object populate a reference count before
create a defrag rpc object.
Fixes: #1339
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: Ia284015d79beaa3d703ebabb92f26870a5aaafba
|
|
|
|
|
|
| |
Optimize parameter backup-volfile-servers to support IPv6 address.
Fixes: #2042
Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
|
|
|
|
|
|
|
|
|
| |
After glibc 2.32, lchmod() is returning EOPNOTSUPP instead of ENOSYS when
called on symlinks. The man page says that the returned code is ENOTSUP.
They are the same in linux, but this patch correctly handles all errors.
Fixes: #2154
Change-Id: Ib3bb3d86d421cba3d7ec8d66b6beb131ef6e0925
Signed-off-by: Xavi Hernandez xhernandez@redhat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Errno set by the runner code was not correct when the bind() fails
to assign an already occupied port in the __socket_server_bind().
Fix:
Updated the code to return the correct errno from the
__socket_server_bind() if the bind() fails due to EADDRINUSE error. And,
use the returned errno from runner_run() to retry allocating a new port
to the brick process.
Fixes: #1101
Change-Id: If124337f41344a04f050754e402490529ef4ecdc
Signed-off-by: nik-redhat nladha@redhat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Removed unused ref_count variable
- Reordered the struct to get related variables closer together.
- Changed 'complete' from a '_Bool' to a 'int32_t'
Before:
```
struct _call_frame {
call_stack_t * root; /* 0 8 */
call_frame_t * parent; /* 8 8 */
struct list_head frames; /* 16 16 */
void * local; /* 32 8 */
xlator_t * this; /* 40 8 */
ret_fn_t ret; /* 48 8 */
int32_t ref_count; /* 56 4 */
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
gf_lock_t lock; /* 64 40 */
void * cookie; /* 104 8 */
_Bool complete; /* 112 1 */
/* XXX 3 bytes hole, try to pack */
glusterfs_fop_t op; /* 116 4 */
struct timespec begin; /* 120 16 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
struct timespec end; /* 136 16 */
const char * wind_from; /* 152 8 */
const char * wind_to; /* 160 8 */
const char * unwind_from; /* 168 8 */
const char * unwind_to; /* 176 8 */
/* size: 184, cachelines: 3, members: 17 */
/* sum members: 177, holes: 2, sum holes: 7 */
/* last cacheline: 56 bytes */
```
After:
```
struct _call_frame {
call_stack_t * root; /* 0 8 */
call_frame_t * parent; /* 8 8 */
struct list_head frames; /* 16 16 */
struct timespec begin; /* 32 16 */
struct timespec end; /* 48 16 */
/* --- cacheline 1 boundary (64 bytes) --- */
void * local; /* 64 8 */
gf_lock_t lock; /* 72 40 */
void * cookie; /* 112 8 */
xlator_t * this; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
ret_fn_t ret; /* 128 8 */
glusterfs_fop_t op; /* 136 4 */
int32_t complete; /* 140 4 */
const char * wind_from; /* 144 8 */
const char * wind_to; /* 152 8 */
const char * unwind_from; /* 160 8 */
const char * unwind_to; /* 168 8 */
/* size: 176, cachelines: 3, members: 16 */
/* last cacheline: 48 bytes */
```
Fixes: #2130
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current block size used for self-heal by default is 128 KiB. This
requires a significant amount of management requests for a very small
portion of data healed.
With this patch the block size is increased to 4 MiB. For a standard
EC volume configuration of 4+2, this means that each healed block of
a file will update 1 MiB on each brick.
Change-Id: Ifeec4a2d54988017d038085720513c121b03445b
Updates: #2067
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* syncop: introduce microsecond sleep support
Introduce microsecond sleep function synctask_usleep,
which can be used to improve precision instead of synctask_sleep.
Change-Id: Ie7a15dda4afc09828bfbee13cb8683713d7902de
* glusterd: use synctask_usleep in glusterd_proc_stop()
glusterd_proc_stop() sleep 1s for proc stop before force kill.
but in most cases, process can be stopped in 100ms.
This patch use synctask_usleep to check proc running state
every 100ms instead of sleep 1, can reduce up to 1s stop time.
in some cases like enable 100 volumes quota, average execution
time reduced from 2500ms to 500ms.
fixes: #2116
Change-Id: I645e083076c205aa23b219abd0de652f7d95dca7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* glusterd-volgen: Add functionality to accept any custom xlator
Add new function which allow users to insert any custom xlators.
It makes to provide a way to add any processing into file operations.
Users can deploy the plugin(xlator shared object) and integrate it to glusterfsd.
If users want to enable a custom xlator, do the follows:
1. put xlator object(.so file) into "XLATOR_DIR/user/"
2. set the option user.xlator.<xlator> to the existing xlator-name to specify of the position in graph
3. restart gluster volume
Options for custom xlator are able to set in "user.xlator.<xlator>.<optkey>".
Fixes: #1943
Signed-off-by:Ryo Furuhashi <ryo.furuhashi.nh@hitachi.com>
Co-authored-by: Yaniv Kaul <ykaul@redhat.com>
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scenario:
1) decommission start: the option decommissioned-bricks is added to the vol file
and being parsed by dht.
2) another configuration change (like setting a new loglevel): the decommissioned-bricks option
still exists on the vol file and being parsed again, this leads to invalid data.
Fix:
Prevent the parsing of "decommissioned-bricks" when decommission is running.
This counts on the fact that once a decommission is running it cannot be started again.
Fixes: #1992
Change-Id: I7a016750e2f865aee4cd620bd9033ec19421d47d
Signed-off-by: Tamar Shacked <tshacked@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
fix-layout operation assumes that the directory passed is directory i.e.
layout->cnt == conf->subvolume_cnt. This will lead to a crash when
fix-layout is attempted on a file.
Fix:
Disallow fix-layout on files
fixes: #2107
Change-Id: I2116b8773059f67e3260e9207e20eab3de711417
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
| |
At the moment self-heal-window-size is 128KB. This leads to healing data
in 128KB chunks. With the growth of data and the avg file sizes
nowadays, 1MB seems like a better default.
Change-Id: I70c42c83b16c7adb53d6b5762969e878477efb5c
Fixes: #2067
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: Iec16d7ff5e05f29255491a43fbb6270c72868999
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I07e5a5bf9d33c24b63da72d4f3f59392c5421652
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I3679de8545f2e5b8027c4d5a6fd0592092e8dfbd
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* Update xlators/storage/posix/src/posix-entry-ops.c
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* Update fd.c
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* features/shard: delay unlink of a file that has fd_count > 0
When there are multiple processes working on a file and if any
process unlinks that file then unlink operation shouldn't harm
other processes working on it. This is a posix a compliant
behavior and this should be supported when shard feature is
enabled also.
Problem description:
Let's consider 2 clients C1 and C2 working on a file F1 with 5
shards on gluster mount and gluster server has 4 bricks
B1, B2, B3, B4.
Assume that base file/shard is present on B1, 1st, 2nd shards
on B2, 3rd and 4th shards on B3 and 5th shard falls on B4 C1
has opened the F1 in append mode and is writing to it. The
write FOP goes to 5th shard in this case. So the
inode->fd_count = 1 on B1(base file) and B4 (5th shard).
C2 at the same time issued unlink to F1. On the server, the
base file has fd_count = 1 (since C1 has opened the file),
the base file is renamed under .glusterfs/unlink and
returned to C2. Then unlink will be sent to shards on all
bricks and shards on B2 and B3 will be deleted which have
no open reference yet. C1 starts getting errors while
accessing the remaining shards though it has open references
for the file.
This is one such undefined behavior. Likewise we will
encounter many such undefined behaviors as we dont have one
global lock to access all shards as one. Of Course having such
global lock will lead to performance hit as it reduces window
for parallel access of shards.
Solution:
The above undefined behavior can be addressed by delaying the
unlink of a file when there are open references on it.
File unlink happens in 2 steps.
step 1: client creates marker file under .shard/remove_me and
sends unlink on base file to the server
step 2: on return from the server, the associated shards will
be cleaned up and finally marker file will be removed.
In step 2, the back ground deletion process does nameless
lookup using marker file name (marker file is named after the
gfid of the base file) in glusterfs/unlink dir. If the nameless
look up is successful then that means the gfid still has open
fds and deletion of shards has to be delayed. If nameless
lookup fails then that indicates the gfid is unlinked and no
open fds on that file (the gfid path is unlinked during final
close on the file). The shards on which deletion is delayed
are unlinked one the all open fds are closed and this is
done through a thread which wakes up every 10 mins.
Also removed active_fd_count from inode structure and
referring fd_count wherever active_fd_count was used.
fixes: #1358
Change-Id: I8985093386e26215e0b0dce294c534a66f6ca11c
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we hit the max capacity of the storage space, shard_unlink()
starts failing if there is no space left on the brick to create a
marker file.
shard_unlink() happens in below steps:
1. create a marker file in the name of gfid of the base file under
BRICK_PATH/.shard/.remove_me
2. unlink the base file
3. shard_delete_shards() deletes the shards in background by
picking the entries in BRICK_PATH/.shard/.remove_me
If a marker file creation fails then we can't really delete the
shards which eventually a problem for user who is looking to make
space by deleting unwanted data.
Solution:
Create the marker file by marking xdata = GLUSTERFS_INTERNAL_FOP_KEY
which is considered to be internal op and allowed to create under
reserved space.
Fixes: #2038
Change-Id: I7facebab940f9aeee81d489df429e00ef4fb7c5d
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
|
|
|
|
|
|
|
|
| |
Do not allow changing storage.linux-aio for running volume,
cleanup nearby storage.linux-io_uring error message as well.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #2039
|
|
|
|
|
|
|
|
|
| |
CID 1430124
A negative value is being passed to a parameter hat cannot be negative.
Modified the value which is being passed.
Change-Id: I06dca105f7a78ae16145b0876910851fb631e366
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
CID 1291733
The return value of the method pthread_cancel was not being checked.
Added a retun value check and proper error handling.
Change-Id: I8c52b0e462461fc59718deb3b7c2f1b4e55613c7
updates: #1060
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* glusterd - fixing coverity issues
- Dereference after null check (CID 1437686)
- Dereference null return value (CID 1437687)
- A check for the return value of a memory allocation was missing, added
it.
- A value of a pointer was being dereferenced after a NULL-pointer check.
With this change the pointer is no longer dereferenced.
Change-Id: I10bf8a2cb08612981dbb788315dad7dbb4efe2cb
updates: #1060
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Implement GF_FOP_FSYNC using io_submit() with IOCB_CMD_FSYNC
and IOCB_CMD_FDSYNC operations.
Refactor common code to posix_aio_cb_init() and posix_aio_cb_fini()
as suggested by Ravishankar N.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #1952
|
|
|
|
|
|
|
| |
'this' pointer was being dereferenced after null check. This change avoids it.
Change-Id: I7dedee44c08df481d2a037eb601f3d5c4d9284f5
Updates: #1060
Signed-off-by: Nishith Vihar Sakinala <nsakinal@redhat.com>
|