| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The current graph-switch code sets priv->handle_graph_switch to false even
when graph-switch is in progress which leads to crashes in some cases
Fix:
priv->handle_graph_switch should be set to false only when graph-switch
completes.
fixes: #1539
Change-Id: I5b04f7220a0a6e65c5f5afa3e28d1afe9efcdc31
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
|
|
|
|
|
|
|
|
|
|
| |
Prefer timespec_now_realtime() and gf_time() over clock_gettime()
and time(), use gf_tvdiff() and gf_tsdiff() where appropriate,
drop unused time_elapsed() and leftovers in 'struct posix_private'.
Change-Id: Ie1f0229df5b03d0862193ce2b7fb91d27b0981b6
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #1002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterfs so far constrained itself with an arbitrary limit (32)
for the number of groups read from /proc/[pid]/status (this was
the number of groups shown there prior to Linux commit
v3.7-9553-g8d238027b87e (v3.8-rc1~74^2~59); since this commit, all
groups are shown).
With this change we'll read groups up to the number Glusterfs
supports in general (64k).
Note: the actual number of groups that are made use of in a
regular Glusterfs setup shall still be capped at ~93 due to limitations
of the RPC transport. To be able to handle more groups than that,
brick side gid resolution (server.manage-gids option) can be used along
with NIS, LDAP or other such networked directory service (see
https://github.com/gluster/glusterdocs/blob/5ba15a2/docs/Administrator%20Guide/Handling-of-users-with-many-groups.md#limit-in-the-glusterfs-protocol
).
Also adding some diagnostic messages to frame_fill_groups().
Change-Id: I271f3dc3e6d3c44d6d989c7a2073ea5f16c26ee0
fixes: #1075
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
| |
Change-Id: Ib2bac85c28905bb8997fbb64db2308f2a6f31720
Fixes: #1376
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The setlk interrupt handler uses a 'fork' of the resolved
fuse state from setlk (a copy with some edits) to initiate
its own auxiliary fop. Thus the references stored in the
fuse states of the setlk fop and of its interrupt handler
are shared (apart from the ones edited by the interrupt
handler -- but the bulk of them remain as is). The lifetimes
of these references are tied to the setlk fop, which has
established them by properly claiming their backing
resources. To guarantee the validity of these references in
the interrupt context, we need to make sure that the setlk
fop did not reclaim the fuse state while the interrupt
handler is running.
In other words, the setlk fop needs to wait for the
termination of the interrupt handler, which is accomplished
by the 'sync' strategy of the interrupt API (passing
true for the 'sync' argument of
fuse_interrupt_finish_{fop,interrupt} functions).
Change-Id: I9a6dc76972507be4b7ba8d023cc876e5fddf813f
Updates: #1374
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With 'sync' strategy, a fop's cbk waits for
the interrupt handler to finish by making a
call to fuse_interrupt_finish_fop() with
sync = true.
The wait is implemented by monitoring an
interrupt_state struct member via a condition
variable. However, due to broken code logic,
the pthread_cond_wait() call is never reached.
This change introduces a new member to the
fuse_interrupt_state_t enum (the type of
aforementioned struct member),
FUSE_INTERRUPT_WAITING_HANDLER, which is then
used for indicating the state of waiting for
the interrupt handler.
Change-Id: I72ab06c37f45ff8f212a6a632bac1f647af05cbd
Updates: #1374
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
NetBSD FUSE does not implement FUSE notification yet. This changes
makes this feature a configure time option so that it can be disabled.
Fixes: #1381
Change-Id: I3d977d8d69b57e1ac6957be84a9ddbb69b100893
Type: Bug
Signed-off-by: Emmanuel Dreyfus manu@netbsd.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
On executing tests/features/flock_interrupt.t the following error log
appears
[2020-06-16 11:51:54.631072 +0000] E
[fuse-bridge.c:4791:fuse_setlk_interrupt_handler_cbk] 0-glusterfs-fuse:
interrupt record not found
This happens because fuse-interrupt-record is never sent on the wire by
getxattr fop and there is no guarantee that in the cbk it will be
available in case of failures.
Fix:
wind getxattr fop with fuse-interrupt-record as cookie and recover it
in the cbk
Fixes: #1310
Change-Id: I4cfff154321a449114fc26e9440db0f08e5c7daa
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Logs and other output carrying timestamps
will have now timezone offsets indicated, eg.:
[2020-03-12 07:01:05.584482 +0000] I [MSGID: 106143] [glusterd-pmap.c:388:pmap_registry_remove] 0-pmap: removing brick (null) on port 49153
To this end,
- gf_time_fmt() now inserts timezone offset via %z strftime(3) template.
- A new utility function has been added, gf_time_fmt_tv(), that
takes a struct timeval pointer (*tv) instead of a time_t value to
specify the time. If tv->tv_usec is negative,
gf_time_fmt_tv(... tv ...)
is equivalent to
gf_time_fmt(... tv->tv_sec ...)
Otherwise it also inserts tv->tv_usec to the formatted string.
- Building timestamps of usec precision has been converted to
gf_time_fmt_tv, which is necessary because the method of appending
a period and the usec value to the end of the timestamp does not work
if the timestamp has zone offset, but it's also beneficial in terms of
eliminating repetition.
- The buffer passed to gf_time_fmt/gf_time_fmt_tv has been unified to
be of GF_TIMESTR_SIZE size (256). We need slightly larger buffer space
to accommodate the zone offset and it's preferable to use a buffer
which is undisputedly large enough.
This change does *not* do the following:
- Retaining a method of timestamp creation without timezone offset.
As to my understanding we don't need such backward compatibility
as the code just emits timestamps to logs and other diagnostic
texts, and doesn't do any later processing on them that would rely
on their format. An exception to this, ie. a case where timestamp
is built for internal use, is graph.c:fill_uuid(). As far as I can
see, what matters in that case is the uniqueness of the produced
string, not the format.
- Implementing a single-token (space free) timestamp format.
While some timestamp formats used to be single-token, now all of
them will include a space preceding the offset indicator. Again,
I did not see a use case where this could be significant in terms
of representation.
- Moving the codebase to a single unified timestamp format and
dropping the fmt argument of gf_time_fmt/gf_time_fmt_tv.
While the gf_timefmt_FT format is almost ubiquitous, there are
a few cases where different formats are used. I'm not convinced
there is any reason to not use gf_timefmt_FT in those cases too,
but I did not want to make a decision in this regard.
Change-Id: I0af73ab5d490cca7ed8d07a2ce7ac22a6df2920a
Updates: #837
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
AFR doesn't delay post-op for fsync fop. For fsync heavy workloads
this leads to un-necessary fxattrop/finodelk for every fsync leading
to bad performance.
Fix:
Have delayed post-op for fsync. Add special flag in xdata to indicate
that afr shouldn't delay post-op in cases where either the
process will terminate or graph-switch would happen. Otherwise it leads
to un-necessary heals when the graph-switch/process-termination
happens before delayed-post-op completes.
Fixes: #1253
Change-Id: I531940d13269a111c49e0510d49514dc169f4577
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a critical flaw in the previous implementation of open-behind.
When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.
To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.
To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.
The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.
Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.
Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Per a "TODO" comment in the code, a code block can be removed as it's no
longer required
Change-Id: I60e064ece985ff2ea2a686bbd2f0e6cc850899e9
updates: #1000
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change is a followup to
I510158843e4b1d482bdc496c2e97b1860dc1ba93.
In referred change we pushed log messages about 'weird'
write errors to fuse device out of sight, by reporting
them at Debug loglevel instead of Error (where
'weird' means errno is not POSIX compliant but having
meaningful semantics for FUSE protocol).
This solved the issue of spurious error reporting.
And so far so good: these messages don't indicate
an error condition by themselves. However, when they
come in high repetitions, that indicates a suboptimal
condition which should be reported.[1]
Therefore now we shall emit a Warning if a certain
errno occurs a certain number of times[2] as the
outcome of a write to the fuse device.
___
[1] typically ENOENTs and ENOTDIRs accumulate
when glusterfs' inode invalidation lags behind
the kernel's internal inode garbage collection
(in this case above errnos mean that the inode
which we requested to be invalidated is not found
in kernel). This can be mitigated with the
invalidate-limit command line / mount option,
cf. bz#1732717.
[2] 256, as of the current implementation.
Change-Id: I8cc7fe104da43a88875f93b0db49d5677cc16045
Updates: #1000
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
tests/bugs/protocol/bug-1433815-auth-allow.t fails
sometimes because of stale mount. This stale mount
comes into picture when parent process dies without
waiting for the child process which mounts fuse fs
to die
Fix:
Wait for mounting child process to die before dying.
Fixes: #1152
Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
| |
Change-Id: Ib07c31dc6669aa9d9f84a21e6160237290ed9afb
Fixes: bz#1804786
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
FUSE uses failures of communicating with /dev/fuse with various
errnos to indicate in-kernel conditions to userspace. Some of these
shouldn't be handled as an application error. Also the standard
POSIX errno description should not be shown as they are misleading
in this context.
Solution:
When writing to the fuse device, the caller of the respective
convenience routine can mask those errnos which don't qualify to
be an error for the application in that context, so then those
shall be reported at DEBUG level.
The possible non-standard errnos are reported with their
POSIX name instead of their description to avoid confusion.
(Eg. for ENOENT we don't log "no such file or directory",
we log indeed literal "ENOENT".)
Change-Id: I510158843e4b1d482bdc496c2e97b1860dc1ba93
updates: bz#1193929
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code like:
f(..., uuid_utoa(x), uuid_utoa(y));
is not valid (causes undefined behaviour) because uuid_utoa()
uses the only static thread-local buffer which will be overwritten
by the subsequent call. All such cases should be converted to use
uuid_utoa_r() with explicitly specified buffer.
Change-Id: I5e72bab806d96a9dd1707c28ed69ca033b9c8d6c
Updates: bz#1193929
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
|
|
|
|
|
|
| |
updates: bz#1193929
Change-Id: I50f75f730ea6970e99347fcee661ce9dc8477725
Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When touch is used to create a file, the ctime is not matching
atime and mtime which ideally should match. There is a difference
in nano seconds.
Cause:
When touch is used modify atime or mtime to current time (UTIME_NOW),
the current time is taken from kernel. The ctime gets updated to current
time when atime or mtime is updated. But the current time to update
ctime is taken from utime xlator. Hence the difference in nano seconds.
Fix:
When utimesat uses UTIME_NOW, use the current time from kernel.
fixes: bz#1773530
Change-Id: I9ccfa47dcd39df23396852b4216f1773c49250ce
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
| |
updates bz#1193929
Signed-off-by: Csaba Henk <csaba@redhat.com>
Change-Id: I12cbe1d87f60fb497654d0e13e12171940867f76
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In many places we use it, compare to it, etc. It could be a static variable,
as it really doesn't change. I think it's better than initializing to 0
and then doing gfid[15] = 1 or other tricks.
I think there are additional oppportunuties to make more variables static.
This is an attempt at an easy one.
Change-Id: I7f23a30a94056d8f043645371ab841cbd0f90d19
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The current lru-limit value still uses memory for
upto 128K inodes.
Reduce the default value of lru-limit to 64K.
Change-Id: Ica2dd4f8f5fde45cb5180d8f02c3d86114ac52b3
Fixes: bz#1753880
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Fixed resource leak of dict_value and newkey variables
CID: 1398630
Updates: bz#789278
Change-Id: I589fdc0aecaeb4f446cd29f95bad36ccd7a35beb
Signed-off-by: Barak Sason <bsasonro@redhat.com>
|
|
|
|
|
|
| |
Fixes: bz#1158130
Change-Id: Ifdeaed7c9fbe85f7ce421f7c89cbe7265e45f77c
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the glusterfs fuse client process is unable to
process the invalidate requests quickly enough, the
number of such requests quickly grows large enough
to use a significant amount of memory.
We are now introducing another option to set an upper
limit on these to prevent runaway memory usage.
Change-Id: Iddfff1ee2de1466223e6717f7abd4b28ed947788
Fixes: bz#1732717
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
| |
Fixes: bz#1644322
Change-Id: I53e8fa362cd8c7d04fb1c4abb606a9abb642c592
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Id7e003e4a53d0a0057c1c84e1cd704c80a6cb015
Fixes: bz#1728047
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have proposed about this an year ago, and with recent smoke
failures, it looks like the right time to take such call.
ref: https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html
With this, glusterfs-8.0 wouldn't have rdma feature, and would
allow some modularity changes possible with rpc layer (as we would
have just 1 transport)
Updates: bz#1635688
Change-Id: Ia277dca4d4b1f0cffae20819024a52b075b775e5
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
| |
As Hadoop is no longer supported, dropping code for
handling Hadoop access.
Fixes: bz#1728417
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Change-Id: I8fcf4faacb364f1c9a8abb0c48faec337087f845
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to man page:
mount -t glusterfs [-o <options>] <server1>,<server2>,
<server3>,..<serverN>:/<volname>[/<subdir>] <mount_point>
When we try
mount -t glusterfs 52:54:00:23:5a:b6,52:54:00:a0:be:ed:/ta-vol1 /mnt/ta
we see the below msg in log:
[2019-06-17 11:41:36.085809] I [MSGID: 100030] [glusterfsd.c:2867:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 7dev (args: /usr/local/sbin/glusterfs --process-name fuse --volfile-server=52:54:00:23:5a --volfile-id=/ta-vol1 /mnt/ta)
With this change, we'll able to give multiple volfile servers
while mounting the volume using ipv6.
After the change, I see the following in log:
[2019-06-17 12:00:21.183658] I [MSGID: 100030] [glusterfsd.c:2867:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 7dev (args: /usr/local/sbin/glusterfs --process-name fuse --volfile-server=52:54:00:23:5a:b6 --volfile-server=52:54:00:a0:be:ed --volfile-id=/ta-vol1 /mnt/ta)
fixes: bz#1719290
credits: Aga <aga_1990@hotmail.com>
Change-Id: Icf89bea3ba15d8374ef428aeb59f2ef55ad544ec
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the moment new stack doesn't populate frame->root->unique in all cases. This
makes it difficult to debug hung frames by examining successive state dumps.
Fuse and server xlators populate it whenever they can, but other xlators won't
be able to assign 'unique' when they need to create a new frame/stack because
they don't know what 'unique' fuse/server xlators already used. What we need is
for unique to be correct. If a stack with same unique is present in successive
statedumps, that means the same operation is still in progress. This makes
'finding hung frames' part of debugging hung frames easier.
fixes bz#1714098
Change-Id: I3e9a8f6b4111e260106c48a2ac3a41ef29361b9e
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In scenarios where a mount fails before creating log file, doesn't
make sense to give message to 'check log file'. See below:
```
ERROR: failed to create logfile "/var/log/glusterfs/mnt.log" (No space left on device)
ERROR: failed to open logfile /var/log/glusterfs/mnt.log
Mount failed. Please check the log file for more details.
```
Fixes: bz#1688068
Change-Id: I1d837caa4f9bc9f1a37780783e95007e01ae4e3f
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixed coverity issue in fuse-bridge.c.
CID : 1398630 : Resource leak
CID : 1399757 : Uninitialized pointer read
updates: bz#789278
Change-Id: I69f8591400ee56a5d215eeac443a8e3d7777db27
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
| |
updates bz#1193929
Change-Id: I55ffa8f086ad9570f2526d91c196d7de9ffe6add
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes memory leak reported by ASan.
Tracebacks:
ERROR: LeakSanitizer: detected memory leaks
Direct leak of 712 byte(s) in 1 object(s) allocated from:
#0 0x7f35139dc848 in __interceptor_malloc (/lib64/libasan.so.5+0xef848)
#1 0x7f35136efb29 in __gf_malloc ../libglusterfs/src/mem-pool.c:136
#2 0x7f3510591ce9 in fuse_thread_proc ../xlators/mount/fuse/src/fuse-bridge.c:5929
#3 0x7f351336d58d in start_thread (/lib64/libpthread.so.0+0x858d)
SUMMARY: AddressSanitizer: 712 byte(s) leaked in 1 allocation(s).
updates: bz#1633930
Change-Id: Ie5b4da6b338d8e5fc770c5b2da1238e3462468ac
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements a thread pool that is wait-free for adding jobs to
the queue and uses a very small locked region to get jobs. This makes it
possible to decrease contention drastically. It's based on wfcqueue
structure provided by urcu library.
It automatically enables more threads when load demands it, and stops
them when not needed. There's a maximum number of threads that can be
used. This value can be configured.
Depending on the workload, the maximum number of threads plays an
important role. So it needs to be configured for optimal performance.
Currently the thread pool doesn't self adjust the maximum for the
workload, so this configuration needs to be changed manually.
For this reason, the global thread pool has been made optional, so that
volumes can still use the thread pool provided by io-threads.
To enable it for bricks, the following option needs to be set:
config.global-threading = on
This option has no effect if bricks are already running. A restart is
required to activate it. It's recommended to also enable the following
option when running bricks with the global thread pool:
performance.iot-pass-through = on
To enable it for a FUSE mount point, the option '--global-threading'
must be added to the mount command. To change it, an umount and remount
is needed. It's recommended to disable the following option when using
global threading on a mount point:
performance.client-io-threads = off
To enable it for services managed by glusterd, glusterd needs to be
started with option '--global-threading'. In this case all daemons, like
self-heal, will be using the global thread pool.
Currently it can only be enabled for bricks, FUSE mounts and glusterd
services.
The maximum number of threads for clients and bricks can be configured
using the following options:
config.client-threads
config.brick-threads
These options can be applied online and its effect is immediate most of
the times. If one of them is set to 0, the maximum number of threads
will be calcutated as #cores * 2.
Some distributions use a very old userspace-rcu library (version 0.7)
for this reason, some header files from version 0.10 have been copied
into contrib/userspace-rcu and are used if the detected version is 0.7
or older.
An additional change has been made to io-threads to prevent that threads
are started when iot-pass-through is set.
Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b
updates: #532
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When "auto-invalidation" option was not specified for mount script,
glusterfs cmdline ended with "--auto-invalidation=" option. This patch
fixes that bug in mount script.
Thanks to Amar for reporting it.
Change-Id: Ie5cd4c6ffb3ac644d9d2b032035f914a935d05a8
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
updates: bz#1664934
|
|
|
|
|
|
|
|
|
|
| |
The setxattr function receives a pointer to raw data, which may not be
null-terminated. When this data needs to be interpreted as a string, an
explicit null termination needs to be added before using the value.
Change-Id: Id110f9b215b22786da5782adec9449ce38d0d563
updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Auto invalidation is necessary when same (meta)data is shared/access
across multiple mounts. However, if (meta)data is not shared, all
relevant I/O goes through the cache of single mount and hence is
coherent with (meta)data on bricks always. So, fuse-auto-invalidation
can be disabled for this case which gives a huge performance boost for
workloads that write data and then immediately read the data they just
wrote.
From glusterfs --help,
<snip>
--auto-invalidation[=BOOL] controls whether fuse-kernel can
auto-invalidate attribute, dentry and page-cache.
Disable this only if same files/directories are
not accessed across two different mounts
concurrently [default: "on"]
</snip>
Details on how disabling auto-invalidation helped to reduce pgbench
init times can be found at [1]. Time taken for pgbench init of scale
8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s)
with auto-invalidations turned off along with other
optimizations. Just disabling auto-invalidation contributed 56%
improvement by reducing the total time taken by 33260s.
[1] https://www.spinics.net/lists/gluster-devel/msg25907.html
Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
updates: bz#1664934
|
|
|
|
|
|
|
| |
fixes: bz#1622665
Change-Id: I777d67b1b62c284c62a02277238ad7538eef001e
Signed-off-by: Iraj Jamali <ijamali@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This reverts commit b87c397091bac6a4a6dec4e45a7671fad4a11770.
There seems to be some performance regression with the patch and hence recommended to have it reverted.
Updates: #325
Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09
|
|
|
|
|
|
| |
updates: bz#1622665
Change-Id: I9f3a75ed9be3d90f37843a140563c356830ef945
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation of iobuf_pool has two problems:
- prealloc of 12.5MB memory, this limits the scale factor of the gluster
processes due to RAM requirements
- lock contention, as the current implementation has one global
iobuf_pool lock. Credits for debugging and addressing the same goes to
Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410
Hence changing the iobuf implementation to use per thread mem pool.
This may theoritically appear to cause perf dip as there is no preallocation.
But per thread mem pool will not have significant perf impact as the last
allocated memory is kept alive for subsequent allocs, for some time.
The worst case would be if iobufs requested are of random sizes each time.
The best case is, if we get iobuf request of the same size. From the perf
tests, this patch did not seem to cause any perf decrease.
Note that, with this patch, the rdma performance is going to degrade
drastically. In one of the previous patchsets we had fixes to not
degrade rdma perf, but rdma is not supported and also not tested [1].
Hence the decision was to not have code in rdma that is not tested
and not supported.
[1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html
Updates: #325
Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27
Signed-off-by: Poornima G <pgurusid@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Use the (f)getxattr based clearlocks interface to
interrupt a pending lock request.
updates: #465
Change-Id: I4e91a4d8791fc688fed400a02de4c53487e61be2
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inode LRU mechanism is moot in fuse xlator (ie. there is no
limit for the LRU list), as fuse inodes are referenced from
kernel context, and thus they can only be dropped on request of
the kernel. This might results in a high number of passive
inodes which are useless for the glusterfs client, causing a
significant memory overhead.
This change tries to remedy this by extending the LRU semantics
and allowing to set a finite limit on the fuse inode LRU.
A brief history of problem:
When gluster's inode table was designed, fuse didn't have any
'invalidate' method, which means, userspace application could
never ask kernel to send a 'forget()' fop, instead had to wait
for kernel to send it based on kernel's parameters. Inode table
remembers the number of times kernel has cached the inode based
on the 'nlookup' parameter. And 'nlookup' field is not used by
no other entry points (like server-protocol, gfapi etc).
Hence the inode_table of fuse module always has to have lru-limit
as '0', which means no limit. GlusterFS always had to keep all
inodes in memory as kernel would have had a reference to it.
Again, the reason for this is, kernel's glusterfs inode reference
was pointer of 'inode_t' structure in glusterfs. As it is a
pointer, we could never free it (to prevent segfault, or memory
corruption).
Solution:
In the inode table, handle the prune case of inodes with 'nlookup'
differently, and call a 'invalidator' method, which in this case is
fuse_invalidate(), and it sends the request to kernel for getting
the forget request.
When the kernel sends the forget, it means, it has dropped all
the reference to the inode, and it will send the forget with the
'nlookup' parameter too. We just need to make sure to reduce the
'nlookup' value we have when we get forget. That automatically
cause the relevant prune to happen.
Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B
fixes: bz#1560969
Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* libglusterfs changes to add new fop
* Fuse changes:
- Changes in fuse bridge xlator to receive and send responses
* posix changes to perform the op on the backend filesystem
* protocol and rpc changes for sending and receiving the fop
* gfapi changes for performing the fop
* tools: glfs-copy-file-range tool for testing copy_file_range fop
- Although, copy_file_range support has been added to the upstream
fuse kernel module, no release has been made yet of a kernel
which contains the support. It is expected to come in the
upcoming release of linux-4.20
So, as of now, executing copy_file_range fop on a fused based
filesystem results in fuse kernel module sending read on the
source fd and write on the destination fd.
Therefore a small gfapi based tool has been written to be able
test the copy_file_range fop. This tool is similar (in functionality)
to the example program given in copy_file_range man page.
So, running regular copy_file_range on a fuse mount point and
running gfapi based glfs-copy-file-range tool gives some idea about
how fast, the copy_file_range (or reflink) can be.
On the local machine this was the result obtained.
mount -t glusterfs workstation:new /mnt/glusterfs
[root@workstation ~]# cd /mnt/glusterfs/
[root@workstation glusterfs]# ls
file
[root@workstation glusterfs]# cd
[root@workstation ~]# time /tmp/a.out /mnt/glusterfs/file /mnt/glusterfs/new
real 0m6.495s
user 0m0.000s
sys 0m1.439s
[root@workstation ~]# time glfs-copy-file-range $(hostname) new /tmp/glfs.log /file /rrr
OPEN_SRC: opening /file is success
OPEN_DST: opening /rrr is success
FSTAT_SRC: fstat on /rrr is success
copy_file_range successful
real 0m0.309s
user 0m0.039s
sys 0m0.017s
This tool needs following arguments
1) hostname
2) volume name
3) log file path
4) source file path (relative to the gluster volume root)
5) destination file path (relative to the gluster volume root)
"glfs-copy-file-range <hostname> <volume> <log file path> <source> <destination>"
- Added a testcase as well to run glfs-copy-file-range tool
* io-stats changes to capture the fop for profiling
* NOTE:
- Added conditional check to see whether the copy_file_range syscall
is available or not. If not, then return ENOSYS.
- Added conditional check for kernel minor version in fuse_kernel.h
and fuse-bridge while referring to copy_file_range. And the kernel
minor version is kept as it is. i.e. 24. Increment it in future
when there is a kernel release which contains the support for
copy_file_range fop in fuse kernel module.
* The document which contains a writeup on this enhancement can be found at
https://docs.google.com/document/d/1BSILbXr_knynNwxSyyu503JoTz5QFM_4suNIh2WwrSc/edit
Change-Id: I280069c814dd21ce6ec3be00a884fc24ab692367
updates: #536
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
| |
Fixes: #164
Change-Id: I93ad6f0232a1dc534df099059f69951e1339086f
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.
Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation <> in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs
This change although big, is just moving around the headers and
making it correct when including these headers from other sources.
This helps us correctly include libglusterfs includes without
namespace conflicts.
Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since gcc-8.2.x (fedora-28 or so) gcc has been emitting warnings
about buggy use of strncpy.
e.g.
warning: ‘strncpy’ output truncated before terminating nul
copying as many bytes from a string as its length
and
warning: ‘strncpy’ specified bound depends on the length of the
source argument
Since we're copying string fragments and explicitly null terminating
use memcpy to silence the warning
Change-Id: I413d84b5f4157f15c99e9af3e154ce594d5bcdc1
updates: bz#1193929
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We add dummy interrupt handling for the FLUSH
fuse message. It can be enabled by the
"--fuse-flush-handle-interrupt" hidden command line
option, or "-ofuse-flush-handle-interrupt=yes"
mount option.
It serves no other than diagnostic & demonstational
purposes -- to exercise the interrupt handling framework
a bit and to give an usage example.
Documentation is also provided that showcases interrupt
handling via FLUSH.
Change-Id: I522f1e798501d06b74ac3592a5f73c1ab0590c60
updates: #465
Signed-off-by: Csaba Henk <csaba@redhat.com>
|