glusterfs.git - GlusterFS is a distributed file-system capable of scaling to several petabytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system.

	Commit message (Collapse)	Author	Age	Files	Lines
*	rpc: fix missing unref on reconnect	Zhang Huan	2019-10-02	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On protocol client connecting to brick, client will firstly contact glusterd to get port, then reconnect to glusterfsd. Reconnect cancels the reconnect timer and start a new one. However, cancelling the timer does not unref rpc ref-ed for it. That leads to refcount leak. Fix this issue by unref-ing rpc if reconnect timer is canceled. Change-Id: Ice89dcd93cb283a0c7250c369cc8961d52fb2022 Fixes: bz#1538900 BUG: 1538900 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
*	glusterd, rpc, glusterfsd: fix coverity defects and put required annotations	Atin Mukherjee	2019-09-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	1404965 - Null pointer dereference 1404316 - Program hangs 1401715 - Program hangs 1401713 - Program hangs Updates: bz#789278 Change-Id: I6e6575daafcb067bc910445f82a9d564f43b75a2 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	rpc/xdr: fixes in Makefile	Amar Tumballi	2019-09-05	1	-1/+6
\| \| \| \| \| \| \| \|	there is no need to cleanup the .x files. Fixes: bz#1743094 Change-Id: I89d8deb3939c83069709c701cb8f1972e3746168 Signed-off-by: Amar Tumballi <amarts@gmail.com>
*	graph/cleanup: Fix race in graph cleanup	Mohammed Rafi KC	2019-09-05	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were unconditionally cleaning up the grap when we get child_down followed by parent_down. But this is prone to race condition when some of the bricks are already disconnected. In this case, even before the last child down is executed in the client xlator code,we might have freed the graph. Because the child_down event is alreadt recevied. To fix this race, we have introduced a check to see if all client xlator have cleared thier reconnect chain, and called the child_down for last time. Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042 fixes: bz#1727329 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	rpc: Update address family if it is not provide in cmd-line arguments	Mohit Agrawal	2019-09-02	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: After enabling transport-type to inet6 and passed ipv6 transport.socket.bind-address in glusterd.vol clients are not started. Solution: Need to update address-family based on remote-address for all gluster client process Change-Id: Iaa3588cd87cebc45231bfd675745c1a457dc9b31 Fixes: bz#1747746 Credits: Amgad Saleh <amgad.saleh@nokia.com> Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	rpc: glusterd start is failed and throwing an error Address already in use	Mohit Agrawal	2019-08-18	1	-7/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Some of the .t are failed due to bind is throwing an error EADDRINUSE Solution: After killing all gluster processes .t is trying to start glusterd but somehow if kernel has not cleaned up resources(socket) then glusterd startup is failed due to bind system call failure.To avoid the issue retries to call bind 10 times to execute system call succesfully Change-Id: Ia5fd6b788f7b211c1508c1b7304fc08a32266629 Fixes: bz#1743020 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	libglusterfs: remove dependency of rpc	Amar Tumballi	2019-08-16	3	-66/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Goal: 'libglusterfs' files shouldn't have any dependency outside of the tree, specially the header files, shouldn't have '#include' from outside the tree. Fixes: * Had to introduce libglusterd so, methods and structures required for only mgmt/glusterd, and cli/ are separated from 'libglusterfs/' * Remove rpc/xdr/gen from build, which was used mainly so dependency for libglusterfs could be properly satisfied. * Move rpcsvc_auth_data to client_t.h, so all dependencies could be handled. Updates: bz#1636297 Change-Id: I0e80243a5a3f4615e6fac6e1b947ad08a9363fce Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	rpc/transport: have default listen-port	Atin Mukherjee	2019-08-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With release-6, we now can have transport.socket.listen-port parameter configurable in glusterd.vol. However the default value wasn't defined in the code and this breaks the backward compatibility where if one has a modified glusterd.vol file, then post upgrade the same file will be retained and the new changes introduced as part of the release wouldn't be available in the glusterd.vol. So it's important that for each new options introduced in glusterd.vol file backward compatibility is guaranteed. Fixes: bz#1737676 Change-Id: I776b28bff786320cda299fe673d824024dc9803e Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	event: rename event_XXX with gf_ prefixed	Xiubo Li	2019-07-29	2	-26/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I hit one crash issue when using the libgfapi. In the libgfapi it will call glfs_poller() --> event_dispatch() in file api/src/glfs.c:721, and the event_dispatch() is defined by libgluster locally, the problem is the name of event_dispatch() is the extremly the same with the one from libevent package form the OS. For example, if a executable program Foo, which will also use and link the libevent and the libgfapi at the same time, I can hit the crash, like: kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp 00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000] The link for Foo is: lib_foo_LADD = -levent $(GFAPI_LIBS) It will crash. This is because the glfs_poller() is calling the event_dispatch() from the libevent, not the libglsuter. The gfapi link info : GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid If I link Foo like: lib_foo_LADD = $(GFAPI_LIBS) -levent It will works well without any problem. And if Foo call one private lib, such as handler_glfs.so, and the handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't and it will dlopen(handler_glfs.so), then the crash will be hit everytime. The link info will be: foo_LADD = -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like: foo_LADD = $(GFAPI_LIBS) -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS. And in some cases when the --as-needed link option is added(on many dists it is added as default), then the crash is back again, the above workaround won't work. Fixes: #699 Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa Signed-off-by: Xiubo Li <xiubli@redhat.com>
*	ctime: Set mdata xattr on legacy files	Kotresh HR	2019-07-22	3	-1/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The files which were created before ctime enabled would not have "trusted.glusterfs.mdata"(stores time attributes) xattr. Upon fops which modifies either ctime or mtime, the xattr gets created with latest ctime, mtime and atime, which is incorrect. It should update only the corresponding time attribute and rest from backend Solution: Creating xattr with values from brick is not possible as each brick of replica set would have different times. So create the xattr upon successful lookup if the xattr is not created Note To Reviewers: The time attributes used to set xattr is got from successful lookup. Instead of sending the whole iatt over the wire via setxattr, a structure called mdata_iatt is sent. The mdata_iatt contains only time attributes. Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 fixes: bz#1593542 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	ibverbs/rdma: remove from build	Amar Tumballi	2019-07-13	8	-6126/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have proposed about this an year ago, and with recent smoke failures, it looks like the right time to take such call. ref: https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html With this, glusterfs-8.0 wouldn't have rdma feature, and would allow some modularity changes possible with rpc layer (as we would have just 1 transport) Updates: bz#1635688 Change-Id: Ia277dca4d4b1f0cffae20819024a52b075b775e5 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	rpc/xdr: include nfs specific files in build only if gNFS is enabled	Amar Tumballi	2019-07-10	2	-10/+23
\| \| \| \| \| \|	updates: bz#1193929 Change-Id: I2b85fd0a04c77815a154f445ec8fb4da37dcbe40 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterd/svc: update pid of mux volumes from the shd process	Mohammed Rafi KC	2019-07-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For a normal volume, we are updating the pid from a the process while we do a daemonization or at the end of the init if it is no-daemon mode. Along with updating the pid we also lock the file, to make sure that the process is running fine. With brick mux, we were updating the pidfile from gluterd after an attach/detach request. There are two problems with this approach. 1) We are not holding a pidlock for any file other than parent process. 2) There is a chance for possible race conditions with attach/detach. For example, shd start and a volume stop could race. Let's say we are starting an shd and it is attached to a volume. While we trying to link the pid file to the running process, this would have deleted by the thread that doing a volume stop. Change-Id: I29a00352102877ce09ea3f376ca52affceb5cf1a Updates: bz#1722541 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	glusterfs-fops: fix the modularity	Amar Tumballi	2019-07-02	7	-261/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glusterfs-fops.h was moved to rpc/xdr to support compound fops. (ref: https://review.gluster.org/14032, 2f945b86d3) This was fine as long as all these header files were in single include directory after 'install'. With the move to separate out glusterfs specific header files into another directory inside /usr/include (ref: https://review.gluster.org/21746, 20ef211cfa), glusterfs-fops.h file was not in the proper path when an external .c file tried to include any of glusterfs specific .h file (like xlator.h). Now, we have removed compound-fops, with that, none of the enums declared in glusterfs-fops.h are actually getting used on wire anymore. Hence, it makes sense to get this to libglusterfs/src as a single point of definition. With this change, the external programs can use glusterfs header files. also remove some enum definitions which are not used in code anymore. Updates: bz#1636297 Change-Id: I423c44d3dbe2efc777299c544ece3cb172fc7e44 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	core: fedora 30 compiler warnings	SheetalPamecha	2019-06-18	1	-1/+1
\| \| \| \| \| \| \| \|	warning: ‘%s’ directive argument is null [-Wformat-overflow=] Change-Id: I69b8d47f0002c58b00d1cc947fac6f1c64e0b295 updates: bz#1193929 Signed-off-by: SheetalPamecha <spamecha@redhat.com>
*	multiple files: another attempt to remove includes	Yaniv Kaul	2019-06-14	17	-45/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are many include statements that are not needed. A previous more ambitious attempt failed because of *BSD plafrom (see https://review.gluster.org/#/c/glusterfs/+/21929/ ) Now trying a more conservative reduction. It does not solve all circular deps that we have, but it does reduce some of them. There is just too much to handle reasonably (dht-common.h includes dht-lock.h which includes dht-common.h ...), but it does reduce the overall number of lines of include we need to look at in the future to understand and fix the mess later one. Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	across: clang-scan: fix NULL dereferencing warnings	Amar Tumballi	2019-06-04	1	-5/+3
\| \| \| \| \| \| \| \| \|	All these checks are done after analyzing clang-scan report produced by the CI job @ https://build.gluster.org/job/clang-scan updates: bz#1622665 Change-Id: I590305af4ceb779be952974b2a36066ffc4865ca Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	If bind-address is IPv6 return it successfully	Amgad Saleh	2019-05-28	1	-6/+11
\| \| \| \| \| \|	Change-Id: Ibd37b6ea82b781a1a266b95f7596874134f30079 fixes: bz#1713730 Signed-off-by: Amgad Saleh <amgad.saleh@nokia.com>
*	Fix some "Null pointer dereference" coverity issues	Xavi Hernandez	2019-05-26	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the following CID's: * 1124829 * 1274075 * 1274083 * 1274128 * 1274135 * 1274141 * 1274143 * 1274197 * 1274205 * 1274210 * 1274211 * 1288801 * 1398629 Change-Id: Ia7c86cfab3245b20777ffa296e1a59748040f558 Updates: bz#789278 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	Revert "rpc: implement reconnect back-off strategy"	Amar Tumballi	2019-05-21	2	-18/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 59841f7e1ff0511b04884015441a181a56d07bea. This revert is done as a 'possible' fix for frequent regression failures, which are random in nature too (ie, different tests fails in different runs). Why exactly this patch? Because this patch seemed like most probable candidate which got merged in last 15days, and after which regressions are failing more often. Updates: bz#1711827 Change-Id: I35333162fcd4064f9609525ca93c666053c6d959
*	rpc: implement reconnect back-off strategy	Xavier Hernandez	2019-05-11	2	-16/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a connection failure happens, gluster tries to reconnect every 3 seconds. In some cases the failure is spurious, so a delay of 3 seconds could be unnecessarily long. This patch implements a back-off strategy that tries a reconnect as soon as 1 tenth of a second. If this fails, the time is doubled until it's around 3 seconds. After that, the reconnect is attempted every 3 seconds as before. Change-Id: Icb3fbe20d618f50cbbb599dce542b4e871c22149 Updates: bz#1193929 Signed-off-by: Xavier Hernandez <xhernandez@redhat.com>
*	protocol: remove compound fop	Amar Tumballi	2019-04-29	2	-240/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compound fops are kept on wire as a backward compatibility with older AFR modules. The AFR module used beyond 4.x releases are not using compound fops. Hence removing the compound fop in the protocol code. Note that, compound-fops was already an 'option' in AFR, and completely removed since 4.1.x releases. So, point to note is, with this change, we have 2 ways to upgrade when clients of 3.x series are present. i) set 'use-compound-fops' option to 'false' on any volume which is of replica type. And then upgrade the servers. ii) Do a two step upgrade. First from current version (which will already be EOL if it's using compound) to a 4.1..6.x version, and then an upgrade to 7.x. Consider the overall code which we are removing for the option seems quite high, I believe it is worth it. updates: bz#1693692 Signed-off-by: Amar Tumballi <amarts@redhat.com> Change-Id: I0a8876d0367a15e1410ec845f251d5d3097ee593
*	rpclib: slow floating point math and libm	Kaleb S. KEITHLEY	2019-04-03	2	-9/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In release-6 rpc/rpc-lib (libgfrpc) added the function get_rightmost_set_bit() which calls log2(3), a call that takes a floating point parameter. It's used thusly: right_most_unset_bit = get_rightmost_set_bit(...); (So is it really the right-most unset bit, or the right-most set bit?) It's unclear to me whether this is in the data path or not. If it is, it's rather scary to think about integer-to-float conversions and slow calls to libm functions in the data path. gcc and clang have __builtin_ctz() which returns the same result as get_rightmost_set_bit(), and does it substantially faster. Approx 20M iterations of get_rightmost_set_bit() took ~33sec of wall clock time on my devel machine, while 20M iterations of __builtin_ctz() took < 9sec; get_rightmost_set_bit() is 3x slower than __builtin_ctz(). And as a side benefit, we can again eliminate the need to link libgfrpc with libm. Change-Id: If9e7e80874577c52223f8125b385fc930de20699 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	transport/socket: log shutdown msg occasionally	Raghavendra G	2019-04-03	2	-2/+3
\| \| \| \| \| \|	Change-Id: If3fc0884e7e2f45de2d278b98693b7a473220a5f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1691616
*	mgmt/shd: Implement multiplexing in self heal daemon	Mohammed Rafi KC	2019-04-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Shd daemon is per node, which means they create a graph with all volumes on it. While this is a great for utilizing resources, it is so good in terms of performance and managebility. Because self-heal daemons doesn't have capability to automatically reconfigure their graphs. So each time when any configurations changes happens to the volumes(replicate/disperse), we need to restart shd to bring the changes into the graph. Because of this all on going heal for all other volumes has to be stopped in the middle, and need to restart all over again. Solution: This changes makes shd as a per volume daemon, so that the graph will be generated for each volumes. When we want to start/reconfigure shd for a volume, we first search for an existing shd running on the node, if there is none, we will start a new process. If already a daemon is running for shd, then we will simply detach a graph for a volume and reatach the updated graph for the volume. This won't touch any of the on going operations for any other volumes on the shd daemon. Example of an shd graph when it is per volume graph ----------------------- \| debug-iostat \| ----------------------- / \| \ / \| \ --------- --------- ---------- \| AFR-1 \| \| AFR-2 \| \| AFR-3 \| -------- --------- ---------- A running shd daemon with 3 volumes will be like--> graph ----------------------- \| debug-iostat \| ----------------------- / \| \ / \| \ ------------ ------------ ------------ \| volume-1 \| \| volume-2 \| \| volume-3 \| ------------ ------------ ------------ Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99 fixes: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	rpc: Remove duplicate code	Pranith Kumar K	2019-03-28	3	-77/+1
\| \| \| \| \| \| \| \| \| \|	rpc_clnt_disable() and rpc_clnt_disconnect() have same code. Removed rpc_clnt_disconnect() and moved calls to rpc_clnt_disconnect() to rpc_clnt_disable() updates bz#1193929 Change-Id: I965f57cc1d5af36d266810125558b6f5e5f279d4 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	build: link libgfrpc with MATH_LIB (libm, -lm)	Kaleb S. KEITHLEY	2019-03-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tl;dnr: libgfrpc.so calls log2(3) from libm; it should be explicitly linked with -lm the autoconf/automake/libtool stack is more or less forgiving on different distributions. On forgiving systems libtool will semi- magically link with implicit dependencies. But on Ubuntu, which seems to be tending toward being less forgiving, the link of libgfrpc will fail with an unresolved referencee to log2(3). Change-Id: I9fae09ddb81e49004fbea4d7d83b95fb64a484b0 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	rpc/transport: Missing a ref on dict while creating transport object	Mohammed Rafi KC	2019-03-20	5	-43/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	while creating rpc_tranpsort object, we store a dictionary without taking a ref on dict but it does an unref during the cleaning of the transport object. So the rpc layer expect the caller to take a ref on the dictionary before passing dict to rpc layer. This leads to a lot of confusion across the code base and leads to ref leaks. Semantically, this is not correct. It is the rpc layer responsibility to take a ref when storing it, and free during the cleanup. I'm listing down the total issues or leaks across the code base because of this confusion. These issues are currently present in the upstream master. 1) changelog_rpc_client_init 2) quota_enforcer_init 3) rpcsvc_create_listeners : when there are two transport, like tcp,rdma. 4) quotad_aggregator_init 5) glusterd: init 6) nfs3_init_state 7) server: init 8) client:init This patch does the cleanup according to the semantics. Change-Id: I46373af9630373eb375ee6de0e6f2bbe2a677425 updates: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	socket/ssl: fix crl handling	Milind Changire	2019-03-19	2	-19/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Just setting the path to the CRL directory in socket_init() wasn't working. Solution: Need to use special API to retrieve and set X509_VERIFY_PARAM and set the CRL checking flags explicitly. Also, setting the CRL checking flags is a big pain, since the connection is declared as failed if any CRL isn't found in the designated file or directory. A comment has been added to the code appropriately. Change-Id: I8a8ed2ddaf4b5eb974387d2f7b1a85c1ca39fe79 fixes: bz#1687326 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	dict: handle STR_OLD data type in xdr conversions	Amar Tumballi	2019-03-08	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently a dict conversion on wire for 3.x protocol happens using `dict_unserialize()`, which sets the type of data as STR_OLD. But the new protocol doesn't send it over the wire as its not considered as a valid format in new processes. But considering we deal with old and new protocol when we do a rolling upgrade, it will allow us to get all the information properly with new protocol. Credits: Krutika Dhananjay Fixes: bz#1684385 Change-Id: I165c0021fb195b399790b9cf14a7416ae75ec84f Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	core: implement a global thread pool	Xavi Hernandez	2019-02-18	4	-22/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a thread pool that is wait-free for adding jobs to the queue and uses a very small locked region to get jobs. This makes it possible to decrease contention drastically. It's based on wfcqueue structure provided by urcu library. It automatically enables more threads when load demands it, and stops them when not needed. There's a maximum number of threads that can be used. This value can be configured. Depending on the workload, the maximum number of threads plays an important role. So it needs to be configured for optimal performance. Currently the thread pool doesn't self adjust the maximum for the workload, so this configuration needs to be changed manually. For this reason, the global thread pool has been made optional, so that volumes can still use the thread pool provided by io-threads. To enable it for bricks, the following option needs to be set: config.global-threading = on This option has no effect if bricks are already running. A restart is required to activate it. It's recommended to also enable the following option when running bricks with the global thread pool: performance.iot-pass-through = on To enable it for a FUSE mount point, the option '--global-threading' must be added to the mount command. To change it, an umount and remount is needed. It's recommended to disable the following option when using global threading on a mount point: performance.client-io-threads = off To enable it for services managed by glusterd, glusterd needs to be started with option '--global-threading'. In this case all daemons, like self-heal, will be using the global thread pool. Currently it can only be enabled for bricks, FUSE mounts and glusterd services. The maximum number of threads for clients and bricks can be configured using the following options: config.client-threads config.brick-threads These options can be applied online and its effect is immediate most of the times. If one of them is set to 0, the maximum number of threads will be calcutated as #cores * 2. Some distributions use a very old userspace-rcu library (version 0.7) for this reason, some header files from version 0.10 have been copied into contrib/userspace-rcu and are used if the detected version is 0.7 or older. An additional change has been made to io-threads to prevent that threads are started when iot-pass-through is set. Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b updates: #532 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	socket: socket event handlers now return void	Milind Changire	2019-02-18	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Returning any value from socket event handlers to the event sub-system doesn't make sense since event sub-system cannot handle socket sub-system errors. Solution: Change return type of all socket event handlers to 'void' Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f Fixes: bz#1651246 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	clnt/rpc: ref leak during disconnect.	Mohammed Rafi KC	2019-02-12	1	-1/+10
\| \| \| \| \| \| \| \| \| \|	During disconnect cleanup, we are not cancelling reconnect timer, which causes a ref leak each time when a disconnect happen. Change-Id: I9d05d1f368d080e04836bf6a0bb018bf8f7b5b8a updates: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	Multiple files: reduce work while under lock.	Yaniv Kaul	2019-01-29	1	-17/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly, unlock before logging. In some cases, moved different code that was not needed to be under lock (for example, taking time, or malloc'ing) to be executed before taking the lock. Note: logging might be slightly less accurate in order, since it may not be done now under the lock, so order of logs is racy. I think it's a reasonable compromise. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89
*	core: heketi-cli is throwing error "target is busy"	Mohit Agrawal	2019-01-24	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: At the time of deleting block hosting volume through heketi-cli , it is throwing an error "target is busy". cli is throwing an error because brick is not detached successfully and brick is not detached due to race condition to cleanp xprt associated with detached brick Solution: To avoid xprt specifc race condition introduce an atomic flag on rpc_transport Change-Id: Id4ff1fe8375a63be71fb3343f455190a1b8bb6d4 fixes: bz#1668190 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	core: move logs which are only developer relevant to DEBUG level	Amar Tumballi	2019-01-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	We had only changed the log level to DEBUG in release branch earlier. But considering 90%+ of our deployments happen in same env, we can look at these specific logs on need basis. With this change, the master branch will be easier to debug with lesser logs. Change-Id: I4157a7ec7d5ec9c2948b2bbc1e4cb8317f28d6b8 Updates: bz#1666833 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	rpc: use address-family option from vol file	Milind Changire	2019-01-22	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch helps enable IPv6 connections in the cluster. The default address-family is IPv4 without using this option explicitly. When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol file, the mount command-line also needs to have -o xlator-option="transport.address-family=inet6" added to it. This option also gets added to the brick command-line. Snapshot and gfapi use-cases should also use this option to pass in the inet6 address-family. Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270 fixes: bz#1635863 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	socket: don't pass return value from protocol handler to event handler	Zhang Huan	2019-01-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Event handler handles socket level error only, while protocol handler handles in protocol level error. If protocol handler decides to disconnect on error in any case, it should call disconnect instead of return an error back to event handler. Change-Id: I9375be98cc52cb969085333f3c7229a91207d1bd updates: bz#1666143 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
*	socket: fix issue when socket read return with EAGAIN	Zhang Huan	2019-01-22	1	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the case socket read returns EAGAIN, positive value about remaining vector to send is returned. This return value will be passed all the way back to event handler, making it complains. [2018-12-29 08:02:25.603199] T [socket.c:1640:__socket_read_simple_payload] 0-test-client-0-extra.0: partial read on non-blocking socket. [2018-12-29 08:02:25.603201] T [rpc-clnt.c:654:rpc_clnt_reply_init] 0-test-client-2-extra.1: received rpc message (RPC XID: 0xfa6 Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 12) from rpc-transport (test-client-2-extra.1) [2018-12-29 08:02:25.603207] T [socket.c:3129:socket_event_handler] 0-test-client-0-extra.0: (sock:32) socket_event_poll_in returned 1 Formerly, in socket_proto_state_machine, return value of socket_readv is used to check if message is all read-in. In this commit, it is checked whether size of bytes indicated in header are all read in. In this way, only 0 and -1 will be returned from socket_proto_state_machine(), indicating whether there is error in the underlying socket. Change-Id: I8be0d178b049f0720d738a03aec41c4b375d2972 updates: bz#1666143 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
*	socket: fix issue when socket write return with EAGAIN	Zhang Huan	2019-01-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	In the case socket write return with EAGAIN, the remaining vector count is return all way back to event handler, making followup pollin event to skip handling and dispatch loop complains about failure. Even thought temporary write failure is not an error. [2018-12-29 07:31:41.772310] E [MSGID: 101191] [event-epoll.c:674:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler Change-Id: Idf03d120b5f7619eda19720a583cbcc3e7da2504 updates: bz#1666143 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
*	socket: fix counting of socket total_bytes_read and total_bytes_write	Zhang Huan	2019-01-17	1	-4/+4
\| \| \| \| \| \|	Change-Id: If35d0dbae963facf00ab6bcf07c6e4d1706ed982 updates: bz#1666143 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
*	Revert "iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool"	Amar Tumballi	2019-01-08	2	-4/+248
\| \| \| \| \| \| \| \| \|	This reverts commit b87c397091bac6a4a6dec4e45a7671fad4a11770. There seems to be some performance regression with the patch and hence recommended to have it reverted. Updates: #325 Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09
*	rpc-clnt: reduce transport connect log for EINPROGRESS	Kinglong Mee	2019-01-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	quotad and ganesha.nfsd prints many logs as, [rpc-clnt.c:1739:rpc_clnt_submit ] 0-<VOLUME_NAME>-quota: error returned while attempting to connect to host: (null), port 0 Change-Id: Ic0c815400619e4a87a772a51b19822920228c1ef Updates: bz#1596787 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
*	rpcsvc: Don't expect dictionary values to be available	Pranith Kumar K	2019-01-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	When reconfigure happens, string values from one dictionary are directly set in another dictionary. This can lead to invalid memory when the first dictionary is freed up. So do dict_set_dynstr_with_alloc instead of dict_set_str updates bz#1650403 Change-Id: Id53236467521cfdeb07e7178d87ba6cf88d17003 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	rpc/rpc-lib: fix coverity issue	Sheetal Pamecha	2018-12-28	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \|	Defect: Code can never be reached because of the condition queue_index > 1024 cannot be true. CID: 1398471 Logically dead code updates: bz#789278 Change-Id: I367cda7e734f6d774900a58d8664cffcab69126f Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
*	rpc : fix coverity in rpc/rpc-lib/src/rpcsvc.c	Sunny Kumar	2018-12-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	This patch fixes newly introduced coverity. CID: 1398472: Dereference before null check. updates: bz#789278 Change-Id: Ie9b13084097de8f24b138acd7608c3e15b3bba9c Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	socket: Remove redundant in_lock in incoming message handling	Krutika Dhananjay	2018-12-26	2	-37/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	A given epoll thread can handle only one incoming (POLLIN) request. And until the socket is rearmed for listening, it is guaranteed that there won't be any new incoming requests. As a result, the priv->in_lock which guards the socket proto state machine seems redundant. This patch removes priv->in_lock. Change-Id: I26b6ddd852aba8c10385833b85ffd2e53e46cb8c updates: bz#1467614 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	rpc: Use adaptive mutex in rpcsvc_program_register	Mohit Agrawal	2018-12-20	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	Adaptive mutexes are used to protect critical/shared data items that are held for short periods.It provides a balance between spin locks and traditional mutex.We have observed after use adaptive mutex in rpcsvc_program_register got some improvement. Change-Id: I7905744b32516ac4e4ca3c83c2e8e5e306093add fixes: bz#1660701
*	rdma: fix possible buffer overflow	Rinku Kothiya	2018-12-19	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	used snprintf instead of sprintf and if the source string is bigger than destination then logged a warning message. clang warning: ‘%s’ directive writing up to 1024 bytes into a region of size 108. updates: bz#1622665 Change-Id: Ia5e7c53d35d8178dd2c75708698599fe8bded5de Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
*	iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool	Poornima G	2018-12-18	2	-248/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of iobuf_pool has two problems: - prealloc of 12.5MB memory, this limits the scale factor of the gluster processes due to RAM requirements - lock contention, as the current implementation has one global iobuf_pool lock. Credits for debugging and addressing the same goes to Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410 Hence changing the iobuf implementation to use per thread mem pool. This may theoritically appear to cause perf dip as there is no preallocation. But per thread mem pool will not have significant perf impact as the last allocated memory is kept alive for subsequent allocs, for some time. The worst case would be if iobufs requested are of random sizes each time. The best case is, if we get iobuf request of the same size. From the perf tests, this patch did not seem to cause any perf decrease. Note that, with this patch, the rdma performance is going to degrade drastically. In one of the previous patchsets we had fixes to not degrade rdma perf, but rdma is not supported and also not tested [1]. Hence the decision was to not have code in rdma that is not tested and not supported. [1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html Updates: #325 Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27 Signed-off-by: Poornima G <pgurusid@redhat.com>