summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/inode.c
Commit message (Collapse)AuthorAgeFilesLines
* core: Reduce calls to THIS wherever possible (#2010)Karthik Subrahmanya2021-01-151-17/+21
| | | | | | | | | In few functions 'THIS' is called inside a loop and saved for later use in 'old_THIS'. Instead we can call 'THIS' only when 'old_THIS' is NULL and reuse that itself to reduce redundant calls. Change-Id: Ie5d4e5fe42bd4df02d101b4c199759cb84e6aee1 Fixes: #1755 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* all: change 'primary' to 'root' where it makes senseRavishankar N2020-12-021-2/+2
| | | | | | | | | | | | | | | | | | As a part of offensive language removal, we changed 'master' to 'primary' in some parts of the code that are *not* related to geo-replication via commits e4c9a14429c51d8d059287c2a2c7a76a5116a362 and 0fd92465333be674485b984e54b08df3e431bb0d. But it is better to use 'root' in some places to distinguish it from the geo-rep changes which use 'primary/secondary' instead of 'master/slave'. This patch mainly changes glusterfs_ctx_t->primary to glusterfs_ctx_t->root. Other places like meta xlator is also changed. gf-changelog.c is not changed since it is related to geo-rep. Updates: #1000 Change-Id: I3cd610f7bea06c7a28ae2c0104f34291023d1daf Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* change 'master' xlator to 'primary' xlatorRavishankar N2020-11-301-1/+1
| | | | | | | | | These were the only offensive language occurences in the code (.c) after making the changes for geo-rep (whichis tracked in issue 1415). Change-Id: I21cd558fdcf8098e988617991bd3673ef86e120d Updates: #1000 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* core: tcmu-runner process continuous growing logs lru_size showing -1 (#1776)mohit842020-11-241-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * core: tcmu-runner process continuous growing logs lru_size showing -1 At the time of calling inode_table_prune it checks if current lru_size is greater than lru_limit but lru_list is empty it throws a log message "Empty inode lru list found but with (%d) lru_size".As per code reading it seems lru_size is out of sync with the actual number of inodes in lru_list. Due to throwing continuous error messages entire disk is getting full and the user has to restart the tcmu-runner process to use the volumes.The log message was introduce by a patch https://review.gluster.org/#/c/glusterfs/+/15087/. Solution: Introduce a flag in_lru_list to take decision about inode is being part of lru_list or not. Fixes: #1775 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I4b836bebf4b5db65fbf88ff41c6c88f4a7ac55c1 * core: tcmu-runner process continuous growing logs lru_size showing -1 Update in_lru_list flag only while modify lru_size Fixes: #1775 Change-Id: I3bea1c6e748b4f50437999bae59edeb3d7677f47 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> * core: tcmu-runner process continuous growing logs lru_size showing -1 Resolve comments in inode_table_destroy and inode_table_prune Fixes: #1775 Change-Id: I5aa4d8c254f0fe374daa5ec604f643dea8dd56ff Signed-off-by: Mohit Agrawal moagrawa@redhat.com * core: tcmu-runner process continuous growing logs lru_size showing -1 Update in_lru_list only while update lru_size Fixes: #1775 Change-Id: I950eb1f0010c3d4bcc44a33225a502d2291d1a83 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* xlators: misc conscious language changes (#1715)Ravishankar N2020-11-021-7/+5
| | | | | | | | | | | | core:change xlator_t->ctx->master to xlator_t->ctx->primary afr: just changed comments. meta: change .meta/master to .meta/primary. Might break scripts. changelog: variable/function name changes only. These are unrelated to geo-rep. Fixes: #1713 Change-Id: I58eb5fcd75d65fc8269633acc41313503dccf5ff Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* core: configure optimum inode table hash_size for shd (#1576)mohit842020-10-111-16/+35
| | | | | | | | | | | | | | | | | | | | | In brick_mux environment a shd process consume high memory. After print the statedump i have found it allocates 1M per afr xlator for all bricks.In case of configure 4k volumes it consumes almost total 6G RSS size in which 4G consumes by inode_tables [cluster/replicate.test1-replicate-0 - usage-type gf_common_mt_list_head memusage] size=1273488 num_allocs=2 max_size=1273488 max_num_allocs=2 total_allocs=2 inode_new_table function allocates memory(1M) for a list of inode and dentry hash. For shd lru_limit size is 1 so we don't need to create a big hash table so to reduce RSS size for shd process pass optimum bucket count at the time of creating inode_table. Change-Id: I039716d42321a232fdee1ee8fd50295e638715bb Fixes: #1538 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* glusterfsd: structure loggingyatipadia2019-11-051-19/+14
| | | | | | | | convert gf_msg() to gf_smsg() Change-Id: I1cd6a5ac6f4361195d5d925efb2cc194045d0bba Updates: #657 Signed-off-by: yatip <ypadia@redhat.com>
* Avoid buffer overwrite due to uuid_utoa() misuseDmitry Antipov2019-12-261-5/+7
| | | | | | | | | | | | | | | Code like: f(..., uuid_utoa(x), uuid_utoa(y)); is not valid (causes undefined behaviour) because uuid_utoa() uses the only static thread-local buffer which will be overwritten by the subsequent call. All such cases should be converted to use uuid_utoa_r() with explicitly specified buffer. Change-Id: I5e72bab806d96a9dd1707c28ed69ca033b9c8d6c Updates: bz#1193929 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* fuse: Set limit on invalidate queue sizeN Balachandran2019-08-091-8/+23
| | | | | | | | | | | | | If the glusterfs fuse client process is unable to process the invalidate requests quickly enough, the number of such requests quickly grows large enough to use a significant amount of memory. We are now introducing another option to set an upper limit on these to prevent runaway memory usage. Change-Id: Iddfff1ee2de1466223e6717f7abd4b28ed947788 Fixes: bz#1732717 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* * core: do not assert in inode_unref if the inode table cleanup has startedRaghavendra Bhat2019-04-301-0/+38
| | | | | | | | | | | | | | | | | | | | | | There is a good chance that, the inode on which unref came has already been zero refed and added to the purge list. This can happen when inode table is being destroyed (glfs_fini is something which destroys the inode table). Consider a directory 'a' which has a file 'b'. Now as part of inode table destruction zero refing of inodes does not happen from leaf to the root. It happens in the order inodes are present in the list. So, in this example, the dentry of 'b' would have its parent set to the inode of 'a'. So if 'a' gets zero refed first (as part of inode table cleanup) and then 'b' has to zero refed, then dentry_unset is called on the dentry of 'b' and it further goes on to call inode_unref on b's parent which is 'a'. In this situation, GF_ASSERT would be called as the refcount of 'a' has been already set to zero. So, return the inode (in the function inode_unref without doing anything) if the inode table cleanup has already started and inode's refcount is zero. Change-Id: I91e0a807d5c9ce0daae5a611c38da379fd11076e fixes: bz#1722546 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* multiple files: another attempt to remove includesYaniv Kaul2019-06-091-1/+0
| | | | | | | | | | | | | | | | | | There are many include statements that are not needed. A previous more ambitious attempt failed because of *BSD plafrom (see https://review.gluster.org/#/c/glusterfs/+/21929/ ) Now trying a more conservative reduction. It does not solve all circular deps that we have, but it does reduce some of them. There is just too much to handle reasonably (dht-common.h includes dht-lock.h which includes dht-common.h ...), but it does reduce the overall number of lines of include we need to look at in the future to understand and fix the mess later one. Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Fix some "Null pointer dereference" coverity issuesXavi Hernandez2019-05-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | This patch fixes the following CID's: * 1124829 * 1274075 * 1274083 * 1274128 * 1274135 * 1274141 * 1274143 * 1274197 * 1274205 * 1274210 * 1274211 * 1288801 * 1398629 Change-Id: Ia7c86cfab3245b20777ffa296e1a59748040f558 Updates: bz#789278 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* inode: fix wrong loop count in __inode_ctx_freeXie Changlong2019-05-171-5/+6
| | | | | | | | Avoid serious memory leak fixes: bz#1711240 Change-Id: Ic61a8fdd0e941e136c98376a87b5a77fa8c22316 Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
* inode: fix unused varsAtin Mukherjee2019-03-211-1/+2
| | | | | | | | | Commit 6d6a3b2 introduced some unused vars. This patch defines them within #ifdef DEBUG Fixes: bz#1580315 Change-Id: I8a332b00c3ffb66689b4b6480c490b9436c17d63 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* inode: don't dump the whole table to CLIAmar Tumballi2019-03-131-0/+13
| | | | | | | | | | | dumping the whole inode table detail to screen doesn't solve any purpose. We should be getting only toplevel details on CLI, and then if one wants to debug further, then they need to get to 'statedump' to get full details. Fixes: bz#1580315 Change-Id: Iaf3de94602f9c76832c3c918ffe2ad13c0b0e448 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* inode: make critical section smallerAmar Tumballi2019-02-091-213/+111
| | | | | | | | | | | | | | | | do all the 'static' tasks outside of locked region. * hash_dentry() and hash_gfid() are now called outside locked region. * remove extra __dentry_hash exported in libglusterfs.sym * avoid checks in locked functions, if the check is done in calling function. * implement dentry_destroy(), which handles freeing of dentry separately, from that of dentry_unset (which takes care of separating dentry from inode, and table) Updates: bz#1670031 Change-Id: I584213e0748464bb427fbdef3c4ab6615d7d5eb0 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* inode: create inode outside locked regionAmar Tumballi2019-02-091-11/+13
| | | | | | | | | Only linking of inode to the table, and inserting it in a list needs to be in locked region. Updates: bz#1670031 Change-Id: I6ea7e956b80cf2765c2233d761909c4bf9c7253c Signed-off-by: Amar Tumballi <amarts@redhat.com>
* fix 32-bit-build-smoke warningsIraj Jamali2018-12-171-1/+1
| | | | | | | fixes: bz#1622665 Change-Id: I777d67b1b62c284c62a02277238ad7538eef001e Signed-off-by: Iraj Jamali <ijamali@redhat.com>
* fuse: add --lru-limit optionAmar Tumballi2018-10-161-35/+219
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode LRU mechanism is moot in fuse xlator (ie. there is no limit for the LRU list), as fuse inodes are referenced from kernel context, and thus they can only be dropped on request of the kernel. This might results in a high number of passive inodes which are useless for the glusterfs client, causing a significant memory overhead. This change tries to remedy this by extending the LRU semantics and allowing to set a finite limit on the fuse inode LRU. A brief history of problem: When gluster's inode table was designed, fuse didn't have any 'invalidate' method, which means, userspace application could never ask kernel to send a 'forget()' fop, instead had to wait for kernel to send it based on kernel's parameters. Inode table remembers the number of times kernel has cached the inode based on the 'nlookup' parameter. And 'nlookup' field is not used by no other entry points (like server-protocol, gfapi etc). Hence the inode_table of fuse module always has to have lru-limit as '0', which means no limit. GlusterFS always had to keep all inodes in memory as kernel would have had a reference to it. Again, the reason for this is, kernel's glusterfs inode reference was pointer of 'inode_t' structure in glusterfs. As it is a pointer, we could never free it (to prevent segfault, or memory corruption). Solution: In the inode table, handle the prune case of inodes with 'nlookup' differently, and call a 'invalidator' method, which in this case is fuse_invalidate(), and it sends the request to kernel for getting the forget request. When the kernel sends the forget, it means, it has dropped all the reference to the inode, and it will send the forget with the 'nlookup' parameter too. We just need to make sure to reduce the 'nlookup' value we have when we get forget. That automatically cause the relevant prune to happen. Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B fixes: bz#1560969 Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-11-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* core: fix strncpy warningsKaleb S. KEITHLE2018-11-091-2/+2
| | | | | | | | | | | | | | | | | | | Since gcc-8.2.x (fedora-28 or so) gcc has been emitting warnings about buggy use of strncpy. e.g. warning: ‘strncpy’ output truncated before terminating nul copying as many bytes from a string as its length and warning: ‘strncpy’ specified bound depends on the length of the source argument Since we're copying string fragments and explicitly null terminating use memcpy to silence the warning Change-Id: I413d84b5f4157f15c99e9af3e154ce594d5bcdc1 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* libglusterfs multiple files: remove dead initilizationYaniv Kaul2018-10-211-0/+2
| | | | | | | | | | | | | Per newer GCC releases and clang-scan, some trivial dead initialization (values that were set but were never read) were removed. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ia9959b2ff87d2e9cb46864e68ffe7dccb984db34
* all: fix the format string exceptionsAmar Tumballi2018-11-011-2/+2
| | | | | | | | | | | | | | | | Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. While fixing it, makes sense to make the explicit check for such issues across the codebase, by making the format call properly. Fixes: CVE-2018-14661 Fixes: bz#1644763 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* core: Use GF_ATOMIC ops to update inode->nlookupMohit Agrawal2018-10-301-38/+26
| | | | | | | fixes: bz#1644164 Change-Id: I0ac5aff565b3a30d5ff25ec5a3f20e0bda424a5d Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* libglusterfs: NULL pointer dereferencing clang fixIraj Jamali2018-09-181-1/+1
| | | | | | | | | | | Problem: trav could be NULL. Solution: Adding a check to avoid clang error. Updates: bz#1622665 Change-Id: If26be82edea5e33c2356cea3769496f1cbd3774c Signed-off-by: Iraj Jamali <ijamali@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-121-1807/+1755
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* multiple files: remove unndeeded memset()Yaniv Kaul2018-08-241-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of multiple commits: contrib/fuse-lib/misc.c: remove unneeded memset() All flock variables are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I8e0512c5a88daadb0e587f545fdb9b32ca8858a2 libglusterfs/src/{client_t|fd|inode|stack}.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2be9ccc3a5cb5da51a92af73488cdabd1c527f59 libglusterfs/src/xlator.c: remove unneeded memset() All xl->mem_acct members are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I7f264cd47e7a06255a3f3943c583de77ae8e3147 xlators/cluster/afr/src/afr-self-heal-common.c: remove unneeded memset() Since we are going over the whole array anyway, initialize it properly, to either 1 or 0. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ied4210388976b6a7a2e91cc3de334534d6fef201 xlators/cluster/dht/src/dht-common.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Idc436d2bd0563b6582908d7cbebf9dbc66a42c9a xlators/cluster/ec/src/ec-helpers.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I81bf971f7fcecb4599e807d37f426f55711978fa xlators/mgmt/glusterd/src/glusterd-volgen.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I476ea59ba53546b5153c269692cd5383da81ce2d xlators/mgmt/glusterd/src/glusterd-geo-rep.c: read() in 4K blocks The current 1K seems small. 4K is usually better (in Linux). Also remove a memset() that I don't think is needed between reads. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I5fb7950c92d282948376db14919ad12e589eac2b xlators/storage/posix/src/posix-{gfid-path|inode-fd-ops}.c: remove memset() before sys_*xattr() functions. I don't see a reason to memset the array sent to the functions sys_llistxattr(), sys_lgetxattr(), sys_lgetxattr(), sys_flistxattr(), sys_fgetxattr(). (Note: it's unclear to me why we are calling sys_*txattr() functions with XATTR_VAL_BUF_SIZE-1 size instead of XATTR_VAL_BUF_SIZE ). Only compile-tested! Change-Id: Ief2103b56ba6c71e40ed343a93684eef6b771346 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Revert "performance/write-behind: better invalidation in readdirp"Raghavendra G2018-08-201-6/+1
| | | | | | | | | | | This reverts commit 4d3c62e71f3250f10aa0344085a5ec2d45458d5c. Traversing all children of a directory in wb_readdirp caused significant performance regression. Hence reverting this patch Change-Id: I6c3b6cee2dd2aca41d49fe55ecdc6262e7cc5f34 updates: bz#1512691 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* All: remove memset() before sprintf()Yaniv Kaul2018-08-021-10/+0
| | | | | | | | | | | | It's not needed. There's a good chance the compiler is smart enough to remove it anyway, but it can't hurt - I hope. Compile-tested only! Change-Id: Id7c054e146ba630227affa591007803f3046416b updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* performance/write-behind: better invalidation in readdirpRaghavendra G2018-06-271-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | Current invalidation of stats in wb_readdirp_cbk is prone to races. As the deleted comment explains, <snip> We cannot guarantee integrity of entry->d_stat as there are cached writes. The stat is most likely stale as it doesn't account the cached writes. However, checking for non-empty liability list here is not a fool-proof solution as there can be races like, 1. readdirp is successful on posix 2. sync of cached write is successful on posix 3. write-behind received sync response and removed the request from liability queue 4. readdirp response is processed at write-behind. In the above scenario, stat for the file is sent back in readdirp response but it is stale. </snip> Change-Id: I6ce170985cc6ce3df2382ec038dd5415beefded5 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-161-1/+1
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* gluster: Sometimes Brick process is crashed at the time of stopping brickMohit Agrawal2018-03-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Sometimes brick process is getting crashed at the time of stop brick while brick mux is enabled. Solution: Brick process was getting crashed because of rpc connection was not cleaning properly while brick mux is enabled.In this patch after sending GF_EVENT_CLEANUP notification to xlator(server) waits for all rpc client connection destroy for specific xlator.Once rpc connections are destroyed in server_rpc_notify for all associated client for that brick then call xlator_mem_cleanup for for brick xlator as well as all child xlators.To avoid races at the time of cleanup introduce two new flags at each xlator cleanup_starting, call_cleanup. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/#/c/19700/) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ic4ab9c128df282d146cf1135640281fcb31997bf updates: bz#1544090
* storage/posix: Add active-fd-count option in glusterPranith Kumar K2018-03-191-0/+2
| | | | | | | | | | | | | | | | | | | | Problem: when dd happens on sharded replicate volume all the writes on shards happen through anon-fd. When the writes don't come quick enough, old anon-fd closes and new fd gets created to serve the new writes. open-fd-count is decremented only after the fd is closed as part of fd_destroy(). So even when one fd is on the way to be closed a new fd will be created and during this short period it appears as though there are multiple fds opened on the file. AFR thinks another application opened the same file and switches off eager-lock leading to extra latency. Fix: Have a different option called active-fd whose life cycle starts at fd_bind() and ends just before fd_destroy() BUG: 1557932 Change-Id: I2e221f6030feeedf29fbb3bd6554673b8a5b9c94 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* Coverity Issue: PW.INCLUDE_RECURSION in several filesGirjesh Rajoria2017-11-031-1/+0
| | | | | | | | | | | | | | Coverity ID: 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 423, 424, 425, 426, 427, 428, 429, 436, 437, 438, 439, 440, 441, 442, 443 Issue: Event include_recursion Removed redundant, recursive includes from the files. Change-Id: I920776b1fa089a2d4917ca722d0075a9239911a7 BUG: 789278 Signed-off-by: Girjesh Rajoria <grajoria@redhat.com>
* gfapi : Resolve "." and ".." only for named lookupsJiffin Tony Thottan2017-06-111-100/+18
| | | | | | | | | | | | | | | | | | | | The patch https://review.gluster.org/#/c/17177 resolves "." and ".." to corrosponding inodes and names before sending the request to the backend server. But this will only work if inode and its parent is linked properly. Incase of nameless lookup(applications like ganesha) the inode of parent can be NULL(only gfid is send). So this patch will resolve "." and ".." only if proper parent is available Change-Id: I4c50258b0d896dabf000a547ab180b57df308a0b BUG: 1460514 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: https://review.gluster.org/17502 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* gfapi: fix handling of dot and double dot in pathMohammed Rafi KC2017-05-031-0/+110
| | | | | | | | | | | | | | | | This patch is to handle "." and ".." in file path. Which means this special dentry names will be resolved before sending fops on the path. Change-Id: I5e92f6d1ad1412bf432eb2488e53fb7731edb013 BUG: 1447266 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17177 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* dht: Add readdir-ahead in rebalance graph if parallel-readdir is onPoornima G2017-04-131-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The value of linkto xattr is generally the name of the dht's next subvol, this requires that the next subvol of dht is not changed for the life time of the volume. But with parallel readdir enabled, the readdir-ahead loaded below dht, is optional. The linkto xattr for first subvol, when: - parallel readdir is enabled : "<volname>-readdir-head-0" - plain distribute volume : "<volname>-client-0" - distribute replicate volume : "<volname>-afr-0" The value of linkto xattr is "<volname>-readdir-head-0" when parallel readdir is enabled, and is "<volname>-client-0" if its disabled. But the dht_lookup takes care of healing if it cannot identify which linkto subvol, the xattr points to. In dht_lookup_cbk, if linkto xattr is found to be "<volname>-client-0" and parallel readdir is enabled, then it cannot understand the value "<volname>-client-0" as it expects "<volname>-readdir-head-0". In that case, dht_lookup_everywhere is issued and then the linkto file is unlinked and recreated with the right linkto xattr. The issue is when parallel readdir is enabled, mount point accesses the file that is currently being migrated. Since rebalance process doesn't have parallel-readdir feature, it expects "<volname>-client-0" where as mount expects "<volname>-readdir-head-0". Thus at some point either the mount or rebalance will fail. Solution: Enable parallel-readdir for rebalance as well and then do not allow enabling/disabling parallel-readdir if rebalance is in progress. Change-Id: I241ab966bdd850e667f7768840540546f5289483 BUG: 1436090 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17056 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* libglusterfs: Fix a crash due to race between inode_ctx_set and inode_refPoornima G2017-02-151-34/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: Currently inode ref count is gaurded by inode_table->lock, and inode_ctx is gauarded by inode->lock. With the new patch [1] inode_ref was modified to change the inode_ctx to track the ref count per xlator. Thus inode_ref performed under inode_table->lock is modifying inode_ctx which has to be modified only under inode->lock Solution: When a inode is created, inode_ctx holder is allocated for all the xlators. Hence in case of inode_ctx_set instead of using the first free index in inode ctx holder, we can have predecided index for every xlator in the graph. Credits Pranith K <pkarampu@redhat.com> [1] http://review.gluster.org/13736 Change-Id: I1bfe111c211fcc4fcd761bba01dc87c4c69b5170 BUG: 1423373 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16622 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* inode: Add per xlator ref count for inodePoornima G2016-03-151-16/+60
| | | | | | | | | | | | | | | | | Debugging inode ref leaks is very difficult as there is no info except for the ref count on the inode. Hence this patch is first step towards debugging inode ref leaks. With this patch, in the statedump we get additional info that tells the ref count taken by each xlator on the inode. Change-Id: I7802f7e7b13c04eb4d41fdf52d5469fd2c2a185a BUG: 1325531 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13736 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* performance/readdir-ahead: limit cache sizeRaghavendra G2016-11-241-0/+32
| | | | | | | | | | | | | | | | This patch introduces a new option called "rda-cache-limit", which is the maximum value the entire readdir-ahead cache can grow into. Since, readdir-ahead holds a reference to inode through dentries, this patch also accounts memory stored by various xlators in inode contexts. Change-Id: I84cc0ca812f35e0f9041f8cc71effae53a9e7f99 BUG: 1356960 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/16137 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* inode: Adjust lru_size while retiring entries in lru listSoumya Koduri2016-08-041-0/+9
| | | | | | | | | | | | | | | | | As part of inode_table_destroy(), we first retire entries in the lru list but the lru_size is not adjusted accordingly. This may result in invalid memory reference in inode_table_prune if the lru_size > lru_limit. Change-Id: I29ee3c03b0eaa8a118d06dc0cefba85877daf963 BUG: 1364026 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/15087 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* core: use readdir(3) with glibc, and associated cleanupKaleb S. KEITHLEY2016-07-071-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting with glibc-2.23 (i.e. what's in Fedora 25), readdir_r(3) is marked as deprecated. Specifically the function decl in <dirent.h> has the deprecated attribute, and now warnings are thrown during the compile on Fedora 25 builds. The readdir(_r)(3) man page (on Fedora 25 at least) and World+Dog say that glibc's readdir(3) is, and always has been, MT-SAFE as long as only one thread is accessing the directory object returned by opendir(). World+Dog also says there is a potential buffer overflow in readdir_r(). World+Dog suggests that it is preferable to simply use readdir(). There's an implication that eventually readdir_r(3) will be removed from glibc. POSIX has, apparently deprecated it in the standard, or even removed it entirely. Over and above that, our source near the various uses of readdir(_r)(3) has a few unsafe uses of strcpy()+strcat(). (AFAIK nobody has looked at the readdir(3) implemenation in *BSD to see if the same is true on those platforms, and we can't be sure of MacOS even though we know it's based on *BSD.) Change-Id: I5481f18ba1eebe7ee177895eecc9a80a71b60568 BUG: 1356998 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14838 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* core: correct the if-statment in inode_set_need_lookup()Niels de Vos2016-05-161-1/+1
| | | | | | | | | | | | | | | | There does not seem to be an ill side-effect from the incorrect if-statement. But we should really stick to the same checks we do everywhere. BUG: 1236009 Change-Id: If2b787287ac0d87712840b15b8c914e3dc5ffcde Reported-by: kinsu <vpolakis@gmail.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14363 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs/gfapi: set appropriate errno for inode_link failuresSoumya Koduri2016-05-101-4/+17
| | | | | | | | | | | | | | | | | | We do not seem to be setting errno appropriately in case of inode_link failures. This errno may be used by any application (for eg., nfs-ganesha) to determine the error encountered. This patch addresses the same. Change-Id: I674f747c73369d0597a9c463e6ea4c85b9091355 BUG: 1334621 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/14278 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* inode: Always fetch first entry from the inode lists during inode_table_destroySoumya Koduri2016-04-131-8/+8
| | | | | | | | | | | | | | | | | | | | | | | In inode_table_destroy, we iterate through lru and active lists to move the entries to purge list so that they can be destroyed during inode_table_prune. But if used "list_for_each_entry" or "list_for_each_entry_safe" to iterate, we could end up accessing the entries which may have got moved to different(purge) lists in the process and can result in either infinite loop or crash. The safe approach seems to fetch the first entry of the list in each iteration till it gets empty. Change-Id: I24a18881833bd9419c2d8e5e8807bc71ec396479 BUG: 1326627 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/13987 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* inode: Retire the inodes from the lru list in inode_table_destroySoumya Koduri2015-12-311-3/+30
| | | | | | | | | | | | | | | | | | | | | | | Inodes from the lru list are not moved to purge list unless they are retired. Also process the lru list first to unset their parent as we need to unset their dentry entries (the ones which may not be unset during '__inode_passivate' as they were hashed) which in turn shall unref their parent inodes which could be in active list. These parent inodes when unref'ed may well again fall into lru list and if we are at the end of traversing the list, we may miss to delete/retire that entry. Hence traverse the lru list till it gets empty. Change-Id: Ib7666e235e9b9644144a7c7933afb5e407e506ca BUG: 1295107 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/13125 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: send lookup if inode_ctx is not setMohammed Rafi KC2016-01-121-4/+14
| | | | | | | | | | | | | | | During resolving of an entry or inode, if inode ctx was not set, we will send a lookup. This patch also make sure that inode_ctx will be created after every inode_link Change-Id: I4211533ca96a51b89d9f010fc57133470e52dc11 BUG: 1297311 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13225 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* protocol/server: forget the inodes which got ENOENT in lookupRaghavendra Bhat2015-07-011-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a looked up object is removed from the backend, then upon getting a revalidated lookup on that object ENOENT error is received. protocol/server xlator handles it by removing dentry upon which ENOENT is received. But the inode associated with it still remains in the inode table, and whoever does nameless lookup on the gfid of that object will be able to do it successfully despite the object being not present. For handling this issue, upon getting ENOENT on a looked up entry in revalidate lookups, protocol/server should forget the inode as well. Though removing files directly from the backend is not allowed, in case of objects corrupted due to bitrot and marked as bad by scrubber, objects are removed directly from the backend in case of replicate volumes, so that the object is healed from the good copy. For handling this, the inode of the bad object removed from the backend should be forgotten. Otherwise, the inode which knows the object it represents is bad, does not allow read/write operations happening as part of self-heal. Change-Id: I23b7a5bef919c98eea684aa1e977e317066cfc71 BUG: 1238188 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11489 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* libgfapi: send explicit lookups on inodes linked in readdirpRaghavendra Bhat2015-06-121-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the inode is linked via readdirp, then the consuners of gfapi which are using handles (got either in lookup or readdirp) might not send an explicit lookup on that object again (ex: NFS, samba, USS). If there is a replicate volume where the replicas of the object are not in sync, then readdirp followed by fops might lead data being served from the subvolume which is not in sync with latest data. And since lookup is needed to trigger self-heal on that object the consumers might keep getting wrong data until an explicit lookup is not done. Fuse handles this situation by sending an explicit lookup by itself (fuse xlator) on those inodes which are linked via readdirp, whenever a fop comes on that inode. The same procedure is done in gfapi as well to address this situation. Thanks to shyam(srangana@redhat.com) for valuable inputs Change-Id: I64f0591495dddc1dea7f8dc319f2558a7e342871 BUG: 1236009 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11236 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/afr: Pick gfid from poststat during fresh lookup for read child ↵Krutika Dhananjay2015-06-241-0/+22
| | | | | | | | | | | calculation Change-Id: I12c1e4f67f4ec4affbe13d7daf871044a8a2a12e BUG: 1235216 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11373 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>