glusterfs.git - GlusterFS is a distributed file-system capable of scaling to several petabytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system.

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/ec: Don't trigger heal for stale index	Pranith Kumar K	2020-09-04	1	-0/+1
\| \| \| \| \| \|	Fixes: #1385 Change-Id: I3609dd2e1f63c4bd6a19d528b935bf5b05443824 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: Change stale index handling	Pranith Kumar K	2020-08-27	1	-9/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Earlier approach is setting dirty bit which requires extra heal Fix: Send zero-xattrop which deletes stale index without any need for extra heal. Fixes: #1385 Change-Id: I7e97a1d8b5516f7be47cae55d0e56b14332b6cae Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: Remove stale entries from indices/xattrop folder	Ashish Pandey	2020-07-29	1	-1/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If a gfid is present in indices/xattrop folder while the file/dir is actaully healthy and all the xattrs are healthy, it causes lot of lookups by shd on an entry which does not need to be healed. This whole process eats up lot of CPU usage without doing meaningful work. Solution: Set trusted.ec.dirty xattr of the entry so that actual heal process happens and at the end of it, during unset of dirty, gfid enrty from indices/xattrop will be removed. Change-Id: Ib1b9377d8dda384bba49523e9ff6ba9f0699cc1b Fixes: #1385 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	cluster/ec: Improve detection of new heals	Xavi Hernandez	2020-07-22	1	-17/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When EC successfully healed a directory it assumed that maybe other entries inside that directory could have been created, which could require additional heal cycles. For this reason, when the heal happened as part of one index heal iteration, it triggered a new iteration. The problem happened when the directory was healthy, so no new entries were added, but its index entry was not removed for some reason. In this case self-heal started and endless loop healing the same directory continuously, cause high CPU utilization. This patch improves detection of new files added to the heal index so that a new index heal iteration is only triggered if there is new work to do. Change-Id: I2355742b85fbfa6de758bccc5d2e1a283c82b53f Fixes: #1354 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	cluster/ec: Change handling of heal failure to avoid crash	Ashish Pandey	2019-11-04	1	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: ec_getxattr_heal_cbk was called with NULL as second argument in case heal was failing. This function was dereferencing "cookie" argument which caused crash. Solution: Cookie is changed to carry the value that was supposed to be stored in fop->data, so even in the case when fop is NULL in error case, there won't be any NULL dereference. Thanks to Xavi for the suggestion about the fix. Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c fixes: bz#1729085
*	ctime/rebalance: Heal ctime xattr on directory during rebalance	Kotresh HR	2019-09-16	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After add-brick and rebalance, the ctime xattr is not present on rebalanced directories on new brick. This patch fixes the same. Note that ctime still doesn't support consistent time across distribute sub-volume. This patch also fixes the in-memory inconsistency of time attributes when metadata is self healed. Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df fixes: bz#1734026 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	cluster/ec: Fix coverity issue.	Ashish Pandey	2019-08-13	1	-1/+1
\| \| \| \| \| \|	Change-Id: I727287784a15d89441865de7f438002e4a370250 fixes: bz#1738763 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	cluster/ec: Create heal task with heal process id	Ashish Pandey	2019-07-30	1	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: ec_data_undo_pending calls syncop_fxattrop->SYNCOP without a frame. In this case SYNCOP gets the frame of the task. However, when we create a synctask for heal we provide frame as NULL. Now, if the read-only feature is ON, it will receive the process ID of the shd as 0 and will consider that it as not an internal process. This will prevent healing of a file with "Read-only file system" error message log. Solution: While launching heal, create a synctask using frame and set process id of the SHD which is -6. Change-Id: I37195399c85de322cbcac75633888922c4e3db4a Fixes: bz#1734252
*	ec-heal: check file's gfid when deleting stale name	Kinglong Mee	2019-06-20	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \|	A name-less lookup does not contain parent's stat, It is hard to check the lookuped file is at the right path. This patch changes to a name lookup, and check file's gfid with expected gfid. If the gfid is different, mark it estale. fixes: bz#1702131 Change-Id: I2de20b10d680eed1e2fb1d3830b3b3dec4520dbf Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
*	multiple files: another attempt to remove includes	Yaniv Kaul	2019-06-14	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are many include statements that are not needed. A previous more ambitious attempt failed because of *BSD plafrom (see https://review.gluster.org/#/c/glusterfs/+/21929/ ) Now trying a more conservative reduction. It does not solve all circular deps that we have, but it does reduce some of them. There is just too much to handle reasonably (dht-common.h includes dht-lock.h which includes dht-common.h ...), but it does reduce the overall number of lines of include we need to look at in the future to understand and fix the mess later one. Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	ec/fini: Fix race between xlator cleanup and on going async fop	Mohammed Rafi KC	2019-06-08	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: While we process a cleanup, there is a chance for a race between async operations, for example ec_launch_replace_heal. So this can lead to invalid mem access. Solution: Just like we track on going heal fops, we can also track fops like ec_launch_replace_heal, so that we can decide when to send a PARENT_DOWN request. Change-Id: I055391c5c6c34d58aef7336847f3b570cb831298 fixes: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	ec/fini: Fix race with ec_fini and ec_notify	Mohammed Rafi KC	2019-05-21	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During a graph cleanup, we first sent a PARENT_DOWN and wait for a child down to ultimately free the xlator and the graph. In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event. But a racing event like CHILD_UP or event xl_op may trigger healing threads after threads cleanup. So there is a chance that the threads might access a freed private variabe Change-Id: I252d10181bb67b95900c903d479de707a8489532 fixes: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	cluster/ec: fix fd reopen	Xavi Hernandez	2019-04-23	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently EC tries to reopen fd's that have been opened while a brick was down. This is done as part of regular write operations, just after having acquired the locks, and it's sent as a sub-fop of the main write fop. There were two problems: 1. The reopen was attempted on all UP bricks, even if a previous lock didn't succeed. This is incorrect because most probably the open will fail. 2. If reopen is sent and fails, the error is propagated to the main operation, causing it to fail when it shouldn't. To fix this, we only attempt reopens on bricks where the current fop owns a lock, and we prevent any error to be propagated to the main fop. To implement this behaviour an argument used to indicate the minimum number of required answers has overloaded to also include some flags. To make the change consistent, it has been necessary to rename the argument, which means that a lot of files have been changed. However there are no functional changes. This change has also uncovered a problem in discard code, which didn't correctely process requests of small sizes because no real discard fop was being processed, only a write of 0's on some region. In this case some fields of the fop remained uninitialized or with incorrect values. To fix this, a new function has been created to simulate success on a fop and it's used in the discard case. Thanks to Pranith for providing a test script that has also detected an issue in this patch. This patch includes a small modification of this script to force data to be written into bricks before stopping them. Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec Fixes: bz#1699866 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	cluster/ec: Fix handling of heal info cases without locks	Ashish Pandey	2019-04-04	1	-25/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we use heal info command, it takes lot of time as in some cases it takes lock on entries to find out if the entry actually needs heal or not. There are some cases where we can avoid these locks and can conclude if the entry needs heal or not. 1 - We do a lookup (without lock) on an entry, which we found in .glusterfs/indices/xattrop, and find that lock count is zero. Now if the file contains dirty bit set on all or any brick, we can say that this entry needs heal. 2 - If the lock count is one and dirty is greater than 1, then it also means that some fop had left the dirty bit set which made the dirty count of current fop (which has taken lock) more than one. At this point also we can definitely say that this entry needs heal. This patch is modifying code to take into consideration above two points. It is also changing code to not to call ec_heal_inspect if ec_heal_do was called from client side heal. Client side heal triggeres heal only when it is sure that it requires heal. [We have changed the code to not to call heal for lookup] updates bz#1689799 Change-Id: I7f09f0ecd12f65a353297aefd57026fd2bebdf9c Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	cluster/ec: Don't enqueue an entry if it is already healing	Ashish Pandey	2019-03-27	1	-14/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: 1 - heal-wait-qlength is by default 128. If shd is disabled and we need to heal files, client side heal is needed. If we access these files that will trigger the heal. However, it has been observed that a file will be enqueued multiple times in the heal wait queue, which in turn causes queue to be filled and prevent other files to be enqueued. 2 - While a file is going through healing and a write fop from mount comes on that file, it sends write on all the bricks including healing one. At the end it updates version and size on all the bricks. However, it does not unset dirty flag on all the bricks, even if this write fop was successful on all the bricks. After healing completion this dirty flag remain set and never gets cleaned up if SHD is disabled. Solution: 1 - If an entry is already in queue or going through heal process, don't enqueue next client side request to heal the same file. 2 - Unset dirty on all the bricks at the end if fop has succeeded on all the bricks even if some of the bricks are going through heal. Change-Id: Ia61ffe230c6502ce6cb934425d55e2f40dd1a727 updates: bz#1593224 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	libglusterfs: Move devel headers under glusterfs directory	ShyamsundarR	2018-12-05	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	all: fix warnings on non 64-bits architectures	Xavi Hernandez	2018-10-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	When compiling in other architectures there appear many warnings. Some of them are actual problems that prevent gluster to work correctly on those architectures. Change-Id: Icdc7107a2bc2da662903c51910beddb84bdf03c0 fixes: bz#1632717 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	Land part 2 of clang-format changes	Gluster Ant	2018-09-12	1	-2467/+2442
\| \| \| \| \|	Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
*	ec-heal: remove a duplicate definition of alloca0	Amar Tumballi	2018-09-10	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	the same macro is defined in common-utils.h, which seems to be much better place for the same. Updates: bz#1193929 Change-Id: I409b719c291102136500b955e5827a550142ed96 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	multiple files: move from strlen() to sizeof()	Yaniv Kaul	2018-08-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	{ec-heal\|ec-combine\|ec-helpers\|ec-inode-read}.c For const strings, just do compile time size calc instead of runtime. Compile-tested only! Change-Id: If92ba0a7a20f64b898d01c6e3b6708190ca93e04 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	cluster/ec: FORWARD_NULL coverity fix	Sunil Kumar Acharya	2018-08-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	Fixing FORWARD_NULL coverify errors with EC. CID: 1394650 BUG: 789278 Change-Id: I52c99dac3483ca31a86cd7e3a959d4010b195f32 updates: bz#789278 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	All: run codespell on the code and fix issues.	Yaniv Kaul	2018-07-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	libglusterfs: Capture the dict response in syncop_xattrop_cbk	karthik-us	2018-04-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Currently it is not possible to capture the xattrs values which are set on the bricks by calling syncop_(f)xattrop, because the response dict is not being assigned to any of the dictionaries. Fix: In the xattrop callback capture the response dict and send it back to the caller if it is requested. Change-Id: I9de9bcd97d6008091c9b060bcca3676cb9ae8ef9 fixes: bz#1572076 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	cluster/ec: avoid delays in self-heal	Xavi Hernandez	2018-03-14	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Self-heal creates a thread per brick to sweep the index looking for files that need to be healed. These threads are started before the volume comes online, so nothing is done but waiting for the next sweep. This happens once per minute. When a replace brick command is executed, the new graph is loaded and all index sweeper threads started. When all bricks have reported, a getxattr request is sent to the root directory of the volume. This causes a heal on it (because the new brick doesn't have good data), and marks its contents as pending to be healed. This is done by the index sweeper thread on the next round, one minute later. This patch solves this problem by waking all index sweeper threads after a successful check on the root directory. Additionally, the index sweep thread scans the index directory sequentially, but it might happen that after healing a directory entry more index entries are created but skipped by the current directory scan. This causes the remaining entries to be processed on the next round, one minute later. The same can happen in the next round, so the heal is running in bursts and taking a lot to finish, specially on volumes with many directory levels. This patch solves this problem by immediately restarting the index sweep if a directory has been healed. Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e BUG: 1547662 Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
*	cluster/ec: Prevent self-heal to work after PARENT_DOWN	Xavier Hernandez	2017-11-28	1	-10/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the volume is being stopped, PARENT_DOWN event is received. This instructs EC to wait until all pending operations are completed before declaring itself down. However heal operations are ignored and allowed to continue even after having said it was down. This may cause unexpected results and crashes. To solve this, heal operations are considered exactly equal as any other operation and EC won't propagate PARENT_DOWN until all operations, including healing, are complete. To avoid big delays if this happens in the middle of a big heal, a check has been added to quit current heal if shutdown is detected. Change-Id: I26645e236ebd115eb22c7ad4972461111a2d2034 BUG: 1515266 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
*	ec: Use tiebreaker_inodelk where necessary	Pranith Kumar K	2017-11-22	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there are big directories or files that need to be healed, other shds are stuck on getting lock on self-heal domain for these directories/files. If there is a tie-breaker logic, other shds can heal some other files/directories while 1 of the shds is healing the big file/directory. Before this patch: 96.67 4890.64 us 12.89 us 646115887.30us 340869 INODELK After this patch: 40.76 42.35 us 15.09 us 6546.50us 438478 INODELK Fixes gluster/glusterfs#354 Change-Id: Ia995b5576b44f770c064090705c78459e543cc64 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: UNUSED_VALUE coverity fix	Sunil Kumar Acharya	2017-11-08	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Problem: Return values are not used. Solution: Removed the unused values. BUG: 789278 Change-Id: Ica2b95bb659dfc7ec5346de0996b9d2fcd340f8d Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	cluster/ec: REVERSE_INULL coverity fix	Sunil Kumar Acharya	2017-11-08	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	Problem: fop could be NULL. Solution: Check has been added to verify fop. BUG: 789278 Change-Id: I7e8d2c1bdd8960c609aa485f180688a95606ebf7 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	cluster/ec: NEGATIVE_RETURNS coverity fix	Sunil Kumar Acharya	2017-11-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	Problem: Source could be negative. Solution: Check has been added to verify the same. BUG: 789278 Change-Id: I8cca6297327e7923b25949c20ec8d1a711207a76 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	cluster/ec: FORWARD_NULL coverity fix	Sunil Kumar Acharya	2017-11-07	1	-4/+6
\| \| \| \| \| \| \| \| \| \|	Problem: xattr could be NULL. Solution: Added check to verify the same. BUG: 789278 Change-Id: Ie013f5655f4621434e5023dd76cef44b976adc68 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	cluster/ec: FORWARD_NULL coverity fix	Sunil Kumar Acharya	2017-11-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Problem: cbk could be NULL. Solution: Assigned appropriate value to cbk. BUG: 789278 Change-Id: I2e4bba9a54f965c6a7bccf0b0cb6c5f75399f6e6 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	cluster/ec: MISSING_BREAK coverity fix	Sunil Kumar Acharya	2017-10-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Problem: switch case syntax issue. Solution: syntax fixed. BUG: 789278 Change-Id: I76da72c3ab6ffc5db671686a71d6a596beaf496e Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	cluster/ec: Coverity Fix UNUSED_VALUE in ec_create_name	Kamal Mohanan	2017-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The value returned by cluster_mkdir is assigned to ret at ec-heal.c:1076. But this value is overwritten before it can be used. Solution: The return value of cluster_mkdir is ignored. It is not assigned to ret. Change-Id: Iee6b8d8b04e0bd800dd30d2c24cab755b9e63443 BUG: 789278 Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
*	cluster/ec: Improve heal info command to handle obvious cases	Ashish Pandey	2017-10-16	1	-23/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: 1 - If a brick is down and we see an index entry in .glusterfs/indices, we should show it in heal info output as it most certainly needs heal. 2 - The first problem is also not getting handled after ec_heal_inspect. Even if in ec_heal_inspect, lookup will mark need_heal as true, we don't handle it properly in ec_get_heal_info and continue with locked inspect which takes lot of time. Solution: 1 - In first case we need not to do any further invstigation. As soon as we see that a brick is down, we should say that this index entry needs heal for sure. 2 - In second case, if we have need_heal as _gf_true after ec_heal_inspect, we should show it as heal requires. Change-Id: Ibe7f9d7602cc0b382ba53bddaf75a2a2c3326aa6 BUG: 1476668 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	cluster/ec: add functions for stripe alignment	Xavier Hernandez	2017-10-13	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes old functions to align offsets and sizes to stripe size boundaries and adds new ones to offer more possibilities. The new functions are: * ec_adjust_offset_down() Aligns a given offset to a multiple of the stripe size equal or smaller than the initial one. It returns the size of the gap between the aligned offset and the given one. * ec_adjust_offset_up() Aligns a given offset to a multiple of the stripe size equal or greater than the initial one. It returns the size of the skipped region between the given offset and the aligned one. If an overflow happens, the returned valid has negative sign (but correct value) and the offset is set to the maximum value (not aligned). * ec_adjust_size_down() Aligns the given size to a multiple of the stripe size equal or smaller than the initial one. It returns the size of the missed region between the aligned size and the given one. * ec_adjust_size_up() Aligns the given size to a multiple of the stripe size equal or greater than the initial one. It returns the size of the gap between the given size and the aligned one. If an overflow happens, the returned value has negative sign (but correct value) and the size is set to the maximum value (not aligned). These functions have been defined in ec-helpers.h as static inline since they are very small and compilers can optimize them (specially the 'scale' argument). Change-Id: I4c91009ad02f76c73772034dfde27ee1c78a80d7 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
*	cluster/ec: Improve performance with xattrop update	Sunil Kumar Acharya	2017-10-06	1	-24/+102
\| \| \| \| \| \| \| \| \| \| \| \|	Existing EC code updates the xattr on the subvolume in a sequential pattern resulting in very poor performance. With this fix EC now updates the xattr on the subvolume in parallel which improves the xattr update performance. BUG: 1445663 Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	cluster/ec: Update xattr and heal size properly	Ashish Pandey	2017-06-06	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem-1 : Recursive healing of same file is happening when IO is going on even after data heal completes. Solution: RCA: At the end of the write, when ec_update_size_version gets called, we send it only on good bricks and not on healing brick. Due to this, xattr on healing brick will always remain out of sync and when the background heal check source and sink, it finds this brick to be healed and start healing from scratch. That involve ftruncate and writing all of the data again. To solve this, send xattrop on all the good bricks as well as healing bricks. Problem-2: The above fix exposes the data corruption during heal. If the write on a file is going on and heal finishes, we find that the file gets corrupted. RCA: The real problem happens in ec_rebuild_data(). Here we receive the 'size' argument which contains the real file size at the time of starting self-heal and it's assigned to heal->total_size. After that, a sequence of calls to ec_sync_heal_block() are done. Each call ends up calling ec_manager_heal_block(), which does the actual work of healing a block. First a lock on the inode is taken in state EC_STATE_INIT using ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is called. This function calls ec_set_inode_size() to store the real size of the inode (it uses heal->total_size). The next step is to read the block to be healed. This is done using a regular ec_readv(). One of the things this call does is to trim the returned size if the file is smaller than the requested size. In our case, when we read the last block of a file whose size was = 512 mod 1024 at the time of starting self-heal, ec_readv() will return only the first 512 bytes, not the whole 1024 bytes. This isn't a problem since the following ec_writev() sent from the heal code only attempts to write the amount of data read, so it shouldn't modify the remaining 512 bytes. However ec_writev() also checks the file size. If we are writing the last block of the file (determined by the size stored on the inode that we have set to heal->total_size), any data beyond the (imposed) end of file will be cleared with 0's. This causes the 512 bytes after the heal->total_size to be cleared. Since the file was written after heal started, the these bytes contained data, so the block written to the damaged brick will be incorrect. Solution: Align heal->total_size to a multiple of the stripe size. Thanks "Xavier Hernandez" <xhernandez@datalab.es> to find out the root cause and to fix the issue. Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e BUG: 1428673 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/16985 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Implement self-heal-window_size option	Sunil Kumar Acharya	2017-04-25	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix implements the heal window size option for EC. This option control the maximum size of read/write operation carried out in self-heal process. BUG: 1441491 Change-Id: I6c0ef65c9ca18b0828f91b319d4f52ac5b77d0d8 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17098 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: Metadata healing fails to update the version	Sunil Kumar Acharya	2017-03-21	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During meatadata heal, we were not updating the version though all the inode attributes were in sync. Updated the code to adjust version when all the inode attributes are in sync. BUG: 1425703 Change-Id: I6723be3c5f748b286d4efdaf3c71e9d2087c7235 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/16772 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: Don't trigger data/metadata heal on Lookups	Pranith Kumar K	2017-02-26	1	-82/+239
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem-1 If Lookup which doesn't take any locks observes version mismatch it can't be trusted. If we launch a heal based on this information it will lead to self-heals which will affect I/O performance in the cases where Lookup is wrong. Considering self-heal-daemon and operations on the inode from client which take locks can still trigger heal we can choose to not attempt a heal on Lookup. Problem-2: Fixed spurious failure of tests/bitrot/bug-1373520.t For the issues above, what was happening was that ec_heal_inspect() is preventing 'name' heal to happen Problem-3: tests/basic/ec/ec-background-heals.t To be honest I don't know what the problem was, while fixing the 2 problems above, I made some changes to ec_heal_inspect() and ec_need_heal() after which when I tried to recreate the spurious failure it just didn't happen even after a long time. BUG: 1414287 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834 Reviewed-on: https://review.gluster.org/16468 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com>
*	cluster/ec: Change log level of messages to DEBUG	Sunil Kumar Acharya	2017-02-15	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Heal failed or passed should not be logged as info. These can be observed from heal info if the heal is happening or not. If we require to debug a case where heal is not happening, we can set the level to DEBUG. Change-Id: I062668eadd145ef809b25e818e6bca1094f54cd6 BUG: 1420619 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/16580 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com>
*	cluster/ec: Do not start heal on good file while IO is going on	Ashish Pandey	2017-01-20	1	-4/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Write on a file has been slowed down significantly after http://review.gluster.org/#/c/13733/ RC : When update fop starts on a file, it sets dirty flag at the start and remove it at the end which make an index entry in indices/xattrop. During IO, SHD scans this and finds out an index and starts heal even if all the fragments are healthy and up tp date. This heal takes inodelk for different types of heal. If the IO is for long time this will happen in every 60 seconds. Due to this extra, unneccessary locking, IO gets slowed down. Solution: Before starting any type of heal check if file needs heal or not. Change-Id: Ib9519a43e7e4b2565d3f3153f9ca0fb92174fe51 BUG: 1409191 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/16377 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Fix lk-owner set race in ec_unlock	Pranith Kumar K	2016-12-13	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Rename does two locks. There is a case where when it tries to unlock it sends xattrop of the directory with new version, callback of these two xattrops can be picked up by two separate epoll threads. Both of them will try to set the lk-owner for unlock in parallel on the same frame so one of these unlocks will fail because the lk-owner doesn't match. Fix: Specify the lk-owner which will be set on inodelk frame which will not be over written by any other thread/operation. BUG: 1402710 Change-Id: I666ffc931440dc5253d72df666efe0ef1d73f99a Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16074 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	afr,ec: Heal device files with correct major, minor numbers	Pranith Kumar K	2016-10-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thanks a lot to xiaoping.wu@nokia.com from Nokia for the bug and the fix. BUG: 1384297 Change-Id: Ie443237e85d34633b5dd30f85eaa2ac34e45754c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15728 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: Implement heal info with lock	Ashish Pandey	2016-10-11	1	-15/+239
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Currently heal info command prints all the files/directories if the index for the file/directory is present in .glusterfs/indices folder. After implementing patch http://review.gluster.org/#/c/13733/ indices of the file which is going through update fop will also be present in .glusterfs/indices even if the fop is successful on all the brick. At this time if heal info command is being used, it will also display this file which is actually healthy and does not require any heal. Solution: Take lock on a file corresponding to the indices and inspect xattrs to decide if the file needs heal or not. Change-Id: I6361e2813ece369be12d02e74816df4eddb81cfa BUG: 1366815 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/15543 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: Add support for hardware acceleration	Xavier Hernandez	2016-09-08	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements functionalities for fast encoding/decoding using hardware support. Currently optimized x86_64, SSE and AVX is added. Additionally this patch implements a caching mecanism for inverse matrices to reduce computation time, as well as a new method for computing the inverse that takes quadratic time instead of cubic. Finally some unnecessary memory copies have been eliminated to further increase performance. Change-Id: I26c75f26fb4201bd22b51335448ea4357235065a BUG: 1289922 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12837 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	libglusterfs: move alloca0 definition to common-utils	Ravishankar N	2016-08-12	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	...and remove their definitons from EC and AFR. Also `s/alloca+memset0/alloca0` wherever it is used. Change-Id: I3b71e596d12a7d8900f5d761af6b98305c8874d5 BUG: 1366226 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/15147 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: Automate heal for replace brick	Ashish Pandey	2016-02-08	1	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: After a replace brick command, newly added brick does not contain data which existed on old brick. Solution: Do getxattr after initialization of all the bricks. This will trigger heal for brick root as soon as it finds the version mismatch on newly added brick. Removing tests from ec-new-entry.t which were required to simulate automation of heal after replace brick. Change-Id: I08e3dfa565374097f6c08856325ea77727437e11 BUG: 1304686 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13353 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec: Mark self-heal fops as internal	Pranith Kumar K	2015-11-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	Change-Id: I8ae7af266d3e00460f0cfdc9389a926e5f2fee36 BUG: 1282761 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12598 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec : Mark new entry changelog in entry self-heal	Ashish Pandey	2015-10-06	1	-6/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : When a new entry is created dirty mark xattrs are not created this will need full heal to be performed, even when there are partial failures. Solution : Marks new entry changelog in self-heal. PS: Also fixed erasing of dirty markers when no data heal is required. BUG: 1254121 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Change-Id: I156e3d3201afa77efe118e1aaace1d91c90a9613 Reviewed-on: http://review.gluster.org/11938 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>