glusterfs.git/xlators/cluster/dht/src/dht-shared.c, branch v4.0dev

cluster/dht: Use size to calculate estimates

2017-07-10T14:35:34+00:00

The earlier approach of using the number of files
to determine when the rebalance would complete did
not work well when file sizes differed widely.

The new approach now gets the total data size and
uses that information to determine how long
the rebalance is expected to take.

Change-Id: I84e80a0893efab72ff06130e4596fa71c9c8c868
BUG: 1467209
Signed-off-by: N Balachandran 
Reviewed-on: https://review.gluster.org/17668
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: MOHIT AGRAWAL 
Reviewed-by: Raghavendra G

core: assorted typos and spelling mistakes from Debian lintian

2017-07-03T12:47:13+00:00

Plus minor readability improvements.

Reported-by: pmatthaei@debian.org

Change-Id: I5393819a2fc9f240a19811143bb57b127df717cf
BUG: 1466785
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: https://review.gluster.org/17660
Smoke: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan 
CentOS-regression: Gluster Build System

cluster/dht: rebalance gets file count periodically

2017-06-23T10:12:17+00:00

The rebalance used to get the file count in the beginning
and not update it. This caused estimates to fail
if the number changed during the rebalance.

The rebalance now updates the file count periodically.

Change-Id: I1667ee69e8a1d7d6bc6bc2f060fad7f989d19ed4
BUG: 1464110
Signed-off-by: N Balachandran 
Reviewed-on: https://review.gluster.org/17607
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

core: fix spelling errors

2017-06-02T11:50:43+00:00

fixes for various minor spelling errors and typos

Reported-by: Patrick Matthäi 
Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5
BUG: 1457808
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: https://review.gluster.org/17442
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos 
Smoke: Gluster Build System

cluster/dht: fix on demand migration files from client

2017-05-30T00:42:58+00:00

On demand migration of files i.e. migration done by clients
triggered by a setfattr was broken.

Dependency on defrag led to crash when migration was triggered from
client.

Note: This functionality is not available for tiered volumes. Migration
from tier served client will fail with ENOTSUP.

usage (But refer to the steps mentioned below to avoid any issues) :
setfattr -n "trusted.distribute.migrate-data" -v "1"

The purpose of fixing the on-demand client migration was to give a
workaround where the user has lots of empty directories compared to
files and want to do a remove-brick process.

Here are the steps to trigger file migration for remove-brick process from
client. (This is highly recommended to follow below steps as is)

Let's say it is a replica volume and user want to remove a replica pair
named brick1 and brick2. (Make sure healing is completed before you run
these steps)

Step-1: Start remove-brick process
- gluster v remove-brick brick1 brick2 start
Step-2: Kill the rebalance daemon
- ps aux | grep glusterfs | grep rebalance\/ | awk '{print $2}' | xargs kill
Step-3: Do a fresh mount as mentioned here
- glusterfs -s ${localhostname} --volfile-id rebalance/$volume-name /tmp/mount/point
Step-4: Go to one of the bricks (among brick1 and brick2)
- cd
Step-5: Run the following command.
- find . -not $ -path ./.glusterfs -prune $ -type f -not -perm 01000 -exec bash -c 'setfattr -n "distribute.fix.layout" -v "1" ${mountpoint}/$(dirname '{}')' \; -exec setfattr -n "trusted.distribute.migrate-data" -v "1" ${mountpoint}/'{}' \;

This command will ignore the linkto files and empty directories. Do a fix-layout of
the parent directory. And trigger a migration operation on the files.

Step-6: Once this process is completed do "remove-brick force"
- gluster v remove-brick brick1 brick2 force

Note: Use the above script only when there are large number of empty directories.
Since the script does a crawl on the brick side directly and avoids directories those
are empty, the time spent on fixing layout on those directories are eliminated(even if the script
does not do fix-layout on empty directories, post remove-brick a fresh layout will be built
for the directory, hence not affecting application continuity).

Detailing the expectation for hardlink migartion with this patch:
Hardlink is migrated only for remove-brick process. It is highly essential
to have a new mount(step-3) for the hardlink migration to happen. Why?:
setfattr operation is an inode based operation. Since, we are doing setfattr from
fuse mount here, inode_path will try to build path from the linked dentries to the inode.
For a file without hardlinks the path construction will be correct. But for hardlinks,
the inode will have multiple dentries linked.

Without fresh mount, inode_path will always get the most recently linked dentry.
e.g. if there are three hardlinks named dir1/link1, dir2/link2, dir3/link3, on a client
where these hardlinks are looked up, inode_path will always return the path dir3/link3
if dir3/link3 was looked up most recently. Hence, we won't be able to create linkto
files for all other hardlinks on destination (read gf_defrag_handle_hardlink for more details
on hardlink migration).

With a fresh mount, the lookup and setfattr become serialized. e.g. link2 won't be
looked up until link1 is looked up and migrated. Hence, inode_path will always have the correct
path, in this case link1 dentry is picked up(as this is the most recently looked up inode) and
the path is built right.

Note: If you run the above script on an existing mount(all entries looked up), hard links may
not be migrated, but there should not be any other issue. Please raise a bug, if you find any
issue.

Tests: Manual

Change-Id: I9854cdd4955d9e24494f348fb29ba856ea7ac50a
BUG: 1450975
Signed-off-by: Susant Palai
Reviewed-on: https://review.gluster.org/17115
NetBSD-regression: NetBSD Build System
CentOS-regression: Gluster Build System
Smoke: Gluster Build System
Reviewed-by: Raghavendra G

cluster/dht: initialize throttle option "normal" to same in init and reconfigure

2017-05-18T03:27:38+00:00

Normal value were different in dht_init and dht_reconfigure.
Initialization/reconfigure of throttle option are carved out to a separate function
(dht_configure_throttle) now. Normal value will be "2".

Change-Id: Ie323eae019af41d6bef0a136e3d284dc82bab9a1
BUG: 1451162
Signed-off-by: Susant Palai 
Reviewed-on: https://review.gluster.org/17303
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Zhou Zhengping 
Reviewed-by: Raghavendra G

cluster/dht: Make rebalance throttle option tuned by number

2017-04-29T14:29:34+00:00

Current rebalance throttle options: lazy/normal/aggressive may not always be
sufficient for the purpose of throttling.  In our recent test, we observed for
certain setups, normal and aggressive modes behaved similarly consuming full
disk bandwidth. So in cases like this admin should be able to  tune it
down(or vice versa) depending on the need.

Along with old throttle configurations, thread counts are tuned based on number.
e.g. gluster v set vol-name cluster-rebal.throttle  5.

Admin can tune up/down between 0 and the number of cores available.

Note: For heterogenous servers, validation will fail on the old server if "number"
is given for throttle configuration.
The message looks something like this:
"volume set: failed: Staging failed on vm2. Error: cluster.rebal-throttle should be {lazy|normal|aggressive}"

Test: Manual test by logging active thread number after reconfiguring throttle option.
testcase: tests/basic/distribute/throttle-rebal.t

Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3
BUG: 1438370
Signed-off-by: Susant Palai 
Reviewed-on: https://review.gluster.org/16980
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Raghavendra G

cluster/dht: Fix memory corruption while accessing regex stored in

2016-12-08T17:56:41+00:00

private

If reconfigure is executed parallely (or concurrently with dht_init),
there are races that can corrupt memory. One such race is modification
of regexes stored in conf (conf->rsync_regex_valid and
conf->extra_regex_valid) through dht_init_regex. With change [1],
reconfigure codepath can get executed parallely (with itself or with
dht_init) and this fix is needed.

Also, a reconfigure can race with any thread doing dht_layout_search,
resulting in dht_layout_search accessing regex freed up by reconfigure
(like in bz 1399134).

[1] http://review.gluster.org/15046

Change-Id: I039422a65374cf0ccbe0073441f0e8c442ebf830
BUG: 1399134
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/15945
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: N Balachandran 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

cluster/tier: handle fast demotions

2016-10-19T19:51:48+00:00

Demote files on priority if hi-watermark has been breached and continue
to demote until the watermark drops below hi-watermark.

Monitor watermark more frequently.
Trigger demotion as soon as hi-watermark is breached.
Add cluster.tier-emergency-demote-query-limit option to limit number
of files returned from the database query for every iteration of
tier_migrate_using_query_file(). If watermark hasn't dropped below
hi-watermark during the first iteration, the next iteration will be
triggered approximately 1 second after tier_demote() returns to the
main tiering loop.
Update changetimerecorder xlator to handle query for emergency demote
mode.

Add tier-ctr-interface.h:
Move tier and ctr interface specific macros and struct definition from
libglusterfs/src/gfdb/gfdb_data_store.h to new header
libglusterfs/src/tier-ctr-interface.h

Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811
BUG: 1366648
Signed-off-by: Milind Changire 
Reviewed-on: http://review.gluster.org/15158
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Dan Lambright

cluster/tier: Adding compaction option for metadata databases

2016-09-05T01:37:57+00:00

Problem: As metadata in the database fills up, querying the database
take a long time. As a result, tier migration slows down.  To
counteract this, we added a way to enable the compaction methods of
the underlying database. The goal is to reduce the size of the
underlying file by eliminating database fragmentation.

NOTE: There is currently a bug where sometimes a brick will
attempt to activate compaction. This happens even compaction is already
turned on.

The cause is narrowed down to the compact_mode_switch flipping its value.

Changes: libglusterfs/src/gfdb - Added a gfdb function to compact the
underlying database, compact_db() This is a no-op if the database has
no such option.

- Added a compaction function for SQLite3 that does the following

1) Changes the auto_vacuum pragma of the database
2) Compacts the database according to the type of compaction requested

- Compaction type can be changed by changing the macro
  GF_SQL_COMPACT_DEF to one of the 4 compaction types in
  gfdb_sqlite3.h

  It is currently set to GF_SQL_COMPACT_INCR, or incremental
  vacuuming.

xlators/cluster/dht/src - Added the following command-line option to
enable SQLite3 compaction.

gluster volume set  tier-compact 

- Added the following command-line option to change the frequency the
  hot and cold tier are ordered to compact.

gluster volume set  tier-hot-compact-frequency 
gluster volume set  tier-cold-compact-frequency 

- tier daemon periodically sends the (new)
  GFDB_IPC_CTR_SET_COMPACT_PRAGMA IPC to the CTR xlator. The IPC
  triggers compaction of the database.

  The inputs are both gf_boolean_t.

  IPC Input:

  compact_active: Is compaction currently on for the db.
  compact_mode_switched: Did we flip the compaction switch recently?

  IPC Output:

  0 if the compaction succeeds.
  Non-zero otherwise.

xlators/features/changetimerecorder/src/ - When the CTR gets the
compaction IPC, it launches a thread that will perform the
compaction. The IPC ends after the thread is launched. To avoid extra
allocations, the parameters are passed using static variables.

Change-Id: I5e1433becb9eeff2afe8dcb4a5798977bf5ba0dd
Signed-off-by: Diogenes Nunez 
Reviewed-on: http://review.gluster.org/15031
Reviewed-by: Milind Changire 
Reviewed-by: Dan Lambright 
Tested-by: Dan Lambright 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System