glusterfs.git/geo-replication/syncdaemon/monitor.py, branch v4.1.7

geo-rep: Fix deadlock during worker start

2018-09-21T13:25:43+00:00

Analysis:
Monitor process spawns monitor threads (one per brick).
Each monitor thread, forks worker and agent processes.
Each monitor thread, while intializing, updates the
monitor status file. It is synchronized using flock.
The race is that, some thread can fork worker while
other thread opened the status file resulting in
holding the reference of fd in worker process.

Cause:
flock gets unlocked either by specifically unlocking it
or by closing all duplicate fds referring to the file.
The code was relying on fd close, hence a reference
in worker/agent process by fork could cause the deadlock.

Fix:
1. flock is unlocked specifically.
2. Also made sure to update status file in approriate places so that
the reference is not leaked to worker/agent process.

With this fix, both the deadlock and possible fd
leaks is solved.

Backport of:
 > Patch: https://review.gluster.org/20704
 > BUG: bz#1614799
 > Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
 > Signed-off-by: Kotresh HR 

fixes: bz#1630145
Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
Signed-off-by: Kotresh HR

geo-rep: Fix geo-rep for older versions of unshare

2018-08-16T04:20:47+00:00

Geo-rep mounts are private to worker. It uses
mount namespace using unshare command to achieve
the same. Well, the unshare command has to support
'--propagation' option. So geo-rep breaks on the
systems with older unshare version. The patch
makes it fall back to lazy umount behaviour if
the unshare does not support propagation option.

Backport of:
 > BUG: 1589782
 > Change-Id: Ia614f068aede288d63ac62fea4461b1865066054
 > Signed-off-by: Kotresh HR 

fixes: bz#1611111
Change-Id: Ia614f068aede288d63ac62fea4461b1865066054
Signed-off-by: Kotresh HR

geo-rep: Remove lazy umount and use mount namespaces

2018-02-22T05:40:35+00:00

Lazy umounting the master volume by worker causes
issues with rsync's usage of getcwd. Henc removing
the lazy umount and using private mount namespace
for the same. On the slave, the lazy umount is
retained as we can't use private namespace in non
root geo-rep setup.

Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a
BUG: 1546129
Signed-off-by: Kotresh HR

geo-rep: Support for using Volinfo from Conf file

2018-01-23T03:03:01+00:00

Once Geo-replication is started, it runs Gluster commands to get Volume
info from Master and Slave. With this patch, Georep can get Volume info
from Conf file if `--use-gconf-volinfo` argument is specified to monitor

Create a config(Or add to the config if exists) with following fields

    [vars]
    master-bricks=NODEID:HOSTNAME:PATH,..
    slave-bricks=NODEID:HOSTNAME,..
    master-volume-id=
    slave-volume-id=
    master-replica-count=
    master-disperse_count=

Note: Exising Geo-replication is not affected since this is activated
only when `--use-gconf-volinfo` is passed while spawning `gsyncd
monitor`

Tiering support is not yet added since Tiering + Glusterd2 is still
under discussion.

Fixes: #396
Change-Id: I281baccbad03686c00f6488a8511dd6db0edc57a
Signed-off-by: Aravinda VK

geo-rep: Refactoring Config and Arguments parsing

2017-11-15T05:20:08+00:00

- Fixed Python pep8 issues
- Removed dead code
- Rewritten configuration management
- Rewritten Arguments/subcommands handling
- Added Args upgrade to accommodate all these changes without changing
  glusterd code
- use of md5 removed, which was used to hash the brick path for workdir

Both Master and Slave nodes will have subdir for session in the
format "__

  $GLUSTER_LOGDIR/geo-replication/__
  $GLUSTER_LOGDIR/geo-replication-slaves/__

Log file paths renamed since session info is available with directory
name itself.

  $LOG_DIR_MASTER/
      - gsyncd.log - Gsyncd, Worker monitor logs
      - mnt-.log - Aux mount logs, mounted by each worker
      - changes-.log - Changelog related logs(One per brick)

  $LOG_DIR_SLAVE/
      - gsyncd.log - Slave Gsyncd logs
      - mnt--.log - Aux mount logs,
        mounted for each connection from master-node:master-brick
      - mnt-mbr--.log - Same as above,
        but mountbroker setup

Fixes: #73
Change-Id: I2ec2a21e4e2a92fd92899d026e8543725276f021
Signed-off-by: Aravinda VK

geo-rep: Fix rename of directory in hybrid crawl

2017-11-10T05:36:22+00:00

In hybrid crawl, renames and unlink can't be
synced but directory renames can be detected.
While syncing the directory on slave, if the
gfid already exists, it should be rename.
Hence if directory gfid already exists, rename
it.

Change-Id: Ibf9f99e76a3e02795a3c2befd8cac48a5c365bb6
BUG: 1499566
Signed-off-by: Kotresh HR

geo-rep: Fix status transition

2017-10-11T10:13:35+00:00

The status transition is as below which is
wrong.

Created->Initializing->Active->Active/Passive->Stopped

As soon as the monitor spawns the worker, the state
is changed from 'Initializing' to 'Active' and then to
'Active/Passive' based on whether worker gets the lock
or not. This is wrong and it should directly tranistion
as below.

Created->Initializing->Active/Passive->Stopped

Change-Id: Ibf5ca5c4fdf168c403c6da01db60b93f0604aae7
BUG: 1500284
Signed-off-by: Kotresh HR

geo-rep: Structured log support

2017-06-20T06:00:47+00:00

Changed all log messages to structured log format

Change-Id: Idae25f8b4ad0bbae38f4362cbda7bbf51ce7607b
Updates: #240
Signed-off-by: Aravinda VK 
Reviewed-on: https://review.gluster.org/17551
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kotresh HR

geo-rep: Improve worker log messages

2017-04-07T06:09:34+00:00

Monitor process expects worker to establish SSH Tunnel to slave node
and mount master volume locally with in 60 secs and acknowledge monitor
process by closing feedback fd. If something goes wrong and worker
does not close feedback fd with in 60 secs, monitor kills the worker.
But there was no clue in log message about the actual issue. This patch
adds log and indicates whether the worker is hung during SSH
or master mount.

Change-Id: Id08a12fa6f3bba1d4fe8036728dbc290e6c14c8c
BUG: 1261689
Signed-off-by: Kotresh HR 
Reviewed-on: https://review.gluster.org/16997
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Aravinda VK

geo-rep: Use Host UUID to find local Gluster node

2016-12-14T06:38:14+00:00

To spawn workers for each local brick, Geo-rep was collecting all
the machine IPs based on hostname and finds based on the connectivity.

With this patch, Geo-rep finds local brick if host UUID matches with
UUID of the brick from Volume info.

BUG: 1401801
Change-Id: Ic83c65df89e43cb86346e3ede227aa84d17ffd79
Signed-off-by: Aravinda VK 
Reviewed-on: http://review.gluster.org/16035
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kotresh HR