<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/geo-replication/syncdaemon/monitor.py, branch v4.1.7</title>
<subtitle>GlusterFS is a distributed file-system capable of scaling to several petabytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system.</subtitle>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/'/>
<entry>
<title>geo-rep: Fix deadlock during worker start</title>
<updated>2018-09-21T13:25:43+00:00</updated>
<author>
<name>Kotresh HR</name>
<email>khiremat@redhat.com</email>
</author>
<published>2018-08-10T12:14:14+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=72514f20d2ae947529cd1c4b4b009f27bae7032a'/>
<id>72514f20d2ae947529cd1c4b4b009f27bae7032a</id>
<content type='text'>
Analysis:
Monitor process spawns monitor threads (one per brick).
Each monitor thread, forks worker and agent processes.
Each monitor thread, while intializing, updates the
monitor status file. It is synchronized using flock.
The race is that, some thread can fork worker while
other thread opened the status file resulting in
holding the reference of fd in worker process.

Cause:
flock gets unlocked either by specifically unlocking it
or by closing all duplicate fds referring to the file.
The code was relying on fd close, hence a reference
in worker/agent process by fork could cause the deadlock.

Fix:
1. flock is unlocked specifically.
2. Also made sure to update status file in approriate places so that
the reference is not leaked to worker/agent process.

With this fix, both the deadlock and possible fd
leaks is solved.

Backport of:
 &gt; Patch: https://review.gluster.org/20704
 &gt; BUG: bz#1614799
 &gt; Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
 &gt; Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;

fixes: bz#1630145
Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Analysis:
Monitor process spawns monitor threads (one per brick).
Each monitor thread, forks worker and agent processes.
Each monitor thread, while intializing, updates the
monitor status file. It is synchronized using flock.
The race is that, some thread can fork worker while
other thread opened the status file resulting in
holding the reference of fd in worker process.

Cause:
flock gets unlocked either by specifically unlocking it
or by closing all duplicate fds referring to the file.
The code was relying on fd close, hence a reference
in worker/agent process by fork could cause the deadlock.

Fix:
1. flock is unlocked specifically.
2. Also made sure to update status file in approriate places so that
the reference is not leaked to worker/agent process.

With this fix, both the deadlock and possible fd
leaks is solved.

Backport of:
 &gt; Patch: https://review.gluster.org/20704
 &gt; BUG: bz#1614799
 &gt; Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
 &gt; Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;

fixes: bz#1630145
Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Fix geo-rep for older versions of unshare</title>
<updated>2018-08-16T04:20:47+00:00</updated>
<author>
<name>Kotresh HR</name>
<email>khiremat@redhat.com</email>
</author>
<published>2018-06-07T08:11:25+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=050969fb3cfb176fa206d3ae8169d6021879d9db'/>
<id>050969fb3cfb176fa206d3ae8169d6021879d9db</id>
<content type='text'>
Geo-rep mounts are private to worker. It uses
mount namespace using unshare command to achieve
the same. Well, the unshare command has to support
'--propagation' option. So geo-rep breaks on the
systems with older unshare version. The patch
makes it fall back to lazy umount behaviour if
the unshare does not support propagation option.

Backport of:
 &gt; BUG: 1589782
 &gt; Change-Id: Ia614f068aede288d63ac62fea4461b1865066054
 &gt; Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;

fixes: bz#1611111
Change-Id: Ia614f068aede288d63ac62fea4461b1865066054
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Geo-rep mounts are private to worker. It uses
mount namespace using unshare command to achieve
the same. Well, the unshare command has to support
'--propagation' option. So geo-rep breaks on the
systems with older unshare version. The patch
makes it fall back to lazy umount behaviour if
the unshare does not support propagation option.

Backport of:
 &gt; BUG: 1589782
 &gt; Change-Id: Ia614f068aede288d63ac62fea4461b1865066054
 &gt; Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;

fixes: bz#1611111
Change-Id: Ia614f068aede288d63ac62fea4461b1865066054
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Remove lazy umount and use mount namespaces</title>
<updated>2018-02-22T05:40:35+00:00</updated>
<author>
<name>Kotresh HR</name>
<email>khiremat@redhat.com</email>
</author>
<published>2018-02-12T08:11:04+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=e4ca0b3df379c553e220f929f0203175bd536b61'/>
<id>e4ca0b3df379c553e220f929f0203175bd536b61</id>
<content type='text'>
Lazy umounting the master volume by worker causes
issues with rsync's usage of getcwd. Henc removing
the lazy umount and using private mount namespace
for the same. On the slave, the lazy umount is
retained as we can't use private namespace in non
root geo-rep setup.

Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a
BUG: 1546129
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Lazy umounting the master volume by worker causes
issues with rsync's usage of getcwd. Henc removing
the lazy umount and using private mount namespace
for the same. On the slave, the lazy umount is
retained as we can't use private namespace in non
root geo-rep setup.

Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a
BUG: 1546129
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Support for using Volinfo from Conf file</title>
<updated>2018-01-23T03:03:01+00:00</updated>
<author>
<name>Aravinda VK</name>
<email>avishwan@redhat.com</email>
</author>
<published>2017-11-30T07:22:30+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=7c9b62cfff34d1ac4c8fa0822b18e51c15e6db81'/>
<id>7c9b62cfff34d1ac4c8fa0822b18e51c15e6db81</id>
<content type='text'>
Once Geo-replication is started, it runs Gluster commands to get Volume
info from Master and Slave. With this patch, Georep can get Volume info
from Conf file if `--use-gconf-volinfo` argument is specified to monitor

Create a config(Or add to the config if exists) with following fields

    [vars]
    master-bricks=NODEID:HOSTNAME:PATH,..
    slave-bricks=NODEID:HOSTNAME,..
    master-volume-id=
    slave-volume-id=
    master-replica-count=
    master-disperse_count=

Note: Exising Geo-replication is not affected since this is activated
only when `--use-gconf-volinfo` is passed while spawning `gsyncd
monitor`

Tiering support is not yet added since Tiering + Glusterd2 is still
under discussion.

Fixes: #396
Change-Id: I281baccbad03686c00f6488a8511dd6db0edc57a
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Once Geo-replication is started, it runs Gluster commands to get Volume
info from Master and Slave. With this patch, Georep can get Volume info
from Conf file if `--use-gconf-volinfo` argument is specified to monitor

Create a config(Or add to the config if exists) with following fields

    [vars]
    master-bricks=NODEID:HOSTNAME:PATH,..
    slave-bricks=NODEID:HOSTNAME,..
    master-volume-id=
    slave-volume-id=
    master-replica-count=
    master-disperse_count=

Note: Exising Geo-replication is not affected since this is activated
only when `--use-gconf-volinfo` is passed while spawning `gsyncd
monitor`

Tiering support is not yet added since Tiering + Glusterd2 is still
under discussion.

Fixes: #396
Change-Id: I281baccbad03686c00f6488a8511dd6db0edc57a
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Refactoring Config and Arguments parsing</title>
<updated>2017-11-15T05:20:08+00:00</updated>
<author>
<name>Aravinda VK</name>
<email>avishwan@redhat.com</email>
</author>
<published>2017-06-21T07:26:14+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=705ec055040268f876d04fe5743a6ce4738d6e02'/>
<id>705ec055040268f876d04fe5743a6ce4738d6e02</id>
<content type='text'>
- Fixed Python pep8 issues
- Removed dead code
- Rewritten configuration management
- Rewritten Arguments/subcommands handling
- Added Args upgrade to accommodate all these changes without changing
  glusterd code
- use of md5 removed, which was used to hash the brick path for workdir

Both Master and Slave nodes will have subdir for session in the
format "&lt;mastervol&gt;_&lt;primary_slave_host&gt;_&lt;slavevol&gt;

  $GLUSTER_LOGDIR/geo-replication/&lt;mastervol&gt;_&lt;primary_slave_host&gt;_&lt;slavevol&gt;
  $GLUSTER_LOGDIR/geo-replication-slaves/&lt;mastervol&gt;_&lt;primary_slave_host&gt;_&lt;slavevol&gt;

Log file paths renamed since session info is available with directory
name itself.

  $LOG_DIR_MASTER/
      - gsyncd.log - Gsyncd, Worker monitor logs
      - mnt-&lt;brick-path&gt;.log - Aux mount logs, mounted by each worker
      - changes-&lt;brick-path&gt;.log - Changelog related logs(One per brick)

  $LOG_DIR_SLAVE/
      - gsyncd.log - Slave Gsyncd logs
      - mnt-&lt;master-node&gt;-&lt;master-brick-path&gt;.log - Aux mount logs,
        mounted for each connection from master-node:master-brick
      - mnt-mbr-&lt;master-node&gt;-&lt;master-brick-path&gt;.log - Same as above,
        but mountbroker setup

Fixes: #73
Change-Id: I2ec2a21e4e2a92fd92899d026e8543725276f021
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Fixed Python pep8 issues
- Removed dead code
- Rewritten configuration management
- Rewritten Arguments/subcommands handling
- Added Args upgrade to accommodate all these changes without changing
  glusterd code
- use of md5 removed, which was used to hash the brick path for workdir

Both Master and Slave nodes will have subdir for session in the
format "&lt;mastervol&gt;_&lt;primary_slave_host&gt;_&lt;slavevol&gt;

  $GLUSTER_LOGDIR/geo-replication/&lt;mastervol&gt;_&lt;primary_slave_host&gt;_&lt;slavevol&gt;
  $GLUSTER_LOGDIR/geo-replication-slaves/&lt;mastervol&gt;_&lt;primary_slave_host&gt;_&lt;slavevol&gt;

Log file paths renamed since session info is available with directory
name itself.

  $LOG_DIR_MASTER/
      - gsyncd.log - Gsyncd, Worker monitor logs
      - mnt-&lt;brick-path&gt;.log - Aux mount logs, mounted by each worker
      - changes-&lt;brick-path&gt;.log - Changelog related logs(One per brick)

  $LOG_DIR_SLAVE/
      - gsyncd.log - Slave Gsyncd logs
      - mnt-&lt;master-node&gt;-&lt;master-brick-path&gt;.log - Aux mount logs,
        mounted for each connection from master-node:master-brick
      - mnt-mbr-&lt;master-node&gt;-&lt;master-brick-path&gt;.log - Same as above,
        but mountbroker setup

Fixes: #73
Change-Id: I2ec2a21e4e2a92fd92899d026e8543725276f021
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Fix rename of directory in hybrid crawl</title>
<updated>2017-11-10T05:36:22+00:00</updated>
<author>
<name>Kotresh HR</name>
<email>khiremat@redhat.com</email>
</author>
<published>2017-09-21T22:11:15+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=0f524f0710229a7f8de3a4e1e6a2790d40f67a8e'/>
<id>0f524f0710229a7f8de3a4e1e6a2790d40f67a8e</id>
<content type='text'>
In hybrid crawl, renames and unlink can't be
synced but directory renames can be detected.
While syncing the directory on slave, if the
gfid already exists, it should be rename.
Hence if directory gfid already exists, rename
it.

Change-Id: Ibf9f99e76a3e02795a3c2befd8cac48a5c365bb6
BUG: 1499566
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In hybrid crawl, renames and unlink can't be
synced but directory renames can be detected.
While syncing the directory on slave, if the
gfid already exists, it should be rename.
Hence if directory gfid already exists, rename
it.

Change-Id: Ibf9f99e76a3e02795a3c2befd8cac48a5c365bb6
BUG: 1499566
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Fix status transition</title>
<updated>2017-10-11T10:13:35+00:00</updated>
<author>
<name>Kotresh HR</name>
<email>khiremat@redhat.com</email>
</author>
<published>2017-10-10T09:54:04+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=3edf926a1bda43879c09694cf3904c214c94c9dc'/>
<id>3edf926a1bda43879c09694cf3904c214c94c9dc</id>
<content type='text'>
The status transition is as below which is
wrong.

Created-&gt;Initializing-&gt;Active-&gt;Active/Passive-&gt;Stopped

As soon as the monitor spawns the worker, the state
is changed from 'Initializing' to 'Active' and then to
'Active/Passive' based on whether worker gets the lock
or not. This is wrong and it should directly tranistion
as below.

Created-&gt;Initializing-&gt;Active/Passive-&gt;Stopped

Change-Id: Ibf5ca5c4fdf168c403c6da01db60b93f0604aae7
BUG: 1500284
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The status transition is as below which is
wrong.

Created-&gt;Initializing-&gt;Active-&gt;Active/Passive-&gt;Stopped

As soon as the monitor spawns the worker, the state
is changed from 'Initializing' to 'Active' and then to
'Active/Passive' based on whether worker gets the lock
or not. This is wrong and it should directly tranistion
as below.

Created-&gt;Initializing-&gt;Active/Passive-&gt;Stopped

Change-Id: Ibf5ca5c4fdf168c403c6da01db60b93f0604aae7
BUG: 1500284
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Structured log support</title>
<updated>2017-06-20T06:00:47+00:00</updated>
<author>
<name>Aravinda VK</name>
<email>avishwan@redhat.com</email>
</author>
<published>2017-06-15T12:39:36+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=0a8dac38ac4415ea770fb36b34e3c494e8713e6e'/>
<id>0a8dac38ac4415ea770fb36b34e3c494e8713e6e</id>
<content type='text'>
Changed all log messages to structured log format

Change-Id: Idae25f8b4ad0bbae38f4362cbda7bbf51ce7607b
Updates: #240
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
Reviewed-on: https://review.gluster.org/17551
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Changed all log messages to structured log format

Change-Id: Idae25f8b4ad0bbae38f4362cbda7bbf51ce7607b
Updates: #240
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
Reviewed-on: https://review.gluster.org/17551
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Improve worker log messages</title>
<updated>2017-04-07T06:09:34+00:00</updated>
<author>
<name>Kotresh HR</name>
<email>khiremat@redhat.com</email>
</author>
<published>2017-04-04T19:39:46+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=e01025973c73e2bd0eda8cfed22b75617305d740'/>
<id>e01025973c73e2bd0eda8cfed22b75617305d740</id>
<content type='text'>
Monitor process expects worker to establish SSH Tunnel to slave node
and mount master volume locally with in 60 secs and acknowledge monitor
process by closing feedback fd. If something goes wrong and worker
does not close feedback fd with in 60 secs, monitor kills the worker.
But there was no clue in log message about the actual issue. This patch
adds log and indicates whether the worker is hung during SSH
or master mount.

Change-Id: Id08a12fa6f3bba1d4fe8036728dbc290e6c14c8c
BUG: 1261689
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16997
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Aravinda VK &lt;avishwan@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Monitor process expects worker to establish SSH Tunnel to slave node
and mount master volume locally with in 60 secs and acknowledge monitor
process by closing feedback fd. If something goes wrong and worker
does not close feedback fd with in 60 secs, monitor kills the worker.
But there was no clue in log message about the actual issue. This patch
adds log and indicates whether the worker is hung during SSH
or master mount.

Change-Id: Id08a12fa6f3bba1d4fe8036728dbc290e6c14c8c
BUG: 1261689
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16997
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Aravinda VK &lt;avishwan@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>geo-rep: Use Host UUID to find local Gluster node</title>
<updated>2016-12-14T06:38:14+00:00</updated>
<author>
<name>Aravinda VK</name>
<email>avishwan@redhat.com</email>
</author>
<published>2016-12-06T06:41:35+00:00</published>
<link rel='alternate' type='text/html' href='https://fedorapeople.org/cgit/anoopcs/public_git/glusterfs.git/commit/?id=009454de29d6653e07ac090af1c5d233c7150dd4'/>
<id>009454de29d6653e07ac090af1c5d233c7150dd4</id>
<content type='text'>
To spawn workers for each local brick, Geo-rep was collecting all
the machine IPs based on hostname and finds based on the connectivity.

With this patch, Geo-rep finds local brick if host UUID matches with
UUID of the brick from Volume info.

BUG: 1401801
Change-Id: Ic83c65df89e43cb86346e3ede227aa84d17ffd79
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16035
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To spawn workers for each local brick, Geo-rep was collecting all
the machine IPs based on hostname and finds based on the connectivity.

With this patch, Geo-rep finds local brick if host UUID matches with
UUID of the brick from Volume info.

BUG: 1401801
Change-Id: Ic83c65df89e43cb86346e3ede227aa84d17ffd79
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16035
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Kotresh HR &lt;khiremat@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
