570667 - MMR: simultaneous total updates on the masters cause

deadlock and data loss https://bugzilla.redhat.com/show_bug.cgi?id=570667 Description: In the MMR topology, if a master receives a total update request to initialize the other master and being initialized by the other master at the same time, the 2 replication threads hang and the replicated backend instance could be wiped out. To prevent the server running the total update supplier and the consumer at the same time, REPLICA_TOTAL_EXCL_SEND and _RECV bits have been introduced. If the server is sending the total update to other replicas, the server rejects the total update request on the backend. But the server can send multiple total updates to other replicas at the same time. If the total update from other master is in progress on the server, the server rejects another total update from yet another master as well as a request to initialize other replicas.
author: Noriko Hosoi <nhosoi@redhat.com> 2010-03-05 10:07:38 -0800
committer: Noriko Hosoi <nhosoi@redhat.com> 2010-03-05 10:07:38 -0800
commit: 0b95451c7e50cb6b2d0cb310dddca18336e1b2ac (patch)
tree: 82cab73fc5a8326f0f4e5bd5869f154895b98532 /ldap/servers/plugins/replication/repl5_protocol.c
parent: d66eb3dd9fdb9648b5058161bf8a7740a16fb2d8 (diff)
download: ds-0b95451c7e50cb6b2d0cb310dddca18336e1b2ac.tar.gz
ds-0b95451c7e50cb6b2d0cb310dddca18336e1b2ac.tar.xz
ds-0b95451c7e50cb6b2d0cb310dddca18336e1b2ac.zip
1 files changed, 28 insertions, 0 deletions
diff --git a/ldap/servers/plugins/replication/repl5_protocol.c b/ldap/servers/plugins/replication/repl5_protocol.c
index 927c450a..efb32716 100644
--- a/ldap/servers/plugins/replication/repl5_protocol.c
+++ b/ldap/servers/plugins/replication/repl5_protocol.c
@@ -317,6 +317,28 @@ prot_thread_main(void *arg)
 		dev_debug("prot_thread_main(STATE_PERFORMING_INCREMENTAL_UPDATE): end");
 		break;
 	      case STATE_PERFORMING_TOTAL_UPDATE:
+		{
+		Slapi_DN *dn = agmt_get_replarea(agmt);
+		Replica *replica = NULL;
+		Object *replica_obj = replica_get_replica_from_dn(dn);
+		if (replica_obj)
+		{
+		    replica = (Replica*) object_get_data (replica_obj);
+		    /* If total update against this replica is in progress,
+		     * we should not initiate the total update to other replicas. */
+		    if (replica_is_state_flag_set(replica, REPLICA_TOTAL_EXCL_RECV))
+		    {
+		        object_release(replica_obj);
+                slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name,
+                    "%s: total update on the replica is in progress.  Cannot initiate the total update.\n", agmt_get_long_name(rp->agmt));
+		        break;
+		    }
+		    else
+		    {
+		        replica_set_state_flag (replica, REPLICA_TOTAL_EXCL_SEND, 0);
+		    }
+		}
+
 		PR_Lock(rp->lock);
     
 		/* stop incremental protocol if running */
@@ -332,7 +354,13 @@ prot_thread_main(void *arg)
 		   replica initialization is completed. */
 		agmt_replica_init_done (agmt);
     
+		if (replica_obj)
+		{
+		    replica_set_state_flag (replica, REPLICA_TOTAL_EXCL_SEND, 1);
+		    object_release(replica_obj);
+		}
 		break;
+		}
 	      case STATE_FINISHED:
 		dev_debug("prot_thread_main(STATE_FINISHED): exiting prot_thread_main");
 		done = 1;
author	Noriko Hosoi <nhosoi@redhat.com>	2010-03-05 10:07:38 -0800
committer	Noriko Hosoi <nhosoi@redhat.com>	2010-03-05 10:07:38 -0800
commit	0b95451c7e50cb6b2d0cb310dddca18336e1b2ac (patch)
tree	82cab73fc5a8326f0f4e5bd5869f154895b98532 /ldap/servers/plugins/replication/repl5_protocol.c
parent	d66eb3dd9fdb9648b5058161bf8a7740a16fb2d8 (diff)
download	ds-0b95451c7e50cb6b2d0cb310dddca18336e1b2ac.tar.gz ds-0b95451c7e50cb6b2d0cb310dddca18336e1b2ac.tar.xz ds-0b95451c7e50cb6b2d0cb310dddca18336e1b2ac.zip