__root__/doc/rgmanager-pacemaker.03.groups.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272

IN THE LIGHT OF RGMANAGER-PACEMAKER CONVERSION: 03/RESOURCE GROUP PROPERTIES

Copyright 2016 Red Hat, Inc., Jan Pokorný <jpokorny @at@ Red Hat .dot. com>
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU
Free Documentation License".


Preface
=======

This document elaborates on how selected resource group internal
relationship properties (denoting the run-time behavior) formalized
by the means of LTL logic maps to particular RGManager (R) and
Pacemaker (P) configuration arrangements.
Due to the purpose of this document, "selected" here means set of
properties one commonly uses in case of the former cluster resource
manager (R).

Properties are categorised, each is further dissected based on
the property variants (basically holds or doesn't, but can be more
convoluted), and for each variants, the LTL model and R+P specifics
are provided (when possible or practical).


Outline
-------

Group properties derived from resource properties
Group member vs. rest of group properties, PROPERTY(GROUP, RESOURCE)
. FAILURE-ISOLATION
Other group properties, PROPERTY(GROUP)


Group properties derived from resource properties
=================================================

Resource group (group) is an ordered set of resources:

GROUP ::= { RESOURCE1, ..., RESOURCEn },
          RESOURCE1 < RESOURCE 2
          ...
          RESOURCEn-1 < RESOURCE n

and is a product of two resource properties applied for each
subsequent pair of resources in linear fashion:

. ORDERING
  ORDERING(RESOURCE1, RESOURCE2, STRONG)
  ...
  ORDERING(RESOURCEn-1, RESOURCEn, STRONG)

. COOCCURRENCE
  COOCCURRENCE(RESOURCE1, RESOURCE2, POSITIVE)
  ...
  COOCCURRENCE(RESOURCEn-1, RESOURCEn, POSITIVE)

As the set is ordered, let's introduce two shortcut functions:

. BEFORE(GROUP, RESOURCE) -> { R | for all R in GROUP, r < RESOURCE }
. AFTER(GROUP, RESOURCE)  -> { R | for all R in GROUP, r > RESOURCE }


Group member vs. rest of group properties
=========================================

Generally a relation expressed by a predicate PROPERTY(GROUP, RESOURCE),
assuming RESOURCE in GROUP, implying modification of the behavior of
cluster wrt. group-resource pair:

PROPERTY(GROUP, RESOURCE) -> ALTER(BEFORE(GROUP, RESOURCE))


Independence between failing resource and its group predecessors
----------------------------------------------------------------

FAILURE-ISOLATION ::= FAILURE-ISOLATION(GROUP, RESOURCE, NONE)
                    | FAILURE-ISOLATION(GROUP, RESOURCE, TRY-RESTART)
                    | FAILURE-ISOLATION(GROUP, RESOURCE, STOP)
. FAILURE-ISOLATION(GROUP, RESOURCE, NONE)  ... RESOURCE failure leads to
                                                recovery of the whole group
. FAILURE-ISOLATION(GROUP, RESOURCE, TRY-RESTART)
                                            ... RESOURCE failure leads to
                                                (bounded) local restarts
                                                of RESOURCE and its successor
                                                (AFTER(GROUP, RESOURCE)) first
. FAILURE-ISOLATION(GROUP, RESOURCE, STOP)  ... RESOURCE failure leads to
                                                stopping and disabling
                                                of RESOURCE and its successor
                                                (AFTER(GROUP, RESOURCE))

R: driven by `__independent_subtree` property of RESOURCE within GROUP

P: in part, driven by `on-fail` property of `monitor` and `stop` operations
   for RESOURCE

FAILURE-ISOLATION(GROUP, RESOURCE, NONE)  [1. recovery the group]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

R: default, no need for that, othewise specifying `@__independent_subtree`
   as `0` for RESOURCE within GROUP

P: specifying `migration-threshold` 1 (+default `on-fail` values)
   for RESOURCE, but only if original recovery policy was `relocate`,
   so better not to do anything otherwise???


FAILURE-ISOLATION(GROUP, RESOURCE, TRY-RESTART)  [2. begin with local restarts]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

R: specifying `@__independent_subtree` as `1` or `yes`
   + `@__max_restarts` and `__restart_expire_time`

P: specifying `migration-threshold` as a value between 2 and INFINITY
   (inclusive) (+default `on-fail` values) for RESOURCE, but only if
   original recovery policy was `relocate`, so better not to do anything
   otherwise???

FAILURE-ISOLATION(GROUP, RESOURCE, STOP)  [3. disable unconditionally]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

R: specifying `@__independent_subtree` as `2` or `non-critical`

P: default `on-fail` values modulo `ignore` for `monitor` (or `status`)
   operation and `stop` for `stop`) for RESOURCE ???


Other group properties
=========================

Recovery policy group property
---------------------------------

RECOVERY ::= RECOVERY(GROUP, RESTART-ONLY)
           | RECOVERY(GROUP, RESTART-UNTIL1, MAX-RESTARTS)
           | RECOVERY(GROUP, RESTART-UNTIL2, MAX-RESTARTS, EXPIRE-TIME)
           | RECOVERY(GROUP, RELOCATE)
           | RECOVERY(GROUP, DISABLE)
. RECOVERY(GROUP, RESTART)  ... "attempt to restart in place", unlimited
. RECOVERY(GROUP, RESTART-UNTIL1, MAX-RESTARTS)
                            ... ditto, but after MAX-RESTARTS attempts
                                (for the whole period of group-node
                                assignment) attempt to relocate
. RECOVERY(GROUP, RESTART-UNTIL2, MAX-RESTARTS, EXPIRE-TIME)
                            ... ditto, but after MAX-RESTARTS attempts
                                accumulated within EXPIRE-TIME windows,
                                attempt to relocate
. RECOVERY(GROUP, RELOCATE) ... move to another node
. RECOVERY(GROUP, DISABLE)  ... do not attempt anything, stop

R: driven by `/cluster/rm/(service|vm)/@recovery`

P: driven by OCF RA return code and/or `migration-threshold`

RECOVERY(GROUP, RESTART-ONLY)  [1. restart in place, unlimited]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

R: default, no need for that, otherwise specifying `@recovery` as `restart`
   (and not specifying none of `@max_restarts`, `@restart_expire_time`,
   or keeping `@max_restarts` at zero!)

P: default, no need for that, otherwise specifying `migration-threshold`
   as `INFINITY` (or zero?; can be overriden by OCF RA return code, anyway?)

RECOVERY(GROUP, RESTART-UNTIL1, MAX-RESTARTS)  [2. restart + absolute limit]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

R: driven by specifying `@max_restarts` as `MAX-RESTARTS` (value, non-positive
   number boils down to case 1.)
   - and, optionally, specifying `@recovery` as `restart` (or not at all!)

P: driven by specifying `migration-threshold` as `MAX-RESTARTS` (value,
   presumably non-negative, `INFINITY` or zero? boil down to case 1.)
   (but can be overriden by OCF RA return code, anyway?)

[3. restart + relative limit for number of restarts/period]
RECOVERY(GROUP, RESTART-UNTIL2, MAX-RESTARTS, EXPIRE-TIME)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

R: driven by specifying `@max_restarts` as `MAX-RESTARTS` (value, non-positive
   number boils down to case 1.) and `@restart_expire_time`
   as `EXPIRE-TIME` (value, negative after expansion boils down to the
   case 1., zero to case 2.)
   - and, optionally, specifying `@recovery` as `restart` (or not at all!)

P: driven by specifying `migration-threshold` as `MAX-RESTARTS` (value,
   presumably non-negative, `INFINITY` or zero? boil down to case 1.) and
   `failure-timeout` as `EXPIRE-TIME`  (value, presumably positive, zero
   boils down to case 2.)
   (but can be overriden by OCF RA return code, anyway?)

RECOVERY(GROUP, RELOCATE)  [4. move to another node]
~~~~~~~~~~~~~~~~~~~~~~~~~

R: driven by specifying `@recovery` as `relocate`

P: driven by specifying `migration-threshold` as 1
   (or possibly negative number?; regardless of `failure-timeout`)
   (but can be overriden by OCF RA return code, anyway?)

RECOVERY(GROUP, DISABLE)  [5. no more attempt]
~~~~~~~~~~~~~~~~~~~~~~~~

R: driven by specifying `@recovery` as `disable`

P: can only be achieved in case of AFFINITY(GROUP, NODE, FALSE)
   for all nodes except one and specifying `migration-threshold`
   as `1` because upon single failure, remaining
   AFFINITY(RESOURCE, NODE, FALSE) rule for yet-enabled NODE will
   be added, effectively preventing RESOURCE to run anywhere


Is-enabled group property
-------------------------

ENABLED ::= ENABLED(GROUP, TRUE)
          | ENABLED(GROUP, FALSE)
. ENABLED(GROUP, TRUE)   ... group is enabled (default assumption)
. ENABLED(GROUP, FALSE)  ... group is disabled

notes
. see also 01/cluster: FUNCTION

R: except for static disabling of everything (RGManager avoidance),
   can be partially driven by `/cluster/rm/(service|vm)/@autostart`
   and/or run-time modification using `clusvcadm`
   (or at least it is close???)

P: via `target-role` (or possibly `is-managed`) meta-attribute [1]

ENABLED(GROUP, TRUE)  [1. group is enabled]
~~~~~~~~~~~~~~~~~~~~~~~

R: (partially) driven by specifying `@autostart` as non-zero
   (has to be sequence of digits for sure, though!)
   - default, no need for that
   # clusvcadm -U GROUP  <-- whole service/vm only

P: default, no need for that, otherwise specifying `target-role` as `Started`
   (or possibly `is-managed` as `true`)
   # pcs resource enable GROUP
   # pcs resource meta GROUP target-role=
   # pcs resource meta GROUP target-role=Started
   or
   # pcs resource manage GROUP
   # pcs resource meta GROUP is-managed=
   # pcs resource meta GROUP is-managed=true

ENABLED(GROUP, FALSE)  [2. group is disabled]
~~~~~~~~~~~~~~~~~~~~~

R: (partially?) driven by specifying `@autostart` as `0` (or `no`)
   # clusvcadm -Z GROUP  <-- whole service/vm only

P: # pcs resource disable GROUP
   # pcs resource meta GROUP target-role=Stopped
   or
   # pcs resource unmanage GROUP
   # pcs resource meta GROUP is-managed=false


References
==========

: vim: set ft=rst:  <-- not exactly, but better than nothing