doc/design.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218

=== Design Overview ===

The NIS plugin module's aim is to serve up data from the directory
server using the NIS protocols.  It does this by doing what any gateway
would do: it queries the directory server for entries which would
correspond to the contents of maps, reads the contents of various
attributes from those entries, and uses that data to synthesize entries
for maps which it serves to clients.

In broad strokes, it might look like this:

   ┌──────────┐   NIS   ┌───────────┐   LDAP   ┌────────────────────┐
   │  Client  │─────────│  Gateway  │──────────│  Directory Server  │
   └──────────┘         └───────────┘          └────────────────────┘

The links in this diagram represent network traffic.  The client uses
the NIS protocol to communicate with the gateway, and the gateway uses
the LDAP protocol to communicate with the directory server.

This implementation requires that the gateway be robust against
variations in directory server availability, be flexible enough to use
any of a number of methods of authenticating to the directory server,
and may additionally require the presence of specific extensions on the
server in order to be able to be even reasonably certain of consistency
with the directory's contents.

In order to sidestep these requirements, and the complexity they add to
an implementation, we decided to implement the gateway as a plugin.  As
a plugin, the gateway starts and stops with the directory server, it
does not need to authenticate as a normal client would, and it can be
expected to work with a server which can use it.

Taking just the gateway and directory server portions of the above
diagram, and breaking them down further, we can come to this:

   ┌──────────────┐   ┌─────────┐   ┌────────────────────────────┐
   │ NIS Protocol │───│ Mapping │───│ Directory Server Back Ends │
   └──────────────┘   └─────────┘   └────────────────────────────┘

The links in this diagram are all API calls.  We've relegated the work
of reading a query (parsed from the NIS client by the NIS Protocol
handler), converting that query to a directory server search operation,
and marshalling the results of that search into a format suitable for
transmission as a NIS response, all to the Mapping module.  The
directory server back ends are exposed by SLAPI, of course.

This approach does have its problems, though.

NIS, as a protocol, requires that the server be able to supply a few
bits of information which can't readily (or shouldn't) be retrieved this
way.

NIS requires that a server be able to report a revision number for a
map, akin to the serial number used in a DNS SOA record.  A slave server
can use this information to poll for changes in map contents on the
master, possibly beginning a full map enumeration to read those new
contents in order to serve its clients.

A directory server, if it stores revision information at all, stores
it on a per-entry basis.  So when a gateway designed as we diagrammed
above is asked for this information, it has at least these options:
  a) use an ever-increasing value, such as the current time
     - This causes frequent map updates on clients when they don't need
       them, and completely unnecessary network traffic.
  b) always use the same value
     - This keeps clients from ever noticing that a map has changed.
  c) return the latest revision of any of the results which formed the
     contents of the map
     - This could severely load a directory server if the information
       needs to be generated dynamically and frequently.

NIS also requires that a server be able to answer whether or not it
services a specified domain, and which maps it serves for a domain that
it serves.  While the mapping module could search the directory's
configuration space whenever it is asked these questions, the first
question is asked repeatedly by each running copy of ypbind, which could
also bog servers down (though admittedly, less than the previous case).

If we break the mapping portion up further, we can introduce a map
cache.  In this module we can maintain a cache of the NIS server's data
set, taking care to construct it at startup-time, updating it as the
contents of the directory server change, and always serving clients
using data from the cache.

   ┌──────────────┐  ┌───────────┐  ┌──────────────┐  ┌──────┐
   │ NIS Protocol │──│ Map Cache │──│ Map Back End │──│ Data │
   └──────────────┘  └───────────┘  └──────────────┘  └──────┘

Which takes us to the current design.  The NIS protocol handler reads
data from the map cache, and the map back end uses SLAPI to populate the
map cache at startup-time, as well as to watch for changes in the
directory's contents which would need to be reflected in the map cache.

=== Components ===

== Protocol Handler ==

This NIS protocol handler module takes the opportunity to set up
listening sockets and register with the local portmapper at module
initialization time.  (It does so at this point because the directory
server has not yet dropped privileges, and the portmapper will not allow
registrations to unprivileged clients.)  The plugin then starts a
listening thread to handle its clients.

The plugin listens for datagram queries from clients, processing them
as they come in, as well as answering connections from clients.
Because connected clients may not always transmit an entire request at
once, and because the server may find itself unable to transmit an
entire response at once, it buffers traffic for connected clients,
multiplexing the work it does for all of its clients from inside of the
thread.]  The actual protocol datagram parsing is performed by libnsl,
which is provided as a part of the C library.

[Unless explicitly disabled in the module's configuration or in a
 map's configuration, the local /etc/securenets file is consulted to
 control access to map information to specific clients.  The list of
 securenet entries can also be stored in the module or map.]

== Map Cache ==

The map cache keeps a dynamically-constructed set of maps in memory,
grouped by domain, and for each map maintains information regarding the
last time its contents were modified (to answer client requests for a
map's order).  The map cache can quickly answer whether or not a domain
is being served by checking whether or not any maps are defined for it.
The definitions of which maps are served for which domains is
configurable via internal APIs -- the map cache itself has no forehand
knowledge of domain names, map names, or formats, as it merely models
data in the way that a NIS server might.

[The backend requires that the cache also be able to track one or more DNs
 which are relevant to the value which is being stored for a given key
 in the map, so that it can be updated if a directory entry with that DN
 is added, removed, modified, or renamed.]

Forcing queries to use the cache provides a couple of benefits over an
alternate approach of performing an LDAP query for each NIS query:
* While the directory server is generally only case-preserving, the NIS
  server can be case-sensitive, which is preferred by NIS clients and
  a requirement for some customers.
* Because the query used is never used to construct an LDAP filter or
  query, we don't have to worry about escaping text to avoid string injection
  attacks.

== Back End ==

The backend interface module sets up, populates, and maintains the map
cache.  At startup time, it configures the map cache with the list of
domains and maps, and populates the maps with initial data.  Using
postoperation plugin hooks, the backend interface also notes when
entries are added, modified, renamed (modrdn'd), or deleted from the
directory server.  It uses this information to create or destroy maps in
the map cache, and to add, remove, or update entries in the map cache's
maps, thereby ensuring that the map cache always reflects the current
contents of the directory server.

The backend interface reads the configuration it should use for the map
cache from its configuration area in the directory server.  Beneath the
plugin's entry, the backend checks for entries with these attributes:
 * domain
 * map
 * base
 * filter
 * keyFormat
 * valueFormat
The backend then instructs the map cache to prepare to hold a map in the
given domain with the given map name, and then performs a subtree search
under the specified base for entries which match the provided filter.
Each found entry is then "added" to the map, using the format specifier
stored in "keyFormat" to construct the key for the entry in the map,
with the corresponding value in the map being constructed using the
format specifier given as the "valueFormat".

The "valueFormat" specifier resembles an RPM format specifier, and can
include the values of multiple attributes in any part of the specifier.
The backend composes the string using the attribute values stored in
the directory server entry, using the format specifier as a guide.  In
this way, the NIS map's contents can be constructed to almost any
specification, can make use of data stored using any schema.

An example specification for a user's entry would look like this:
  %{uid}:%{userPassword:-*}:%{uidNumber}:%{gidNumber}:%{gecos:-%{cn:-}}:%{homeDirectory}:%{loginShell:-/bin/sh}
The syntax borrows from RPM's syntax, which in turn borrows from shell
syntax, to allow the specification of alternate values to be used when
the directory server entry doesn't include a "userPassword" or "gecos"
attribute.

To ensure safety, any reference to an attribute value which does not
also specify an alternate value will cause the directory server entry
to be ignored if the referenced attribute has no value defined for that
entry, or contains multiple values.  In the above example, the entry
would be ignored if the "uid", "uidNumber", "gidNumber", or
"homeDirectory" attributes of the entry did not each contain exactly
one value.

The syntax further defines "functions" which can be used to concatenate
lists of multiple values into a single result, for example for groups:
  %{cn}:%{userPassword:-*}:%{gidNumber}:%list{",","memberUid"})
This filter takes advantage of a built-in "list" function, which
processes zero or more values of the "memberUid" attribute and
concatenates them together with a "," separator, to generate the list
of group members.

The filter, key, and value have sensible defaults for the maps which we
expect to be using -- this is important because it's easy to subtly
construct malformed result strings which could trigger undefined
behavior on clients -- for example by leaving the user's numeric UID
empty in a passwd entry, which may be treated as "0" by inattentive
clients.

The format specifier includes function-like invocations to allow the
backend to be instructed to chase references to other entries, for
example to handle flattening of nested groups or netgroups.

A function-like invocation expects a comma-separated list of
double-quoted arguments. and any arguments which contain a double-quote
need to escape the double-quote using a '\' character -- this character
itself also needs to be escaped whenever it appears.