= Design Overview =

The NIS Server plugin's aim is to serve up data from the directory
server using the NIS protocols.  It does this by doing what any gateway
would do: it queries the directory server for entries which would
correspond to the contents of maps, reads the contents of various
attributes from those entries, and uses that data to synthesize entries
for maps which it serves to clients.

In broad strokes, one design might look like this:

   ┌──────────┐   NIS   ┌───────────┐   LDAP   ┌────────────────────┐
   │  Client  │─────────│  Gateway  │──────────│  Directory Server  │
   └──────────┘         └───────────┘          └────────────────────┘

The links in this diagram represent network traffic.  The client uses
the NIS protocol to communicate with the gateway, and the gateway uses
the LDAP protocol to communicate with the directory server.

This implementation requires that the gateway be robust against
variations in directory server availability, be flexible enough to use
any of a number of methods of authenticating to the directory server,
and may additionally require the presence of specific extensions on the
server in order to be able to be even reasonably certain of consistency
with the directory's contents.

In order to sidestep these requirements, and the complexity they add to
an implementation, we decided to implement the gateway as a plugin.  As
a plugin, the gateway starts and stops with the directory server, it
does not need to authenticate as a normal client would, and it can be
expected to work with a server which can use it.

Taking just the gateway and directory server portions of the above
diagram, and breaking them down further, we can come to this:

   ┌──────────────┐   ┌─────────┐   ┌────────────────────────────┐
   │ NIS Protocol │───│ Mapping │───│ Directory Server Back Ends │
   └──────────────┘   └─────────┘   └────────────────────────────┘

The links in this diagram are now API calls.  We've relegated the work
of reading a query (parsed from the NIS client by the NIS Protocol
handler), converting that query to a directory server search operation,
and marshalling the results of that search into a format suitable for
transmission as a NIS response, all to the Mapping module.  The
directory server back ends are exposed by SLAPI, of course.

This approach does have its problems, though.

NIS, as a protocol, requires that the server be able to supply a few
bits of information which can't readily (or shouldn't) be retrieved this
way, information which is not normally kept in a directory server.

NIS requires that a server be able to report a revision number for a
map, which is used as an indicator of the time when the map was last
modified.  A slave server can use this information to poll for changes
in map contents on the master, possibly beginning a full map enumeration
to read those new contents in order to serve its clients.

A directory server, if it stores revision information at all, stores
it on a per-entry basis.  So when a gateway designed as we diagrammed
above is asked for this information, it has at least these options:
  a) always use the current time
     - This causes frequent map updates on clients when they don't need
       them, and completely unnecessary network traffic.
  b) always use the same value
     - This keeps clients from ever noticing that a map has changed.
  c) return the latest revision of any of the results which formed the
     contents of the map
     - This could severely load a directory server if the information
       needs to be generated by reading the last-modified timestamps
       from many directory server entries.

NIS also requires that a server be able to answer whether or not it
services a specified domain, and which maps it serves for a domain that
it serves.  While the mapping module could search the directory's
configuration space whenever it is asked these questions, the first
question is asked regularly by each running copy of ypbind, which could
also bog servers down (though admittedly, less than the previous case).

If we break the mapping portion up further, we can introduce a map
cache.  In this module we can maintain a cache of the NIS server's data
set, taking care to construct it at startup-time, updating it as the
contents of the directory server change, and always serving clients
using data from the cache.

   ┌──────────────┐  ┌───────────┐  ┌──────────────┐  ┌──────┐
   │ NIS Protocol │──│ Map Cache │──│ Map Back End │──│ Data │
   └──────────────┘  └───────────┘  └──────────────┘  └──────┘

Which takes us to the current design.  The NIS protocol handler reads
data from the map cache, and the map back end uses SLAPI to obtain data
which is used to populate the map cache at startup-time, as well as to
watch for changes in the directory's contents which would need to be
reflected in the map cache.

= Components =

== Protocol Handler ==

This NIS protocol handler module takes the opportunity to set up
listening sockets (listening on the port specified in the plugin
configuration entry's "nsslapd-pluginarg0" attribute, or an unused port
if none is specified) and register with the local portmapper at module
initialization time.  The plugin then starts a listening thread to
service its clients.

The plugin listens for datagram queries from clients, processing them as
they come in, as well as accepting connections from clients.  Because
connected clients may not always transmit an entire request at once, and
because the server may find itself unable to transmit an entire response
at once, it buffers traffic for connected clients, multiplexing the work
it does for all of its clients from inside of its thread.  The actual
protocol datagram parsing is performed by libnsl, which is provided as a
part of the C library.

Datagram responses which exceed the "nis-max-dgram-size" threshold (by
default, 1024 bytes, or 1 kilobyte) are simply dropped.  Response
records larger than "nis-max-value-size" (by default, 256 kilobytes) are
also ignored, even for connected clients.

Client access is limited by the local tcp_wrappers configuration on the
directory server, with a tcp_wrappers service name as dictated by the
"nis-tcp-wrappers-name" attribute (by default, "nis-plugin") in the
plugin's configuration.  If the tcp_wrappers configuration denies access
for the client, a connected client's connection will be closed, and a
datagram client's request will be discarded.

Client requests are also limited based on a client's address using
"securenet"-style settings in the module's configuration entry's
"nis-securenet" attribute.  If no values are specified, access is
allowed to all clients.  If the securenet configuration denies access
for the client, a connected client's connection will be closed, and a
datagram client's request will be discarded.

Client requests are further classed as "secure" or not, based on the
query's originating port.  This information is used elsewhere for
additional access control.

== NIS Layer ==

The NIS layer processes complete requests, whether they come in from
connected or datagram clients, fetches the requested information from
the map cache, and uses callbacks provided by the protocol handler to
respond.  Before doing so, if a value is being retrieved from a map, it
checks if the map's contents are restricted to "secure" clients.  If
they are, but the client is not a "secure" client, the NIS layer will
respond as if no data were present in the map.

== Map Cache ==

The map cache keeps a dynamically-constructed set of maps in memory,
grouped by domain, and for each map maintains information regarding the
last time its contents were modified (to answer client requests for a
map's order) and whether or not the map's contents should be restricted
to "secure" clients.  The map cache can quickly answer whether or not a
domain is being served by checking whether or not any maps are defined
for it.  The definitions of which maps are served for which domains is
configurable via internal APIs -- the map cache itself has no forehand
knowledge of domain names, map names, or formats, as it merely models
data in the way that a conventional NIS server might.

Forcing queries to use the cache provides a couple of benefits over an
alternate approach of performing an LDAP query for each NIS query:
* While the directory server is generally only case-preserving, the NIS
  server can be case-sensitive, which is preferred by NIS clients and
  a requirement for some customers.
* Because the query used is never used to construct an LDAP filter or
  query, we don't have to worry about escaping text to avoid string
  injection attacks.

=== Internal Representation ===

At the topmost level, the map cache is a table.  Each entry in the table
is the name of a domain and a table of maps.

Each entry in a domain's table of maps contains the map's name, the time
the map was last modified, a note indicating whether or not the map is a
"secure" map, a linked list of map entries, and a set of indexes into
the list.  Each map can also hold a data pointer on behalf of the
backend.

Each item in the map's list of entries contains an array of NIS keys, an
array of corresponding values, a unique identifier (which, currently,
stores the NDN of the directory server entry which was used to create
this list item) and a data pointer which is kept on behalf of the
backend.

The map indexes its entry list using an entry's unique identifier, and
each of its keys.

== Back End ==

The backend interface module sets up, populates, and maintains the map
cache.  At startup time, it configures the map cache with the list of
domains and maps, and populates the maps with initial data.  Using
postoperation plugin hooks, the backend interface also notes when
entries are added, modified, renamed (modrdn'd), or deleted from the
directory server.  It uses this information to create or destroy maps in
the map cache, and to add, remove, or update entries in the map cache's
maps, thereby ensuring that the map cache always reflects the current
contents of the directory server.

The backend interface reads the configuration it should use for the map
cache from its configuration area in the directory server.  Beneath the
plugin's entry, the backend checks for entries with these attributes:
* nis-domain
* nis-map
* nis-secure
* nis-base
* nis-filter
* nis-key-format
* nis-keys-format
* nis-value-format
* nis-values-format
The backend then instructs the map cache to prepare to hold a map in the
given domain (or domains) with the given map name (or names), and then
performs a subtree search under the specified base (or bases, if there's
more than one ''nis-base'' value) for entries which match the provided
filter.  Each entry found is then "added" to the map, using the format
specifiers stored in the ''nis-key-format'' and ''nis-keys-format''
attributes to construct the keys for the entry in the map, with the
corresponding value in the map being constructed using the format
specifiers stored in the ''nis-value-format'' and ''nis-values-format''
attributes.  The map is also marked as a "secure" map according to the
''nis-secure'' attribute, if so set.

For each ''nis-key-format'' value, exactly one entry will be created in a
NIS map.  (If a ''nis-key-format'' does not yield a single value, the
directory server entry will not appear in the NIS map.)  For each
''nis-keys-format'' value, any number of entries will be created in a NIS
map.  The method by which these attributes (and the ''nis-value-format''
and ''nis-value-formats'') are interpreted is described below.

Should one of the directory server entries which was used to construct
one or more NIS map entries be modified or removed, the corresponding
entries in every applicable NIS map are updated or removed.  Likewise,
if an entry is added to the directory server which would correspond to
an entry in a NIS map, entries are created in the corresponding NIS
maps.

== Formatting Data for NIS ==

The ''nis-key-format'' and ''nis-value-format'' specifiers resemble an RPM
format specifier, and can include the values of multiple attributes in
any part of the specifier.  The backend composes the string using the
attribute values stored in the directory server entry, using the format
specifier as a guide.  In this way, the NIS map's contents can be
constructed to almost any specification, and can make use of data stored
using any schema.

An example specification for the ''nis-value-format'' for a user's entry
could look something like this:
  %{uid}:%{userPassword:-*}:%{uidNumber}:%{gidNumber}:%{gecos:-%{cn:-}}:%{homeDirectory}:%{loginShell:-/bin/sh}
The syntax borrows from RPM's syntax, which in turn borrows from shell
syntax, to allow the specification of alternate values to be used when
the directory server entry doesn't include a ''userPassword'' or ''gecos''
attribute.  Additional operators include "#", "##", "%", "%%", "/",
"//", which operate in ways similar to their shell counterparts (with
one notable exception being that patterns for the "/" operator can not
currently be anchored to the beginning or end of the string).

A format specifier can actually be interpreted in two ways: it can be
interpreted as a single value (when given as ''nis-key-format'' or
''nis-value-format''), or it can be interpreted as providing a list of
values (''nis-keys-format'' or ''nis-values-format'').  When the format
specifier is being interpreted as a single value, any reference to an
attribute value which does not also specify an alternate value will
cause the directory server entry to be ignored if the referenced
attribute has no value defined for that entry, or contains multiple
values.  In the above example, the entry would be ignored if the ''uid'',
''uidNumber'', ''gidNumber'', or ''homeDirectory'' attributes of the entry did
not each contain exactly one value.

The ''nis-filter'', ''nis-key-format'', and ''nis-value-format'' settings have
sensible defaults for the maps which we expect to be commonly used --
this is important because it's easy to subtly construct malformed result
specifiers which could trigger undefined behavior on clients -- for
example by leaving the user's numeric UID empty in a passwd entry, which
may be treated as "0" by inattentive clients.

The format specifier syntax further defines "functions" which can be
used to concatenate lists of multiple values into a single result, for
example for groups:
  %{cn}:%{userPassword:-*}:%{gidNumber}:%merge(",","%{memberUid}")
This filter takes advantage of a built-in ''merge'' function, which
processes zero or more single or list values and concatenates them
together with a "," separator, to generate the list of the group's
members.  The available functions are described more fully in
"format-specifiers.txt".