\input texinfo @c -*-texinfo-*- @c @c Note: the above texinfo file must include the "doubleleftarrow" @c definitions added by jcb. @c %**start of header @c guide @setfilename krb5-implement.info @settitle Kerberos V5 Installation Guide @setchapternewpage odd @c chapter begins on next odd page @c @setchapternewpage on @c chapter begins on next page @c @smallbook @c Format for 7" X 9.25" paper @c %**end of header @paragraphindent 0 @iftex @parskip 6pt plus 6pt @end iftex @include definitions.texinfo @set EDITION b7-1 @finalout @c don't print black warning boxes @titlepage @title @value{PRODUCT} Implementor's Guide @subtitle Release: @value{RELEASE} @subtitle Document Edition: @value{EDITION} @subtitle Last updated: @value{UPDATED} @author @value{COMPANY} @page @vskip 0pt plus 1filll @iftex @include copyright.texinfo @end iftex @end titlepage @node Top, Introduction, (dir), (dir) @comment node-name, next, previous, up @ifinfo This file contains internal implementor's information for the @value{RELEASE} release of @value{PRODUCT}. @include copyright.texinfo @end ifinfo @c The master menu is updated using emacs19's M-x texinfo-all-menus-update @c function. Don't forget to run M-x texinfo-every-node-update after @c you add a new section or subsection, or after you've rearranged the @c order of sections or subsections. Also, don't forget to add an @node @c comand before each @section or @subsection! All you need to enter @c is: @c @c @node New Section Name @c @section New Section Name @c @c M-x texinfo-every-node-update will take care of calculating the @c node's forward and back pointers. @c @c --------------------------------------------------------------------- @menu * Introduction:: * Socket API:: * IPv6 Support:: * Local Addresses:: * Host Address Lookup:: * Thread Safety:: * Shared Libraries:: @end menu @node Introduction, Socket API, Top, Top @chapter Introduction This file contains internal implementor's information for @value{PRODUCT}. It is currently contains information that was removed from install.texi; eventually it will have more detailed information on the internals of the @value{PRODUCT}. @node Socket API, IPv6 Support, Introduction, Top @chapter Socket API Someone should describe the API subset we're allowed to use with sockets, how and when to use @code{SOCKET_ERRNO}, @i{etc}. Note that all new code doing hostname and address translation should use @code{getaddrinfo} and friends. (@xref{Host Address Lookup}.) @node IPv6 Support, Local Addresses, Socket API, Top @chapter IPv6 Support Most of the IPv6 support is keyed on the macro @code{KRB5_USE_INET6}. If this macro is not defined, there should be no references to @code{AF_INET6}, @code{struct sockaddr_in6}, @i{etc}. The @code{configure} scripts will check for the existence of various functions, macros and structure types to decide whether to enable the IPv6 support. You can also use the @samp{--enable-ipv6} or @samp{--disable-ipv6} options to override this decision. Regardless of the setting of @code{KRB5_USE_INET6}, some aspects of the new APIs devised for IPv6 are used throughout the code, because it would be too difficult maintain code for the IPv6 APIs and for the old APIs at the same time. But for backwards compatibility, we try to fake them if the system libraries don't provide them, at least for now. This means we sometimes use slightly modified versions of the APIs, but we try to keep the modifications as non-intrusive as possible. Macros are used to rename struct tags and function names, so don't @code{#undef} any of these names. @table @code @item getaddrinfo @itemx getnameinfo @itemx freeaddrinfo @itemx gai_strerror @itemx struct addrinfo Always include the header file @code{fake-addrinfo.h} before using these. If the native system doesn't provide them, the header file will, using static functions that will call @code{gethostbyname} and the like in the native libraries. (This also happens to be the way the Winsock 2 headers work, depending on some of the predefined macros indicating the target OS version.) We also provide ``wrapper'' versions on some systems where a native implementation exists but the data it returns is broken in some way. So these may not always be thread-safe, and they may not always provide IPv6 support, but the API will be consistent. @item struct sockaddr_storage @itemx socklen_t These are provided by @code{socket-utils.h}, if the native headers don't provide them. @code{sockaddr_storage} contains a @code{sockaddr_in}, so by definition it's big enough to hold one; it also has some extra padding which will probably make it big enough to hold a @code{sockaddr_in6} if the resulting binary should get run on a kernel with IPv6 support. Question: Should these simply be moved into @code{port-sockets.h}? @end table IRIX 6.5.7 has no IPv6 support. Of the systems most actively in the MIT's Athena environment (used by MIT's Kerberos UNIX developers), this is the only one without built-in IPv6 support. In another year or so we probably won't be using those systems any more, and we may consider dropping support for systems without IPv6 support. Somewhere between IRIX 6.5.14 and 6.5.16, partial IPv6 support was introduced to the extent that the configuration system detects the IPv6 support and attempts to use it. Code compiles, but then upon linking, one discovers that ``in6addr_any'' is not defined in any system library. A work around the header file @code{fake-addrinfo.h} is provided by providing a static copy. This run time IPv6 code has still not been tested. Some utility functions or macros are also provided to give a convenient shorthand for some operations, and to retain compile-time type checking when possible (generally using inline functions but only when compiling with GCC). @table @code @item socklen(struct sockaddr *) Returns the length of the @code{sockaddr} structure, by looking at the @code{sa_len} field if it exists, or by returning the known sizes of @code{AF_INET} and @code{AF_INET6} address structures. @item sa2sin(struct sockaddr *) @itemx sa2sin6(struct sockaddr *) @itemx ss2sa(struct sockaddr_storage *) @itemx ss2sin(struct sockaddr_storage *) @itemx ss2sin6(struct sockaddr_storage *) Pointer type conversions. Use these instead of plain casts, to get type checking under GCC. @end table @node Local Addresses, Host Address Lookup, IPv6 Support, Top @chapter Local Addresses (Last update: 2002-03-13.) Different systems have different ways of finding the local network addresses. On Windows, @code{gethostbyname} is called on the local host name to get a set of addresses. If that fails, a UDP socket is ``connected'' to a particular IPv4 address, and the local socket name is retrieved, its address being treated as the one local network address. Future versions of the Windows code should be able to actually examine local interfaces. On Mac OS 9 and earlier, a Mac-specific interface is used to look up local addresses. Presumably, on Mac OS X we'll use that or the general UNIX code. On (most?) UNIX systems, there is an @code{ioctl} called @code{SIOCGIFCONF} which gets interface configuration information. The behavior of this @code{ioctl} varies across UNIX systems though. It takes as input a buffer to fill with data structures, but if the buffer isn't big enough, the behavior isn't well defined. Sometimes you get an error, sometimes you get incomplete data. Sometimes you get a clear indication that more space was needed, sometimes not. A couple of systems have additional @code{ioctl}s that can be used to determine or at least estimate the correct size for the buffer. Solaris has introduced @code{SIOCGLIFCONF} for querying IPv6 addresses, and restricts @code{SIOCGIFCONF} to IPv4 only. (** We should actually check if that's true.) We (Ken Raeburn in particular) ran some tests on various systems to see what would happen with buffers of various sizes from much smaller to much larger than needed for the actual data. The buffers were filled with specific byte values, and then checked to see how much of the buffer was actually written to. The "largest gap" values listed below are the largest number of bytes we've seen left unused at the end of the supplied buffer when there were more entries to return. These values may of coures be dependent on the configurations of the particular systems we wre testing with. (See @code{lib/krb5/os/t_gifconf.c} for the test program.) NetBSD 1.5-alpha: The returned @code{ifc_len} is the desired amount of space, always. The returned list may be truncated if there isn't enough room; no overrun. Largest gap: 43. However, NetBSD has @code{getifaddrs}, which hides all the ugliness within the C library. BSD/OS 4.0.1 (courtesy djm): The returned @code{ifc_len} is equal to or less than the supplied @code{ifc_len}. Sometimes the entire buffer is used; sometimes N-1 bytes; occasionally, the buffer must have quite a bit of extra room before the next structure will be added. Largest gap: 39. Solaris 7,8: Return @code{EINVAL} if the buffer space is too small for all the data to be returned, including when @code{ifc_len} is 0. Solaris is the only system I've found so far that actually returns an error. No gap. However, @code{SIOCGIFNUM} may be used to query the number of interfaces. Linux 2.2.12 (Red Hat 6.1 distribution, x86), 2.4.9 (RH 7.1, x86): The buffer is filled in with as many entries as will fit, and the size used is returned in @code{ifc_len}. The list is truncated if needed, with no indication. Largest gap: 31. @emph{However}, this interface does not return any IPv6 addresses. They must be read from a file under @code{/proc}. (This appears to be what the @samp{ifconfig} program does.) IRIX 6.5.7: The buffer is filled in with as many entries as will fit in N-1 bytes, and the size used is returned in @code{ifc_len}. Providing exactly the desired number of bytes is inadequate; the buffer must be @emph{bigger} than needed. (E.g., 32->0, 33->32.) The returned @code{ifc_len} is always less than the supplied one. Largest gap: 32. AIX 4.3.3: Sometimes the returned @code{ifc_len} is bigger than the supplied one, but it may not be big enough for @emph{all} the interfaces. Sometimes it's smaller than the supplied value, even if the returned list is truncated. The list is filled in with as many entries as will fit; no overrun. Largest gap: 143. Older AIX: We're told by W. David Shambroom (DShambroom@@gte.com) in PR krb5-kdc/919 that older versions of AIX have a bug in the @code{SIOCGIFCONF} @code{ioctl} which can cause them to overrun the supplied buffer. However, we don't yet have details as to which version, whether the overrun amount was bounded (e.g., one @code{ifreq}'s worth) or not, whether it's a real buffer overrun or someone assuming it was because @code{ifc_len} was increased, etc. Once we've got details, we can try to work around the problem. Digital UNIX 4.0F: If input @code{ifc_len} is zero, return an @code{ifc_len} that's big enough to include all entries. (Actually, on our system, it appears to be larger than that by 32.) If input @code{ifc_len} is nonzero, fill in as many entries as will fit, and set @code{ifc_len} accordingly. (Tested only with buffer previously filled with zeros.) Tru64 UNIX 5.1A: Like Digital UNIX 4.0F, except the ``extra'' space indicated when the input @code{ifc_len} is zero is larger. (We got 400 out when 320 appeared to be needed.) So... if the returned @code{ifc_len} is bigger than the supplied one, we'll need at least that much space -- but possibly more -- to hold all the results. If the returned value is smaller or the same, we may still need more space. The heuristic we're using on most systems now is to keep growing the buffer until the unused space is larger than an @code{ifreq} structure by some safe margin. @node Host Address Lookup, Thread Safety, Local Addresses, Top @chapter Host Address Lookup The traditional @code{gethostbyname} function is not thread-safe, and does not support looking up IPv6 addresses, both of which are becoming more important. New standards have been in development that should address both of these problems. The most promising is @code{getaddrinfo} and friends, which is part of the Austin Group and UNIX 98(?) specifications. Code in the MIT tree is gradually being converted to use this interface. @quotation (Question: What about @code{inet_ntop} and @code{inet_pton}? We're not using them at the moment, but some bits of code would be simplified if we were to do so, when plain addresses and not socket addresses are already presented to us.) @end quotation The @code{getaddrinfo} function takes a host name and service name and returns a linked list of structures indicating the address family, length, and actual data in ``sockaddr'' form. (That is, it includes a pointer to a @code{sockaddr_in} or @code{sockaddr_in6} structure.) Depending on options set via the @code{hints} input argument, the results can be limited to a single address family (@i{e.g.}, for IPv4 applications), and the canonical name of the indicated host can be returned. Either the host or service can be a null pointer, in which case only the other is looked up; they can also be expressed in numeric form. This interface is extensible to additional address families in the future. The returned linked list can be freed with the @code{freeaddrinfo} function. The @code{getnameinfo} function does the reverse -- given an address in ``sockaddr'' form, it converts the address and port values into printable forms. Errors returned by either of these functions -- as return values, not global variables -- can be translated into printable form with the @code{gai_strerror} function. Some vendors are starting to implement @code{getaddrinfo} and friends, however, some of the implementations are deficient in one way or another. @table @asis @item AIX As of AIX 4.3.3, @code{getaddrinfo} returns sockaddr structures without the family and length fields filled in. @item GNU libc The GNU C library, used on GNU/Linux systems, has had a few problems in this area. One version would drop some IPv4 addresses for some hosts that had multiple IPv4 and IPv6 addresses. In GNU libc 2.2.4, when the DNS is used, the name referred to by PTR records for each of the addresses is looked up and stored in the @code{ai_canonname} field, or the printed numeric form of the address is, both of which are wrong. @item IRIX No known bugs here, but as of IRIX 6.5.7, the version we're using at MIT, these functions had not been implemented. @item NetBSD As of NetBSD 1.5, this function is not thread-safe. In 1.5X (intermediate code snapshot between 1.5 and 1.6 releases), the @code{ai_canonname} field can be empty, even if the @code{AI_CANONNAME} flag was passed. In particular, this can happen if a numeric host address string is provided. Also, numeric service names appear not to work unless the stream type is given; specifying the TCP protocol is not enough. @item Tru64 UNIX In Tru64 UNIX 5.0, @code{getaddrinfo} is available, but requires that @code{} be included before its use; that header file defines @code{getaddrinfo} as a macro expanding to either @code{ogetaddrinfo} or @code{ngetaddrinfo}, and apparently the symbol @code{getaddrinfo} is not present in the system library, causing the @code{configure} test for it to fail. Technically speaking, I [Ken] think Compaq has it wrong here, I think the symbol is supposed to be available even if the application uses @code{#undef}, but I have not confirmed it in the spec. @item Windows According to Windows documentation, the returned @code{ai_canonname} field can be null even if the @code{AI_CANONNAME} flag is given. @end table For most systems where @code{getaddrinfo} returns incorrect data, we've provided wrapper versions that call the system version and then try to fix up the returned data. For systems that don't provide these functions at all, we've provided replacement versions that neither are thread-safe nor support IPv6, but will allow us to convert the rest of our code to assume the availability of @code{getaddrinfo}, rather than having to use two branches everywhere, one for @code{getaddrinfo} and one for @code{gethostbyname}. These replacement functions do use @code{gethostbyname} and the like; for some systems it would be possible to use @code{gethostbyname2} or @code{gethostbyname_r} or other such functions, to provide thread safety or IPv6 support, but this has not been a priority for us, since most modern systems have these functions anyways. And if they don't, they probably don't have real IPv6 support either. Including @code{fake-addrinfo.h} will enable the wrapper or replacement versions when needed. Depending on the system configuration, this header file may define several static functions (and declare them @code{inline} under GNU C), and leave it to the compiler to discard any unused code. This may produce warnings on some systems, and if the compiler isn't being too clever, may cause several kilobytes of excess storage to be consumed on these backwards systems. Do not assume that @code{ai_canonname} will be set when the @code{AI_CANONNAME} flag is set. Check for a null pointer before using it. @node Thread Safety, Shared Libraries, Host Address Lookup, Top @chapter Thread Safety Hahahahahaha... We're not even close. We have started talking about it, though. Some stuff is ``kind of'' thread safe because it operates on a @code{krb5_context} and we simply assert that a context can be used only in one thread at a time. But there are places where we use unsafe C library functions, and a few places where we have modifiable static data in the libraries. Even if the Kerberos or C library functions aren't using static data themselves, there are other instances of per-process data that have to be dealt with before our library can become thread-safe. For example, file locking with UNIX @code{flock()} is on a per-process basis; for a single thread to be able to lock a file against accesses from other threads, we'll have to implement per-thread locks for files on top of the operating system per-process locks, and that means a global (per-process) table listing all the locks. So it seems unlikely that we will find an approach that eliminates all static modifiable data from the library. A rough proposal for hooks for implementing locking was put forth, and an IBM Linux group is experimenting with a trial implementation of it, with a few changes. A few issues with the proposal have been discussed on the @samp{krbdev} mailing list, and you can find the discussion in the list archives. @node Shared Libraries, , Thread Safety, Top @chapter Shared Libraries (These sections are old -- they should get updated.) @menu * Shared Library Theory:: * Operating System Notes for Shared Libraries:: @end menu @node Shared Library Theory, Operating System Notes for Shared Libraries, Shared Libraries, Shared Libraries @section Theory of How Shared Libraries are Used An explanation of how shared libraries are implemented on a given platform is too broad a topic for this manual. Instead this will touch on some of the issues that the Kerberos V5 tree uses to support version numbering and alternate install locations. Normally when one builds a shared library and then links with it, the name of the shared library is stored in the object (i.e. libfoo.so). Most operating systems allows one to change name that is referenced and we have done so, placing the version number into the shared library (i.e. libfoo.so.0.1). At link time, one would reference libfoo.so, but when one executes the program, the shared library loader would then look for the shared library with the alternate name. Hence multiple versions of shared libraries may be supported relatively easily. @footnote{Under AIX for the RISC/6000, multiple versions of shared libraries are supported by combining two or more versions of the shared library into one file. The Kerberos build procedure produces shared libraries with version numbers in the internal module names, so that the shared libraries are compatible with this scheme. Unfortunately, combining two shared libraries requires internal knowledge of the AIX shared library system beyond the scope of this document. Practicallyspeaking, only one version of AIX shared libraries can be supported on a system, unless the multi-version library is constructed by a programmer familiar with the AIX internals.} All operating systems (that we have seen) provide a means for programs to specify the location of shared libraries. On different operating systems, this is either specified when creating the shared library, and link time, or both.@footnote{Both are necessary sometimes as the shared libraries are dependent on other shared libraries} The build process will hardwire a path to the installed destination. @node Operating System Notes for Shared Libraries, , Shared Library Theory, Shared Libraries @section Operating System Notes for Shared Libraries From time to time users or developers suggest using GNU @code{Libtool} or some other mechanism to generate shared libraries. Experience with other packages suggests that Libtool tends to be difficult to debug and when it works incorrectly, patches are required to generated scripts to work around problems. So far, the Kerberos shared library build mechanism, which sets a variety of makefile variables based on operating system type and then uses those variables in the build process has proven to be easier to debug and adequate to the task of building shared libraries for Kerberos. @menu * NetBSD Shared Library Support:: * AIX Shared Library Support:: * Solaris Shared Library Support:: * Alpha OSF/1 Shared Library Support:: @end menu @node NetBSD Shared Library Support, AIX Shared Library Support, Operating System Notes for Shared Libraries, Operating System Notes for Shared Libraries @subsection NetBSD Shared Library Support XXX I think this is horribly out of date and reflects pre-elf NetBSD. Shared library support has been tested under NetBSD 1.0A using GCC 2.4.5. Due to the vagaries of the loader in the operating system, the library load path needs to be specified in building libraries and in linking with them. Unless the library is placed in a standard location to search for libraries, this may make it difficult for developers to work with the shared libraries. @node AIX Shared Library Support, Solaris Shared Library Support, NetBSD Shared Library Support, Operating System Notes for Shared Libraries @subsection AIX Shared Library Support AIX specifies shared library versions by combining multiple versions into a single file. Because of the complexity of this process, no automatic procedure for building multi-versioned shared libraries is provided. Therefore, supporting multiple versions of the Kerberos shared libraries under AIX will require significant work on the part of a programmer famiiliar with AIX internals. AIX allows a single library to be used both as a static library and as a shared library. For this reason, the @samp{--enable-shared} switch to configure builds only shared libraries. On other operating systems, both shared and static libraries are built when this switch is specified. As with all other operating systems, only non-shared static libraries are built when @samp{--enable-shared} is not specified. The AIX 3.2.5 linker dumps core trying to build a shared @samp{libkrb5.a} produced with the GNU C compiler. The native AIX compiler works fine. In addition, the AIX 4.1 linker is able to build a shared @samp{libkrb5.a} when GNU C is used. @node Solaris Shared Library Support, Alpha OSF/1 Shared Library Support, AIX Shared Library Support, Operating System Notes for Shared Libraries @subsection Solaris Shared Library Support Shared library support only works when using the Sunsoft C compiler. We are currently using version 3.0.1. Modern versions of Solaris do not have this problem. The path to the shared library must be specified at link time as well as when creating libraries. @node Alpha OSF/1 Shared Library Support, , Solaris Shared Library Support, Operating System Notes for Shared Libraries @subsection Alpha OSF/1 Shared Library Support Shared library support has been tested with V2.1 and higher of the operating system. Shared libraries may be compiled both with GCC and the native compiler. One of the nice features on this platform is that the paths to the shared libraries is specified in the library itself without requiring that one specify the same at link time. We are using the @samp{-rpath} option to @samp{ld} to place the library load path into the executables. The one disadvantage of this is during testing where we want to make sure that we are using the build tree instead of a possibly installed library. The loader uses the contents of @samp{-rpath} before LD_LIBRARY_PATH so we must specify a dummy _RLD_ROOT and complete LD_LIBRARY_PATH in our tests. The one disadvantage with the method we are using.... @contents @bye