summaryrefslogtreecommitdiffstats
path: root/doc/rfc/rfc1122.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1122.txt')
-rw-r--r--doc/rfc/rfc1122.txt6844
1 files changed, 6844 insertions, 0 deletions
diff --git a/doc/rfc/rfc1122.txt b/doc/rfc/rfc1122.txt
new file mode 100644
index 0000000..c14f2e5
--- /dev/null
+++ b/doc/rfc/rfc1122.txt
@@ -0,0 +1,6844 @@
+
+
+
+
+
+
+Network Working Group Internet Engineering Task Force
+Request for Comments: 1122 R. Braden, Editor
+ October 1989
+
+
+ Requirements for Internet Hosts -- Communication Layers
+
+
+Status of This Memo
+
+ This RFC is an official specification for the Internet community. It
+ incorporates by reference, amends, corrects, and supplements the
+ primary protocol standards documents relating to hosts. Distribution
+ of this document is unlimited.
+
+Summary
+
+ This is one RFC of a pair that defines and discusses the requirements
+ for Internet host software. This RFC covers the communications
+ protocol layers: link layer, IP layer, and transport layer; its
+ companion RFC-1123 covers the application and support protocols.
+
+
+
+ Table of Contents
+
+
+
+
+ 1. INTRODUCTION ............................................... 5
+ 1.1 The Internet Architecture .............................. 6
+ 1.1.1 Internet Hosts .................................... 6
+ 1.1.2 Architectural Assumptions ......................... 7
+ 1.1.3 Internet Protocol Suite ........................... 8
+ 1.1.4 Embedded Gateway Code ............................. 10
+ 1.2 General Considerations ................................. 12
+ 1.2.1 Continuing Internet Evolution ..................... 12
+ 1.2.2 Robustness Principle .............................. 12
+ 1.2.3 Error Logging ..................................... 13
+ 1.2.4 Configuration ..................................... 14
+ 1.3 Reading this Document .................................. 15
+ 1.3.1 Organization ...................................... 15
+ 1.3.2 Requirements ...................................... 16
+ 1.3.3 Terminology ....................................... 17
+ 1.4 Acknowledgments ........................................ 20
+
+ 2. LINK LAYER .................................................. 21
+ 2.1 INTRODUCTION ........................................... 21
+
+
+
+Internet Engineering Task Force [Page 1]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ 2.2 PROTOCOL WALK-THROUGH .................................. 21
+ 2.3 SPECIFIC ISSUES ........................................ 21
+ 2.3.1 Trailer Protocol Negotiation ...................... 21
+ 2.3.2 Address Resolution Protocol -- ARP ................ 22
+ 2.3.2.1 ARP Cache Validation ......................... 22
+ 2.3.2.2 ARP Packet Queue ............................. 24
+ 2.3.3 Ethernet and IEEE 802 Encapsulation ............... 24
+ 2.4 LINK/INTERNET LAYER INTERFACE .......................... 25
+ 2.5 LINK LAYER REQUIREMENTS SUMMARY ........................ 26
+
+ 3. INTERNET LAYER PROTOCOLS .................................... 27
+ 3.1 INTRODUCTION ............................................ 27
+ 3.2 PROTOCOL WALK-THROUGH .................................. 29
+ 3.2.1 Internet Protocol -- IP ............................ 29
+ 3.2.1.1 Version Number ............................... 29
+ 3.2.1.2 Checksum ..................................... 29
+ 3.2.1.3 Addressing ................................... 29
+ 3.2.1.4 Fragmentation and Reassembly ................. 32
+ 3.2.1.5 Identification ............................... 32
+ 3.2.1.6 Type-of-Service .............................. 33
+ 3.2.1.7 Time-to-Live ................................. 34
+ 3.2.1.8 Options ...................................... 35
+ 3.2.2 Internet Control Message Protocol -- ICMP .......... 38
+ 3.2.2.1 Destination Unreachable ...................... 39
+ 3.2.2.2 Redirect ..................................... 40
+ 3.2.2.3 Source Quench ................................ 41
+ 3.2.2.4 Time Exceeded ................................ 41
+ 3.2.2.5 Parameter Problem ............................ 42
+ 3.2.2.6 Echo Request/Reply ........................... 42
+ 3.2.2.7 Information Request/Reply .................... 43
+ 3.2.2.8 Timestamp and Timestamp Reply ................ 43
+ 3.2.2.9 Address Mask Request/Reply ................... 45
+ 3.2.3 Internet Group Management Protocol IGMP ........... 47
+ 3.3 SPECIFIC ISSUES ........................................ 47
+ 3.3.1 Routing Outbound Datagrams ........................ 47
+ 3.3.1.1 Local/Remote Decision ........................ 47
+ 3.3.1.2 Gateway Selection ............................ 48
+ 3.3.1.3 Route Cache .................................. 49
+ 3.3.1.4 Dead Gateway Detection ....................... 51
+ 3.3.1.5 New Gateway Selection ........................ 55
+ 3.3.1.6 Initialization ............................... 56
+ 3.3.2 Reassembly ........................................ 56
+ 3.3.3 Fragmentation ..................................... 58
+ 3.3.4 Local Multihoming ................................. 60
+ 3.3.4.1 Introduction ................................. 60
+ 3.3.4.2 Multihoming Requirements ..................... 61
+ 3.3.4.3 Choosing a Source Address .................... 64
+ 3.3.5 Source Route Forwarding ........................... 65
+
+
+
+Internet Engineering Task Force [Page 2]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ 3.3.6 Broadcasts ........................................ 66
+ 3.3.7 IP Multicasting ................................... 67
+ 3.3.8 Error Reporting ................................... 69
+ 3.4 INTERNET/TRANSPORT LAYER INTERFACE ..................... 69
+ 3.5 INTERNET LAYER REQUIREMENTS SUMMARY .................... 72
+
+ 4. TRANSPORT PROTOCOLS ......................................... 77
+ 4.1 USER DATAGRAM PROTOCOL -- UDP .......................... 77
+ 4.1.1 INTRODUCTION ...................................... 77
+ 4.1.2 PROTOCOL WALK-THROUGH ............................. 77
+ 4.1.3 SPECIFIC ISSUES ................................... 77
+ 4.1.3.1 Ports ........................................ 77
+ 4.1.3.2 IP Options ................................... 77
+ 4.1.3.3 ICMP Messages ................................ 78
+ 4.1.3.4 UDP Checksums ................................ 78
+ 4.1.3.5 UDP Multihoming .............................. 79
+ 4.1.3.6 Invalid Addresses ............................ 79
+ 4.1.4 UDP/APPLICATION LAYER INTERFACE ................... 79
+ 4.1.5 UDP REQUIREMENTS SUMMARY .......................... 80
+ 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP ................... 82
+ 4.2.1 INTRODUCTION ...................................... 82
+ 4.2.2 PROTOCOL WALK-THROUGH ............................. 82
+ 4.2.2.1 Well-Known Ports ............................. 82
+ 4.2.2.2 Use of Push .................................. 82
+ 4.2.2.3 Window Size .................................. 83
+ 4.2.2.4 Urgent Pointer ............................... 84
+ 4.2.2.5 TCP Options .................................. 85
+ 4.2.2.6 Maximum Segment Size Option .................. 85
+ 4.2.2.7 TCP Checksum ................................. 86
+ 4.2.2.8 TCP Connection State Diagram ................. 86
+ 4.2.2.9 Initial Sequence Number Selection ............ 87
+ 4.2.2.10 Simultaneous Open Attempts .................. 87
+ 4.2.2.11 Recovery from Old Duplicate SYN ............. 87
+ 4.2.2.12 RST Segment ................................. 87
+ 4.2.2.13 Closing a Connection ........................ 87
+ 4.2.2.14 Data Communication .......................... 89
+ 4.2.2.15 Retransmission Timeout ...................... 90
+ 4.2.2.16 Managing the Window ......................... 91
+ 4.2.2.17 Probing Zero Windows ........................ 92
+ 4.2.2.18 Passive OPEN Calls .......................... 92
+ 4.2.2.19 Time to Live ................................ 93
+ 4.2.2.20 Event Processing ............................ 93
+ 4.2.2.21 Acknowledging Queued Segments ............... 94
+ 4.2.3 SPECIFIC ISSUES ................................... 95
+ 4.2.3.1 Retransmission Timeout Calculation ........... 95
+ 4.2.3.2 When to Send an ACK Segment .................. 96
+ 4.2.3.3 When to Send a Window Update ................. 97
+ 4.2.3.4 When to Send Data ............................ 98
+
+
+
+Internet Engineering Task Force [Page 3]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ 4.2.3.5 TCP Connection Failures ...................... 100
+ 4.2.3.6 TCP Keep-Alives .............................. 101
+ 4.2.3.7 TCP Multihoming .............................. 103
+ 4.2.3.8 IP Options ................................... 103
+ 4.2.3.9 ICMP Messages ................................ 103
+ 4.2.3.10 Remote Address Validation ................... 104
+ 4.2.3.11 TCP Traffic Patterns ........................ 104
+ 4.2.3.12 Efficiency .................................. 105
+ 4.2.4 TCP/APPLICATION LAYER INTERFACE ................... 106
+ 4.2.4.1 Asynchronous Reports ......................... 106
+ 4.2.4.2 Type-of-Service .............................. 107
+ 4.2.4.3 Flush Call ................................... 107
+ 4.2.4.4 Multihoming .................................. 108
+ 4.2.5 TCP REQUIREMENT SUMMARY ........................... 108
+
+ 5. REFERENCES ................................................. 112
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 4]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+1. INTRODUCTION
+
+ This document is one of a pair that defines and discusses the
+ requirements for host system implementations of the Internet protocol
+ suite. This RFC covers the communication protocol layers: link
+ layer, IP layer, and transport layer. Its companion RFC,
+ "Requirements for Internet Hosts -- Application and Support"
+ [INTRO:1], covers the application layer protocols. This document
+ should also be read in conjunction with "Requirements for Internet
+ Gateways" [INTRO:2].
+
+ These documents are intended to provide guidance for vendors,
+ implementors, and users of Internet communication software. They
+ represent the consensus of a large body of technical experience and
+ wisdom, contributed by the members of the Internet research and
+ vendor communities.
+
+ This RFC enumerates standard protocols that a host connected to the
+ Internet must use, and it incorporates by reference the RFCs and
+ other documents describing the current specifications for these
+ protocols. It corrects errors in the referenced documents and adds
+ additional discussion and guidance for an implementor.
+
+ For each protocol, this document also contains an explicit set of
+ requirements, recommendations, and options. The reader must
+ understand that the list of requirements in this document is
+ incomplete by itself; the complete set of requirements for an
+ Internet host is primarily defined in the standard protocol
+ specification documents, with the corrections, amendments, and
+ supplements contained in this RFC.
+
+ A good-faith implementation of the protocols that was produced after
+ careful reading of the RFC's and with some interaction with the
+ Internet technical community, and that followed good communications
+ software engineering practices, should differ from the requirements
+ of this document in only minor ways. Thus, in many cases, the
+ "requirements" in this RFC are already stated or implied in the
+ standard protocol documents, so that their inclusion here is, in a
+ sense, redundant. However, they were included because some past
+ implementation has made the wrong choice, causing problems of
+ interoperability, performance, and/or robustness.
+
+ This document includes discussion and explanation of many of the
+ requirements and recommendations. A simple list of requirements
+ would be dangerous, because:
+
+ o Some required features are more important than others, and some
+ features are optional.
+
+
+
+Internet Engineering Task Force [Page 5]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ o There may be valid reasons why particular vendor products that
+ are designed for restricted contexts might choose to use
+ different specifications.
+
+ However, the specifications of this document must be followed to meet
+ the general goal of arbitrary host interoperation across the
+ diversity and complexity of the Internet system. Although most
+ current implementations fail to meet these requirements in various
+ ways, some minor and some major, this specification is the ideal
+ towards which we need to move.
+
+ These requirements are based on the current level of Internet
+ architecture. This document will be updated as required to provide
+ additional clarifications or to include additional information in
+ those areas in which specifications are still evolving.
+
+ This introductory section begins with a brief overview of the
+ Internet architecture as it relates to hosts, and then gives some
+ general advice to host software vendors. Finally, there is some
+ guidance on reading the rest of the document and some terminology.
+
+ 1.1 The Internet Architecture
+
+ General background and discussion on the Internet architecture and
+ supporting protocol suite can be found in the DDN Protocol
+ Handbook [INTRO:3]; for background see for example [INTRO:9],
+ [INTRO:10], and [INTRO:11]. Reference [INTRO:5] describes the
+ procedure for obtaining Internet protocol documents, while
+ [INTRO:6] contains a list of the numbers assigned within Internet
+ protocols.
+
+ 1.1.1 Internet Hosts
+
+ A host computer, or simply "host," is the ultimate consumer of
+ communication services. A host generally executes application
+ programs on behalf of user(s), employing network and/or
+ Internet communication services in support of this function.
+ An Internet host corresponds to the concept of an "End-System"
+ used in the OSI protocol suite [INTRO:13].
+
+ An Internet communication system consists of interconnected
+ packet networks supporting communication among host computers
+ using the Internet protocols. The networks are interconnected
+ using packet-switching computers called "gateways" or "IP
+ routers" by the Internet community, and "Intermediate Systems"
+ by the OSI world [INTRO:13]. The RFC "Requirements for
+ Internet Gateways" [INTRO:2] contains the official
+ specifications for Internet gateways. That RFC together with
+
+
+
+Internet Engineering Task Force [Page 6]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ the present document and its companion [INTRO:1] define the
+ rules for the current realization of the Internet architecture.
+
+ Internet hosts span a wide range of size, speed, and function.
+ They range in size from small microprocessors through
+ workstations to mainframes and supercomputers. In function,
+ they range from single-purpose hosts (such as terminal servers)
+ to full-service hosts that support a variety of online network
+ services, typically including remote login, file transfer, and
+ electronic mail.
+
+ A host is generally said to be multihomed if it has more than
+ one interface to the same or to different networks. See
+ Section 1.1.3 on "Terminology".
+
+ 1.1.2 Architectural Assumptions
+
+ The current Internet architecture is based on a set of
+ assumptions about the communication system. The assumptions
+ most relevant to hosts are as follows:
+
+ (a) The Internet is a network of networks.
+
+ Each host is directly connected to some particular
+ network(s); its connection to the Internet is only
+ conceptual. Two hosts on the same network communicate
+ with each other using the same set of protocols that they
+ would use to communicate with hosts on distant networks.
+
+ (b) Gateways don't keep connection state information.
+
+ To improve robustness of the communication system,
+ gateways are designed to be stateless, forwarding each IP
+ datagram independently of other datagrams. As a result,
+ redundant paths can be exploited to provide robust service
+ in spite of failures of intervening gateways and networks.
+
+ All state information required for end-to-end flow control
+ and reliability is implemented in the hosts, in the
+ transport layer or in application programs. All
+ connection control information is thus co-located with the
+ end points of the communication, so it will be lost only
+ if an end point fails.
+
+ (c) Routing complexity should be in the gateways.
+
+ Routing is a complex and difficult problem, and ought to
+ be performed by the gateways, not the hosts. An important
+
+
+
+Internet Engineering Task Force [Page 7]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ objective is to insulate host software from changes caused
+ by the inevitable evolution of the Internet routing
+ architecture.
+
+ (d) The System must tolerate wide network variation.
+
+ A basic objective of the Internet design is to tolerate a
+ wide range of network characteristics -- e.g., bandwidth,
+ delay, packet loss, packet reordering, and maximum packet
+ size. Another objective is robustness against failure of
+ individual networks, gateways, and hosts, using whatever
+ bandwidth is still available. Finally, the goal is full
+ "open system interconnection": an Internet host must be
+ able to interoperate robustly and effectively with any
+ other Internet host, across diverse Internet paths.
+
+ Sometimes host implementors have designed for less
+ ambitious goals. For example, the LAN environment is
+ typically much more benign than the Internet as a whole;
+ LANs have low packet loss and delay and do not reorder
+ packets. Some vendors have fielded host implementations
+ that are adequate for a simple LAN environment, but work
+ badly for general interoperation. The vendor justifies
+ such a product as being economical within the restricted
+ LAN market. However, isolated LANs seldom stay isolated
+ for long; they are soon gatewayed to each other, to
+ organization-wide internets, and eventually to the global
+ Internet system. In the end, neither the customer nor the
+ vendor is served by incomplete or substandard Internet
+ host software.
+
+ The requirements spelled out in this document are designed
+ for a full-function Internet host, capable of full
+ interoperation over an arbitrary Internet path.
+
+
+ 1.1.3 Internet Protocol Suite
+
+ To communicate using the Internet system, a host must implement
+ the layered set of protocols comprising the Internet protocol
+ suite. A host typically must implement at least one protocol
+ from each layer.
+
+ The protocol layers used in the Internet architecture are as
+ follows [INTRO:4]:
+
+
+ o Application Layer
+
+
+
+Internet Engineering Task Force [Page 8]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ The application layer is the top layer of the Internet
+ protocol suite. The Internet suite does not further
+ subdivide the application layer, although some of the
+ Internet application layer protocols do contain some
+ internal sub-layering. The application layer of the
+ Internet suite essentially combines the functions of the
+ top two layers -- Presentation and Application -- of the
+ OSI reference model.
+
+ We distinguish two categories of application layer
+ protocols: user protocols that provide service directly
+ to users, and support protocols that provide common system
+ functions. Requirements for user and support protocols
+ will be found in the companion RFC [INTRO:1].
+
+ The most common Internet user protocols are:
+
+ o Telnet (remote login)
+ o FTP (file transfer)
+ o SMTP (electronic mail delivery)
+
+ There are a number of other standardized user protocols
+ [INTRO:4] and many private user protocols.
+
+ Support protocols, used for host name mapping, booting,
+ and management, include SNMP, BOOTP, RARP, and the Domain
+ Name System (DNS) protocols.
+
+
+ o Transport Layer
+
+ The transport layer provides end-to-end communication
+ services for applications. There are two primary
+ transport layer protocols at present:
+
+ o Transmission Control Protocol (TCP)
+ o User Datagram Protocol (UDP)
+
+ TCP is a reliable connection-oriented transport service
+ that provides end-to-end reliability, resequencing, and
+ flow control. UDP is a connectionless ("datagram")
+ transport service.
+
+ Other transport protocols have been developed by the
+ research community, and the set of official Internet
+ transport protocols may be expanded in the future.
+
+ Transport layer protocols are discussed in Chapter 4.
+
+
+
+Internet Engineering Task Force [Page 9]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ o Internet Layer
+
+ All Internet transport protocols use the Internet Protocol
+ (IP) to carry data from source host to destination host.
+ IP is a connectionless or datagram internetwork service,
+ providing no end-to-end delivery guarantees. Thus, IP
+ datagrams may arrive at the destination host damaged,
+ duplicated, out of order, or not at all. The layers above
+ IP are responsible for reliable delivery service when it
+ is required. The IP protocol includes provision for
+ addressing, type-of-service specification, fragmentation
+ and reassembly, and security information.
+
+ The datagram or connectionless nature of the IP protocol
+ is a fundamental and characteristic feature of the
+ Internet architecture. Internet IP was the model for the
+ OSI Connectionless Network Protocol [INTRO:12].
+
+ ICMP is a control protocol that is considered to be an
+ integral part of IP, although it is architecturally
+ layered upon IP, i.e., it uses IP to carry its data end-
+ to-end just as a transport protocol like TCP or UDP does.
+ ICMP provides error reporting, congestion reporting, and
+ first-hop gateway redirection.
+
+ IGMP is an Internet layer protocol used for establishing
+ dynamic host groups for IP multicasting.
+
+ The Internet layer protocols IP, ICMP, and IGMP are
+ discussed in Chapter 3.
+
+
+ o Link Layer
+
+ To communicate on its directly-connected network, a host
+ must implement the communication protocol used to
+ interface to that network. We call this a link layer or
+ media-access layer protocol.
+
+ There is a wide variety of link layer protocols,
+ corresponding to the many different types of networks.
+ See Chapter 2.
+
+
+ 1.1.4 Embedded Gateway Code
+
+ Some Internet host software includes embedded gateway
+ functionality, so that these hosts can forward packets as a
+
+
+
+Internet Engineering Task Force [Page 10]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ gateway would, while still performing the application layer
+ functions of a host.
+
+ Such dual-purpose systems must follow the Gateway Requirements
+ RFC [INTRO:2] with respect to their gateway functions, and
+ must follow the present document with respect to their host
+ functions. In all overlapping cases, the two specifications
+ should be in agreement.
+
+ There are varying opinions in the Internet community about
+ embedded gateway functionality. The main arguments are as
+ follows:
+
+ o Pro: in a local network environment where networking is
+ informal, or in isolated internets, it may be convenient
+ and economical to use existing host systems as gateways.
+
+ There is also an architectural argument for embedded
+ gateway functionality: multihoming is much more common
+ than originally foreseen, and multihoming forces a host to
+ make routing decisions as if it were a gateway. If the
+ multihomed host contains an embedded gateway, it will
+ have full routing knowledge and as a result will be able
+ to make more optimal routing decisions.
+
+ o Con: Gateway algorithms and protocols are still changing,
+ and they will continue to change as the Internet system
+ grows larger. Attempting to include a general gateway
+ function within the host IP layer will force host system
+ maintainers to track these (more frequent) changes. Also,
+ a larger pool of gateway implementations will make
+ coordinating the changes more difficult. Finally, the
+ complexity of a gateway IP layer is somewhat greater than
+ that of a host, making the implementation and operation
+ tasks more complex.
+
+ In addition, the style of operation of some hosts is not
+ appropriate for providing stable and robust gateway
+ service.
+
+ There is considerable merit in both of these viewpoints. One
+ conclusion can be drawn: an host administrator must have
+ conscious control over whether or not a given host acts as a
+ gateway. See Section 3.1 for the detailed requirements.
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 11]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ 1.2 General Considerations
+
+ There are two important lessons that vendors of Internet host
+ software have learned and which a new vendor should consider
+ seriously.
+
+ 1.2.1 Continuing Internet Evolution
+
+ The enormous growth of the Internet has revealed problems of
+ management and scaling in a large datagram-based packet
+ communication system. These problems are being addressed, and
+ as a result there will be continuing evolution of the
+ specifications described in this document. These changes will
+ be carefully planned and controlled, since there is extensive
+ participation in this planning by the vendors and by the
+ organizations responsible for operations of the networks.
+
+ Development, evolution, and revision are characteristic of
+ computer network protocols today, and this situation will
+ persist for some years. A vendor who develops computer
+ communication software for the Internet protocol suite (or any
+ other protocol suite!) and then fails to maintain and update
+ that software for changing specifications is going to leave a
+ trail of unhappy customers. The Internet is a large
+ communication network, and the users are in constant contact
+ through it. Experience has shown that knowledge of
+ deficiencies in vendor software propagates quickly through the
+ Internet technical community.
+
+ 1.2.2 Robustness Principle
+
+ At every layer of the protocols, there is a general rule whose
+ application can lead to enormous benefits in robustness and
+ interoperability [IP:1]:
+
+ "Be liberal in what you accept, and
+ conservative in what you send"
+
+ Software should be written to deal with every conceivable
+ error, no matter how unlikely; sooner or later a packet will
+ come in with that particular combination of errors and
+ attributes, and unless the software is prepared, chaos can
+ ensue. In general, it is best to assume that the network is
+ filled with malevolent entities that will send in packets
+ designed to have the worst possible effect. This assumption
+ will lead to suitable protective design, although the most
+ serious problems in the Internet have been caused by
+ unenvisaged mechanisms triggered by low-probability events;
+
+
+
+Internet Engineering Task Force [Page 12]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ mere human malice would never have taken so devious a course!
+
+ Adaptability to change must be designed into all levels of
+ Internet host software. As a simple example, consider a
+ protocol specification that contains an enumeration of values
+ for a particular header field -- e.g., a type field, a port
+ number, or an error code; this enumeration must be assumed to
+ be incomplete. Thus, if a protocol specification defines four
+ possible error codes, the software must not break when a fifth
+ code shows up. An undefined code might be logged (see below),
+ but it must not cause a failure.
+
+ The second part of the principle is almost as important:
+ software on other hosts may contain deficiencies that make it
+ unwise to exploit legal but obscure protocol features. It is
+ unwise to stray far from the obvious and simple, lest untoward
+ effects result elsewhere. A corollary of this is "watch out
+ for misbehaving hosts"; host software should be prepared, not
+ just to survive other misbehaving hosts, but also to cooperate
+ to limit the amount of disruption such hosts can cause to the
+ shared communication facility.
+
+ 1.2.3 Error Logging
+
+ The Internet includes a great variety of host and gateway
+ systems, each implementing many protocols and protocol layers,
+ and some of these contain bugs and mis-features in their
+ Internet protocol software. As a result of complexity,
+ diversity, and distribution of function, the diagnosis of
+ Internet problems is often very difficult.
+
+ Problem diagnosis will be aided if host implementations include
+ a carefully designed facility for logging erroneous or
+ "strange" protocol events. It is important to include as much
+ diagnostic information as possible when an error is logged. In
+ particular, it is often useful to record the header(s) of a
+ packet that caused an error. However, care must be taken to
+ ensure that error logging does not consume prohibitive amounts
+ of resources or otherwise interfere with the operation of the
+ host.
+
+ There is a tendency for abnormal but harmless protocol events
+ to overflow error logging files; this can be avoided by using a
+ "circular" log, or by enabling logging only while diagnosing a
+ known failure. It may be useful to filter and count duplicate
+ successive messages. One strategy that seems to work well is:
+ (1) always count abnormalities and make such counts accessible
+ through the management protocol (see [INTRO:1]); and (2) allow
+
+
+
+Internet Engineering Task Force [Page 13]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ the logging of a great variety of events to be selectively
+ enabled. For example, it might useful to be able to "log
+ everything" or to "log everything for host X".
+
+ Note that different managements may have differing policies
+ about the amount of error logging that they want normally
+ enabled in a host. Some will say, "if it doesn't hurt me, I
+ don't want to know about it", while others will want to take a
+ more watchful and aggressive attitude about detecting and
+ removing protocol abnormalities.
+
+ 1.2.4 Configuration
+
+ It would be ideal if a host implementation of the Internet
+ protocol suite could be entirely self-configuring. This would
+ allow the whole suite to be implemented in ROM or cast into
+ silicon, it would simplify diskless workstations, and it would
+ be an immense boon to harried LAN administrators as well as
+ system vendors. We have not reached this ideal; in fact, we
+ are not even close.
+
+ At many points in this document, you will find a requirement
+ that a parameter be a configurable option. There are several
+ different reasons behind such requirements. In a few cases,
+ there is current uncertainty or disagreement about the best
+ value, and it may be necessary to update the recommended value
+ in the future. In other cases, the value really depends on
+ external factors -- e.g., the size of the host and the
+ distribution of its communication load, or the speeds and
+ topology of nearby networks -- and self-tuning algorithms are
+ unavailable and may be insufficient. In some cases,
+ configurability is needed because of administrative
+ requirements.
+
+ Finally, some configuration options are required to communicate
+ with obsolete or incorrect implementations of the protocols,
+ distributed without sources, that unfortunately persist in many
+ parts of the Internet. To make correct systems coexist with
+ these faulty systems, administrators often have to "mis-
+ configure" the correct systems. This problem will correct
+ itself gradually as the faulty systems are retired, but it
+ cannot be ignored by vendors.
+
+ When we say that a parameter must be configurable, we do not
+ intend to require that its value be explicitly read from a
+ configuration file at every boot time. We recommend that
+ implementors set up a default for each parameter, so a
+ configuration file is only necessary to override those defaults
+
+
+
+Internet Engineering Task Force [Page 14]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ that are inappropriate in a particular installation. Thus, the
+ configurability requirement is an assurance that it will be
+ POSSIBLE to override the default when necessary, even in a
+ binary-only or ROM-based product.
+
+ This document requires a particular value for such defaults in
+ some cases. The choice of default is a sensitive issue when
+ the configuration item controls the accommodation to existing
+ faulty systems. If the Internet is to converge successfully to
+ complete interoperability, the default values built into
+ implementations must implement the official protocol, not
+ "mis-configurations" to accommodate faulty implementations.
+ Although marketing considerations have led some vendors to
+ choose mis-configuration defaults, we urge vendors to choose
+ defaults that will conform to the standard.
+
+ Finally, we note that a vendor needs to provide adequate
+ documentation on all configuration parameters, their limits and
+ effects.
+
+
+ 1.3 Reading this Document
+
+ 1.3.1 Organization
+
+ Protocol layering, which is generally used as an organizing
+ principle in implementing network software, has also been used
+ to organize this document. In describing the rules, we assume
+ that an implementation does strictly mirror the layering of the
+ protocols. Thus, the following three major sections specify
+ the requirements for the link layer, the internet layer, and
+ the transport layer, respectively. A companion RFC [INTRO:1]
+ covers application level software. This layerist organization
+ was chosen for simplicity and clarity.
+
+ However, strict layering is an imperfect model, both for the
+ protocol suite and for recommended implementation approaches.
+ Protocols in different layers interact in complex and sometimes
+ subtle ways, and particular functions often involve multiple
+ layers. There are many design choices in an implementation,
+ many of which involve creative "breaking" of strict layering.
+ Every implementor is urged to read references [INTRO:7] and
+ [INTRO:8].
+
+ This document describes the conceptual service interface
+ between layers using a functional ("procedure call") notation,
+ like that used in the TCP specification [TCP:1]. A host
+ implementation must support the logical information flow
+
+
+
+Internet Engineering Task Force [Page 15]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ implied by these calls, but need not literally implement the
+ calls themselves. For example, many implementations reflect
+ the coupling between the transport layer and the IP layer by
+ giving them shared access to common data structures. These
+ data structures, rather than explicit procedure calls, are then
+ the agency for passing much of the information that is
+ required.
+
+ In general, each major section of this document is organized
+ into the following subsections:
+
+ (1) Introduction
+
+ (2) Protocol Walk-Through -- considers the protocol
+ specification documents section-by-section, correcting
+ errors, stating requirements that may be ambiguous or
+ ill-defined, and providing further clarification or
+ explanation.
+
+ (3) Specific Issues -- discusses protocol design and
+ implementation issues that were not included in the walk-
+ through.
+
+ (4) Interfaces -- discusses the service interface to the next
+ higher layer.
+
+ (5) Summary -- contains a summary of the requirements of the
+ section.
+
+
+ Under many of the individual topics in this document, there is
+ parenthetical material labeled "DISCUSSION" or
+ "IMPLEMENTATION". This material is intended to give
+ clarification and explanation of the preceding requirements
+ text. It also includes some suggestions on possible future
+ directions or developments. The implementation material
+ contains suggested approaches that an implementor may want to
+ consider.
+
+ The summary sections are intended to be guides and indexes to
+ the text, but are necessarily cryptic and incomplete. The
+ summaries should never be used or referenced separately from
+ the complete RFC.
+
+ 1.3.2 Requirements
+
+ In this document, the words that are used to define the
+ significance of each particular requirement are capitalized.
+
+
+
+Internet Engineering Task Force [Page 16]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ These words are:
+
+ * "MUST"
+
+ This word or the adjective "REQUIRED" means that the item
+ is an absolute requirement of the specification.
+
+ * "SHOULD"
+
+ This word or the adjective "RECOMMENDED" means that there
+ may exist valid reasons in particular circumstances to
+ ignore this item, but the full implications should be
+ understood and the case carefully weighed before choosing
+ a different course.
+
+ * "MAY"
+
+ This word or the adjective "OPTIONAL" means that this item
+ is truly optional. One vendor may choose to include the
+ item because a particular marketplace requires it or
+ because it enhances the product, for example; another
+ vendor may omit the same item.
+
+
+ An implementation is not compliant if it fails to satisfy one
+ or more of the MUST requirements for the protocols it
+ implements. An implementation that satisfies all the MUST and
+ all the SHOULD requirements for its protocols is said to be
+ "unconditionally compliant"; one that satisfies all the MUST
+ requirements but not all the SHOULD requirements for its
+ protocols is said to be "conditionally compliant".
+
+ 1.3.3 Terminology
+
+ This document uses the following technical terms:
+
+ Segment
+ A segment is the unit of end-to-end transmission in the
+ TCP protocol. A segment consists of a TCP header followed
+ by application data. A segment is transmitted by
+ encapsulation inside an IP datagram.
+
+ Message
+ In this description of the lower-layer protocols, a
+ message is the unit of transmission in a transport layer
+ protocol. In particular, a TCP segment is a message. A
+ message consists of a transport protocol header followed
+ by application protocol data. To be transmitted end-to-
+
+
+
+Internet Engineering Task Force [Page 17]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ end through the Internet, a message must be encapsulated
+ inside a datagram.
+
+ IP Datagram
+ An IP datagram is the unit of end-to-end transmission in
+ the IP protocol. An IP datagram consists of an IP header
+ followed by transport layer data, i.e., of an IP header
+ followed by a message.
+
+ In the description of the internet layer (Section 3), the
+ unqualified term "datagram" should be understood to refer
+ to an IP datagram.
+
+ Packet
+ A packet is the unit of data passed across the interface
+ between the internet layer and the link layer. It
+ includes an IP header and data. A packet may be a
+ complete IP datagram or a fragment of an IP datagram.
+
+ Frame
+ A frame is the unit of transmission in a link layer
+ protocol, and consists of a link-layer header followed by
+ a packet.
+
+ Connected Network
+ A network to which a host is interfaced is often known as
+ the "local network" or the "subnetwork" relative to that
+ host. However, these terms can cause confusion, and
+ therefore we use the term "connected network" in this
+ document.
+
+ Multihomed
+ A host is said to be multihomed if it has multiple IP
+ addresses. For a discussion of multihoming, see Section
+ 3.3.4 below.
+
+ Physical network interface
+ This is a physical interface to a connected network and
+ has a (possibly unique) link-layer address. Multiple
+ physical network interfaces on a single host may share the
+ same link-layer address, but the address must be unique
+ for different hosts on the same physical network.
+
+ Logical [network] interface
+ We define a logical [network] interface to be a logical
+ path, distinguished by a unique IP address, to a connected
+ network. See Section 3.3.4.
+
+
+
+
+Internet Engineering Task Force [Page 18]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ Specific-destination address
+ This is the effective destination address of a datagram,
+ even if it is broadcast or multicast; see Section 3.2.1.3.
+
+ Path
+ At a given moment, all the IP datagrams from a particular
+ source host to a particular destination host will
+ typically traverse the same sequence of gateways. We use
+ the term "path" for this sequence. Note that a path is
+ uni-directional; it is not unusual to have different paths
+ in the two directions between a given host pair.
+
+ MTU
+ The maximum transmission unit, i.e., the size of the
+ largest packet that can be transmitted.
+
+
+ The terms frame, packet, datagram, message, and segment are
+ illustrated by the following schematic diagrams:
+
+ A. Transmission on connected network:
+ _______________________________________________
+ | LL hdr | IP hdr | (data) |
+ |________|________|_____________________________|
+
+ <---------- Frame ----------------------------->
+ <----------Packet -------------------->
+
+
+ B. Before IP fragmentation or after IP reassembly:
+ ______________________________________
+ | IP hdr | transport| Application Data |
+ |________|____hdr___|__________________|
+
+ <-------- Datagram ------------------>
+ <-------- Message ----------->
+ or, for TCP:
+ ______________________________________
+ | IP hdr | TCP hdr | Application Data |
+ |________|__________|__________________|
+
+ <-------- Datagram ------------------>
+ <-------- Segment ----------->
+
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 19]
+
+
+
+
+RFC1122 INTRODUCTION October 1989
+
+
+ 1.4 Acknowledgments
+
+ This document incorporates contributions and comments from a large
+ group of Internet protocol experts, including representatives of
+ university and research labs, vendors, and government agencies.
+ It was assembled primarily by the Host Requirements Working Group
+ of the Internet Engineering Task Force (IETF).
+
+ The Editor would especially like to acknowledge the tireless
+ dedication of the following people, who attended many long
+ meetings and generated 3 million bytes of electronic mail over the
+ past 18 months in pursuit of this document: Philip Almquist, Dave
+ Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve
+ Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore),
+ John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG),
+ Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge
+ (BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software).
+
+ In addition, the following people made major contributions to the
+ effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia
+ (BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA),
+ Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL),
+ John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill
+ Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul
+ (DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue
+ Technology), and Mike StJohns (DCA). The following also made
+ significant contributions to particular areas: Eric Allman
+ (Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic
+ (Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn
+ (IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann
+ (Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun
+ Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen
+ (Toronto).
+
+ We are grateful to all, including any contributors who may have
+ been inadvertently omitted from this list.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 20]
+
+
+
+
+RFC1122 LINK LAYER October 1989
+
+
+2. LINK LAYER
+
+ 2.1 INTRODUCTION
+
+ All Internet systems, both hosts and gateways, have the same
+ requirements for link layer protocols. These requirements are
+ given in Chapter 3 of "Requirements for Internet Gateways"
+ [INTRO:2], augmented with the material in this section.
+
+ 2.2 PROTOCOL WALK-THROUGH
+
+ None.
+
+ 2.3 SPECIFIC ISSUES
+
+ 2.3.1 Trailer Protocol Negotiation
+
+ The trailer protocol [LINK:1] for link-layer encapsulation MAY
+ be used, but only when it has been verified that both systems
+ (host or gateway) involved in the link-layer communication
+ implement trailers. If the system does not dynamically
+ negotiate use of the trailer protocol on a per-destination
+ basis, the default configuration MUST disable the protocol.
+
+ DISCUSSION:
+ The trailer protocol is a link-layer encapsulation
+ technique that rearranges the data contents of packets
+ sent on the physical network. In some cases, trailers
+ improve the throughput of higher layer protocols by
+ reducing the amount of data copying within the operating
+ system. Higher layer protocols are unaware of trailer
+ use, but both the sending and receiving host MUST
+ understand the protocol if it is used.
+
+ Improper use of trailers can result in very confusing
+ symptoms. Only packets with specific size attributes are
+ encapsulated using trailers, and typically only a small
+ fraction of the packets being exchanged have these
+ attributes. Thus, if a system using trailers exchanges
+ packets with a system that does not, some packets
+ disappear into a black hole while others are delivered
+ successfully.
+
+ IMPLEMENTATION:
+ On an Ethernet, packets encapsulated with trailers use a
+ distinct Ethernet type [LINK:1], and trailer negotiation
+ is performed at the time that ARP is used to discover the
+ link-layer address of a destination system.
+
+
+
+Internet Engineering Task Force [Page 21]
+
+
+
+
+RFC1122 LINK LAYER October 1989
+
+
+ Specifically, the ARP exchange is completed in the usual
+ manner using the normal IP protocol type, but a host that
+ wants to speak trailers will send an additional "trailer
+ ARP reply" packet, i.e., an ARP reply that specifies the
+ trailer encapsulation protocol type but otherwise has the
+ format of a normal ARP reply. If a host configured to use
+ trailers receives a trailer ARP reply message from a
+ remote machine, it can add that machine to the list of
+ machines that understand trailers, e.g., by marking the
+ corresponding entry in the ARP cache.
+
+ Hosts wishing to receive trailer encapsulations send
+ trailer ARP replies whenever they complete exchanges of
+ normal ARP messages for IP. Thus, a host that received an
+ ARP request for its IP protocol address would send a
+ trailer ARP reply in addition to the normal IP ARP reply;
+ a host that sent the IP ARP request would send a trailer
+ ARP reply when it received the corresponding IP ARP reply.
+ In this way, either the requesting or responding host in
+ an IP ARP exchange may request that it receive trailer
+ encapsulations.
+
+ This scheme, using extra trailer ARP reply packets rather
+ than sending an ARP request for the trailer protocol type,
+ was designed to avoid a continuous exchange of ARP packets
+ with a misbehaving host that, contrary to any
+ specification or common sense, responded to an ARP reply
+ for trailers with another ARP reply for IP. This problem
+ is avoided by sending a trailer ARP reply in response to
+ an IP ARP reply only when the IP ARP reply answers an
+ outstanding request; this is true when the hardware
+ address for the host is still unknown when the IP ARP
+ reply is received. A trailer ARP reply may always be sent
+ along with an IP ARP reply responding to an IP ARP
+ request.
+
+ 2.3.2 Address Resolution Protocol -- ARP
+
+ 2.3.2.1 ARP Cache Validation
+
+ An implementation of the Address Resolution Protocol (ARP)
+ [LINK:2] MUST provide a mechanism to flush out-of-date cache
+ entries. If this mechanism involves a timeout, it SHOULD be
+ possible to configure the timeout value.
+
+ A mechanism to prevent ARP flooding (repeatedly sending an
+ ARP Request for the same IP address, at a high rate) MUST be
+ included. The recommended maximum rate is 1 per second per
+
+
+
+Internet Engineering Task Force [Page 22]
+
+
+
+
+RFC1122 LINK LAYER October 1989
+
+
+ destination.
+
+ DISCUSSION:
+ The ARP specification [LINK:2] suggests but does not
+ require a timeout mechanism to invalidate cache entries
+ when hosts change their Ethernet addresses. The
+ prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
+ has significantly increased the likelihood that cache
+ entries in hosts will become invalid, and therefore
+ some ARP-cache invalidation mechanism is now required
+ for hosts. Even in the absence of proxy ARP, a long-
+ period cache timeout is useful in order to
+ automatically correct any bad ARP data that might have
+ been cached.
+
+ IMPLEMENTATION:
+ Four mechanisms have been used, sometimes in
+ combination, to flush out-of-date cache entries.
+
+ (1) Timeout -- Periodically time out cache entries,
+ even if they are in use. Note that this timeout
+ should be restarted when the cache entry is
+ "refreshed" (by observing the source fields,
+ regardless of target address, of an ARP broadcast
+ from the system in question). For proxy ARP
+ situations, the timeout needs to be on the order
+ of a minute.
+
+ (2) Unicast Poll -- Actively poll the remote host by
+ periodically sending a point-to-point ARP Request
+ to it, and delete the entry if no ARP Reply is
+ received from N successive polls. Again, the
+ timeout should be on the order of a minute, and
+ typically N is 2.
+
+ (3) Link-Layer Advice -- If the link-layer driver
+ detects a delivery problem, flush the
+ corresponding ARP cache entry.
+
+ (4) Higher-layer Advice -- Provide a call from the
+ Internet layer to the link layer to indicate a
+ delivery problem. The effect of this call would
+ be to invalidate the corresponding cache entry.
+ This call would be analogous to the
+ "ADVISE_DELIVPROB()" call from the transport layer
+ to the Internet layer (see Section 3.4), and in
+ fact the ADVISE_DELIVPROB routine might in turn
+ call the link-layer advice routine to invalidate
+
+
+
+Internet Engineering Task Force [Page 23]
+
+
+
+
+RFC1122 LINK LAYER October 1989
+
+
+ the ARP cache entry.
+
+ Approaches (1) and (2) involve ARP cache timeouts on
+ the order of a minute or less. In the absence of proxy
+ ARP, a timeout this short could create noticeable
+ overhead traffic on a very large Ethernet. Therefore,
+ it may be necessary to configure a host to lengthen the
+ ARP cache timeout.
+
+ 2.3.2.2 ARP Packet Queue
+
+ The link layer SHOULD save (rather than discard) at least
+ one (the latest) packet of each set of packets destined to
+ the same unresolved IP address, and transmit the saved
+ packet when the address has been resolved.
+
+ DISCUSSION:
+ Failure to follow this recommendation causes the first
+ packet of every exchange to be lost. Although higher-
+ layer protocols can generally cope with packet loss by
+ retransmission, packet loss does impact performance.
+ For example, loss of a TCP open request causes the
+ initial round-trip time estimate to be inflated. UDP-
+ based applications such as the Domain Name System are
+ more seriously affected.
+
+ 2.3.3 Ethernet and IEEE 802 Encapsulation
+
+ The IP encapsulation for Ethernets is described in RFC-894
+ [LINK:3], while RFC-1042 [LINK:4] describes the IP
+ encapsulation for IEEE 802 networks. RFC-1042 elaborates and
+ replaces the discussion in Section 3.4 of [INTRO:2].
+
+ Every Internet host connected to a 10Mbps Ethernet cable:
+
+ o MUST be able to send and receive packets using RFC-894
+ encapsulation;
+
+ o SHOULD be able to receive RFC-1042 packets, intermixed
+ with RFC-894 packets; and
+
+ o MAY be able to send packets using RFC-1042 encapsulation.
+
+
+ An Internet host that implements sending both the RFC-894 and
+ the RFC-1042 encapsulations MUST provide a configuration switch
+ to select which is sent, and this switch MUST default to RFC-
+ 894.
+
+
+
+Internet Engineering Task Force [Page 24]
+
+
+
+
+RFC1122 LINK LAYER October 1989
+
+
+ Note that the standard IP encapsulation in RFC-1042 does not
+ use the protocol id value (K1=6) that IEEE reserved for IP;
+ instead, it uses a value (K1=170) that implies an extension
+ (the "SNAP") which can be used to hold the Ether-Type field.
+ An Internet system MUST NOT send 802 packets using K1=6.
+
+ Address translation from Internet addresses to link-layer
+ addresses on Ethernet and IEEE 802 networks MUST be managed by
+ the Address Resolution Protocol (ARP).
+
+ The MTU for an Ethernet is 1500 and for 802.3 is 1492.
+
+ DISCUSSION:
+ The IEEE 802.3 specification provides for operation over a
+ 10Mbps Ethernet cable, in which case Ethernet and IEEE
+ 802.3 frames can be physically intermixed. A receiver can
+ distinguish Ethernet and 802.3 frames by the value of the
+ 802.3 Length field; this two-octet field coincides in the
+ header with the Ether-Type field of an Ethernet frame. In
+ particular, the 802.3 Length field must be less than or
+ equal to 1500, while all valid Ether-Type values are
+ greater than 1500.
+
+ Another compatibility problem arises with link-layer
+ broadcasts. A broadcast sent with one framing will not be
+ seen by hosts that can receive only the other framing.
+
+ The provisions of this section were designed to provide
+ direct interoperation between 894-capable and 1042-capable
+ systems on the same cable, to the maximum extent possible.
+ It is intended to support the present situation where
+ 894-only systems predominate, while providing an easy
+ transition to a possible future in which 1042-capable
+ systems become common.
+
+ Note that 894-only systems cannot interoperate directly
+ with 1042-only systems. If the two system types are set
+ up as two different logical networks on the same cable,
+ they can communicate only through an IP gateway.
+ Furthermore, it is not useful or even possible for a
+ dual-format host to discover automatically which format to
+ send, because of the problem of link-layer broadcasts.
+
+ 2.4 LINK/INTERNET LAYER INTERFACE
+
+ The packet receive interface between the IP layer and the link
+ layer MUST include a flag to indicate whether the incoming packet
+ was addressed to a link-layer broadcast address.
+
+
+
+Internet Engineering Task Force [Page 25]
+
+
+
+
+RFC1122 LINK LAYER October 1989
+
+
+ DISCUSSION
+ Although the IP layer does not generally know link layer
+ addresses (since every different network medium typically has
+ a different address format), the broadcast address on a
+ broadcast-capable medium is an important special case. See
+ Section 3.2.2, especially the DISCUSSION concerning broadcast
+ storms.
+
+ The packet send interface between the IP and link layers MUST
+ include the 5-bit TOS field (see Section 3.2.1.6).
+
+ The link layer MUST NOT report a Destination Unreachable error to
+ IP solely because there is no ARP cache entry for a destination.
+
+ 2.5 LINK LAYER REQUIREMENTS SUMMARY
+
+ | | | | |S| |
+ | | | | |H| |F
+ | | | | |O|M|o
+ | | |S| |U|U|o
+ | | |H| |L|S|t
+ | |M|O| |D|T|n
+ | |U|U|M| | |o
+ | |S|L|A|N|N|t
+ | |T|D|Y|O|O|t
+FEATURE |SECTION| | | |T|T|e
+--------------------------------------------------|-------|-|-|-|-|-|--
+ | | | | | | |
+Trailer encapsulation |2.3.1 | | |x| | |
+Send Trailers by default without negotiation |2.3.1 | | | | |x|
+ARP |2.3.2 | | | | | |
+ Flush out-of-date ARP cache entries |2.3.2.1|x| | | | |
+ Prevent ARP floods |2.3.2.1|x| | | | |
+ Cache timeout configurable |2.3.2.1| |x| | | |
+ Save at least one (latest) unresolved pkt |2.3.2.2| |x| | | |
+Ethernet and IEEE 802 Encapsulation |2.3.3 | | | | | |
+ Host able to: |2.3.3 | | | | | |
+ Send & receive RFC-894 encapsulation |2.3.3 |x| | | | |
+ Receive RFC-1042 encapsulation |2.3.3 | |x| | | |
+ Send RFC-1042 encapsulation |2.3.3 | | |x| | |
+ Then config. sw. to select, RFC-894 dflt |2.3.3 |x| | | | |
+ Send K1=6 encapsulation |2.3.3 | | | | |x|
+ Use ARP on Ethernet and IEEE 802 nets |2.3.3 |x| | | | |
+Link layer report b'casts to IP layer |2.4 |x| | | | |
+IP layer pass TOS to link layer |2.4 |x| | | | |
+No ARP cache entry treated as Dest. Unreach. |2.4 | | | | |x|
+
+
+
+
+
+Internet Engineering Task Force [Page 26]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+3. INTERNET LAYER PROTOCOLS
+
+ 3.1 INTRODUCTION
+
+ The Robustness Principle: "Be liberal in what you accept, and
+ conservative in what you send" is particularly important in the
+ Internet layer, where one misbehaving host can deny Internet
+ service to many other hosts.
+
+ The protocol standards used in the Internet layer are:
+
+ o RFC-791 [IP:1] defines the IP protocol and gives an
+ introduction to the architecture of the Internet.
+
+ o RFC-792 [IP:2] defines ICMP, which provides routing,
+ diagnostic and error functionality for IP. Although ICMP
+ messages are encapsulated within IP datagrams, ICMP
+ processing is considered to be (and is typically implemented
+ as) part of the IP layer. See Section 3.2.2.
+
+ o RFC-950 [IP:3] defines the mandatory subnet extension to the
+ addressing architecture.
+
+ o RFC-1112 [IP:4] defines the Internet Group Management
+ Protocol IGMP, as part of a recommended extension to hosts
+ and to the host-gateway interface to support Internet-wide
+ multicasting at the IP level. See Section 3.2.3.
+
+ The target of an IP multicast may be an arbitrary group of
+ Internet hosts. IP multicasting is designed as a natural
+ extension of the link-layer multicasting facilities of some
+ networks, and it provides a standard means for local access
+ to such link-layer multicasting facilities.
+
+ Other important references are listed in Section 5 of this
+ document.
+
+ The Internet layer of host software MUST implement both IP and
+ ICMP. See Section 3.3.7 for the requirements on support of IGMP.
+
+ The host IP layer has two basic functions: (1) choose the "next
+ hop" gateway or host for outgoing IP datagrams and (2) reassemble
+ incoming IP datagrams. The IP layer may also (3) implement
+ intentional fragmentation of outgoing datagrams. Finally, the IP
+ layer must (4) provide diagnostic and error functionality. We
+ expect that IP layer functions may increase somewhat in the
+ future, as further Internet control and management facilities are
+ developed.
+
+
+
+Internet Engineering Task Force [Page 27]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ For normal datagrams, the processing is straightforward. For
+ incoming datagrams, the IP layer:
+
+ (1) verifies that the datagram is correctly formatted;
+
+ (2) verifies that it is destined to the local host;
+
+ (3) processes options;
+
+ (4) reassembles the datagram if necessary; and
+
+ (5) passes the encapsulated message to the appropriate
+ transport-layer protocol module.
+
+ For outgoing datagrams, the IP layer:
+
+ (1) sets any fields not set by the transport layer;
+
+ (2) selects the correct first hop on the connected network (a
+ process called "routing");
+
+ (3) fragments the datagram if necessary and if intentional
+ fragmentation is implemented (see Section 3.3.3); and
+
+ (4) passes the packet(s) to the appropriate link-layer driver.
+
+
+ A host is said to be multihomed if it has multiple IP addresses.
+ Multihoming introduces considerable confusion and complexity into
+ the protocol suite, and it is an area in which the Internet
+ architecture falls seriously short of solving all problems. There
+ are two distinct problem areas in multihoming:
+
+ (1) Local multihoming -- the host itself is multihomed; or
+
+ (2) Remote multihoming -- the local host needs to communicate
+ with a remote multihomed host.
+
+ At present, remote multihoming MUST be handled at the application
+ layer, as discussed in the companion RFC [INTRO:1]. A host MAY
+ support local multihoming, which is discussed in this document,
+ and in particular in Section 3.3.4.
+
+ Any host that forwards datagrams generated by another host is
+ acting as a gateway and MUST also meet the specifications laid out
+ in the gateway requirements RFC [INTRO:2]. An Internet host that
+ includes embedded gateway code MUST have a configuration switch to
+ disable the gateway function, and this switch MUST default to the
+
+
+
+Internet Engineering Task Force [Page 28]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ non-gateway mode. In this mode, a datagram arriving through one
+ interface will not be forwarded to another host or gateway (unless
+ it is source-routed), regardless of whether the host is single-
+ homed or multihomed. The host software MUST NOT automatically
+ move into gateway mode if the host has more than one interface, as
+ the operator of the machine may neither want to provide that
+ service nor be competent to do so.
+
+ In the following, the action specified in certain cases is to
+ "silently discard" a received datagram. This means that the
+ datagram will be discarded without further processing and that the
+ host will not send any ICMP error message (see Section 3.2.2) as a
+ result. However, for diagnosis of problems a host SHOULD provide
+ the capability of logging the error (see Section 1.2.3), including
+ the contents of the silently-discarded datagram, and SHOULD record
+ the event in a statistics counter.
+
+ DISCUSSION:
+ Silent discard of erroneous datagrams is generally intended
+ to prevent "broadcast storms".
+
+ 3.2 PROTOCOL WALK-THROUGH
+
+ 3.2.1 Internet Protocol -- IP
+
+ 3.2.1.1 Version Number: RFC-791 Section 3.1
+
+ A datagram whose version number is not 4 MUST be silently
+ discarded.
+
+ 3.2.1.2 Checksum: RFC-791 Section 3.1
+
+ A host MUST verify the IP header checksum on every received
+ datagram and silently discard every datagram that has a bad
+ checksum.
+
+ 3.2.1.3 Addressing: RFC-791 Section 3.2
+
+ There are now five classes of IP addresses: Class A through
+ Class E. Class D addresses are used for IP multicasting
+ [IP:4], while Class E addresses are reserved for
+ experimental use.
+
+ A multicast (Class D) address is a 28-bit logical address
+ that stands for a group of hosts, and may be either
+ permanent or transient. Permanent multicast addresses are
+ allocated by the Internet Assigned Number Authority
+ [INTRO:6], while transient addresses may be allocated
+
+
+
+Internet Engineering Task Force [Page 29]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ dynamically to transient groups. Group membership is
+ determined dynamically using IGMP [IP:4].
+
+ We now summarize the important special cases for Class A, B,
+ and C IP addresses, using the following notation for an IP
+ address:
+
+ { <Network-number>, <Host-number> }
+
+ or
+ { <Network-number>, <Subnet-number>, <Host-number> }
+
+ and the notation "-1" for a field that contains all 1 bits.
+ This notation is not intended to imply that the 1-bits in an
+ address mask need be contiguous.
+
+ (a) { 0, 0 }
+
+ This host on this network. MUST NOT be sent, except as
+ a source address as part of an initialization procedure
+ by which the host learns its own IP address.
+
+ See also Section 3.3.6 for a non-standard use of {0,0}.
+
+ (b) { 0, <Host-number> }
+
+ Specified host on this network. It MUST NOT be sent,
+ except as a source address as part of an initialization
+ procedure by which the host learns its full IP address.
+
+ (c) { -1, -1 }
+
+ Limited broadcast. It MUST NOT be used as a source
+ address.
+
+ A datagram with this destination address will be
+ received by every host on the connected physical
+ network but will not be forwarded outside that network.
+
+ (d) { <Network-number>, -1 }
+
+ Directed broadcast to the specified network. It MUST
+ NOT be used as a source address.
+
+ (e) { <Network-number>, <Subnet-number>, -1 }
+
+ Directed broadcast to the specified subnet. It MUST
+ NOT be used as a source address.
+
+
+
+Internet Engineering Task Force [Page 30]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ (f) { <Network-number>, -1, -1 }
+
+ Directed broadcast to all subnets of the specified
+ subnetted network. It MUST NOT be used as a source
+ address.
+
+ (g) { 127, <any> }
+
+ Internal host loopback address. Addresses of this form
+ MUST NOT appear outside a host.
+
+ The <Network-number> is administratively assigned so that
+ its value will be unique in the entire world.
+
+ IP addresses are not permitted to have the value 0 or -1 for
+ any of the <Host-number>, <Network-number>, or <Subnet-
+ number> fields (except in the special cases listed above).
+ This implies that each of these fields will be at least two
+ bits long.
+
+ For further discussion of broadcast addresses, see Section
+ 3.3.6.
+
+ A host MUST support the subnet extensions to IP [IP:3]. As
+ a result, there will be an address mask of the form:
+ {-1, -1, 0} associated with each of the host's local IP
+ addresses; see Sections 3.2.2.9 and 3.3.1.1.
+
+ When a host sends any datagram, the IP source address MUST
+ be one of its own IP addresses (but not a broadcast or
+ multicast address).
+
+ A host MUST silently discard an incoming datagram that is
+ not destined for the host. An incoming datagram is destined
+ for the host if the datagram's destination address field is:
+
+ (1) (one of) the host's IP address(es); or
+
+ (2) an IP broadcast address valid for the connected
+ network; or
+
+ (3) the address for a multicast group of which the host is
+ a member on the incoming physical interface.
+
+ For most purposes, a datagram addressed to a broadcast or
+ multicast destination is processed as if it had been
+ addressed to one of the host's IP addresses; we use the term
+ "specific-destination address" for the equivalent local IP
+
+
+
+Internet Engineering Task Force [Page 31]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ address of the host. The specific-destination address is
+ defined to be the destination address in the IP header
+ unless the header contains a broadcast or multicast address,
+ in which case the specific-destination is an IP address
+ assigned to the physical interface on which the datagram
+ arrived.
+
+ A host MUST silently discard an incoming datagram containing
+ an IP source address that is invalid by the rules of this
+ section. This validation could be done in either the IP
+ layer or by each protocol in the transport layer.
+
+ DISCUSSION:
+ A mis-addressed datagram might be caused by a link-
+ layer broadcast of a unicast datagram or by a gateway
+ or host that is confused or mis-configured.
+
+ An architectural goal for Internet hosts was to allow
+ IP addresses to be featureless 32-bit numbers, avoiding
+ algorithms that required a knowledge of the IP address
+ format. Otherwise, any future change in the format or
+ interpretation of IP addresses will require host
+ software changes. However, validation of broadcast and
+ multicast addresses violates this goal; a few other
+ violations are described elsewhere in this document.
+
+ Implementers should be aware that applications
+ depending upon the all-subnets directed broadcast
+ address (f) may be unusable on some networks. All-
+ subnets broadcast is not widely implemented in vendor
+ gateways at present, and even when it is implemented, a
+ particular network administration may disable it in the
+ gateway configuration.
+
+ 3.2.1.4 Fragmentation and Reassembly: RFC-791 Section 3.2
+
+ The Internet model requires that every host support
+ reassembly. See Sections 3.3.2 and 3.3.3 for the
+ requirements on fragmentation and reassembly.
+
+ 3.2.1.5 Identification: RFC-791 Section 3.2
+
+ When sending an identical copy of an earlier datagram, a
+ host MAY optionally retain the same Identification field in
+ the copy.
+
+
+
+
+
+
+Internet Engineering Task Force [Page 32]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ DISCUSSION:
+ Some Internet protocol experts have maintained that
+ when a host sends an identical copy of an earlier
+ datagram, the new copy should contain the same
+ Identification value as the original. There are two
+ suggested advantages: (1) if the datagrams are
+ fragmented and some of the fragments are lost, the
+ receiver may be able to reconstruct a complete datagram
+ from fragments of the original and the copies; (2) a
+ congested gateway might use the IP Identification field
+ (and Fragment Offset) to discard duplicate datagrams
+ from the queue.
+
+ However, the observed patterns of datagram loss in the
+ Internet do not favor the probability of retransmitted
+ fragments filling reassembly gaps, while other
+ mechanisms (e.g., TCP repacketizing upon
+ retransmission) tend to prevent retransmission of an
+ identical datagram [IP:9]. Therefore, we believe that
+ retransmitting the same Identification field is not
+ useful. Also, a connectionless transport protocol like
+ UDP would require the cooperation of the application
+ programs to retain the same Identification value in
+ identical datagrams.
+
+ 3.2.1.6 Type-of-Service: RFC-791 Section 3.2
+
+ The "Type-of-Service" byte in the IP header is divided into
+ two sections: the Precedence field (high-order 3 bits), and
+ a field that is customarily called "Type-of-Service" or
+ "TOS" (low-order 5 bits). In this document, all references
+ to "TOS" or the "TOS field" refer to the low-order 5 bits
+ only.
+
+ The Precedence field is intended for Department of Defense
+ applications of the Internet protocols. The use of non-zero
+ values in this field is outside the scope of this document
+ and the IP standard specification. Vendors should consult
+ the Defense Communication Agency (DCA) for guidance on the
+ IP Precedence field and its implications for other protocol
+ layers. However, vendors should note that the use of
+ precedence will most likely require that its value be passed
+ between protocol layers in just the same way as the TOS
+ field is passed.
+
+ The IP layer MUST provide a means for the transport layer to
+ set the TOS field of every datagram that is sent; the
+ default is all zero bits. The IP layer SHOULD pass received
+
+
+
+Internet Engineering Task Force [Page 33]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ TOS values up to the transport layer.
+
+ The particular link-layer mappings of TOS contained in RFC-
+ 795 SHOULD NOT be implemented.
+
+ DISCUSSION:
+ While the TOS field has been little used in the past,
+ it is expected to play an increasing role in the near
+ future. The TOS field is expected to be used to
+ control two aspects of gateway operations: routing and
+ queueing algorithms. See Section 2 of [INTRO:1] for
+ the requirements on application programs to specify TOS
+ values.
+
+ The TOS field may also be mapped into link-layer
+ service selectors. This has been applied to provide
+ effective sharing of serial lines by different classes
+ of TCP traffic, for example. However, the mappings
+ suggested in RFC-795 for networks that were included in
+ the Internet as of 1981 are now obsolete.
+
+ 3.2.1.7 Time-to-Live: RFC-791 Section 3.2
+
+ A host MUST NOT send a datagram with a Time-to-Live (TTL)
+ value of zero.
+
+ A host MUST NOT discard a datagram just because it was
+ received with TTL less than 2.
+
+ The IP layer MUST provide a means for the transport layer to
+ set the TTL field of every datagram that is sent. When a
+ fixed TTL value is used, it MUST be configurable. The
+ current suggested value will be published in the "Assigned
+ Numbers" RFC.
+
+ DISCUSSION:
+ The TTL field has two functions: limit the lifetime of
+ TCP segments (see RFC-793 [TCP:1], p. 28), and
+ terminate Internet routing loops. Although TTL is a
+ time in seconds, it also has some attributes of a hop-
+ count, since each gateway is required to reduce the TTL
+ field by at least one.
+
+ The intent is that TTL expiration will cause a datagram
+ to be discarded by a gateway but not by the destination
+ host; however, hosts that act as gateways by forwarding
+ datagrams must follow the gateway rules for TTL.
+
+
+
+
+Internet Engineering Task Force [Page 34]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ A higher-layer protocol may want to set the TTL in
+ order to implement an "expanding scope" search for some
+ Internet resource. This is used by some diagnostic
+ tools, and is expected to be useful for locating the
+ "nearest" server of a given class using IP
+ multicasting, for example. A particular transport
+ protocol may also want to specify its own TTL bound on
+ maximum datagram lifetime.
+
+ A fixed value must be at least big enough for the
+ Internet "diameter," i.e., the longest possible path.
+ A reasonable value is about twice the diameter, to
+ allow for continued Internet growth.
+
+ 3.2.1.8 Options: RFC-791 Section 3.2
+
+ There MUST be a means for the transport layer to specify IP
+ options to be included in transmitted IP datagrams (see
+ Section 3.4).
+
+ All IP options (except NOP or END-OF-LIST) received in
+ datagrams MUST be passed to the transport layer (or to ICMP
+ processing when the datagram is an ICMP message). The IP
+ and transport layer MUST each interpret those IP options
+ that they understand and silently ignore the others.
+
+ Later sections of this document discuss specific IP option
+ support required by each of ICMP, TCP, and UDP.
+
+ DISCUSSION:
+ Passing all received IP options to the transport layer
+ is a deliberate "violation of strict layering" that is
+ designed to ease the introduction of new transport-
+ relevant IP options in the future. Each layer must
+ pick out any options that are relevant to its own
+ processing and ignore the rest. For this purpose,
+ every IP option except NOP and END-OF-LIST will include
+ a specification of its own length.
+
+ This document does not define the order in which a
+ receiver must process multiple options in the same IP
+ header. Hosts sending multiple options must be aware
+ that this introduces an ambiguity in the meaning of
+ certain options when combined with a source-route
+ option.
+
+ IMPLEMENTATION:
+ The IP layer must not crash as the result of an option
+
+
+
+Internet Engineering Task Force [Page 35]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ length that is outside the possible range. For
+ example, erroneous option lengths have been observed to
+ put some IP implementations into infinite loops.
+
+ Here are the requirements for specific IP options:
+
+
+ (a) Security Option
+
+ Some environments require the Security option in every
+ datagram; such a requirement is outside the scope of
+ this document and the IP standard specification. Note,
+ however, that the security options described in RFC-791
+ and RFC-1038 are obsolete. For DoD applications,
+ vendors should consult [IP:8] for guidance.
+
+
+ (b) Stream Identifier Option
+
+ This option is obsolete; it SHOULD NOT be sent, and it
+ MUST be silently ignored if received.
+
+
+ (c) Source Route Options
+
+ A host MUST support originating a source route and MUST
+ be able to act as the final destination of a source
+ route.
+
+ If host receives a datagram containing a completed
+ source route (i.e., the pointer points beyond the last
+ field), the datagram has reached its final destination;
+ the option as received (the recorded route) MUST be
+ passed up to the transport layer (or to ICMP message
+ processing). This recorded route will be reversed and
+ used to form a return source route for reply datagrams
+ (see discussion of IP Options in Section 4). When a
+ return source route is built, it MUST be correctly
+ formed even if the recorded route included the source
+ host (see case (B) in the discussion below).
+
+ An IP header containing more than one Source Route
+ option MUST NOT be sent; the effect on routing of
+ multiple Source Route options is implementation-
+ specific.
+
+ Section 3.3.5 presents the rules for a host acting as
+ an intermediate hop in a source route, i.e., forwarding
+
+
+
+Internet Engineering Task Force [Page 36]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ a source-routed datagram.
+
+ DISCUSSION:
+ If a source-routed datagram is fragmented, each
+ fragment will contain a copy of the source route.
+ Since the processing of IP options (including a
+ source route) must precede reassembly, the
+ original datagram will not be reassembled until
+ the final destination is reached.
+
+ Suppose a source routed datagram is to be routed
+ from host S to host D via gateways G1, G2, ... Gn.
+ There was an ambiguity in the specification over
+ whether the source route option in a datagram sent
+ out by S should be (A) or (B):
+
+ (A): {>>G2, G3, ... Gn, D} <--- CORRECT
+
+ (B): {S, >>G2, G3, ... Gn, D} <---- WRONG
+
+ (where >> represents the pointer). If (A) is
+ sent, the datagram received at D will contain the
+ option: {G1, G2, ... Gn >>}, with S and D as the
+ IP source and destination addresses. If (B) were
+ sent, the datagram received at D would again
+ contain S and D as the same IP source and
+ destination addresses, but the option would be:
+ {S, G1, ...Gn >>}; i.e., the originating host
+ would be the first hop in the route.
+
+
+ (d) Record Route Option
+
+ Implementation of originating and processing the Record
+ Route option is OPTIONAL.
+
+
+ (e) Timestamp Option
+
+ Implementation of originating and processing the
+ Timestamp option is OPTIONAL. If it is implemented,
+ the following rules apply:
+
+ o The originating host MUST record a timestamp in a
+ Timestamp option whose Internet address fields are
+ not pre-specified or whose first pre-specified
+ address is the host's interface address.
+
+
+
+
+Internet Engineering Task Force [Page 37]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ o The destination host MUST (if possible) add the
+ current timestamp to a Timestamp option before
+ passing the option to the transport layer or to
+ ICMP for processing.
+
+ o A timestamp value MUST follow the rules given in
+ Section 3.2.2.8 for the ICMP Timestamp message.
+
+
+ 3.2.2 Internet Control Message Protocol -- ICMP
+
+ ICMP messages are grouped into two classes.
+
+ *
+ ICMP error messages:
+
+ Destination Unreachable (see Section 3.2.2.1)
+ Redirect (see Section 3.2.2.2)
+ Source Quench (see Section 3.2.2.3)
+ Time Exceeded (see Section 3.2.2.4)
+ Parameter Problem (see Section 3.2.2.5)
+
+
+ *
+ ICMP query messages:
+
+ Echo (see Section 3.2.2.6)
+ Information (see Section 3.2.2.7)
+ Timestamp (see Section 3.2.2.8)
+ Address Mask (see Section 3.2.2.9)
+
+
+ If an ICMP message of unknown type is received, it MUST be
+ silently discarded.
+
+ Every ICMP error message includes the Internet header and at
+ least the first 8 data octets of the datagram that triggered
+ the error; more than 8 octets MAY be sent; this header and data
+ MUST be unchanged from the received datagram.
+
+ In those cases where the Internet layer is required to pass an
+ ICMP error message to the transport layer, the IP protocol
+ number MUST be extracted from the original header and used to
+ select the appropriate transport protocol entity to handle the
+ error.
+
+ An ICMP error message SHOULD be sent with normal (i.e., zero)
+ TOS bits.
+
+
+
+Internet Engineering Task Force [Page 38]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ An ICMP error message MUST NOT be sent as the result of
+ receiving:
+
+ * an ICMP error message, or
+
+ * a datagram destined to an IP broadcast or IP multicast
+ address, or
+
+ * a datagram sent as a link-layer broadcast, or
+
+ * a non-initial fragment, or
+
+ * a datagram whose source address does not define a single
+ host -- e.g., a zero address, a loopback address, a
+ broadcast address, a multicast address, or a Class E
+ address.
+
+ NOTE: THESE RESTRICTIONS TAKE PRECEDENCE OVER ANY REQUIREMENT
+ ELSEWHERE IN THIS DOCUMENT FOR SENDING ICMP ERROR MESSAGES.
+
+ DISCUSSION:
+ These rules will prevent the "broadcast storms" that have
+ resulted from hosts returning ICMP error messages in
+ response to broadcast datagrams. For example, a broadcast
+ UDP segment to a non-existent port could trigger a flood
+ of ICMP Destination Unreachable datagrams from all
+ machines that do not have a client for that destination
+ port. On a large Ethernet, the resulting collisions can
+ render the network useless for a second or more.
+
+ Every datagram that is broadcast on the connected network
+ should have a valid IP broadcast address as its IP
+ destination (see Section 3.3.6). However, some hosts
+ violate this rule. To be certain to detect broadcast
+ datagrams, therefore, hosts are required to check for a
+ link-layer broadcast as well as an IP-layer broadcast
+ address.
+
+ IMPLEMENTATION:
+ This requires that the link layer inform the IP layer when
+ a link-layer broadcast datagram has been received; see
+ Section 2.4.
+
+ 3.2.2.1 Destination Unreachable: RFC-792
+
+ The following additional codes are hereby defined:
+
+ 6 = destination network unknown
+
+
+
+Internet Engineering Task Force [Page 39]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ 7 = destination host unknown
+
+ 8 = source host isolated
+
+ 9 = communication with destination network
+ administratively prohibited
+
+ 10 = communication with destination host
+ administratively prohibited
+
+ 11 = network unreachable for type of service
+
+ 12 = host unreachable for type of service
+
+ A host SHOULD generate Destination Unreachable messages with
+ code:
+
+ 2 (Protocol Unreachable), when the designated transport
+ protocol is not supported; or
+
+ 3 (Port Unreachable), when the designated transport
+ protocol (e.g., UDP) is unable to demultiplex the
+ datagram but has no protocol mechanism to inform the
+ sender.
+
+ A Destination Unreachable message that is received MUST be
+ reported to the transport layer. The transport layer SHOULD
+ use the information appropriately; for example, see Sections
+ 4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol
+ that has its own mechanism for notifying the sender that a
+ port is unreachable (e.g., TCP, which sends RST segments)
+ MUST nevertheless accept an ICMP Port Unreachable for the
+ same purpose.
+
+ A Destination Unreachable message that is received with code
+ 0 (Net), 1 (Host), or 5 (Bad Source Route) may result from a
+ routing transient and MUST therefore be interpreted as only
+ a hint, not proof, that the specified destination is
+ unreachable [IP:11]. For example, it MUST NOT be used as
+ proof of a dead gateway (see Section 3.3.1).
+
+ 3.2.2.2 Redirect: RFC-792
+
+ A host SHOULD NOT send an ICMP Redirect message; Redirects
+ are to be sent only by gateways.
+
+ A host receiving a Redirect message MUST update its routing
+ information accordingly. Every host MUST be prepared to
+
+
+
+Internet Engineering Task Force [Page 40]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ accept both Host and Network Redirects and to process them
+ as described in Section 3.3.1.2 below.
+
+ A Redirect message SHOULD be silently discarded if the new
+ gateway address it specifies is not on the same connected
+ (sub-) net through which the Redirect arrived [INTRO:2,
+ Appendix A], or if the source of the Redirect is not the
+ current first-hop gateway for the specified destination (see
+ Section 3.3.1).
+
+ 3.2.2.3 Source Quench: RFC-792
+
+ A host MAY send a Source Quench message if it is
+ approaching, or has reached, the point at which it is forced
+ to discard incoming datagrams due to a shortage of
+ reassembly buffers or other resources. See Section 2.2.3 of
+ [INTRO:2] for suggestions on when to send Source Quench.
+
+ If a Source Quench message is received, the IP layer MUST
+ report it to the transport layer (or ICMP processing). In
+ general, the transport or application layer SHOULD implement
+ a mechanism to respond to Source Quench for any protocol
+ that can send a sequence of datagrams to the same
+ destination and which can reasonably be expected to maintain
+ enough state information to make this feasible. See Section
+ 4 for the handling of Source Quench by TCP and UDP.
+
+ DISCUSSION:
+ A Source Quench may be generated by the target host or
+ by some gateway in the path of a datagram. The host
+ receiving a Source Quench should throttle itself back
+ for a period of time, then gradually increase the
+ transmission rate again. The mechanism to respond to
+ Source Quench may be in the transport layer (for
+ connection-oriented protocols like TCP) or in the
+ application layer (for protocols that are built on top
+ of UDP).
+
+ A mechanism has been proposed [IP:14] to make the IP
+ layer respond directly to Source Quench by controlling
+ the rate at which datagrams are sent, however, this
+ proposal is currently experimental and not currently
+ recommended.
+
+ 3.2.2.4 Time Exceeded: RFC-792
+
+ An incoming Time Exceeded message MUST be passed to the
+ transport layer.
+
+
+
+Internet Engineering Task Force [Page 41]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ DISCUSSION:
+ A gateway will send a Time Exceeded Code 0 (In Transit)
+ message when it discards a datagram due to an expired
+ TTL field. This indicates either a gateway routing
+ loop or too small an initial TTL value.
+
+ A host may receive a Time Exceeded Code 1 (Reassembly
+ Timeout) message from a destination host that has timed
+ out and discarded an incomplete datagram; see Section
+ 3.3.2 below. In the future, receipt of this message
+ might be part of some "MTU discovery" procedure, to
+ discover the maximum datagram size that can be sent on
+ the path without fragmentation.
+
+ 3.2.2.5 Parameter Problem: RFC-792
+
+ A host SHOULD generate Parameter Problem messages. An
+ incoming Parameter Problem message MUST be passed to the
+ transport layer, and it MAY be reported to the user.
+
+ DISCUSSION:
+ The ICMP Parameter Problem message is sent to the
+ source host for any problem not specifically covered by
+ another ICMP message. Receipt of a Parameter Problem
+ message generally indicates some local or remote
+ implementation error.
+
+ A new variant on the Parameter Problem message is hereby
+ defined:
+ Code 1 = required option is missing.
+
+ DISCUSSION:
+ This variant is currently in use in the military
+ community for a missing security option.
+
+ 3.2.2.6 Echo Request/Reply: RFC-792
+
+ Every host MUST implement an ICMP Echo server function that
+ receives Echo Requests and sends corresponding Echo Replies.
+ A host SHOULD also implement an application-layer interface
+ for sending an Echo Request and receiving an Echo Reply, for
+ diagnostic purposes.
+
+ An ICMP Echo Request destined to an IP broadcast or IP
+ multicast address MAY be silently discarded.
+
+
+
+
+
+
+Internet Engineering Task Force [Page 42]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ DISCUSSION:
+ This neutral provision results from a passionate debate
+ between those who feel that ICMP Echo to a broadcast
+ address provides a valuable diagnostic capability and
+ those who feel that misuse of this feature can too
+ easily create packet storms.
+
+ The IP source address in an ICMP Echo Reply MUST be the same
+ as the specific-destination address (defined in Section
+ 3.2.1.3) of the corresponding ICMP Echo Request message.
+
+ Data received in an ICMP Echo Request MUST be entirely
+ included in the resulting Echo Reply. However, if sending
+ the Echo Reply requires intentional fragmentation that is
+ not implemented, the datagram MUST be truncated to maximum
+ transmission size (see Section 3.3.3) and sent.
+
+ Echo Reply messages MUST be passed to the ICMP user
+ interface, unless the corresponding Echo Request originated
+ in the IP layer.
+
+ If a Record Route and/or Time Stamp option is received in an
+ ICMP Echo Request, this option (these options) SHOULD be
+ updated to include the current host and included in the IP
+ header of the Echo Reply message, without "truncation".
+ Thus, the recorded route will be for the entire round trip.
+
+ If a Source Route option is received in an ICMP Echo
+ Request, the return route MUST be reversed and used as a
+ Source Route option for the Echo Reply message.
+
+ 3.2.2.7 Information Request/Reply: RFC-792
+
+ A host SHOULD NOT implement these messages.
+
+ DISCUSSION:
+ The Information Request/Reply pair was intended to
+ support self-configuring systems such as diskless
+ workstations, to allow them to discover their IP
+ network numbers at boot time. However, the RARP and
+ BOOTP protocols provide better mechanisms for a host to
+ discover its own IP address.
+
+ 3.2.2.8 Timestamp and Timestamp Reply: RFC-792
+
+ A host MAY implement Timestamp and Timestamp Reply. If they
+ are implemented, the following rules MUST be followed.
+
+
+
+
+Internet Engineering Task Force [Page 43]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ o The ICMP Timestamp server function returns a Timestamp
+ Reply to every Timestamp message that is received. If
+ this function is implemented, it SHOULD be designed for
+ minimum variability in delay (e.g., implemented in the
+ kernel to avoid delay in scheduling a user process).
+
+ The following cases for Timestamp are to be handled
+ according to the corresponding rules for ICMP Echo:
+
+ o An ICMP Timestamp Request message to an IP broadcast or
+ IP multicast address MAY be silently discarded.
+
+ o The IP source address in an ICMP Timestamp Reply MUST
+ be the same as the specific-destination address of the
+ corresponding Timestamp Request message.
+
+ o If a Source-route option is received in an ICMP Echo
+ Request, the return route MUST be reversed and used as
+ a Source Route option for the Timestamp Reply message.
+
+ o If a Record Route and/or Timestamp option is received
+ in a Timestamp Request, this (these) option(s) SHOULD
+ be updated to include the current host and included in
+ the IP header of the Timestamp Reply message.
+
+ o Incoming Timestamp Reply messages MUST be passed up to
+ the ICMP user interface.
+
+ The preferred form for a timestamp value (the "standard
+ value") is in units of milliseconds since midnight Universal
+ Time. However, it may be difficult to provide this value
+ with millisecond resolution. For example, many systems use
+ clocks that update only at line frequency, 50 or 60 times
+ per second. Therefore, some latitude is allowed in a
+ "standard value":
+
+ (a) A "standard value" MUST be updated at least 15 times
+ per second (i.e., at most the six low-order bits of the
+ value may be undefined).
+
+ (b) The accuracy of a "standard value" MUST approximate
+ that of operator-set CPU clocks, i.e., correct within a
+ few minutes.
+
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 44]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ 3.2.2.9 Address Mask Request/Reply: RFC-950
+
+ A host MUST support the first, and MAY implement all three,
+ of the following methods for determining the address mask(s)
+ corresponding to its IP address(es):
+
+ (1) static configuration information;
+
+ (2) obtaining the address mask(s) dynamically as a side-
+ effect of the system initialization process (see
+ [INTRO:1]); and
+
+ (3) sending ICMP Address Mask Request(s) and receiving ICMP
+ Address Mask Reply(s).
+
+ The choice of method to be used in a particular host MUST be
+ configurable.
+
+ When method (3), the use of Address Mask messages, is
+ enabled, then:
+
+ (a) When it initializes, the host MUST broadcast an Address
+ Mask Request message on the connected network
+ corresponding to the IP address. It MUST retransmit
+ this message a small number of times if it does not
+ receive an immediate Address Mask Reply.
+
+ (b) Until it has received an Address Mask Reply, the host
+ SHOULD assume a mask appropriate for the address class
+ of the IP address, i.e., assume that the connected
+ network is not subnetted.
+
+ (c) The first Address Mask Reply message received MUST be
+ used to set the address mask corresponding to the
+ particular local IP address. This is true even if the
+ first Address Mask Reply message is "unsolicited", in
+ which case it will have been broadcast and may arrive
+ after the host has ceased to retransmit Address Mask
+ Requests. Once the mask has been set by an Address
+ Mask Reply, later Address Mask Reply messages MUST be
+ (silently) ignored.
+
+ Conversely, if Address Mask messages are disabled, then no
+ ICMP Address Mask Requests will be sent, and any ICMP
+ Address Mask Replies received for that local IP address MUST
+ be (silently) ignored.
+
+ A host SHOULD make some reasonableness check on any address
+
+
+
+Internet Engineering Task Force [Page 45]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ mask it installs; see IMPLEMENTATION section below.
+
+ A system MUST NOT send an Address Mask Reply unless it is an
+ authoritative agent for address masks. An authoritative
+ agent may be a host or a gateway, but it MUST be explicitly
+ configured as a address mask agent. Receiving an address
+ mask via an Address Mask Reply does not give the receiver
+ authority and MUST NOT be used as the basis for issuing
+ Address Mask Replies.
+
+ With a statically configured address mask, there SHOULD be
+ an additional configuration flag that determines whether the
+ host is to act as an authoritative agent for this mask,
+ i.e., whether it will answer Address Mask Request messages
+ using this mask.
+
+ If it is configured as an agent, the host MUST broadcast an
+ Address Mask Reply for the mask on the appropriate interface
+ when it initializes.
+
+ See "System Initialization" in [INTRO:1] for more
+ information about the use of Address Mask Request/Reply
+ messages.
+
+ DISCUSSION
+ Hosts that casually send Address Mask Replies with
+ invalid address masks have often been a serious
+ nuisance. To prevent this, Address Mask Replies ought
+ to be sent only by authoritative agents that have been
+ selected by explicit administrative action.
+
+ When an authoritative agent receives an Address Mask
+ Request message, it will send a unicast Address Mask
+ Reply to the source IP address. If the network part of
+ this address is zero (see (a) and (b) in 3.2.1.3), the
+ Reply will be broadcast.
+
+ Getting no reply to its Address Mask Request messages,
+ a host will assume there is no agent and use an
+ unsubnetted mask, but the agent may be only temporarily
+ unreachable. An agent will broadcast an unsolicited
+ Address Mask Reply whenever it initializes, in order to
+ update the masks of all hosts that have initialized in
+ the meantime.
+
+ IMPLEMENTATION:
+ The following reasonableness check on an address mask
+ is suggested: the mask is not all 1 bits, and it is
+
+
+
+Internet Engineering Task Force [Page 46]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ either zero or else the 8 highest-order bits are on.
+
+ 3.2.3 Internet Group Management Protocol IGMP
+
+ IGMP [IP:4] is a protocol used between hosts and gateways on a
+ single network to establish hosts' membership in particular
+ multicast groups. The gateways use this information, in
+ conjunction with a multicast routing protocol, to support IP
+ multicasting across the Internet.
+
+ At this time, implementation of IGMP is OPTIONAL; see Section
+ 3.3.7 for more information. Without IGMP, a host can still
+ participate in multicasting local to its connected networks.
+
+ 3.3 SPECIFIC ISSUES
+
+ 3.3.1 Routing Outbound Datagrams
+
+ The IP layer chooses the correct next hop for each datagram it
+ sends. If the destination is on a connected network, the
+ datagram is sent directly to the destination host; otherwise,
+ it has to be routed to a gateway on a connected network.
+
+ 3.3.1.1 Local/Remote Decision
+
+ To decide if the destination is on a connected network, the
+ following algorithm MUST be used [see IP:3]:
+
+ (a) The address mask (particular to a local IP address for
+ a multihomed host) is a 32-bit mask that selects the
+ network number and subnet number fields of the
+ corresponding IP address.
+
+ (b) If the IP destination address bits extracted by the
+ address mask match the IP source address bits extracted
+ by the same mask, then the destination is on the
+ corresponding connected network, and the datagram is to
+ be transmitted directly to the destination host.
+
+ (c) If not, then the destination is accessible only through
+ a gateway. Selection of a gateway is described below
+ (3.3.1.2).
+
+ A special-case destination address is handled as follows:
+
+ * For a limited broadcast or a multicast address, simply
+ pass the datagram to the link layer for the appropriate
+ interface.
+
+
+
+Internet Engineering Task Force [Page 47]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ * For a (network or subnet) directed broadcast, the
+ datagram can use the standard routing algorithms.
+
+ The host IP layer MUST operate correctly in a minimal
+ network environment, and in particular, when there are no
+ gateways. For example, if the IP layer of a host insists on
+ finding at least one gateway to initialize, the host will be
+ unable to operate on a single isolated broadcast net.
+
+ 3.3.1.2 Gateway Selection
+
+ To efficiently route a series of datagrams to the same
+ destination, the source host MUST keep a "route cache" of
+ mappings to next-hop gateways. A host uses the following
+ basic algorithm on this cache to route a datagram; this
+ algorithm is designed to put the primary routing burden on
+ the gateways [IP:11].
+
+ (a) If the route cache contains no information for a
+ particular destination, the host chooses a "default"
+ gateway and sends the datagram to it. It also builds a
+ corresponding Route Cache entry.
+
+ (b) If that gateway is not the best next hop to the
+ destination, the gateway will forward the datagram to
+ the best next-hop gateway and return an ICMP Redirect
+ message to the source host.
+
+ (c) When it receives a Redirect, the host updates the
+ next-hop gateway in the appropriate route cache entry,
+ so later datagrams to the same destination will go
+ directly to the best gateway.
+
+ Since the subnet mask appropriate to the destination address
+ is generally not known, a Network Redirect message SHOULD be
+ treated identically to a Host Redirect message; i.e., the
+ cache entry for the destination host (only) would be updated
+ (or created, if an entry for that host did not exist) for
+ the new gateway.
+
+ DISCUSSION:
+ This recommendation is to protect against gateways that
+ erroneously send Network Redirects for a subnetted
+ network, in violation of the gateway requirements
+ [INTRO:2].
+
+ When there is no route cache entry for the destination host
+ address (and the destination is not on the connected
+
+
+
+Internet Engineering Task Force [Page 48]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ network), the IP layer MUST pick a gateway from its list of
+ "default" gateways. The IP layer MUST support multiple
+ default gateways.
+
+ As an extra feature, a host IP layer MAY implement a table
+ of "static routes". Each such static route MAY include a
+ flag specifying whether it may be overridden by ICMP
+ Redirects.
+
+ DISCUSSION:
+ A host generally needs to know at least one default
+ gateway to get started. This information can be
+ obtained from a configuration file or else from the
+ host startup sequence, e.g., the BOOTP protocol (see
+ [INTRO:1]).
+
+ It has been suggested that a host can augment its list
+ of default gateways by recording any new gateways it
+ learns about. For example, it can record every gateway
+ to which it is ever redirected. Such a feature, while
+ possibly useful in some circumstances, may cause
+ problems in other cases (e.g., gateways are not all
+ equal), and it is not recommended.
+
+ A static route is typically a particular preset mapping
+ from destination host or network into a particular
+ next-hop gateway; it might also depend on the Type-of-
+ Service (see next section). Static routes would be set
+ up by system administrators to override the normal
+ automatic routing mechanism, to handle exceptional
+ situations. However, any static routing information is
+ a potential source of failure as configurations change
+ or equipment fails.
+
+ 3.3.1.3 Route Cache
+
+ Each route cache entry needs to include the following
+ fields:
+
+ (1) Local IP address (for a multihomed host)
+
+ (2) Destination IP address
+
+ (3) Type(s)-of-Service
+
+ (4) Next-hop gateway IP address
+
+ Field (2) MAY be the full IP address of the destination
+
+
+
+Internet Engineering Task Force [Page 49]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ host, or only the destination network number. Field (3),
+ the TOS, SHOULD be included.
+
+ See Section 3.3.4.2 for a discussion of the implications of
+ multihoming for the lookup procedure in this cache.
+
+ DISCUSSION:
+ Including the Type-of-Service field in the route cache
+ and considering it in the host route algorithm will
+ provide the necessary mechanism for the future when
+ Type-of-Service routing is commonly used in the
+ Internet. See Section 3.2.1.6.
+
+ Each route cache entry defines the endpoints of an
+ Internet path. Although the connecting path may change
+ dynamically in an arbitrary way, the transmission
+ characteristics of the path tend to remain
+ approximately constant over a time period longer than a
+ single typical host-host transport connection.
+ Therefore, a route cache entry is a natural place to
+ cache data on the properties of the path. Examples of
+ such properties might be the maximum unfragmented
+ datagram size (see Section 3.3.3), or the average
+ round-trip delay measured by a transport protocol.
+ This data will generally be both gathered and used by a
+ higher layer protocol, e.g., by TCP, or by an
+ application using UDP. Experiments are currently in
+ progress on caching path properties in this manner.
+
+ There is no consensus on whether the route cache should
+ be keyed on destination host addresses alone, or allow
+ both host and network addresses. Those who favor the
+ use of only host addresses argue that:
+
+ (1) As required in Section 3.3.1.2, Redirect messages
+ will generally result in entries keyed on
+ destination host addresses; the simplest and most
+ general scheme would be to use host addresses
+ always.
+
+ (2) The IP layer may not always know the address mask
+ for a network address in a complex subnetted
+ environment.
+
+ (3) The use of only host addresses allows the
+ destination address to be used as a pure 32-bit
+ number, which may allow the Internet architecture
+ to be more easily extended in the future without
+
+
+
+Internet Engineering Task Force [Page 50]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ any change to the hosts.
+
+ The opposing view is that allowing a mixture of
+ destination hosts and networks in the route cache:
+
+ (1) Saves memory space.
+
+ (2) Leads to a simpler data structure, easily
+ combining the cache with the tables of default and
+ static routes (see below).
+
+ (3) Provides a more useful place to cache path
+ properties, as discussed earlier.
+
+
+ IMPLEMENTATION:
+ The cache needs to be large enough to include entries
+ for the maximum number of destination hosts that may be
+ in use at one time.
+
+ A route cache entry may also include control
+ information used to choose an entry for replacement.
+ This might take the form of a "recently used" bit, a
+ use count, or a last-used timestamp, for example. It
+ is recommended that it include the time of last
+ modification of the entry, for diagnostic purposes.
+
+ An implementation may wish to reduce the overhead of
+ scanning the route cache for every datagram to be
+ transmitted. This may be accomplished with a hash
+ table to speed the lookup, or by giving a connection-
+ oriented transport protocol a "hint" or temporary
+ handle on the appropriate cache entry, to be passed to
+ the IP layer with each subsequent datagram.
+
+ Although we have described the route cache, the lists
+ of default gateways, and a table of static routes as
+ conceptually distinct, in practice they may be combined
+ into a single "routing table" data structure.
+
+ 3.3.1.4 Dead Gateway Detection
+
+ The IP layer MUST be able to detect the failure of a "next-
+ hop" gateway that is listed in its route cache and to choose
+ an alternate gateway (see Section 3.3.1.5).
+
+ Dead gateway detection is covered in some detail in RFC-816
+ [IP:11]. Experience to date has not produced a complete
+
+
+
+Internet Engineering Task Force [Page 51]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ algorithm which is totally satisfactory, though it has
+ identified several forbidden paths and promising techniques.
+
+ * A particular gateway SHOULD NOT be used indefinitely in
+ the absence of positive indications that it is
+ functioning.
+
+ * Active probes such as "pinging" (i.e., using an ICMP
+ Echo Request/Reply exchange) are expensive and scale
+ poorly. In particular, hosts MUST NOT actively check
+ the status of a first-hop gateway by simply pinging the
+ gateway continuously.
+
+ * Even when it is the only effective way to verify a
+ gateway's status, pinging MUST be used only when
+ traffic is being sent to the gateway and when there is
+ no other positive indication to suggest that the
+ gateway is functioning.
+
+ * To avoid pinging, the layers above and/or below the
+ Internet layer SHOULD be able to give "advice" on the
+ status of route cache entries when either positive
+ (gateway OK) or negative (gateway dead) information is
+ available.
+
+
+ DISCUSSION:
+ If an implementation does not include an adequate
+ mechanism for detecting a dead gateway and re-routing,
+ a gateway failure may cause datagrams to apparently
+ vanish into a "black hole". This failure can be
+ extremely confusing for users and difficult for network
+ personnel to debug.
+
+ The dead-gateway detection mechanism must not cause
+ unacceptable load on the host, on connected networks,
+ or on first-hop gateway(s). The exact constraints on
+ the timeliness of dead gateway detection and on
+ acceptable load may vary somewhat depending on the
+ nature of the host's mission, but a host generally
+ needs to detect a failed first-hop gateway quickly
+ enough that transport-layer connections will not break
+ before an alternate gateway can be selected.
+
+ Passing advice from other layers of the protocol stack
+ complicates the interfaces between the layers, but it
+ is the preferred approach to dead gateway detection.
+ Advice can come from almost any part of the IP/TCP
+
+
+
+Internet Engineering Task Force [Page 52]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ architecture, but it is expected to come primarily from
+ the transport and link layers. Here are some possible
+ sources for gateway advice:
+
+ o TCP or any connection-oriented transport protocol
+ should be able to give negative advice, e.g.,
+ triggered by excessive retransmissions.
+
+ o TCP may give positive advice when (new) data is
+ acknowledged. Even though the route may be
+ asymmetric, an ACK for new data proves that the
+ acknowleged data must have been transmitted
+ successfully.
+
+ o An ICMP Redirect message from a particular gateway
+ should be used as positive advice about that
+ gateway.
+
+ o Link-layer information that reliably detects and
+ reports host failures (e.g., ARPANET Destination
+ Dead messages) should be used as negative advice.
+
+ o Failure to ARP or to re-validate ARP mappings may
+ be used as negative advice for the corresponding
+ IP address.
+
+ o Packets arriving from a particular link-layer
+ address are evidence that the system at this
+ address is alive. However, turning this
+ information into advice about gateways requires
+ mapping the link-layer address into an IP address,
+ and then checking that IP address against the
+ gateways pointed to by the route cache. This is
+ probably prohibitively inefficient.
+
+ Note that positive advice that is given for every
+ datagram received may cause unacceptable overhead in
+ the implementation.
+
+ While advice might be passed using required arguments
+ in all interfaces to the IP layer, some transport and
+ application layer protocols cannot deduce the correct
+ advice. These interfaces must therefore allow a
+ neutral value for advice, since either always-positive
+ or always-negative advice leads to incorrect behavior.
+
+ There is another technique for dead gateway detection
+ that has been commonly used but is not recommended.
+
+
+
+Internet Engineering Task Force [Page 53]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ This technique depends upon the host passively
+ receiving ("wiretapping") the Interior Gateway Protocol
+ (IGP) datagrams that the gateways are broadcasting to
+ each other. This approach has the drawback that a host
+ needs to recognize all the interior gateway protocols
+ that gateways may use (see [INTRO:2]). In addition, it
+ only works on a broadcast network.
+
+ At present, pinging (i.e., using ICMP Echo messages) is
+ the mechanism for gateway probing when absolutely
+ required. A successful ping guarantees that the
+ addressed interface and its associated machine are up,
+ but it does not guarantee that the machine is a gateway
+ as opposed to a host. The normal inference is that if
+ a Redirect or other evidence indicates that a machine
+ was a gateway, successful pings will indicate that the
+ machine is still up and hence still a gateway.
+ However, since a host silently discards packets that a
+ gateway would forward or redirect, this assumption
+ could sometimes fail. To avoid this problem, a new
+ ICMP message under development will ask "are you a
+ gateway?"
+
+ IMPLEMENTATION:
+ The following specific algorithm has been suggested:
+
+ o Associate a "reroute timer" with each gateway
+ pointed to by the route cache. Initialize the
+ timer to a value Tr, which must be small enough to
+ allow detection of a dead gateway before transport
+ connections time out.
+
+ o Positive advice would reset the reroute timer to
+ Tr. Negative advice would reduce or zero the
+ reroute timer.
+
+ o Whenever the IP layer used a particular gateway to
+ route a datagram, it would check the corresponding
+ reroute timer. If the timer had expired (reached
+ zero), the IP layer would send a ping to the
+ gateway, followed immediately by the datagram.
+
+ o The ping (ICMP Echo) would be sent again if
+ necessary, up to N times. If no ping reply was
+ received in N tries, the gateway would be assumed
+ to have failed, and a new first-hop gateway would
+ be chosen for all cache entries pointing to the
+ failed gateway.
+
+
+
+Internet Engineering Task Force [Page 54]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ Note that the size of Tr is inversely related to the
+ amount of advice available. Tr should be large enough
+ to insure that:
+
+ * Any pinging will be at a low level (e.g., <10%) of
+ all packets sent to a gateway from the host, AND
+
+ * pinging is infrequent (e.g., every 3 minutes)
+
+ Since the recommended algorithm is concerned with the
+ gateways pointed to by route cache entries, rather than
+ the cache entries themselves, a two level data
+ structure (perhaps coordinated with ARP or similar
+ caches) may be desirable for implementing a route
+ cache.
+
+ 3.3.1.5 New Gateway Selection
+
+ If the failed gateway is not the current default, the IP
+ layer can immediately switch to a default gateway. If it is
+ the current default that failed, the IP layer MUST select a
+ different default gateway (assuming more than one default is
+ known) for the failed route and for establishing new routes.
+
+ DISCUSSION:
+ When a gateway does fail, the other gateways on the
+ connected network will learn of the failure through
+ some inter-gateway routing protocol. However, this
+ will not happen instantaneously, since gateway routing
+ protocols typically have a settling time of 30-60
+ seconds. If the host switches to an alternative
+ gateway before the gateways have agreed on the failure,
+ the new target gateway will probably forward the
+ datagram to the failed gateway and send a Redirect back
+ to the host pointing to the failed gateway (!). The
+ result is likely to be a rapid oscillation in the
+ contents of the host's route cache during the gateway
+ settling period. It has been proposed that the dead-
+ gateway logic should include some hysteresis mechanism
+ to prevent such oscillations. However, experience has
+ not shown any harm from such oscillations, since
+ service cannot be restored to the host until the
+ gateways' routing information does settle down.
+
+ IMPLEMENTATION:
+ One implementation technique for choosing a new default
+ gateway is to simply round-robin among the default
+ gateways in the host's list. Another is to rank the
+
+
+
+Internet Engineering Task Force [Page 55]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ gateways in priority order, and when the current
+ default gateway is not the highest priority one, to
+ "ping" the higher-priority gateways slowly to detect
+ when they return to service. This pinging can be at a
+ very low rate, e.g., 0.005 per second.
+
+ 3.3.1.6 Initialization
+
+ The following information MUST be configurable:
+
+ (1) IP address(es).
+
+ (2) Address mask(s).
+
+ (3) A list of default gateways, with a preference level.
+
+ A manual method of entering this configuration data MUST be
+ provided. In addition, a variety of methods can be used to
+ determine this information dynamically; see the section on
+ "Host Initialization" in [INTRO:1].
+
+ DISCUSSION:
+ Some host implementations use "wiretapping" of gateway
+ protocols on a broadcast network to learn what gateways
+ exist. A standard method for default gateway discovery
+ is under development.
+
+ 3.3.2 Reassembly
+
+ The IP layer MUST implement reassembly of IP datagrams.
+
+ We designate the largest datagram size that can be reassembled
+ by EMTU_R ("Effective MTU to receive"); this is sometimes
+ called the "reassembly buffer size". EMTU_R MUST be greater
+ than or equal to 576, SHOULD be either configurable or
+ indefinite, and SHOULD be greater than or equal to the MTU of
+ the connected network(s).
+
+ DISCUSSION:
+ A fixed EMTU_R limit should not be built into the code
+ because some application layer protocols require EMTU_R
+ values larger than 576.
+
+ IMPLEMENTATION:
+ An implementation may use a contiguous reassembly buffer
+ for each datagram, or it may use a more complex data
+ structure that places no definite limit on the reassembled
+ datagram size; in the latter case, EMTU_R is said to be
+
+
+
+Internet Engineering Task Force [Page 56]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ "indefinite".
+
+ Logically, reassembly is performed by simply copying each
+ fragment into the packet buffer at the proper offset.
+ Note that fragments may overlap if successive
+ retransmissions use different packetizing but the same
+ reassembly Id.
+
+ The tricky part of reassembly is the bookkeeping to
+ determine when all bytes of the datagram have been
+ reassembled. We recommend Clark's algorithm [IP:10] that
+ requires no additional data space for the bookkeeping.
+ However, note that, contrary to [IP:10], the first
+ fragment header needs to be saved for inclusion in a
+ possible ICMP Time Exceeded (Reassembly Timeout) message.
+
+ There MUST be a mechanism by which the transport layer can
+ learn MMS_R, the maximum message size that can be received and
+ reassembled in an IP datagram (see GET_MAXSIZES calls in
+ Section 3.4). If EMTU_R is not indefinite, then the value of
+ MMS_R is given by:
+
+ MMS_R = EMTU_R - 20
+
+ since 20 is the minimum size of an IP header.
+
+ There MUST be a reassembly timeout. The reassembly timeout
+ value SHOULD be a fixed value, not set from the remaining TTL.
+ It is recommended that the value lie between 60 seconds and 120
+ seconds. If this timeout expires, the partially-reassembled
+ datagram MUST be discarded and an ICMP Time Exceeded message
+ sent to the source host (if fragment zero has been received).
+
+ DISCUSSION:
+ The IP specification says that the reassembly timeout
+ should be the remaining TTL from the IP header, but this
+ does not work well because gateways generally treat TTL as
+ a simple hop count rather than an elapsed time. If the
+ reassembly timeout is too small, datagrams will be
+ discarded unnecessarily, and communication may fail. The
+ timeout needs to be at least as large as the typical
+ maximum delay across the Internet. A realistic minimum
+ reassembly timeout would be 60 seconds.
+
+ It has been suggested that a cache might be kept of
+ round-trip times measured by transport protocols for
+ various destinations, and that these values might be used
+ to dynamically determine a reasonable reassembly timeout
+
+
+
+Internet Engineering Task Force [Page 57]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ value. Further investigation of this approach is
+ required.
+
+ If the reassembly timeout is set too high, buffer
+ resources in the receiving host will be tied up too long,
+ and the MSL (Maximum Segment Lifetime) [TCP:1] will be
+ larger than necessary. The MSL controls the maximum rate
+ at which fragmented datagrams can be sent using distinct
+ values of the 16-bit Ident field; a larger MSL lowers the
+ maximum rate. The TCP specification [TCP:1] arbitrarily
+ assumes a value of 2 minutes for MSL. This sets an upper
+ limit on a reasonable reassembly timeout value.
+
+ 3.3.3 Fragmentation
+
+ Optionally, the IP layer MAY implement a mechanism to fragment
+ outgoing datagrams intentionally.
+
+ We designate by EMTU_S ("Effective MTU for sending") the
+ maximum IP datagram size that may be sent, for a particular
+ combination of IP source and destination addresses and perhaps
+ TOS.
+
+ A host MUST implement a mechanism to allow the transport layer
+ to learn MMS_S, the maximum transport-layer message size that
+ may be sent for a given {source, destination, TOS} triplet (see
+ GET_MAXSIZES call in Section 3.4). If no local fragmentation
+ is performed, the value of MMS_S will be:
+
+ MMS_S = EMTU_S - <IP header size>
+
+ and EMTU_S must be less than or equal to the MTU of the network
+ interface corresponding to the source address of the datagram.
+ Note that <IP header size> in this equation will be 20, unless
+ the IP reserves space to insert IP options for its own purposes
+ in addition to any options inserted by the transport layer.
+
+ A host that does not implement local fragmentation MUST ensure
+ that the transport layer (for TCP) or the application layer
+ (for UDP) obtains MMS_S from the IP layer and does not send a
+ datagram exceeding MMS_S in size.
+
+ It is generally desirable to avoid local fragmentation and to
+ choose EMTU_S low enough to avoid fragmentation in any gateway
+ along the path. In the absence of actual knowledge of the
+ minimum MTU along the path, the IP layer SHOULD use
+ EMTU_S <= 576 whenever the destination address is not on a
+ connected network, and otherwise use the connected network's
+
+
+
+Internet Engineering Task Force [Page 58]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ MTU.
+
+ The MTU of each physical interface MUST be configurable.
+
+ A host IP layer implementation MAY have a configuration flag
+ "All-Subnets-MTU", indicating that the MTU of the connected
+ network is to be used for destinations on different subnets
+ within the same network, but not for other networks. Thus,
+ this flag causes the network class mask, rather than the subnet
+ address mask, to be used to choose an EMTU_S. For a multihomed
+ host, an "All-Subnets-MTU" flag is needed for each network
+ interface.
+
+ DISCUSSION:
+ Picking the correct datagram size to use when sending data
+ is a complex topic [IP:9].
+
+ (a) In general, no host is required to accept an IP
+ datagram larger than 576 bytes (including header and
+ data), so a host must not send a larger datagram
+ without explicit knowledge or prior arrangement with
+ the destination host. Thus, MMS_S is only an upper
+ bound on the datagram size that a transport protocol
+ may send; even when MMS_S exceeds 556, the transport
+ layer must limit its messages to 556 bytes in the
+ absence of other knowledge about the destination
+ host.
+
+ (b) Some transport protocols (e.g., TCP) provide a way to
+ explicitly inform the sender about the largest
+ datagram the other end can receive and reassemble
+ [IP:7]. There is no corresponding mechanism in the
+ IP layer.
+
+ A transport protocol that assumes an EMTU_R larger
+ than 576 (see Section 3.3.2), can send a datagram of
+ this larger size to another host that implements the
+ same protocol.
+
+ (c) Hosts should ideally limit their EMTU_S for a given
+ destination to the minimum MTU of all the networks
+ along the path, to avoid any fragmentation. IP
+ fragmentation, while formally correct, can create a
+ serious transport protocol performance problem,
+ because loss of a single fragment means all the
+ fragments in the segment must be retransmitted
+ [IP:9].
+
+
+
+
+Internet Engineering Task Force [Page 59]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ Since nearly all networks in the Internet currently
+ support an MTU of 576 or greater, we strongly recommend
+ the use of 576 for datagrams sent to non-local networks.
+
+ It has been suggested that a host could determine the MTU
+ over a given path by sending a zero-offset datagram
+ fragment and waiting for the receiver to time out the
+ reassembly (which cannot complete!) and return an ICMP
+ Time Exceeded message. This message would include the
+ largest remaining fragment header in its body. More
+ direct mechanisms are being experimented with, but have
+ not yet been adopted (see e.g., RFC-1063).
+
+ 3.3.4 Local Multihoming
+
+ 3.3.4.1 Introduction
+
+ A multihomed host has multiple IP addresses, which we may
+ think of as "logical interfaces". These logical interfaces
+ may be associated with one or more physical interfaces, and
+ these physical interfaces may be connected to the same or
+ different networks.
+
+ Here are some important cases of multihoming:
+
+ (a) Multiple Logical Networks
+
+ The Internet architects envisioned that each physical
+ network would have a single unique IP network (or
+ subnet) number. However, LAN administrators have
+ sometimes found it useful to violate this assumption,
+ operating a LAN with multiple logical networks per
+ physical connected network.
+
+ If a host connected to such a physical network is
+ configured to handle traffic for each of N different
+ logical networks, then the host will have N logical
+ interfaces. These could share a single physical
+ interface, or might use N physical interfaces to the
+ same network.
+
+ (b) Multiple Logical Hosts
+
+ When a host has multiple IP addresses that all have the
+ same <Network-number> part (and the same <Subnet-
+ number> part, if any), the logical interfaces are known
+ as "logical hosts". These logical interfaces might
+ share a single physical interface or might use separate
+
+
+
+Internet Engineering Task Force [Page 60]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ physical interfaces to the same physical network.
+
+ (c) Simple Multihoming
+
+ In this case, each logical interface is mapped into a
+ separate physical interface and each physical interface
+ is connected to a different physical network. The term
+ "multihoming" was originally applied only to this case,
+ but it is now applied more generally.
+
+ A host with embedded gateway functionality will
+ typically fall into the simple multihoming case. Note,
+ however, that a host may be simply multihomed without
+ containing an embedded gateway, i.e., without
+ forwarding datagrams from one connected network to
+ another.
+
+ This case presents the most difficult routing problems.
+ The choice of interface (i.e., the choice of first-hop
+ network) may significantly affect performance or even
+ reachability of remote parts of the Internet.
+
+
+ Finally, we note another possibility that is NOT
+ multihoming: one logical interface may be bound to multiple
+ physical interfaces, in order to increase the reliability or
+ throughput between directly connected machines by providing
+ alternative physical paths between them. For instance, two
+ systems might be connected by multiple point-to-point links.
+ We call this "link-layer multiplexing". With link-layer
+ multiplexing, the protocols above the link layer are unaware
+ that multiple physical interfaces are present; the link-
+ layer device driver is responsible for multiplexing and
+ routing packets across the physical interfaces.
+
+ In the Internet protocol architecture, a transport protocol
+ instance ("entity") has no address of its own, but instead
+ uses a single Internet Protocol (IP) address. This has
+ implications for the IP, transport, and application layers,
+ and for the interfaces between them. In particular, the
+ application software may have to be aware of the multiple IP
+ addresses of a multihomed host; in other cases, the choice
+ can be made within the network software.
+
+ 3.3.4.2 Multihoming Requirements
+
+ The following general rules apply to the selection of an IP
+ source address for sending a datagram from a multihomed
+
+
+
+Internet Engineering Task Force [Page 61]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ host.
+
+ (1) If the datagram is sent in response to a received
+ datagram, the source address for the response SHOULD be
+ the specific-destination address of the request. See
+ Sections 4.1.3.5 and 4.2.3.7 and the "General Issues"
+ section of [INTRO:1] for more specific requirements on
+ higher layers.
+
+ Otherwise, a source address must be selected.
+
+ (2) An application MUST be able to explicitly specify the
+ source address for initiating a connection or a
+ request.
+
+ (3) In the absence of such a specification, the networking
+ software MUST choose a source address. Rules for this
+ choice are described below.
+
+
+ There are two key requirement issues related to multihoming:
+
+ (A) A host MAY silently discard an incoming datagram whose
+ destination address does not correspond to the physical
+ interface through which it is received.
+
+ (B) A host MAY restrict itself to sending (non-source-
+ routed) IP datagrams only through the physical
+ interface that corresponds to the IP source address of
+ the datagrams.
+
+
+ DISCUSSION:
+ Internet host implementors have used two different
+ conceptual models for multihoming, briefly summarized
+ in the following discussion. This document takes no
+ stand on which model is preferred; each seems to have a
+ place. This ambivalence is reflected in the issues (A)
+ and (B) being optional.
+
+ o Strong ES Model
+
+ The Strong ES (End System, i.e., host) model
+ emphasizes the host/gateway (ES/IS) distinction,
+ and would therefore substitute MUST for MAY in
+ issues (A) and (B) above. It tends to model a
+ multihomed host as a set of logical hosts within
+ the same physical host.
+
+
+
+Internet Engineering Task Force [Page 62]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ With respect to (A), proponents of the Strong ES
+ model note that automatic Internet routing
+ mechanisms could not route a datagram to a
+ physical interface that did not correspond to the
+ destination address.
+
+ Under the Strong ES model, the route computation
+ for an outgoing datagram is the mapping:
+
+ route(src IP addr, dest IP addr, TOS)
+ -> gateway
+
+ Here the source address is included as a parameter
+ in order to select a gateway that is directly
+ reachable on the corresponding physical interface.
+ Note that this model logically requires that in
+ general there be at least one default gateway, and
+ preferably multiple defaults, for each IP source
+ address.
+
+ o Weak ES Model
+
+ This view de-emphasizes the ES/IS distinction, and
+ would therefore substitute MUST NOT for MAY in
+ issues (A) and (B). This model may be the more
+ natural one for hosts that wiretap gateway routing
+ protocols, and is necessary for hosts that have
+ embedded gateway functionality.
+
+ The Weak ES Model may cause the Redirect mechanism
+ to fail. If a datagram is sent out a physical
+ interface that does not correspond to the
+ destination address, the first-hop gateway will
+ not realize when it needs to send a Redirect. On
+ the other hand, if the host has embedded gateway
+ functionality, then it has routing information
+ without listening to Redirects.
+
+ In the Weak ES model, the route computation for an
+ outgoing datagram is the mapping:
+
+ route(dest IP addr, TOS) -> gateway, interface
+
+
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 63]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ 3.3.4.3 Choosing a Source Address
+
+ DISCUSSION:
+ When it sends an initial connection request (e.g., a
+ TCP "SYN" segment) or a datagram service request (e.g.,
+ a UDP-based query), the transport layer on a multihomed
+ host needs to know which source address to use. If the
+ application does not specify it, the transport layer
+ must ask the IP layer to perform the conceptual
+ mapping:
+
+ GET_SRCADDR(remote IP addr, TOS)
+ -> local IP address
+
+ Here TOS is the Type-of-Service value (see Section
+ 3.2.1.6), and the result is the desired source address.
+ The following rules are suggested for implementing this
+ mapping:
+
+ (a) If the remote Internet address lies on one of the
+ (sub-) nets to which the host is directly
+ connected, a corresponding source address may be
+ chosen, unless the corresponding interface is
+ known to be down.
+
+ (b) The route cache may be consulted, to see if there
+ is an active route to the specified destination
+ network through any network interface; if so, a
+ local IP address corresponding to that interface
+ may be chosen.
+
+ (c) The table of static routes, if any (see Section
+ 3.3.1.2) may be similarly consulted.
+
+ (d) The default gateways may be consulted. If these
+ gateways are assigned to different interfaces, the
+ interface corresponding to the gateway with the
+ highest preference may be chosen.
+
+ In the future, there may be a defined way for a
+ multihomed host to ask the gateways on all connected
+ networks for advice about the best network to use for a
+ given destination.
+
+ IMPLEMENTATION:
+ It will be noted that this process is essentially the
+ same as datagram routing (see Section 3.3.1), and
+ therefore hosts may be able to combine the
+
+
+
+Internet Engineering Task Force [Page 64]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ implementation of the two functions.
+
+ 3.3.5 Source Route Forwarding
+
+ Subject to restrictions given below, a host MAY be able to act
+ as an intermediate hop in a source route, forwarding a source-
+ routed datagram to the next specified hop.
+
+ However, in performing this gateway-like function, the host
+ MUST obey all the relevant rules for a gateway forwarding
+ source-routed datagrams [INTRO:2]. This includes the following
+ specific provisions, which override the corresponding host
+ provisions given earlier in this document:
+
+ (A) TTL (ref. Section 3.2.1.7)
+
+ The TTL field MUST be decremented and the datagram perhaps
+ discarded as specified for a gateway in [INTRO:2].
+
+ (B) ICMP Destination Unreachable (ref. Section 3.2.2.1)
+
+ A host MUST be able to generate Destination Unreachable
+ messages with the following codes:
+
+ 4 (Fragmentation Required but DF Set) when a source-
+ routed datagram cannot be fragmented to fit into the
+ target network;
+
+ 5 (Source Route Failed) when a source-routed datagram
+ cannot be forwarded, e.g., because of a routing
+ problem or because the next hop of a strict source
+ route is not on a connected network.
+
+ (C) IP Source Address (ref. Section 3.2.1.3)
+
+ A source-routed datagram being forwarded MAY (and normally
+ will) have a source address that is not one of the IP
+ addresses of the forwarding host.
+
+ (D) Record Route Option (ref. Section 3.2.1.8d)
+
+ A host that is forwarding a source-routed datagram
+ containing a Record Route option MUST update that option,
+ if it has room.
+
+ (E) Timestamp Option (ref. Section 3.2.1.8e)
+
+ A host that is forwarding a source-routed datagram
+
+
+
+Internet Engineering Task Force [Page 65]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ containing a Timestamp Option MUST add the current
+ timestamp to that option, according to the rules for this
+ option.
+
+ To define the rules restricting host forwarding of source-
+ routed datagrams, we use the term "local source-routing" if the
+ next hop will be through the same physical interface through
+ which the datagram arrived; otherwise, it is "non-local
+ source-routing".
+
+ o A host is permitted to perform local source-routing
+ without restriction.
+
+ o A host that supports non-local source-routing MUST have a
+ configurable switch to disable forwarding, and this switch
+ MUST default to disabled.
+
+ o The host MUST satisfy all gateway requirements for
+ configurable policy filters [INTRO:2] restricting non-
+ local forwarding.
+
+ If a host receives a datagram with an incomplete source route
+ but does not forward it for some reason, the host SHOULD return
+ an ICMP Destination Unreachable (code 5, Source Route Failed)
+ message, unless the datagram was itself an ICMP error message.
+
+ 3.3.6 Broadcasts
+
+ Section 3.2.1.3 defined the four standard IP broadcast address
+ forms:
+
+ Limited Broadcast: {-1, -1}
+
+ Directed Broadcast: {<Network-number>,-1}
+
+ Subnet Directed Broadcast:
+ {<Network-number>,<Subnet-number>,-1}
+
+ All-Subnets Directed Broadcast: {<Network-number>,-1,-1}
+
+ A host MUST recognize any of these forms in the destination
+ address of an incoming datagram.
+
+ There is a class of hosts* that use non-standard broadcast
+ address forms, substituting 0 for -1. All hosts SHOULD
+_________________________
+*4.2BSD Unix and its derivatives, but not 4.3BSD.
+
+
+
+
+Internet Engineering Task Force [Page 66]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ recognize and accept any of these non-standard broadcast
+ addresses as the destination address of an incoming datagram.
+ A host MAY optionally have a configuration option to choose the
+ 0 or the -1 form of broadcast address, for each physical
+ interface, but this option SHOULD default to the standard (-1)
+ form.
+
+ When a host sends a datagram to a link-layer broadcast address,
+ the IP destination address MUST be a legal IP broadcast or IP
+ multicast address.
+
+ A host SHOULD silently discard a datagram that is received via
+ a link-layer broadcast (see Section 2.4) but does not specify
+ an IP multicast or broadcast destination address.
+
+ Hosts SHOULD use the Limited Broadcast address to broadcast to
+ a connected network.
+
+
+ DISCUSSION:
+ Using the Limited Broadcast address instead of a Directed
+ Broadcast address may improve system robustness. Problems
+ are often caused by machines that do not understand the
+ plethora of broadcast addresses (see Section 3.2.1.3), or
+ that may have different ideas about which broadcast
+ addresses are in use. The prime example of the latter is
+ machines that do not understand subnetting but are
+ attached to a subnetted net. Sending a Subnet Broadcast
+ for the connected network will confuse those machines,
+ which will see it as a message to some other host.
+
+ There has been discussion on whether a datagram addressed
+ to the Limited Broadcast address ought to be sent from all
+ the interfaces of a multihomed host. This specification
+ takes no stand on the issue.
+
+ 3.3.7 IP Multicasting
+
+ A host SHOULD support local IP multicasting on all connected
+ networks for which a mapping from Class D IP addresses to
+ link-layer addresses has been specified (see below). Support
+ for local IP multicasting includes sending multicast datagrams,
+ joining multicast groups and receiving multicast datagrams, and
+ leaving multicast groups. This implies support for all of
+ [IP:4] except the IGMP protocol itself, which is OPTIONAL.
+
+
+
+
+
+
+Internet Engineering Task Force [Page 67]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ DISCUSSION:
+ IGMP provides gateways that are capable of multicast
+ routing with the information required to support IP
+ multicasting across multiple networks. At this time,
+ multicast-routing gateways are in the experimental stage
+ and are not widely available. For hosts that are not
+ connected to networks with multicast-routing gateways or
+ that do not need to receive multicast datagrams
+ originating on other networks, IGMP serves no purpose and
+ is therefore optional for now. However, the rest of
+ [IP:4] is currently recommended for the purpose of
+ providing IP-layer access to local network multicast
+ addressing, as a preferable alternative to local broadcast
+ addressing. It is expected that IGMP will become
+ recommended at some future date, when multicast-routing
+ gateways have become more widely available.
+
+ If IGMP is not implemented, a host SHOULD still join the "all-
+ hosts" group (224.0.0.1) when the IP layer is initialized and
+ remain a member for as long as the IP layer is active.
+
+ DISCUSSION:
+ Joining the "all-hosts" group will support strictly local
+ uses of multicasting, e.g., a gateway discovery protocol,
+ even if IGMP is not implemented.
+
+ The mapping of IP Class D addresses to local addresses is
+ currently specified for the following types of networks:
+
+ o Ethernet/IEEE 802.3, as defined in [IP:4].
+
+ o Any network that supports broadcast but not multicast,
+ addressing: all IP Class D addresses map to the local
+ broadcast address.
+
+ o Any type of point-to-point link (e.g., SLIP or HDLC
+ links): no mapping required. All IP multicast datagrams
+ are sent as-is, inside the local framing.
+
+ Mappings for other types of networks will be specified in the
+ future.
+
+ A host SHOULD provide a way for higher-layer protocols or
+ applications to determine which of the host's connected
+ network(s) support IP multicast addressing.
+
+
+
+
+
+
+Internet Engineering Task Force [Page 68]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ 3.3.8 Error Reporting
+
+ Wherever practical, hosts MUST return ICMP error datagrams on
+ detection of an error, except in those cases where returning an
+ ICMP error message is specifically prohibited.
+
+ DISCUSSION:
+ A common phenomenon in datagram networks is the "black
+ hole disease": datagrams are sent out, but nothing comes
+ back. Without any error datagrams, it is difficult for
+ the user to figure out what the problem is.
+
+ 3.4 INTERNET/TRANSPORT LAYER INTERFACE
+
+ The interface between the IP layer and the transport layer MUST
+ provide full access to all the mechanisms of the IP layer,
+ including options, Type-of-Service, and Time-to-Live. The
+ transport layer MUST either have mechanisms to set these interface
+ parameters, or provide a path to pass them through from an
+ application, or both.
+
+ DISCUSSION:
+ Applications are urged to make use of these mechanisms where
+ applicable, even when the mechanisms are not currently
+ effective in the Internet (e.g., TOS). This will allow these
+ mechanisms to be immediately useful when they do become
+ effective, without a large amount of retrofitting of host
+ software.
+
+ We now describe a conceptual interface between the transport layer
+ and the IP layer, as a set of procedure calls. This is an
+ extension of the information in Section 3.3 of RFC-791 [IP:1].
+
+
+ * Send Datagram
+
+ SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt
+ => result )
+
+ where the parameters are defined in RFC-791. Passing an Id
+ parameter is optional; see Section 3.2.1.5.
+
+
+ * Receive Datagram
+
+ RECV(BufPTR, prot
+ => result, src, dst, SpecDest, TOS, len, opt)
+
+
+
+
+Internet Engineering Task Force [Page 69]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ All the parameters are defined in RFC-791, except for:
+
+ SpecDest = specific-destination address of datagram
+ (defined in Section 3.2.1.3)
+
+ The result parameter dst contains the datagram's destination
+ address. Since this may be a broadcast or multicast address,
+ the SpecDest parameter (not shown in RFC-791) MUST be passed.
+ The parameter opt contains all the IP options received in the
+ datagram; these MUST also be passed to the transport layer.
+
+
+ * Select Source Address
+
+ GET_SRCADDR(remote, TOS) -> local
+
+ remote = remote IP address
+ TOS = Type-of-Service
+ local = local IP address
+
+ See Section 3.3.4.3.
+
+
+ * Find Maximum Datagram Sizes
+
+ GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S
+
+ MMS_R = maximum receive transport-message size.
+ MMS_S = maximum send transport-message size.
+ (local, remote, TOS defined above)
+
+ See Sections 3.3.2 and 3.3.3.
+
+
+ * Advice on Delivery Success
+
+ ADVISE_DELIVPROB(sense, local, remote, TOS)
+
+ Here the parameter sense is a 1-bit flag indicating whether
+ positive or negative advice is being given; see the
+ discussion in Section 3.3.1.4. The other parameters were
+ defined earlier.
+
+
+ * Send ICMP Message
+
+ SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt)
+ -> result
+
+
+
+Internet Engineering Task Force [Page 70]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ (Parameters defined in RFC-791).
+
+ Passing an Id parameter is optional; see Section 3.2.1.5.
+ The transport layer MUST be able to send certain ICMP
+ messages: Port Unreachable or any of the query-type
+ messages. This function could be considered to be a special
+ case of the SEND() call, of course; we describe it separately
+ for clarity.
+
+
+ * Receive ICMP Message
+
+ RECV_ICMP(BufPTR ) -> result, src, dst, len, opt
+
+ (Parameters defined in RFC-791).
+
+ The IP layer MUST pass certain ICMP messages up to the
+ appropriate transport-layer routine. This function could be
+ considered to be a special case of the RECV() call, of
+ course; we describe it separately for clarity.
+
+ For an ICMP error message, the data that is passed up MUST
+ include the original Internet header plus all the octets of
+ the original message that are included in the ICMP message.
+ This data will be used by the transport layer to locate the
+ connection state information, if any.
+
+ In particular, the following ICMP messages are to be passed
+ up:
+
+ o Destination Unreachable
+
+ o Source Quench
+
+ o Echo Reply (to ICMP user interface, unless the Echo
+ Request originated in the IP layer)
+
+ o Timestamp Reply (to ICMP user interface)
+
+ o Time Exceeded
+
+
+ DISCUSSION:
+ In the future, there may be additions to this interface to
+ pass path data (see Section 3.3.1.3) between the IP and
+ transport layers.
+
+
+
+
+
+Internet Engineering Task Force [Page 71]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ 3.5 INTERNET LAYER REQUIREMENTS SUMMARY
+
+
+ | | | | |S| |
+ | | | | |H| |F
+ | | | | |O|M|o
+ | | |S| |U|U|o
+ | | |H| |L|S|t
+ | |M|O| |D|T|n
+ | |U|U|M| | |o
+ | |S|L|A|N|N|t
+ | |T|D|Y|O|O|t
+FEATURE |SECTION | | | |T|T|e
+-------------------------------------------------|--------|-|-|-|-|-|--
+ | | | | | | |
+Implement IP and ICMP |3.1 |x| | | | |
+Handle remote multihoming in application layer |3.1 |x| | | | |
+Support local multihoming |3.1 | | |x| | |
+Meet gateway specs if forward datagrams |3.1 |x| | | | |
+Configuration switch for embedded gateway |3.1 |x| | | | |1
+ Config switch default to non-gateway |3.1 |x| | | | |1
+ Auto-config based on number of interfaces |3.1 | | | | |x|1
+Able to log discarded datagrams |3.1 | |x| | | |
+ Record in counter |3.1 | |x| | | |
+ | | | | | | |
+Silently discard Version != 4 |3.2.1.1 |x| | | | |
+Verify IP checksum, silently discard bad dgram |3.2.1.2 |x| | | | |
+Addressing: | | | | | | |
+ Subnet addressing (RFC-950) |3.2.1.3 |x| | | | |
+ Src address must be host's own IP address |3.2.1.3 |x| | | | |
+ Silently discard datagram with bad dest addr |3.2.1.3 |x| | | | |
+ Silently discard datagram with bad src addr |3.2.1.3 |x| | | | |
+Support reassembly |3.2.1.4 |x| | | | |
+Retain same Id field in identical datagram |3.2.1.5 | | |x| | |
+ | | | | | | |
+TOS: | | | | | | |
+ Allow transport layer to set TOS |3.2.1.6 |x| | | | |
+ Pass received TOS up to transport layer |3.2.1.6 | |x| | | |
+ Use RFC-795 link-layer mappings for TOS |3.2.1.6 | | | |x| |
+TTL: | | | | | | |
+ Send packet with TTL of 0 |3.2.1.7 | | | | |x|
+ Discard received packets with TTL < 2 |3.2.1.7 | | | | |x|
+ Allow transport layer to set TTL |3.2.1.7 |x| | | | |
+ Fixed TTL is configurable |3.2.1.7 |x| | | | |
+ | | | | | | |
+IP Options: | | | | | | |
+ Allow transport layer to send IP options |3.2.1.8 |x| | | | |
+ Pass all IP options rcvd to higher layer |3.2.1.8 |x| | | | |
+
+
+
+Internet Engineering Task Force [Page 72]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ IP layer silently ignore unknown options |3.2.1.8 |x| | | | |
+ Security option |3.2.1.8a| | |x| | |
+ Send Stream Identifier option |3.2.1.8b| | | |x| |
+ Silently ignore Stream Identifer option |3.2.1.8b|x| | | | |
+ Record Route option |3.2.1.8d| | |x| | |
+ Timestamp option |3.2.1.8e| | |x| | |
+Source Route Option: | | | | | | |
+ Originate & terminate Source Route options |3.2.1.8c|x| | | | |
+ Datagram with completed SR passed up to TL |3.2.1.8c|x| | | | |
+ Build correct (non-redundant) return route |3.2.1.8c|x| | | | |
+ Send multiple SR options in one header |3.2.1.8c| | | | |x|
+ | | | | | | |
+ICMP: | | | | | | |
+ Silently discard ICMP msg with unknown type |3.2.2 |x| | | | |
+ Include more than 8 octets of orig datagram |3.2.2 | | |x| | |
+ Included octets same as received |3.2.2 |x| | | | |
+ Demux ICMP Error to transport protocol |3.2.2 |x| | | | |
+ Send ICMP error message with TOS=0 |3.2.2 | |x| | | |
+ Send ICMP error message for: | | | | | | |
+ - ICMP error msg |3.2.2 | | | | |x|
+ - IP b'cast or IP m'cast |3.2.2 | | | | |x|
+ - Link-layer b'cast |3.2.2 | | | | |x|
+ - Non-initial fragment |3.2.2 | | | | |x|
+ - Datagram with non-unique src address |3.2.2 | | | | |x|
+ Return ICMP error msgs (when not prohibited) |3.3.8 |x| | | | |
+ | | | | | | |
+ Dest Unreachable: | | | | | | |
+ Generate Dest Unreachable (code 2/3) |3.2.2.1 | |x| | | |
+ Pass ICMP Dest Unreachable to higher layer |3.2.2.1 |x| | | | |
+ Higher layer act on Dest Unreach |3.2.2.1 | |x| | | |
+ Interpret Dest Unreach as only hint |3.2.2.1 |x| | | | |
+ Redirect: | | | | | | |
+ Host send Redirect |3.2.2.2 | | | |x| |
+ Update route cache when recv Redirect |3.2.2.2 |x| | | | |
+ Handle both Host and Net Redirects |3.2.2.2 |x| | | | |
+ Discard illegal Redirect |3.2.2.2 | |x| | | |
+ Source Quench: | | | | | | |
+ Send Source Quench if buffering exceeded |3.2.2.3 | | |x| | |
+ Pass Source Quench to higher layer |3.2.2.3 |x| | | | |
+ Higher layer act on Source Quench |3.2.2.3 | |x| | | |
+ Time Exceeded: pass to higher layer |3.2.2.4 |x| | | | |
+ Parameter Problem: | | | | | | |
+ Send Parameter Problem messages |3.2.2.5 | |x| | | |
+ Pass Parameter Problem to higher layer |3.2.2.5 |x| | | | |
+ Report Parameter Problem to user |3.2.2.5 | | |x| | |
+ | | | | | | |
+ ICMP Echo Request or Reply: | | | | | | |
+ Echo server and Echo client |3.2.2.6 |x| | | | |
+
+
+
+Internet Engineering Task Force [Page 73]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ Echo client |3.2.2.6 | |x| | | |
+ Discard Echo Request to broadcast address |3.2.2.6 | | |x| | |
+ Discard Echo Request to multicast address |3.2.2.6 | | |x| | |
+ Use specific-dest addr as Echo Reply src |3.2.2.6 |x| | | | |
+ Send same data in Echo Reply |3.2.2.6 |x| | | | |
+ Pass Echo Reply to higher layer |3.2.2.6 |x| | | | |
+ Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |
+ Reverse and reflect Source Route option |3.2.2.6 |x| | | | |
+ | | | | | | |
+ ICMP Information Request or Reply: |3.2.2.7 | | | |x| |
+ ICMP Timestamp and Timestamp Reply: |3.2.2.8 | | |x| | |
+ Minimize delay variability |3.2.2.8 | |x| | | |1
+ Silently discard b'cast Timestamp |3.2.2.8 | | |x| | |1
+ Silently discard m'cast Timestamp |3.2.2.8 | | |x| | |1
+ Use specific-dest addr as TS Reply src |3.2.2.8 |x| | | | |1
+ Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |1
+ Reverse and reflect Source Route option |3.2.2.8 |x| | | | |1
+ Pass Timestamp Reply to higher layer |3.2.2.8 |x| | | | |1
+ Obey rules for "standard value" |3.2.2.8 |x| | | | |1
+ | | | | | | |
+ ICMP Address Mask Request and Reply: | | | | | | |
+ Addr Mask source configurable |3.2.2.9 |x| | | | |
+ Support static configuration of addr mask |3.2.2.9 |x| | | | |
+ Get addr mask dynamically during booting |3.2.2.9 | | |x| | |
+ Get addr via ICMP Addr Mask Request/Reply |3.2.2.9 | | |x| | |
+ Retransmit Addr Mask Req if no Reply |3.2.2.9 |x| | | | |3
+ Assume default mask if no Reply |3.2.2.9 | |x| | | |3
+ Update address mask from first Reply only |3.2.2.9 |x| | | | |3
+ Reasonableness check on Addr Mask |3.2.2.9 | |x| | | |
+ Send unauthorized Addr Mask Reply msgs |3.2.2.9 | | | | |x|
+ Explicitly configured to be agent |3.2.2.9 |x| | | | |
+ Static config=> Addr-Mask-Authoritative flag |3.2.2.9 | |x| | | |
+ Broadcast Addr Mask Reply when init. |3.2.2.9 |x| | | | |3
+ | | | | | | |
+ROUTING OUTBOUND DATAGRAMS: | | | | | | |
+ Use address mask in local/remote decision |3.3.1.1 |x| | | | |
+ Operate with no gateways on conn network |3.3.1.1 |x| | | | |
+ Maintain "route cache" of next-hop gateways |3.3.1.2 |x| | | | |
+ Treat Host and Net Redirect the same |3.3.1.2 | |x| | | |
+ If no cache entry, use default gateway |3.3.1.2 |x| | | | |
+ Support multiple default gateways |3.3.1.2 |x| | | | |
+ Provide table of static routes |3.3.1.2 | | |x| | |
+ Flag: route overridable by Redirects |3.3.1.2 | | |x| | |
+ Key route cache on host, not net address |3.3.1.3 | | |x| | |
+ Include TOS in route cache |3.3.1.3 | |x| | | |
+ | | | | | | |
+ Able to detect failure of next-hop gateway |3.3.1.4 |x| | | | |
+ Assume route is good forever |3.3.1.4 | | | |x| |
+
+
+
+Internet Engineering Task Force [Page 74]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ Ping gateways continuously |3.3.1.4 | | | | |x|
+ Ping only when traffic being sent |3.3.1.4 |x| | | | |
+ Ping only when no positive indication |3.3.1.4 |x| | | | |
+ Higher and lower layers give advice |3.3.1.4 | |x| | | |
+ Switch from failed default g'way to another |3.3.1.5 |x| | | | |
+ Manual method of entering config info |3.3.1.6 |x| | | | |
+ | | | | | | |
+REASSEMBLY and FRAGMENTATION: | | | | | | |
+ Able to reassemble incoming datagrams |3.3.2 |x| | | | |
+ At least 576 byte datagrams |3.3.2 |x| | | | |
+ EMTU_R configurable or indefinite |3.3.2 | |x| | | |
+ Transport layer able to learn MMS_R |3.3.2 |x| | | | |
+ Send ICMP Time Exceeded on reassembly timeout |3.3.2 |x| | | | |
+ Fixed reassembly timeout value |3.3.2 | |x| | | |
+ | | | | | | |
+ Pass MMS_S to higher layers |3.3.3 |x| | | | |
+ Local fragmentation of outgoing packets |3.3.3 | | |x| | |
+ Else don't send bigger than MMS_S |3.3.3 |x| | | | |
+ Send max 576 to off-net destination |3.3.3 | |x| | | |
+ All-Subnets-MTU configuration flag |3.3.3 | | |x| | |
+ | | | | | | |
+MULTIHOMING: | | | | | | |
+ Reply with same addr as spec-dest addr |3.3.4.2 | |x| | | |
+ Allow application to choose local IP addr |3.3.4.2 |x| | | | |
+ Silently discard d'gram in "wrong" interface |3.3.4.2 | | |x| | |
+ Only send d'gram through "right" interface |3.3.4.2 | | |x| | |4
+ | | | | | | |
+SOURCE-ROUTE FORWARDING: | | | | | | |
+ Forward datagram with Source Route option |3.3.5 | | |x| | |1
+ Obey corresponding gateway rules |3.3.5 |x| | | | |1
+ Update TTL by gateway rules |3.3.5 |x| | | | |1
+ Able to generate ICMP err code 4, 5 |3.3.5 |x| | | | |1
+ IP src addr not local host |3.3.5 | | |x| | |1
+ Update Timestamp, Record Route options |3.3.5 |x| | | | |1
+ Configurable switch for non-local SRing |3.3.5 |x| | | | |1
+ Defaults to OFF |3.3.5 |x| | | | |1
+ Satisfy gwy access rules for non-local SRing |3.3.5 |x| | | | |1
+ If not forward, send Dest Unreach (cd 5) |3.3.5 | |x| | | |2
+ | | | | | | |
+BROADCAST: | | | | | | |
+ Broadcast addr as IP source addr |3.2.1.3 | | | | |x|
+ Receive 0 or -1 broadcast formats OK |3.3.6 | |x| | | |
+ Config'ble option to send 0 or -1 b'cast |3.3.6 | | |x| | |
+ Default to -1 broadcast |3.3.6 | |x| | | |
+ Recognize all broadcast address formats |3.3.6 |x| | | | |
+ Use IP b'cast/m'cast addr in link-layer b'cast |3.3.6 |x| | | | |
+ Silently discard link-layer-only b'cast dg's |3.3.6 | |x| | | |
+ Use Limited Broadcast addr for connected net |3.3.6 | |x| | | |
+
+
+
+Internet Engineering Task Force [Page 75]
+
+
+
+
+RFC1122 INTERNET LAYER October 1989
+
+
+ | | | | | | |
+MULTICAST: | | | | | | |
+ Support local IP multicasting (RFC-1112) |3.3.7 | |x| | | |
+ Support IGMP (RFC-1112) |3.3.7 | | |x| | |
+ Join all-hosts group at startup |3.3.7 | |x| | | |
+ Higher layers learn i'face m'cast capability |3.3.7 | |x| | | |
+ | | | | | | |
+INTERFACE: | | | | | | |
+ Allow transport layer to use all IP mechanisms |3.4 |x| | | | |
+ Pass interface ident up to transport layer |3.4 |x| | | | |
+ Pass all IP options up to transport layer |3.4 |x| | | | |
+ Transport layer can send certain ICMP messages |3.4 |x| | | | |
+ Pass spec'd ICMP messages up to transp. layer |3.4 |x| | | | |
+ Include IP hdr+8 octets or more from orig. |3.4 |x| | | | |
+ Able to leap tall buildings at a single bound |3.5 | |x| | | |
+
+Footnotes:
+
+(1) Only if feature is implemented.
+
+(2) This requirement is overruled if datagram is an ICMP error message.
+
+(3) Only if feature is implemented and is configured "on".
+
+(4) Unless has embedded gateway functionality or is source routed.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 76]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- UDP October 1989
+
+
+4. TRANSPORT PROTOCOLS
+
+ 4.1 USER DATAGRAM PROTOCOL -- UDP
+
+ 4.1.1 INTRODUCTION
+
+ The User Datagram Protocol UDP [UDP:1] offers only a minimal
+ transport service -- non-guaranteed datagram delivery -- and
+ gives applications direct access to the datagram service of the
+ IP layer. UDP is used by applications that do not require the
+ level of service of TCP or that wish to use communications
+ services (e.g., multicast or broadcast delivery) not available
+ from TCP.
+
+ UDP is almost a null protocol; the only services it provides
+ over IP are checksumming of data and multiplexing by port
+ number. Therefore, an application program running over UDP
+ must deal directly with end-to-end communication problems that
+ a connection-oriented protocol would have handled -- e.g.,
+ retransmission for reliable delivery, packetization and
+ reassembly, flow control, congestion avoidance, etc., when
+ these are required. The fairly complex coupling between IP and
+ TCP will be mirrored in the coupling between UDP and many
+ applications using UDP.
+
+ 4.1.2 PROTOCOL WALK-THROUGH
+
+ There are no known errors in the specification of UDP.
+
+ 4.1.3 SPECIFIC ISSUES
+
+ 4.1.3.1 Ports
+
+ UDP well-known ports follow the same rules as TCP well-known
+ ports; see Section 4.2.2.1 below.
+
+ If a datagram arrives addressed to a UDP port for which
+ there is no pending LISTEN call, UDP SHOULD send an ICMP
+ Port Unreachable message.
+
+ 4.1.3.2 IP Options
+
+ UDP MUST pass any IP option that it receives from the IP
+ layer transparently to the application layer.
+
+ An application MUST be able to specify IP options to be sent
+ in its UDP datagrams, and UDP MUST pass these options to the
+ IP layer.
+
+
+
+Internet Engineering Task Force [Page 77]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- UDP October 1989
+
+
+ DISCUSSION:
+ At present, the only options that need be passed
+ through UDP are Source Route, Record Route, and Time
+ Stamp. However, new options may be defined in the
+ future, and UDP need not and should not make any
+ assumptions about the format or content of options it
+ passes to or from the application; an exception to this
+ might be an IP-layer security option.
+
+ An application based on UDP will need to obtain a
+ source route from a request datagram and supply a
+ reversed route for sending the corresponding reply.
+
+ 4.1.3.3 ICMP Messages
+
+ UDP MUST pass to the application layer all ICMP error
+ messages that it receives from the IP layer. Conceptually
+ at least, this may be accomplished with an upcall to the
+ ERROR_REPORT routine (see Section 4.2.4.1).
+
+ DISCUSSION:
+ Note that ICMP error messages resulting from sending a
+ UDP datagram are received asynchronously. A UDP-based
+ application that wants to receive ICMP error messages
+ is responsible for maintaining the state necessary to
+ demultiplex these messages when they arrive; for
+ example, the application may keep a pending receive
+ operation for this purpose. The application is also
+ responsible to avoid confusion from a delayed ICMP
+ error message resulting from an earlier use of the same
+ port(s).
+
+ 4.1.3.4 UDP Checksums
+
+ A host MUST implement the facility to generate and validate
+ UDP checksums. An application MAY optionally be able to
+ control whether a UDP checksum will be generated, but it
+ MUST default to checksumming on.
+
+ If a UDP datagram is received with a checksum that is non-
+ zero and invalid, UDP MUST silently discard the datagram.
+ An application MAY optionally be able to control whether UDP
+ datagrams without checksums should be discarded or passed to
+ the application.
+
+ DISCUSSION:
+ Some applications that normally run only across local
+ area networks have chosen to turn off UDP checksums for
+
+
+
+Internet Engineering Task Force [Page 78]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- UDP October 1989
+
+
+ efficiency. As a result, numerous cases of undetected
+ errors have been reported. The advisability of ever
+ turning off UDP checksumming is very controversial.
+
+ IMPLEMENTATION:
+ There is a common implementation error in UDP
+ checksums. Unlike the TCP checksum, the UDP checksum
+ is optional; the value zero is transmitted in the
+ checksum field of a UDP header to indicate the absence
+ of a checksum. If the transmitter really calculates a
+ UDP checksum of zero, it must transmit the checksum as
+ all 1's (65535). No special action is required at the
+ receiver, since zero and 65535 are equivalent in 1's
+ complement arithmetic.
+
+ 4.1.3.5 UDP Multihoming
+
+ When a UDP datagram is received, its specific-destination
+ address MUST be passed up to the application layer.
+
+ An application program MUST be able to specify the IP source
+ address to be used for sending a UDP datagram or to leave it
+ unspecified (in which case the networking software will
+ choose an appropriate source address). There SHOULD be a
+ way to communicate the chosen source address up to the
+ application layer (e.g, so that the application can later
+ receive a reply datagram only from the corresponding
+ interface).
+
+ DISCUSSION:
+ A request/response application that uses UDP should use
+ a source address for the response that is the same as
+ the specific destination address of the request. See
+ the "General Issues" section of [INTRO:1].
+
+ 4.1.3.6 Invalid Addresses
+
+ A UDP datagram received with an invalid IP source address
+ (e.g., a broadcast or multicast address) must be discarded
+ by UDP or by the IP layer (see Section 3.2.1.3).
+
+ When a host sends a UDP datagram, the source address MUST be
+ (one of) the IP address(es) of the host.
+
+ 4.1.4 UDP/APPLICATION LAYER INTERFACE
+
+ The application interface to UDP MUST provide the full services
+ of the IP/transport interface described in Section 3.4 of this
+
+
+
+Internet Engineering Task Force [Page 79]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- UDP October 1989
+
+
+ document. Thus, an application using UDP needs the functions
+ of the GET_SRCADDR(), GET_MAXSIZES(), ADVISE_DELIVPROB(), and
+ RECV_ICMP() calls described in Section 3.4. For example,
+ GET_MAXSIZES() can be used to learn the effective maximum UDP
+ maximum datagram size for a particular {interface,remote
+ host,TOS} triplet.
+
+ An application-layer program MUST be able to set the TTL and
+ TOS values as well as IP options for sending a UDP datagram,
+ and these values must be passed transparently to the IP layer.
+ UDP MAY pass the received TOS up to the application layer.
+
+ 4.1.5 UDP REQUIREMENTS SUMMARY
+
+
+ | | | | |S| |
+ | | | | |H| |F
+ | | | | |O|M|o
+ | | |S| |U|U|o
+ | | |H| |L|S|t
+ | |M|O| |D|T|n
+ | |U|U|M| | |o
+ | |S|L|A|N|N|t
+ | |T|D|Y|O|O|t
+FEATURE |SECTION | | | |T|T|e
+-------------------------------------------------|--------|-|-|-|-|-|--
+ | | | | | | |
+ UDP | | | | | | |
+-------------------------------------------------|--------|-|-|-|-|-|--
+ | | | | | | |
+UDP send Port Unreachable |4.1.3.1 | |x| | | |
+ | | | | | | |
+IP Options in UDP | | | | | | |
+ - Pass rcv'd IP options to applic layer |4.1.3.2 |x| | | | |
+ - Applic layer can specify IP options in Send |4.1.3.2 |x| | | | |
+ - UDP passes IP options down to IP layer |4.1.3.2 |x| | | | |
+ | | | | | | |
+Pass ICMP msgs up to applic layer |4.1.3.3 |x| | | | |
+ | | | | | | |
+UDP checksums: | | | | | | |
+ - Able to generate/check checksum |4.1.3.4 |x| | | | |
+ - Silently discard bad checksum |4.1.3.4 |x| | | | |
+ - Sender Option to not generate checksum |4.1.3.4 | | |x| | |
+ - Default is to checksum |4.1.3.4 |x| | | | |
+ - Receiver Option to require checksum |4.1.3.4 | | |x| | |
+ | | | | | | |
+UDP Multihoming | | | | | | |
+ - Pass spec-dest addr to application |4.1.3.5 |x| | | | |
+
+
+
+Internet Engineering Task Force [Page 80]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- UDP October 1989
+
+
+ - Applic layer can specify Local IP addr |4.1.3.5 |x| | | | |
+ - Applic layer specify wild Local IP addr |4.1.3.5 |x| | | | |
+ - Applic layer notified of Local IP addr used |4.1.3.5 | |x| | | |
+ | | | | | | |
+Bad IP src addr silently discarded by UDP/IP |4.1.3.6 |x| | | | |
+Only send valid IP source address |4.1.3.6 |x| | | | |
+UDP Application Interface Services | | | | | | |
+Full IP interface of 3.4 for application |4.1.4 |x| | | | |
+ - Able to spec TTL, TOS, IP opts when send dg |4.1.4 |x| | | | |
+ - Pass received TOS up to applic layer |4.1.4 | | |x| | |
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 81]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP
+
+ 4.2.1 INTRODUCTION
+
+ The Transmission Control Protocol TCP [TCP:1] is the primary
+ virtual-circuit transport protocol for the Internet suite. TCP
+ provides reliable, in-sequence delivery of a full-duplex stream
+ of octets (8-bit bytes). TCP is used by those applications
+ needing reliable, connection-oriented transport service, e.g.,
+ mail (SMTP), file transfer (FTP), and virtual terminal service
+ (Telnet); requirements for these application-layer protocols
+ are described in [INTRO:1].
+
+ 4.2.2 PROTOCOL WALK-THROUGH
+
+ 4.2.2.1 Well-Known Ports: RFC-793 Section 2.7
+
+ DISCUSSION:
+ TCP reserves port numbers in the range 0-255 for
+ "well-known" ports, used to access services that are
+ standardized across the Internet. The remainder of the
+ port space can be freely allocated to application
+ processes. Current well-known port definitions are
+ listed in the RFC entitled "Assigned Numbers"
+ [INTRO:6]. A prerequisite for defining a new well-
+ known port is an RFC documenting the proposed service
+ in enough detail to allow new implementations.
+
+ Some systems extend this notion by adding a third
+ subdivision of the TCP port space: reserved ports,
+ which are generally used for operating-system-specific
+ services. For example, reserved ports might fall
+ between 256 and some system-dependent upper limit.
+ Some systems further choose to protect well-known and
+ reserved ports by permitting only privileged users to
+ open TCP connections with those port values. This is
+ perfectly reasonable as long as the host does not
+ assume that all hosts protect their low-numbered ports
+ in this manner.
+
+ 4.2.2.2 Use of Push: RFC-793 Section 2.8
+
+ When an application issues a series of SEND calls without
+ setting the PUSH flag, the TCP MAY aggregate the data
+ internally without sending it. Similarly, when a series of
+ segments is received without the PSH bit, a TCP MAY queue
+ the data internally without passing it to the receiving
+ application.
+
+
+
+Internet Engineering Task Force [Page 82]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ The PSH bit is not a record marker and is independent of
+ segment boundaries. The transmitter SHOULD collapse
+ successive PSH bits when it packetizes data, to send the
+ largest possible segment.
+
+ A TCP MAY implement PUSH flags on SEND calls. If PUSH flags
+ are not implemented, then the sending TCP: (1) must not
+ buffer data indefinitely, and (2) MUST set the PSH bit in
+ the last buffered segment (i.e., when there is no more
+ queued data to be sent).
+
+ The discussion in RFC-793 on pages 48, 50, and 74
+ erroneously implies that a received PSH flag must be passed
+ to the application layer. Passing a received PSH flag to
+ the application layer is now OPTIONAL.
+
+ An application program is logically required to set the PUSH
+ flag in a SEND call whenever it needs to force delivery of
+ the data to avoid a communication deadlock. However, a TCP
+ SHOULD send a maximum-sized segment whenever possible, to
+ improve performance (see Section 4.2.3.4).
+
+ DISCUSSION:
+ When the PUSH flag is not implemented on SEND calls,
+ i.e., when the application/TCP interface uses a pure
+ streaming model, responsibility for aggregating any
+ tiny data fragments to form reasonable sized segments
+ is partially borne by the application layer.
+
+ Generally, an interactive application protocol must set
+ the PUSH flag at least in the last SEND call in each
+ command or response sequence. A bulk transfer protocol
+ like FTP should set the PUSH flag on the last segment
+ of a file or when necessary to prevent buffer deadlock.
+
+ At the receiver, the PSH bit forces buffered data to be
+ delivered to the application (even if less than a full
+ buffer has been received). Conversely, the lack of a
+ PSH bit can be used to avoid unnecessary wakeup calls
+ to the application process; this can be an important
+ performance optimization for large timesharing hosts.
+ Passing the PSH bit to the receiving application allows
+ an analogous optimization within the application.
+
+ 4.2.2.3 Window Size: RFC-793 Section 3.1
+
+ The window size MUST be treated as an unsigned number, or
+ else large window sizes will appear like negative windows
+
+
+
+Internet Engineering Task Force [Page 83]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ and TCP will not work. It is RECOMMENDED that
+ implementations reserve 32-bit fields for the send and
+ receive window sizes in the connection record and do all
+ window computations with 32 bits.
+
+ DISCUSSION:
+ It is known that the window field in the TCP header is
+ too small for high-speed, long-delay paths.
+ Experimental TCP options have been defined to extend
+ the window size; see for example [TCP:11]. In
+ anticipation of the adoption of such an extension, TCP
+ implementors should treat windows as 32 bits.
+
+ 4.2.2.4 Urgent Pointer: RFC-793 Section 3.1
+
+ The second sentence is in error: the urgent pointer points
+ to the sequence number of the LAST octet (not LAST+1) in a
+ sequence of urgent data. The description on page 56 (last
+ sentence) is correct.
+
+ A TCP MUST support a sequence of urgent data of any length.
+
+ A TCP MUST inform the application layer asynchronously
+ whenever it receives an Urgent pointer and there was
+ previously no pending urgent data, or whenever the Urgent
+ pointer advances in the data stream. There MUST be a way
+ for the application to learn how much urgent data remains to
+ be read from the connection, or at least to determine
+ whether or not more urgent data remains to be read.
+
+ DISCUSSION:
+ Although the Urgent mechanism may be used for any
+ application, it is normally used to send "interrupt"-
+ type commands to a Telnet program (see "Using Telnet
+ Synch Sequence" section in [INTRO:1]).
+
+ The asynchronous or "out-of-band" notification will
+ allow the application to go into "urgent mode", reading
+ data from the TCP connection. This allows control
+ commands to be sent to an application whose normal
+ input buffers are full of unprocessed data.
+
+ IMPLEMENTATION:
+ The generic ERROR-REPORT() upcall described in Section
+ 4.2.4.1 is a possible mechanism for informing the
+ application of the arrival of urgent data.
+
+
+
+
+
+Internet Engineering Task Force [Page 84]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ 4.2.2.5 TCP Options: RFC-793 Section 3.1
+
+ A TCP MUST be able to receive a TCP option in any segment.
+ A TCP MUST ignore without error any TCP option it does not
+ implement, assuming that the option has a length field (all
+ TCP options defined in the future will have length fields).
+ TCP MUST be prepared to handle an illegal option length
+ (e.g., zero) without crashing; a suggested procedure is to
+ reset the connection and log the reason.
+
+ 4.2.2.6 Maximum Segment Size Option: RFC-793 Section 3.1
+
+ TCP MUST implement both sending and receiving the Maximum
+ Segment Size option [TCP:4].
+
+ TCP SHOULD send an MSS (Maximum Segment Size) option in
+ every SYN segment when its receive MSS differs from the
+ default 536, and MAY send it always.
+
+ If an MSS option is not received at connection setup, TCP
+ MUST assume a default send MSS of 536 (576-40) [TCP:4].
+
+ The maximum size of a segment that TCP really sends, the
+ "effective send MSS," MUST be the smaller of the send MSS
+ (which reflects the available reassembly buffer size at the
+ remote host) and the largest size permitted by the IP layer:
+
+ Eff.snd.MSS =
+
+ min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize
+
+ where:
+
+ * SendMSS is the MSS value received from the remote host,
+ or the default 536 if no MSS option is received.
+
+ * MMS_S is the maximum size for a transport-layer message
+ that TCP may send.
+
+ * TCPhdrsize is the size of the TCP header; this is
+ normally 20, but may be larger if TCP options are to be
+ sent.
+
+ * IPoptionsize is the size of any IP options that TCP
+ will pass to the IP layer with the current message.
+
+
+ The MSS value to be sent in an MSS option must be less than
+
+
+
+Internet Engineering Task Force [Page 85]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ or equal to:
+
+ MMS_R - 20
+
+ where MMS_R is the maximum size for a transport-layer
+ message that can be received (and reassembled). TCP obtains
+ MMS_R and MMS_S from the IP layer; see the generic call
+ GET_MAXSIZES in Section 3.4.
+
+ DISCUSSION:
+ The choice of TCP segment size has a strong effect on
+ performance. Larger segments increase throughput by
+ amortizing header size and per-datagram processing
+ overhead over more data bytes; however, if the packet
+ is so large that it causes IP fragmentation, efficiency
+ drops sharply if any fragments are lost [IP:9].
+
+ Some TCP implementations send an MSS option only if the
+ destination host is on a non-connected network.
+ However, in general the TCP layer may not have the
+ appropriate information to make this decision, so it is
+ preferable to leave to the IP layer the task of
+ determining a suitable MTU for the Internet path. We
+ therefore recommend that TCP always send the option (if
+ not 536) and that the IP layer determine MMS_R as
+ specified in 3.3.3 and 3.4. A proposed IP-layer
+ mechanism to measure the MTU would then modify the IP
+ layer without changing TCP.
+
+ 4.2.2.7 TCP Checksum: RFC-793 Section 3.1
+
+ Unlike the UDP checksum (see Section 4.1.3.4), the TCP
+ checksum is never optional. The sender MUST generate it and
+ the receiver MUST check it.
+
+ 4.2.2.8 TCP Connection State Diagram: RFC-793 Section 3.2,
+ page 23
+
+ There are several problems with this diagram:
+
+ (a) The arrow from SYN-SENT to SYN-RCVD should be labeled
+ with "snd SYN,ACK", to agree with the text on page 68
+ and with Figure 8.
+
+ (b) There could be an arrow from SYN-RCVD state to LISTEN
+ state, conditioned on receiving a RST after a passive
+ open (see text page 70).
+
+
+
+
+Internet Engineering Task Force [Page 86]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ (c) It is possible to go directly from FIN-WAIT-1 to the
+ TIME-WAIT state (see page 75 of the spec).
+
+
+ 4.2.2.9 Initial Sequence Number Selection: RFC-793 Section
+ 3.3, page 27
+
+ A TCP MUST use the specified clock-driven selection of
+ initial sequence numbers.
+
+ 4.2.2.10 Simultaneous Open Attempts: RFC-793 Section 3.4, page
+ 32
+
+ There is an error in Figure 8: the packet on line 7 should
+ be identical to the packet on line 5.
+
+ A TCP MUST support simultaneous open attempts.
+
+ DISCUSSION:
+ It sometimes surprises implementors that if two
+ applications attempt to simultaneously connect to each
+ other, only one connection is generated instead of two.
+ This was an intentional design decision; don't try to
+ "fix" it.
+
+ 4.2.2.11 Recovery from Old Duplicate SYN: RFC-793 Section 3.4,
+ page 33
+
+ Note that a TCP implementation MUST keep track of whether a
+ connection has reached SYN_RCVD state as the result of a
+ passive OPEN or an active OPEN.
+
+ 4.2.2.12 RST Segment: RFC-793 Section 3.4
+
+ A TCP SHOULD allow a received RST segment to include data.
+
+ DISCUSSION
+ It has been suggested that a RST segment could contain
+ ASCII text that encoded and explained the cause of the
+ RST. No standard has yet been established for such
+ data.
+
+ 4.2.2.13 Closing a Connection: RFC-793 Section 3.5
+
+ A TCP connection may terminate in two ways: (1) the normal
+ TCP close sequence using a FIN handshake, and (2) an "abort"
+ in which one or more RST segments are sent and the
+ connection state is immediately discarded. If a TCP
+
+
+
+Internet Engineering Task Force [Page 87]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ connection is closed by the remote site, the local
+ application MUST be informed whether it closed normally or
+ was aborted.
+
+ The normal TCP close sequence delivers buffered data
+ reliably in both directions. Since the two directions of a
+ TCP connection are closed independently, it is possible for
+ a connection to be "half closed," i.e., closed in only one
+ direction, and a host is permitted to continue sending data
+ in the open direction on a half-closed connection.
+
+ A host MAY implement a "half-duplex" TCP close sequence, so
+ that an application that has called CLOSE cannot continue to
+ read data from the connection. If such a host issues a
+ CLOSE call while received data is still pending in TCP, or
+ if new data is received after CLOSE is called, its TCP
+ SHOULD send a RST to show that data was lost.
+
+ When a connection is closed actively, it MUST linger in
+ TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
+ However, it MAY accept a new SYN from the remote TCP to
+ reopen the connection directly from TIME-WAIT state, if it:
+
+ (1) assigns its initial sequence number for the new
+ connection to be larger than the largest sequence
+ number it used on the previous connection incarnation,
+ and
+
+ (2) returns to TIME-WAIT state if the SYN turns out to be
+ an old duplicate.
+
+
+ DISCUSSION:
+ TCP's full-duplex data-preserving close is a feature
+ that is not included in the analogous ISO transport
+ protocol TP4.
+
+ Some systems have not implemented half-closed
+ connections, presumably because they do not fit into
+ the I/O model of their particular operating system. On
+ these systems, once an application has called CLOSE, it
+ can no longer read input data from the connection; this
+ is referred to as a "half-duplex" TCP close sequence.
+
+ The graceful close algorithm of TCP requires that the
+ connection state remain defined on (at least) one end
+ of the connection, for a timeout period of 2xMSL, i.e.,
+ 4 minutes. During this period, the (remote socket,
+
+
+
+Internet Engineering Task Force [Page 88]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ local socket) pair that defines the connection is busy
+ and cannot be reused. To shorten the time that a given
+ port pair is tied up, some TCPs allow a new SYN to be
+ accepted in TIME-WAIT state.
+
+ 4.2.2.14 Data Communication: RFC-793 Section 3.7, page 40
+
+ Since RFC-793 was written, there has been extensive work on
+ TCP algorithms to achieve efficient data communication.
+ Later sections of the present document describe required and
+ recommended TCP algorithms to determine when to send data
+ (Section 4.2.3.4), when to send an acknowledgment (Section
+ 4.2.3.2), and when to update the window (Section 4.2.3.3).
+
+ DISCUSSION:
+ One important performance issue is "Silly Window
+ Syndrome" or "SWS" [TCP:5], a stable pattern of small
+ incremental window movements resulting in extremely
+ poor TCP performance. Algorithms to avoid SWS are
+ described below for both the sending side (Section
+ 4.2.3.4) and the receiving side (Section 4.2.3.3).
+
+ In brief, SWS is caused by the receiver advancing the
+ right window edge whenever it has any new buffer space
+ available to receive data and by the sender using any
+ incremental window, no matter how small, to send more
+ data [TCP:5]. The result can be a stable pattern of
+ sending tiny data segments, even though both sender and
+ receiver have a large total buffer space for the
+ connection. SWS can only occur during the transmission
+ of a large amount of data; if the connection goes
+ quiescent, the problem will disappear. It is caused by
+ typical straightforward implementation of window
+ management, but the sender and receiver algorithms
+ given below will avoid it.
+
+ Another important TCP performance issue is that some
+ applications, especially remote login to character-at-
+ a-time hosts, tend to send streams of one-octet data
+ segments. To avoid deadlocks, every TCP SEND call from
+ such applications must be "pushed", either explicitly
+ by the application or else implicitly by TCP. The
+ result may be a stream of TCP segments that contain one
+ data octet each, which makes very inefficient use of
+ the Internet and contributes to Internet congestion.
+ The Nagle Algorithm described in Section 4.2.3.4
+ provides a simple and effective solution to this
+ problem. It does have the effect of clumping
+
+
+
+Internet Engineering Task Force [Page 89]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ characters over Telnet connections; this may initially
+ surprise users accustomed to single-character echo, but
+ user acceptance has not been a problem.
+
+ Note that the Nagle algorithm and the send SWS
+ avoidance algorithm play complementary roles in
+ improving performance. The Nagle algorithm discourages
+ sending tiny segments when the data to be sent
+ increases in small increments, while the SWS avoidance
+ algorithm discourages small segments resulting from the
+ right window edge advancing in small increments.
+
+ A careless implementation can send two or more
+ acknowledgment segments per data segment received. For
+ example, suppose the receiver acknowledges every data
+ segment immediately. When the application program
+ subsequently consumes the data and increases the
+ available receive buffer space again, the receiver may
+ send a second acknowledgment segment to update the
+ window at the sender. The extreme case occurs with
+ single-character segments on TCP connections using the
+ Telnet protocol for remote login service. Some
+ implementations have been observed in which each
+ incoming 1-character segment generates three return
+ segments: (1) the acknowledgment, (2) a one byte
+ increase in the window, and (3) the echoed character,
+ respectively.
+
+ 4.2.2.15 Retransmission Timeout: RFC-793 Section 3.7, page 41
+
+ The algorithm suggested in RFC-793 for calculating the
+ retransmission timeout is now known to be inadequate; see
+ Section 4.2.3.1 below.
+
+ Recent work by Jacobson [TCP:7] on Internet congestion and
+ TCP retransmission stability has produced a transmission
+ algorithm combining "slow start" with "congestion
+ avoidance". A TCP MUST implement this algorithm.
+
+ If a retransmitted packet is identical to the original
+ packet (which implies not only that the data boundaries have
+ not changed, but also that the window and acknowledgment
+ fields of the header have not changed), then the same IP
+ Identification field MAY be used (see Section 3.2.1.5).
+
+ IMPLEMENTATION:
+ Some TCP implementors have chosen to "packetize" the
+ data stream, i.e., to pick segment boundaries when
+
+
+
+Internet Engineering Task Force [Page 90]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ segments are originally sent and to queue these
+ segments in a "retransmission queue" until they are
+ acknowledged. Another design (which may be simpler) is
+ to defer packetizing until each time data is
+ transmitted or retransmitted, so there will be no
+ segment retransmission queue.
+
+ In an implementation with a segment retransmission
+ queue, TCP performance may be enhanced by repacketizing
+ the segments awaiting acknowledgment when the first
+ retransmission timeout occurs. That is, the
+ outstanding segments that fitted would be combined into
+ one maximum-sized segment, with a new IP Identification
+ value. The TCP would then retain this combined segment
+ in the retransmit queue until it was acknowledged.
+ However, if the first two segments in the
+ retransmission queue totalled more than one maximum-
+ sized segment, the TCP would retransmit only the first
+ segment using the original IP Identification field.
+
+ 4.2.2.16 Managing the Window: RFC-793 Section 3.7, page 41
+
+ A TCP receiver SHOULD NOT shrink the window, i.e., move the
+ right window edge to the left. However, a sending TCP MUST
+ be robust against window shrinking, which may cause the
+ "useable window" (see Section 4.2.3.4) to become negative.
+
+ If this happens, the sender SHOULD NOT send new data, but
+ SHOULD retransmit normally the old unacknowledged data
+ between SND.UNA and SND.UNA+SND.WND. The sender MAY also
+ retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT
+ time out the connection if data beyond the right window edge
+ is not acknowledged. If the window shrinks to zero, the TCP
+ MUST probe it in the standard way (see next Section).
+
+ DISCUSSION:
+ Many TCP implementations become confused if the window
+ shrinks from the right after data has been sent into a
+ larger window. Note that TCP has a heuristic to select
+ the latest window update despite possible datagram
+ reordering; as a result, it may ignore a window update
+ with a smaller window than previously offered if
+ neither the sequence number nor the acknowledgment
+ number is increased.
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 91]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ 4.2.2.17 Probing Zero Windows: RFC-793 Section 3.7, page 42
+
+ Probing of zero (offered) windows MUST be supported.
+
+ A TCP MAY keep its offered receive window closed
+ indefinitely. As long as the receiving TCP continues to
+ send acknowledgments in response to the probe segments, the
+ sending TCP MUST allow the connection to stay open.
+
+ DISCUSSION:
+ It is extremely important to remember that ACK
+ (acknowledgment) segments that contain no data are not
+ reliably transmitted by TCP. If zero window probing is
+ not supported, a connection may hang forever when an
+ ACK segment that re-opens the window is lost.
+
+ The delay in opening a zero window generally occurs
+ when the receiving application stops taking data from
+ its TCP. For example, consider a printer daemon
+ application, stopped because the printer ran out of
+ paper.
+
+ The transmitting host SHOULD send the first zero-window
+ probe when a zero window has existed for the retransmission
+ timeout period (see Section 4.2.2.15), and SHOULD increase
+ exponentially the interval between successive probes.
+
+ DISCUSSION:
+ This procedure minimizes delay if the zero-window
+ condition is due to a lost ACK segment containing a
+ window-opening update. Exponential backoff is
+ recommended, possibly with some maximum interval not
+ specified here. This procedure is similar to that of
+ the retransmission algorithm, and it may be possible to
+ combine the two procedures in the implementation.
+
+ 4.2.2.18 Passive OPEN Calls: RFC-793 Section 3.8
+
+ Every passive OPEN call either creates a new connection
+ record in LISTEN state, or it returns an error; it MUST NOT
+ affect any previously created connection record.
+
+ A TCP that supports multiple concurrent users MUST provide
+ an OPEN call that will functionally allow an application to
+ LISTEN on a port while a connection block with the same
+ local port is in SYN-SENT or SYN-RECEIVED state.
+
+ DISCUSSION:
+
+
+
+Internet Engineering Task Force [Page 92]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ Some applications (e.g., SMTP servers) may need to
+ handle multiple connection attempts at about the same
+ time. The probability of a connection attempt failing
+ is reduced by giving the application some means of
+ listening for a new connection at the same time that an
+ earlier connection attempt is going through the three-
+ way handshake.
+
+ IMPLEMENTATION:
+ Acceptable implementations of concurrent opens may
+ permit multiple passive OPEN calls, or they may allow
+ "cloning" of LISTEN-state connections from a single
+ passive OPEN call.
+
+ 4.2.2.19 Time to Live: RFC-793 Section 3.9, page 52
+
+ RFC-793 specified that TCP was to request the IP layer to
+ send TCP segments with TTL = 60. This is obsolete; the TTL
+ value used to send TCP segments MUST be configurable. See
+ Section 3.2.1.7 for discussion.
+
+ 4.2.2.20 Event Processing: RFC-793 Section 3.9
+
+ While it is not strictly required, a TCP SHOULD be capable
+ of queueing out-of-order TCP segments. Change the "may" in
+ the last sentence of the first paragraph on page 70 to
+ "should".
+
+ DISCUSSION:
+ Some small-host implementations have omitted segment
+ queueing because of limited buffer space. This
+ omission may be expected to adversely affect TCP
+ throughput, since loss of a single segment causes all
+ later segments to appear to be "out of sequence".
+
+ In general, the processing of received segments MUST be
+ implemented to aggregate ACK segments whenever possible.
+ For example, if the TCP is processing a series of queued
+ segments, it MUST process them all before sending any ACK
+ segments.
+
+ Here are some detailed error corrections and notes on the
+ Event Processing section of RFC-793.
+
+ (a) CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK
+ state, not CLOSING.
+
+ (b) LISTEN state, check for SYN (pp. 65, 66): With a SYN
+
+
+
+Internet Engineering Task Force [Page 93]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ bit, if the security/compartment or the precedence is
+ wrong for the segment, a reset is sent. The wrong form
+ of reset is shown in the text; it should be:
+
+ <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK>
+
+
+ (c) SYN-SENT state, Check for SYN, p. 68: When the
+ connection enters ESTABLISHED state, the following
+ variables must be set:
+ SND.WND <- SEG.WND
+ SND.WL1 <- SEG.SEQ
+ SND.WL2 <- SEG.ACK
+
+
+ (d) Check security and precedence, p. 71: The first heading
+ "ESTABLISHED STATE" should really be a list of all
+ states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT-
+ 1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and
+ TIME-WAIT.
+
+ (e) Check SYN bit, p. 71: "In SYN-RECEIVED state and if
+ the connection was initiated with a passive OPEN, then
+ return this connection to the LISTEN state and return.
+ Otherwise...".
+
+ (f) Check ACK field, SYN-RECEIVED state, p. 72: When the
+ connection enters ESTABLISHED state, the variables
+ listed in (c) must be set.
+
+ (g) Check ACK field, ESTABLISHED state, p. 72: The ACK is a
+ duplicate if SEG.ACK =< SND.UNA (the = was omitted).
+ Similarly, the window should be updated if: SND.UNA =<
+ SEG.ACK =< SND.NXT.
+
+ (h) USER TIMEOUT, p. 77:
+
+ It would be better to notify the application of the
+ timeout rather than letting TCP force the connection
+ closed. However, see also Section 4.2.3.5.
+
+
+ 4.2.2.21 Acknowledging Queued Segments: RFC-793 Section 3.9
+
+ A TCP MAY send an ACK segment acknowledging RCV.NXT when a
+ valid segment arrives that is in the window but not at the
+ left window edge.
+
+
+
+
+Internet Engineering Task Force [Page 94]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ DISCUSSION:
+ RFC-793 (see page 74) was ambiguous about whether or
+ not an ACK segment should be sent when an out-of-order
+ segment was received, i.e., when SEG.SEQ was unequal to
+ RCV.NXT.
+
+ One reason for ACKing out-of-order segments might be to
+ support an experimental algorithm known as "fast
+ retransmit". With this algorithm, the sender uses the
+ "redundant" ACK's to deduce that a segment has been
+ lost before the retransmission timer has expired. It
+ counts the number of times an ACK has been received
+ with the same value of SEG.ACK and with the same right
+ window edge. If more than a threshold number of such
+ ACK's is received, then the segment containing the
+ octets starting at SEG.ACK is assumed to have been lost
+ and is retransmitted, without awaiting a timeout. The
+ threshold is chosen to compensate for the maximum
+ likely segment reordering in the Internet. There is
+ not yet enough experience with the fast retransmit
+ algorithm to determine how useful it is.
+
+ 4.2.3 SPECIFIC ISSUES
+
+ 4.2.3.1 Retransmission Timeout Calculation
+
+ A host TCP MUST implement Karn's algorithm and Jacobson's
+ algorithm for computing the retransmission timeout ("RTO").
+
+ o Jacobson's algorithm for computing the smoothed round-
+ trip ("RTT") time incorporates a simple measure of the
+ variance [TCP:7].
+
+ o Karn's algorithm for selecting RTT measurements ensures
+ that ambiguous round-trip times will not corrupt the
+ calculation of the smoothed round-trip time [TCP:6].
+
+ This implementation also MUST include "exponential backoff"
+ for successive RTO values for the same segment.
+ Retransmission of SYN segments SHOULD use the same algorithm
+ as data segments.
+
+ DISCUSSION:
+ There were two known problems with the RTO calculations
+ specified in RFC-793. First, the accurate measurement
+ of RTTs is difficult when there are retransmissions.
+ Second, the algorithm to compute the smoothed round-
+ trip time is inadequate [TCP:7], because it incorrectly
+
+
+
+Internet Engineering Task Force [Page 95]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ assumed that the variance in RTT values would be small
+ and constant. These problems were solved by Karn's and
+ Jacobson's algorithm, respectively.
+
+ The performance increase resulting from the use of
+ these improvements varies from noticeable to dramatic.
+ Jacobson's algorithm for incorporating the measured RTT
+ variance is especially important on a low-speed link,
+ where the natural variation of packet sizes causes a
+ large variation in RTT. One vendor found link
+ utilization on a 9.6kb line went from 10% to 90% as a
+ result of implementing Jacobson's variance algorithm in
+ TCP.
+
+ The following values SHOULD be used to initialize the
+ estimation parameters for a new connection:
+
+ (a) RTT = 0 seconds.
+
+ (b) RTO = 3 seconds. (The smoothed variance is to be
+ initialized to the value that will result in this RTO).
+
+ The recommended upper and lower bounds on the RTO are known
+ to be inadequate on large internets. The lower bound SHOULD
+ be measured in fractions of a second (to accommodate high
+ speed LANs) and the upper bound should be 2*MSL, i.e., 240
+ seconds.
+
+ DISCUSSION:
+ Experience has shown that these initialization values
+ are reasonable, and that in any case the Karn and
+ Jacobson algorithms make TCP behavior reasonably
+ insensitive to the initial parameter choices.
+
+ 4.2.3.2 When to Send an ACK Segment
+
+ A host that is receiving a stream of TCP data segments can
+ increase efficiency in both the Internet and the hosts by
+ sending fewer than one ACK (acknowledgment) segment per data
+ segment received; this is known as a "delayed ACK" [TCP:5].
+
+ A TCP SHOULD implement a delayed ACK, but an ACK should not
+ be excessively delayed; in particular, the delay MUST be
+ less than 0.5 seconds, and in a stream of full-sized
+ segments there SHOULD be an ACK for at least every second
+ segment.
+
+ DISCUSSION:
+
+
+
+Internet Engineering Task Force [Page 96]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ A delayed ACK gives the application an opportunity to
+ update the window and perhaps to send an immediate
+ response. In particular, in the case of character-mode
+ remote login, a delayed ACK can reduce the number of
+ segments sent by the server by a factor of 3 (ACK,
+ window update, and echo character all combined in one
+ segment).
+
+ In addition, on some large multi-user hosts, a delayed
+ ACK can substantially reduce protocol processing
+ overhead by reducing the total number of packets to be
+ processed [TCP:5]. However, excessive delays on ACK's
+ can disturb the round-trip timing and packet "clocking"
+ algorithms [TCP:7].
+
+ 4.2.3.3 When to Send a Window Update
+
+ A TCP MUST include a SWS avoidance algorithm in the receiver
+ [TCP:5].
+
+ IMPLEMENTATION:
+ The receiver's SWS avoidance algorithm determines when
+ the right window edge may be advanced; this is
+ customarily known as "updating the window". This
+ algorithm combines with the delayed ACK algorithm (see
+ Section 4.2.3.2) to determine when an ACK segment
+ containing the current window will really be sent to
+ the receiver. We use the notation of RFC-793; see
+ Figures 4 and 5 in that document.
+
+ The solution to receiver SWS is to avoid advancing the
+ right window edge RCV.NXT+RCV.WND in small increments,
+ even if data is received from the network in small
+ segments.
+
+ Suppose the total receive buffer space is RCV.BUFF. At
+ any given moment, RCV.USER octets of this total may be
+ tied up with data that has been received and
+ acknowledged but which the user process has not yet
+ consumed. When the connection is quiescent, RCV.WND =
+ RCV.BUFF and RCV.USER = 0.
+
+ Keeping the right window edge fixed as data arrives and
+ is acknowledged requires that the receiver offer less
+ than its full buffer space, i.e., the receiver must
+ specify a RCV.WND that keeps RCV.NXT+RCV.WND constant
+ as RCV.NXT increases. Thus, the total buffer space
+ RCV.BUFF is generally divided into three parts:
+
+
+
+Internet Engineering Task Force [Page 97]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+
+ |<------- RCV.BUFF ---------------->|
+ 1 2 3
+ ----|---------|------------------|------|----
+ RCV.NXT ^
+ (Fixed)
+
+ 1 - RCV.USER = data received but not yet consumed;
+ 2 - RCV.WND = space advertised to sender;
+ 3 - Reduction = space available but not yet
+ advertised.
+
+
+ The suggested SWS avoidance algorithm for the receiver
+ is to keep RCV.NXT+RCV.WND fixed until the reduction
+ satisfies:
+
+ RCV.BUFF - RCV.USER - RCV.WND >=
+
+ min( Fr * RCV.BUFF, Eff.snd.MSS )
+
+ where Fr is a fraction whose recommended value is 1/2,
+ and Eff.snd.MSS is the effective send MSS for the
+ connection (see Section 4.2.2.6). When the inequality
+ is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER.
+
+ Note that the general effect of this algorithm is to
+ advance RCV.WND in increments of Eff.snd.MSS (for
+ realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2).
+ Note also that the receiver must use its own
+ Eff.snd.MSS, assuming it is the same as the sender's.
+
+ 4.2.3.4 When to Send Data
+
+ A TCP MUST include a SWS avoidance algorithm in the sender.
+
+ A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
+ coalesce short segments. However, there MUST be a way for
+ an application to disable the Nagle algorithm on an
+ individual connection. In all cases, sending data is also
+ subject to the limitation imposed by the Slow Start
+ algorithm (Section 4.2.2.15).
+
+ DISCUSSION:
+ The Nagle algorithm is generally as follows:
+
+ If there is unacknowledged data (i.e., SND.NXT >
+ SND.UNA), then the sending TCP buffers all user
+
+
+
+Internet Engineering Task Force [Page 98]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ data (regardless of the PSH bit), until the
+ outstanding data has been acknowledged or until
+ the TCP can send a full-sized segment (Eff.snd.MSS
+ bytes; see Section 4.2.2.6).
+
+ Some applications (e.g., real-time display window
+ updates) require that the Nagle algorithm be turned
+ off, so small data segments can be streamed out at the
+ maximum rate.
+
+ IMPLEMENTATION:
+ The sender's SWS avoidance algorithm is more difficult
+ than the receivers's, because the sender does not know
+ (directly) the receiver's total buffer space RCV.BUFF.
+ An approach which has been found to work well is for
+ the sender to calculate Max(SND.WND), the maximum send
+ window it has seen so far on the connection, and to use
+ this value as an estimate of RCV.BUFF. Unfortunately,
+ this can only be an estimate; the receiver may at any
+ time reduce the size of RCV.BUFF. To avoid a resulting
+ deadlock, it is necessary to have a timeout to force
+ transmission of data, overriding the SWS avoidance
+ algorithm. In practice, this timeout should seldom
+ occur.
+
+ The "useable window" [TCP:5] is:
+
+ U = SND.UNA + SND.WND - SND.NXT
+
+ i.e., the offered window less the amount of data sent
+ but not acknowledged. If D is the amount of data
+ queued in the sending TCP but not yet sent, then the
+ following set of rules is recommended.
+
+ Send data:
+
+ (1) if a maximum-sized segment can be sent, i.e, if:
+
+ min(D,U) >= Eff.snd.MSS;
+
+
+ (2) or if the data is pushed and all queued data can
+ be sent now, i.e., if:
+
+ [SND.NXT = SND.UNA and] PUSHED and D <= U
+
+ (the bracketed condition is imposed by the Nagle
+ algorithm);
+
+
+
+Internet Engineering Task Force [Page 99]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ (3) or if at least a fraction Fs of the maximum window
+ can be sent, i.e., if:
+
+ [SND.NXT = SND.UNA and]
+
+ min(D.U) >= Fs * Max(SND.WND);
+
+
+ (4) or if data is PUSHed and the override timeout
+ occurs.
+
+ Here Fs is a fraction whose recommended value is 1/2.
+ The override timeout should be in the range 0.1 - 1.0
+ seconds. It may be convenient to combine this timer
+ with the timer used to probe zero windows (Section
+ 4.2.2.17).
+
+ Finally, note that the SWS avoidance algorithm just
+ specified is to be used instead of the sender-side
+ algorithm contained in [TCP:5].
+
+ 4.2.3.5 TCP Connection Failures
+
+ Excessive retransmission of the same segment by TCP
+ indicates some failure of the remote host or the Internet
+ path. This failure may be of short or long duration. The
+ following procedure MUST be used to handle excessive
+ retransmissions of data segments [IP:11]:
+
+ (a) There are two thresholds R1 and R2 measuring the amount
+ of retransmission that has occurred for the same
+ segment. R1 and R2 might be measured in time units or
+ as a count of retransmissions.
+
+ (b) When the number of transmissions of the same segment
+ reaches or exceeds threshold R1, pass negative advice
+ (see Section 3.3.1.4) to the IP layer, to trigger
+ dead-gateway diagnosis.
+
+ (c) When the number of transmissions of the same segment
+ reaches a threshold R2 greater than R1, close the
+ connection.
+
+ (d) An application MUST be able to set the value for R2 for
+ a particular connection. For example, an interactive
+ application might set R2 to "infinity," giving the user
+ control over when to disconnect.
+
+
+
+
+Internet Engineering Task Force [Page 100]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ (d) TCP SHOULD inform the application of the delivery
+ problem (unless such information has been disabled by
+ the application; see Section 4.2.4.1), when R1 is
+ reached and before R2. This will allow a remote login
+ (User Telnet) application program to inform the user,
+ for example.
+
+ The value of R1 SHOULD correspond to at least 3
+ retransmissions, at the current RTO. The value of R2 SHOULD
+ correspond to at least 100 seconds.
+
+ An attempt to open a TCP connection could fail with
+ excessive retransmissions of the SYN segment or by receipt
+ of a RST segment or an ICMP Port Unreachable. SYN
+ retransmissions MUST be handled in the general way just
+ described for data retransmissions, including notification
+ of the application layer.
+
+ However, the values of R1 and R2 may be different for SYN
+ and data segments. In particular, R2 for a SYN segment MUST
+ be set large enough to provide retransmission of the segment
+ for at least 3 minutes. The application can close the
+ connection (i.e., give up on the open attempt) sooner, of
+ course.
+
+ DISCUSSION:
+ Some Internet paths have significant setup times, and
+ the number of such paths is likely to increase in the
+ future.
+
+ 4.2.3.6 TCP Keep-Alives
+
+ Implementors MAY include "keep-alives" in their TCP
+ implementations, although this practice is not universally
+ accepted. If keep-alives are included, the application MUST
+ be able to turn them on or off for each TCP connection, and
+ they MUST default to off.
+
+ Keep-alive packets MUST only be sent when no data or
+ acknowledgement packets have been received for the
+ connection within an interval. This interval MUST be
+ configurable and MUST default to no less than two hours.
+
+ It is extremely important to remember that ACK segments that
+ contain no data are not reliably transmitted by TCP.
+ Consequently, if a keep-alive mechanism is implemented it
+ MUST NOT interpret failure to respond to any specific probe
+ as a dead connection.
+
+
+
+Internet Engineering Task Force [Page 101]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ An implementation SHOULD send a keep-alive segment with no
+ data; however, it MAY be configurable to send a keep-alive
+ segment containing one garbage octet, for compatibility with
+ erroneous TCP implementations.
+
+ DISCUSSION:
+ A "keep-alive" mechanism periodically probes the other
+ end of a connection when the connection is otherwise
+ idle, even when there is no data to be sent. The TCP
+ specification does not include a keep-alive mechanism
+ because it could: (1) cause perfectly good connections
+ to break during transient Internet failures; (2)
+ consume unnecessary bandwidth ("if no one is using the
+ connection, who cares if it is still good?"); and (3)
+ cost money for an Internet path that charges for
+ packets.
+
+ Some TCP implementations, however, have included a
+ keep-alive mechanism. To confirm that an idle
+ connection is still active, these implementations send
+ a probe segment designed to elicit a response from the
+ peer TCP. Such a segment generally contains SEG.SEQ =
+ SND.NXT-1 and may or may not contain one garbage octet
+ of data. Note that on a quiet connection SND.NXT =
+ RCV.NXT, so that this SEG.SEQ will be outside the
+ window. Therefore, the probe causes the receiver to
+ return an acknowledgment segment, confirming that the
+ connection is still live. If the peer has dropped the
+ connection due to a network partition or a crash, it
+ will respond with a RST instead of an acknowledgment
+ segment.
+
+ Unfortunately, some misbehaved TCP implementations fail
+ to respond to a segment with SEG.SEQ = SND.NXT-1 unless
+ the segment contains data. Alternatively, an
+ implementation could determine whether a peer responded
+ correctly to keep-alive packets with no garbage data
+ octet.
+
+ A TCP keep-alive mechanism should only be invoked in
+ server applications that might otherwise hang
+ indefinitely and consume resources unnecessarily if a
+ client crashes or aborts a connection during a network
+ failure.
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 102]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ 4.2.3.7 TCP Multihoming
+
+ If an application on a multihomed host does not specify the
+ local IP address when actively opening a TCP connection,
+ then the TCP MUST ask the IP layer to select a local IP
+ address before sending the (first) SYN. See the function
+ GET_SRCADDR() in Section 3.4.
+
+ At all other times, a previous segment has either been sent
+ or received on this connection, and TCP MUST use the same
+ local address is used that was used in those previous
+ segments.
+
+ 4.2.3.8 IP Options
+
+ When received options are passed up to TCP from the IP
+ layer, TCP MUST ignore options that it does not understand.
+
+ A TCP MAY support the Time Stamp and Record Route options.
+
+ An application MUST be able to specify a source route when
+ it actively opens a TCP connection, and this MUST take
+ precedence over a source route received in a datagram.
+
+ When a TCP connection is OPENed passively and a packet
+ arrives with a completed IP Source Route option (containing
+ a return route), TCP MUST save the return route and use it
+ for all segments sent on this connection. If a different
+ source route arrives in a later segment, the later
+ definition SHOULD override the earlier one.
+
+ 4.2.3.9 ICMP Messages
+
+ TCP MUST act on an ICMP error message passed up from the IP
+ layer, directing it to the connection that created the
+ error. The necessary demultiplexing information can be
+ found in the IP header contained within the ICMP message.
+
+ o Source Quench
+
+ TCP MUST react to a Source Quench by slowing
+ transmission on the connection. The RECOMMENDED
+ procedure is for a Source Quench to trigger a "slow
+ start," as if a retransmission timeout had occurred.
+
+ o Destination Unreachable -- codes 0, 1, 5
+
+ Since these Unreachable messages indicate soft error
+
+
+
+Internet Engineering Task Force [Page 103]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ conditions, TCP MUST NOT abort the connection, and it
+ SHOULD make the information available to the
+ application.
+
+ DISCUSSION:
+ TCP could report the soft error condition directly
+ to the application layer with an upcall to the
+ ERROR_REPORT routine, or it could merely note the
+ message and report it to the application only when
+ and if the TCP connection times out.
+
+ o Destination Unreachable -- codes 2-4
+
+ These are hard error conditions, so TCP SHOULD abort
+ the connection.
+
+ o Time Exceeded -- codes 0, 1
+
+ This should be handled the same way as Destination
+ Unreachable codes 0, 1, 5 (see above).
+
+ o Parameter Problem
+
+ This should be handled the same way as Destination
+ Unreachable codes 0, 1, 5 (see above).
+
+
+ 4.2.3.10 Remote Address Validation
+
+ A TCP implementation MUST reject as an error a local OPEN
+ call for an invalid remote IP address (e.g., a broadcast or
+ multicast address).
+
+ An incoming SYN with an invalid source address must be
+ ignored either by TCP or by the IP layer (see Section
+ 3.2.1.3).
+
+ A TCP implementation MUST silently discard an incoming SYN
+ segment that is addressed to a broadcast or multicast
+ address.
+
+ 4.2.3.11 TCP Traffic Patterns
+
+ IMPLEMENTATION:
+ The TCP protocol specification [TCP:1] gives the
+ implementor much freedom in designing the algorithms
+ that control the message flow over the connection --
+ packetizing, managing the window, sending
+
+
+
+Internet Engineering Task Force [Page 104]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ acknowledgments, etc. These design decisions are
+ difficult because a TCP must adapt to a wide range of
+ traffic patterns. Experience has shown that a TCP
+ implementor needs to verify the design on two extreme
+ traffic patterns:
+
+ o Single-character Segments
+
+ Even if the sender is using the Nagle Algorithm,
+ when a TCP connection carries remote login traffic
+ across a low-delay LAN the receiver will generally
+ get a stream of single-character segments. If
+ remote terminal echo mode is in effect, the
+ receiver's system will generally echo each
+ character as it is received.
+
+ o Bulk Transfer
+
+ When TCP is used for bulk transfer, the data
+ stream should be made up (almost) entirely of
+ segments of the size of the effective MSS.
+ Although TCP uses a sequence number space with
+ byte (octet) granularity, in bulk-transfer mode
+ its operation should be as if TCP used a sequence
+ space that counted only segments.
+
+ Experience has furthermore shown that a single TCP can
+ effectively and efficiently handle these two extremes.
+
+ The most important tool for verifying a new TCP
+ implementation is a packet trace program. There is a
+ large volume of experience showing the importance of
+ tracing a variety of traffic patterns with other TCP
+ implementations and studying the results carefully.
+
+
+ 4.2.3.12 Efficiency
+
+ IMPLEMENTATION:
+ Extensive experience has led to the following
+ suggestions for efficient implementation of TCP:
+
+ (a) Don't Copy Data
+
+ In bulk data transfer, the primary CPU-intensive
+ tasks are copying data from one place to another
+ and checksumming the data. It is vital to
+ minimize the number of copies of TCP data. Since
+
+
+
+Internet Engineering Task Force [Page 105]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ the ultimate speed limitation may be fetching data
+ across the memory bus, it may be useful to combine
+ the copy with checksumming, doing both with a
+ single memory fetch.
+
+ (b) Hand-Craft the Checksum Routine
+
+ A good TCP checksumming routine is typically two
+ to five times faster than a simple and direct
+ implementation of the definition. Great care and
+ clever coding are often required and advisable to
+ make the checksumming code "blazing fast". See
+ [TCP:10].
+
+ (c) Code for the Common Case
+
+ TCP protocol processing can be complicated, but
+ for most segments there are only a few simple
+ decisions to be made. Per-segment processing will
+ be greatly speeded up by coding the main line to
+ minimize the number of decisions in the most
+ common case.
+
+
+ 4.2.4 TCP/APPLICATION LAYER INTERFACE
+
+ 4.2.4.1 Asynchronous Reports
+
+ There MUST be a mechanism for reporting soft TCP error
+ conditions to the application. Generically, we assume this
+ takes the form of an application-supplied ERROR_REPORT
+ routine that may be upcalled [INTRO:7] asynchronously from
+ the transport layer:
+
+ ERROR_REPORT(local connection name, reason, subreason)
+
+ The precise encoding of the reason and subreason parameters
+ is not specified here. However, the conditions that are
+ reported asynchronously to the application MUST include:
+
+ * ICMP error message arrived (see 4.2.3.9)
+
+ * Excessive retransmissions (see 4.2.3.5)
+
+ * Urgent pointer advance (see 4.2.2.4).
+
+ However, an application program that does not want to
+ receive such ERROR_REPORT calls SHOULD be able to
+
+
+
+Internet Engineering Task Force [Page 106]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ effectively disable these calls.
+
+ DISCUSSION:
+ These error reports generally reflect soft errors that
+ can be ignored without harm by many applications. It
+ has been suggested that these error report calls should
+ default to "disabled," but this is not required.
+
+ 4.2.4.2 Type-of-Service
+
+ The application layer MUST be able to specify the Type-of-
+ Service (TOS) for segments that are sent on a connection.
+ It not required, but the application SHOULD be able to
+ change the TOS during the connection lifetime. TCP SHOULD
+ pass the current TOS value without change to the IP layer,
+ when it sends segments on the connection.
+
+ The TOS will be specified independently in each direction on
+ the connection, so that the receiver application will
+ specify the TOS used for ACK segments.
+
+ TCP MAY pass the most recently received TOS up to the
+ application.
+
+ DISCUSSION
+ Some applications (e.g., SMTP) change the nature of
+ their communication during the lifetime of a
+ connection, and therefore would like to change the TOS
+ specification.
+
+ Note also that the OPEN call specified in RFC-793
+ includes a parameter ("options") in which the caller
+ can specify IP options such as source route, record
+ route, or timestamp.
+
+ 4.2.4.3 Flush Call
+
+ Some TCP implementations have included a FLUSH call, which
+ will empty the TCP send queue of any data for which the user
+ has issued SEND calls but which is still to the right of the
+ current send window. That is, it flushes as much queued
+ send data as possible without losing sequence number
+ synchronization. This is useful for implementing the "abort
+ output" function of Telnet.
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 107]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ 4.2.4.4 Multihoming
+
+ The user interface outlined in sections 2.7 and 3.8 of RFC-
+ 793 needs to be extended for multihoming. The OPEN call
+ MUST have an optional parameter:
+
+ OPEN( ... [local IP address,] ... )
+
+ to allow the specification of the local IP address.
+
+ DISCUSSION:
+ Some TCP-based applications need to specify the local
+ IP address to be used to open a particular connection;
+ FTP is an example.
+
+ IMPLEMENTATION:
+ A passive OPEN call with a specified "local IP address"
+ parameter will await an incoming connection request to
+ that address. If the parameter is unspecified, a
+ passive OPEN will await an incoming connection request
+ to any local IP address, and then bind the local IP
+ address of the connection to the particular address
+ that is used.
+
+ For an active OPEN call, a specified "local IP address"
+ parameter will be used for opening the connection. If
+ the parameter is unspecified, the networking software
+ will choose an appropriate local IP address (see
+ Section 3.3.4.2) for the connection
+
+ 4.2.5 TCP REQUIREMENT SUMMARY
+
+ | | | | |S| |
+ | | | | |H| |F
+ | | | | |O|M|o
+ | | |S| |U|U|o
+ | | |H| |L|S|t
+ | |M|O| |D|T|n
+ | |U|U|M| | |o
+ | |S|L|A|N|N|t
+ | |T|D|Y|O|O|t
+FEATURE |SECTION | | | |T|T|e
+-------------------------------------------------|--------|-|-|-|-|-|--
+ | | | | | | |
+Push flag | | | | | | |
+ Aggregate or queue un-pushed data |4.2.2.2 | | |x| | |
+ Sender collapse successive PSH flags |4.2.2.2 | |x| | | |
+ SEND call can specify PUSH |4.2.2.2 | | |x| | |
+
+
+
+Internet Engineering Task Force [Page 108]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x|
+ If cannot: PSH last segment |4.2.2.2 |x| | | | |
+ Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1
+ Send max size segment when possible |4.2.2.2 | |x| | | |
+ | | | | | | |
+Window | | | | | | |
+ Treat as unsigned number |4.2.2.3 |x| | | | |
+ Handle as 32-bit number |4.2.2.3 | |x| | | |
+ Shrink window from right |4.2.2.16| | | |x| |
+ Robust against shrinking window |4.2.2.16|x| | | | |
+ Receiver's window closed indefinitely |4.2.2.17| | |x| | |
+ Sender probe zero window |4.2.2.17|x| | | | |
+ First probe after RTO |4.2.2.17| |x| | | |
+ Exponential backoff |4.2.2.17| |x| | | |
+ Allow window stay zero indefinitely |4.2.2.17|x| | | | |
+ Sender timeout OK conn with zero wind |4.2.2.17| | | | |x|
+ | | | | | | |
+Urgent Data | | | | | | |
+ Pointer points to last octet |4.2.2.4 |x| | | | |
+ Arbitrary length urgent data sequence |4.2.2.4 |x| | | | |
+ Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1
+ ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1
+ | | | | | | |
+TCP Options | | | | | | |
+ Receive TCP option in any segment |4.2.2.5 |x| | | | |
+ Ignore unsupported options |4.2.2.5 |x| | | | |
+ Cope with illegal option length |4.2.2.5 |x| | | | |
+ Implement sending & receiving MSS option |4.2.2.6 |x| | | | |
+ Send MSS option unless 536 |4.2.2.6 | |x| | | |
+ Send MSS option always |4.2.2.6 | | |x| | |
+ Send-MSS default is 536 |4.2.2.6 |x| | | | |
+ Calculate effective send seg size |4.2.2.6 |x| | | | |
+ | | | | | | |
+TCP Checksums | | | | | | |
+ Sender compute checksum |4.2.2.7 |x| | | | |
+ Receiver check checksum |4.2.2.7 |x| | | | |
+ | | | | | | |
+Use clock-driven ISN selection |4.2.2.9 |x| | | | |
+ | | | | | | |
+Opening Connections | | | | | | |
+ Support simultaneous open attempts |4.2.2.10|x| | | | |
+ SYN-RCVD remembers last state |4.2.2.11|x| | | | |
+ Passive Open call interfere with others |4.2.2.18| | | | |x|
+ Function: simultan. LISTENs for same port |4.2.2.18|x| | | | |
+ Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | |
+ Otherwise, use local addr of conn. |4.2.3.7 |x| | | | |
+ OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x|
+ Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | |
+
+
+
+Internet Engineering Task Force [Page 109]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ | | | | | | |
+Closing Connections | | | | | | |
+ RST can contain data |4.2.2.12| |x| | | |
+ Inform application of aborted conn |4.2.2.13|x| | | | |
+ Half-duplex close connections |4.2.2.13| | |x| | |
+ Send RST to indicate data lost |4.2.2.13| |x| | | |
+ In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | |
+ Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | |
+ | | | | | | |
+Retransmissions | | | | | | |
+ Jacobson Slow Start algorithm |4.2.2.15|x| | | | |
+ Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | |
+ Retransmit with same IP ident |4.2.2.15| | |x| | |
+ Karn's algorithm |4.2.3.1 |x| | | | |
+ Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | |
+ Exponential backoff |4.2.3.1 |x| | | | |
+ SYN RTO calc same as data |4.2.3.1 | |x| | | |
+ Recommended initial values and bounds |4.2.3.1 | |x| | | |
+ | | | | | | |
+Generating ACK's: | | | | | | |
+ Queue out-of-order segments |4.2.2.20| |x| | | |
+ Process all Q'd before send ACK |4.2.2.20|x| | | | |
+ Send ACK for out-of-order segment |4.2.2.21| | |x| | |
+ Delayed ACK's |4.2.3.2 | |x| | | |
+ Delay < 0.5 seconds |4.2.3.2 |x| | | | |
+ Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | |
+ Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | |
+ | | | | | | |
+Sending data | | | | | | |
+ Configurable TTL |4.2.2.19|x| | | | |
+ Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | |
+ Nagle algorithm |4.2.3.4 | |x| | | |
+ Application can disable Nagle algorithm |4.2.3.4 |x| | | | |
+ | | | | | | |
+Connection Failures: | | | | | | |
+ Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | |
+ Close connection on R2 retxs |4.2.3.5 |x| | | | |
+ ALP can set R2 |4.2.3.5 |x| | | | |1
+ Inform ALP of R1<=retxs<R2 |4.2.3.5 | |x| | | |1
+ Recommended values for R1, R2 |4.2.3.5 | |x| | | |
+ Same mechanism for SYNs |4.2.3.5 |x| | | | |
+ R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | |
+ | | | | | | |
+Send Keep-alive Packets: |4.2.3.6 | | |x| | |
+ - Application can request |4.2.3.6 |x| | | | |
+ - Default is "off" |4.2.3.6 |x| | | | |
+ - Only send if idle for interval |4.2.3.6 |x| | | | |
+ - Interval configurable |4.2.3.6 |x| | | | |
+
+
+
+Internet Engineering Task Force [Page 110]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ - Default at least 2 hrs. |4.2.3.6 |x| | | | |
+ - Tolerant of lost ACK's |4.2.3.6 |x| | | | |
+ | | | | | | |
+IP Options | | | | | | |
+ Ignore options TCP doesn't understand |4.2.3.8 |x| | | | |
+ Time Stamp support |4.2.3.8 | | |x| | |
+ Record Route support |4.2.3.8 | | |x| | |
+ Source Route: | | | | | | |
+ ALP can specify |4.2.3.8 |x| | | | |1
+ Overrides src rt in datagram |4.2.3.8 |x| | | | |
+ Build return route from src rt |4.2.3.8 |x| | | | |
+ Later src route overrides |4.2.3.8 | |x| | | |
+ | | | | | | |
+Receiving ICMP Messages from IP |4.2.3.9 |x| | | | |
+ Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | |
+ Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x|
+ Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | |
+ Source Quench => slow start |4.2.3.9 | |x| | | |
+ Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | |
+ Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | |
+ | | | | | | |
+Address Validation | | | | | | |
+ Reject OPEN call to invalid IP address |4.2.3.10|x| | | | |
+ Reject SYN from invalid IP address |4.2.3.10|x| | | | |
+ Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | |
+ | | | | | | |
+TCP/ALP Interface Services | | | | | | |
+ Error Report mechanism |4.2.4.1 |x| | | | |
+ ALP can disable Error Report Routine |4.2.4.1 | |x| | | |
+ ALP can specify TOS for sending |4.2.4.2 |x| | | | |
+ Passed unchanged to IP |4.2.4.2 | |x| | | |
+ ALP can change TOS during connection |4.2.4.2 | |x| | | |
+ Pass received TOS up to ALP |4.2.4.2 | | |x| | |
+ FLUSH call |4.2.4.3 | | |x| | |
+ Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | |
+-------------------------------------------------|--------|-|-|-|-|-|--
+-------------------------------------------------|--------|-|-|-|-|-|--
+
+FOOTNOTES:
+
+(1) "ALP" means Application-Layer program.
+
+
+
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 111]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+5. REFERENCES
+
+INTRODUCTORY REFERENCES
+
+
+[INTRO:1] "Requirements for Internet Hosts -- Application and Support,"
+ IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123,
+ October 1989.
+
+[INTRO:2] "Requirements for Internet Gateways," R. Braden and J.
+ Postel, RFC-1009, June 1987.
+
+[INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006,
+ (three volumes), SRI International, December 1985.
+
+[INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel,
+ RFC-1011, May 1987.
+
+ This document is republished periodically with new RFC numbers; the
+ latest version must be used.
+
+[INTRO:5] "Protocol Document Order Information," O. Jacobsen and J.
+ Postel, RFC-980, March 1986.
+
+[INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May
+ 1987.
+
+ This document is republished periodically with new RFC numbers; the
+ latest version must be used.
+
+[INTRO:7] "Modularity and Efficiency in Protocol Implementations," D.
+ Clark, RFC-817, July 1982.
+
+[INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM
+ SOSP, Orcas Island, Washington, December 1985.
+
+
+Secondary References:
+
+
+[INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf
+ and R. Kahn, IEEE Transactions on Communication, May 1974.
+
+[INTRO:10] "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D.
+ Cohen, Computer Networks, Vol. 5, No. 4, July 1981.
+
+[INTRO:11] "The DARPA Internet Protocol Suite," B. Leiner, J. Postel,
+ R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC,
+
+
+
+Internet Engineering Task Force [Page 112]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ March 1985. Also in: IEEE Communications Magazine, March 1985.
+ Also available as ISI-RS-85-153.
+
+[INTRO:12] "Final Text of DIS8473, Protocol for Providing the
+ Connectionless Mode Network Service," ANSI, published as RFC-994,
+ March 1986.
+
+[INTRO:13] "End System to Intermediate System Routing Exchange
+ Protocol," ANSI X3S3.3, published as RFC-995, April 1986.
+
+
+LINK LAYER REFERENCES
+
+
+[LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893,
+ April 1984.
+
+[LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826,
+ November 1982.
+
+[LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet
+ Networks," C. Hornig, RFC-894, April 1984.
+
+[LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802
+ "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988.
+
+ This RFC contains a great deal of information of importance to
+ Internet implementers planning to use IEEE 802 networks.
+
+
+IP LAYER REFERENCES
+
+
+[IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981.
+
+[IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792,
+ September 1981.
+
+[IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel,
+ RFC-950, August 1985.
+
+[IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112,
+ August 1989.
+
+[IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department
+ of Defense, August 1983.
+
+ This specification, as amended by RFC-963, is intended to describe
+
+
+
+Internet Engineering Task Force [Page 113]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+ the Internet Protocol but has some serious omissions (e.g., the
+ mandatory subnet extension [IP:3] and the optional multicasting
+ extension [IP:4]). It is also out of date. If there is a
+ conflict, RFC-791, RFC-792, and RFC-950 must be taken as
+ authoritative, while the present document is authoritative over
+ all.
+
+[IP:6] "Some Problems with the Specification of the Military Standard
+ Internet Protocol," D. Sidhu, RFC-963, November 1985.
+
+[IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel,
+ RFC-879, November 1983.
+
+ Discusses and clarifies the relationship between the TCP Maximum
+ Segment Size option and the IP datagram size.
+
+[IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108,
+ October 1989.
+
+[IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM
+ SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol.
+ 17, no. 5.
+
+ This useful paper discusses the problems created by Internet
+ fragmentation and presents alternative solutions.
+
+[IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July
+ 1982.
+
+ This and the following paper should be read by every implementor.
+
+[IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982.
+
+SECONDARY IP REFERENCES:
+
+
+[IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J.
+ Mogul, RFC-922, October 1984.
+
+[IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July
+ 1982.
+
+[IP:14] "Something a Host Could Do with Source Quench: The Source Quench
+ Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July
+ 1987.
+
+ This RFC first described directed broadcast addresses. However,
+ the bulk of the RFC is concerned with gateways, not hosts.
+
+
+
+Internet Engineering Task Force [Page 114]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+UDP REFERENCES:
+
+
+[UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980.
+
+
+TCP REFERENCES:
+
+
+[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September
+ 1981.
+
+
+[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of
+ Defense, August 1984.
+
+ This specification as amended by RFC-964 is intended to describe
+ the same protocol as RFC-793 [TCP:1]. If there is a conflict,
+ RFC-793 takes precedence, and the present document is authoritative
+ over both.
+
+
+[TCP:3] "Some Problems with the Specification of the Military Standard
+ Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964,
+ November 1985.
+
+
+[TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel,
+ RFC-879, November 1983.
+
+
+[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813,
+ July 1982.
+
+
+[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM
+ SIGCOMM-87, August 1987.
+
+
+[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88,
+ August 1988.
+
+
+SECONDARY TCP REFERENCES:
+
+
+[TCP:8] "Modularity and Efficiency in Protocol Implementation," D.
+ Clark, RFC-817, July 1982.
+
+
+
+Internet Engineering Task Force [Page 115]
+
+
+
+
+RFC1122 TRANSPORT LAYER -- TCP October 1989
+
+
+[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984.
+
+
+[TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C.
+ Partridge, RFC-1071, September 1988.
+
+
+[TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden,
+ RFC-1072, October 1988.
+
+
+Security Considerations
+
+ There are many security issues in the communication layers of host
+ software, but a full discussion is beyond the scope of this RFC.
+
+ The Internet architecture generally provides little protection
+ against spoofing of IP source addresses, so any security mechanism
+ that is based upon verifying the IP source address of a datagram
+ should be treated with suspicion. However, in restricted
+ environments some source-address checking may be possible. For
+ example, there might be a secure LAN whose gateway to the rest of the
+ Internet discarded any incoming datagram with a source address that
+ spoofed the LAN address. In this case, a host on the LAN could use
+ the source address to test for local vs. remote source. This problem
+ is complicated by source routing, and some have suggested that
+ source-routed datagram forwarding by hosts (see Section 3.3.5) should
+ be outlawed for security reasons.
+
+ Security-related issues are mentioned in sections concerning the IP
+ Security option (Section 3.2.1.8), the ICMP Parameter Problem message
+ (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and
+ reserved TCP ports (Section 4.2.2.1).
+
+Author's Address
+
+ Robert Braden
+ USC/Information Sciences Institute
+ 4676 Admiralty Way
+ Marina del Rey, CA 90292-6695
+
+ Phone: (213) 822 1511
+
+ EMail: Braden@ISI.EDU
+
+
+
+
+
+
+
+Internet Engineering Task Force [Page 116]
+