HTTP/1
HTTP (Hyper Text Transfer Protocol) is an application-layer protocol used for communicating between a client and a server.
Table of contents
Introduction
HTTP is a request/response protocol. It specifies what clients can send to a server, and what they can expect to receive back [1, P. 683].
Originally HTTP was intended to transfer HTML documents from servers to browsers, but it’s now used for many different kinds of media.
HTTP/1.1 requests are made up of multiple lines. The first line is the most important, it contains the request method and the HTTP version number:
GET /glossary/internet HTTP/1.1
URIs
URIs (Universal Resource Identifiers) are strings that identify a resource [2, P. 18].
URIs can be represented either in absolute from, or relative form (relative to some base URI). An absolute URI begins with a scheme name (e.g. https) [2, P. 19].
HTTP URLs (Universal Resource Locators) include the information required to reach the resource. They have the following structure:
"http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
Connections
HTTP usually runs over TCP. HTTP/1.0 would close the TCP connection after receiving an HTTP response from a server. This meant each connection had to perform a TCP handshake, even if multiple HTTP requests were made to the same domain while loading a webpage (a common scenario) [1, P. 684].
To solve this, HTTP/1.1 supports persistent connections. The TCP connection can be kept alive, and additional requests can be sent over the same connection (also known as connection reuse). This improves performance [1, P. 684].
It’s also possible to pipeline requests (send 2 requests at the same time) [1, P. 684].
Connections are typically closed after a short time (e.g. 60 seconds) to avoid servers holding too many connections open [1, P. 685].
HTTP Methods
An HTTP request has an associated method.
The first word on the first line of an HTTP request (the Request-Line) is its method name:
Request-Line = Method Request-URI HTTP-Version CRLF
Method names indicate the intent of the request:
Method | Description |
---|---|
GET | Read a Web page. |
HEAD | Read a Web page’s header. |
POST | Append to a Web page. |
PUT | Store a Web page. |
DELETE | Remove the Web page. |
TRACE | Echo the incoming request. |
CONNECT | Connect through a proxy. |
OPTIONS | Query options for a page. |
HTTP Headers
The first line of an HTTP request/response can be followed by additional lines, called request headers. Responses can have response headers [1, P. 688].
Each header field is made of a name and a field value, separated by a colon (:):
message-header = field-name ":" [ field-value ]
Headers can be used to set caching policies, provide authorization, and provide metadata about the user agent making the request (as well as many other uses) [1, P. 688].
HTTP status codes
HTTP status codes are included in HTTP responses, as part of the Status-Line:
Status-Line = HTTP-Version Status-Code Reason-Phrase CRLF
The status code is a 3-digit integer that describes the status of the HTTP response [2, P. 39].
The status codes are classed based on the first integer:
Code | Meaning | Examples |
---|---|---|
1xx | Information | 100 = server agrees to handle client’s request. |
2xx | Success | 200 = request succeeded. 204 = no content present. |
3xx | Redirection | 301 = page moved. 304 = cached page still valid. |
4xx | Client error | 403 = forbidden page. 404 = page not found. |
5xx | Server error | 500 = internal server error. 503 = try again later. |
Caching
Caching is the process of storing HTTP responses to be used later. Caching improves performance by reducing network traffic and latency [1, P. 690].
In HTTP, a cache is defined as the cache entries and the program that manages the cache entries. The primary cache key consists of the request method and URI, and the value is the HTTP response [3, Pp. 5-6].
Vary headers can be used to create secondary cache keys [3, Pp. 9-10].
HTTP has shared caches and private caches. Shared caches can be accessed by multiple users whereas private caches are user-specific (e.g. a browser). Responses can be set to be cacheable for private caches only with the Cache-Control private directive [3, P. 4].
A cache entry can be either fresh or stale. “A fresh response is one whose age has not yet exceeded its freshness lifetime. Conversely, a stale response is one where it has” [3, P. 11].
The freshness lifetime of a response is the time between the generation of the response by an origin server and the expiration time of the response. Explicit expiration times can be set with HTTP headers (e.g. the Cache-Control max-age directive), but an expiration time can be implicitly calculated if the response is missing the required headers and the response is considered cacheable [3, P. 11].
Note: a cacheable response is defined as a response with a cacheable request method (e.g. GET), and either a cacheable status code (e.g. 200, 301, 404) or a header marking the request as cacheable (e.g. with the public response directive) [3, P. 13].
A fresh response can be returned to a client by a cache without the cache contacting the origin server. A stale response can be served if the origin server returns an error, or while the cache is revalidating a request (using the stale-if-error and stale-while-revalidate directives) [3, Pp. 11 15][4, P. 2].
When a stored response is stale, the cache must revalidate the response. The cache can reduce the amount of data transferred by making a conditional request to the origin server. If the origin server validates the resource (usually meaning the stored response hasn’t changed), the origin server responds with a 304 and the cache can respond to the client with its stored response. If the response has changed, the origin server responds with the full response [5, P. 15].
The origin server can validate a resource by using a variety of methods, such as the Last-Modified header [3, P. 15].
Another way to validate a resource is by using a response’s ETag. An ETag (entity tag) is an identifier that can be added to a response (commonly an ETag is generated by taking a hash of the response body). To validate a response using an ETag, a cache can send a GET request with an If-None-Match header containing the ETag of the response that is being validated. A recipient server can then respond with a 304 Not Modified if the response has the same ETag as provided in the If-None-Match header [3, P. 15][5, P. 15].
References
- [1] A. Tanenbaum and D. Wetherall, Computer Networks, 5th ed. 2011.
- [2] H. F. Nielsen et al., “Hypertext Transfer Protocol – HTTP/1.1,” no. 2616. RFC Editor, Jun-1999.
- [3] R. T. Fielding, M. Nottingham, and J. Reschke, “Hypertext Transfer Protocol (HTTP/1.1): Caching,” no. 7234. RFC Editor, Jun-2014.
- [4] M. Nottingham, “HTTP Cache-Control Extensions for Stale Content,” no. 5861. RFC Editor, May-2010.
- [5] R. T. Fielding and J. Reschke, “Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests,” no. 7232. RFC Editor, Jun-2014.