-
Notifications
You must be signed in to change notification settings - Fork 30.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc: general improvements to url.md copy
General cleanup and restructuring of the doc. Added additional detail to how URLs are serialized.
- Loading branch information
Showing
1 changed file
with
191 additions
and
82 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,139 +2,248 @@ | |
|
||
Stability: 2 - Stable | ||
|
||
This module has utilities for URL resolution and parsing. | ||
Call `require('url')` to use it. | ||
The `url` module provides utilities for URL resolution and parsing. It can be | ||
accessed using: | ||
|
||
## URL Parsing | ||
```js | ||
const url = require('url'); | ||
``` | ||
|
||
Parsed URL objects have some or all of the following fields, depending on | ||
whether or not they exist in the URL string. Any parts that are not in the URL | ||
string will not be in the parsed object. Examples are shown for the URL | ||
## URL Strings and URL Objects | ||
|
||
`'http://user:[email protected]:8080/p/a/t/h?query=string#hash'` | ||
A URL string is a structured string containing multiple meaningful components. | ||
When parsed, a URL object is returned containing properties for each of these | ||
components. | ||
|
||
* `href`: The full URL that was originally parsed. Both the protocol and host are lowercased. | ||
The following details each of the components of a parsed URL. The example | ||
`'http://user:[email protected]:8080/p/a/t/h?query=string#hash'` is used to | ||
illustrate each. | ||
|
||
Example: `'http://user:[email protected]:8080/p/a/t/h?query=string#hash'` | ||
``` | ||
+---------------------------------------------------------------------------+ | ||
| href | | ||
+----------++-----------+-----------------+-------------------------+-------+ | ||
| protocol || auth | host | path | hash | | ||
| || +----------+------+----------+--------------+ | | ||
| || | hostname | port | pathname | search | | | ||
| || | | | +-+------------+ | | ||
| || | | | | | query | | | ||
" http: // user:pass @ host.com : 8080 /p/a/t/h ? query=string #hash " | ||
| || | | | | | | | | ||
+----------++-----------+-----------+------+----------+-+-----------+-------+ | ||
(all spaces in the "" line should be ignored -- they're purely for formatting) | ||
``` | ||
|
||
* `protocol`: The request protocol, lowercased. | ||
### urlObject.href | ||
|
||
Example: `'http:'` | ||
The `href` property is the full URL string that was parsed with both the | ||
`protocol` and `host` components converted to lower-case. | ||
|
||
* `slashes`: The protocol requires slashes after the colon. | ||
For example: `'http://user:[email protected]:8080/p/a/t/h?query=string#hash'` | ||
|
||
Example: true or false | ||
### urlObject.protocol | ||
|
||
* `host`: The full lowercased host portion of the URL, including port | ||
information. | ||
The `protocol` property identifies the URL's lower-cased protocol scheme. | ||
|
||
Example: `'host.com:8080'` | ||
For example: `'http:'` | ||
|
||
* `auth`: The authentication information portion of a URL. | ||
### urlObject.slashes | ||
|
||
Example: `'user:pass'` | ||
The `slashes` property is a `boolean` with a value of `true` if two ASCII | ||
forward-slash characters (`/`) are required following the colon in the | ||
`protocol`. | ||
|
||
* `hostname`: Just the lowercased hostname portion of the host. | ||
### urlObject.host | ||
|
||
Example: `'host.com'` | ||
The `host` property is the full lower-cased host portion of the URL, including | ||
the `port` if specified. | ||
|
||
* `port`: The port number portion of the host. | ||
For example: `'host.com:8080'` | ||
|
||
Example: `'8080'` | ||
### urlObject.auth | ||
|
||
* `pathname`: The path section of the URL, that comes after the host and | ||
before the query, including the initial slash if present. No decoding is | ||
performed. | ||
The `auth` property is the username and password portion of the URL, also | ||
referred to as "userinfo". This string subset follows the `protocol` and | ||
double slashes (if present) and preceeds the `host` component, delimited by an | ||
ASCII "at sign" (`@`). The format of the string is `{username}[:{password}]`, | ||
with the `[:{password}]` portion being optional. | ||
|
||
Example: `'/p/a/t/h'` | ||
For example: `'user:pass'` | ||
|
||
* `search`: The 'query string' portion of the URL, including the leading | ||
question mark. | ||
### urlObject.hostname | ||
|
||
Example: `'?query=string'` | ||
The `hostname` property is the lower-cased host name portion of the `host` | ||
component *without* the `port` included. | ||
|
||
* `path`: Concatenation of `pathname` and `search`. No decoding is performed. | ||
For example: `'host.com'` | ||
|
||
Example: `'/p/a/t/h?query=string'` | ||
### urlObject.port | ||
|
||
* `query`: Either the 'params' portion of the query string, or a | ||
querystring-parsed object. | ||
The `port` property is the numeric port portion of the `host` component. | ||
|
||
Example: `'query=string'` or `{'query':'string'}` | ||
For example: `'8080'` | ||
|
||
* `hash`: The 'fragment' portion of the URL including the pound-sign. | ||
### urlObject.pathname | ||
|
||
Example: `'#hash'` | ||
The `pathname` property consists of the entire path section of the URL. This | ||
is everything following the `host` (including the `port`) and before the start | ||
of the `query` or `hash` components, delimited by either the ASCII question | ||
mark (`?`) or hash (`#`) characters. | ||
|
||
### Escaped Characters | ||
For example `'/p/a/t/h'` | ||
|
||
Spaces (`' '`) and the following characters will be automatically escaped in the | ||
properties of URL objects: | ||
No decoding of the path string is performed. | ||
|
||
``` | ||
< > " ` \r \n \t { } | \ ^ ' | ||
``` | ||
### urlObject.search | ||
|
||
The `search` property consists of the entire "query string" portion of the | ||
URL, including the leading ASCII question mark (`?`) character. | ||
|
||
For example: `'?query=string'` | ||
|
||
No decoding of the query string is performed. | ||
|
||
### urlObject.path | ||
|
||
The `path` property is a concatenation of the `pathname` and `search` | ||
components. | ||
|
||
For example: `'/p/a/t/h?query=string'` | ||
|
||
No decoding of the `path` is performed. | ||
|
||
### urlObject.query | ||
|
||
The `query` property is either the "params" portion of the query string ( | ||
everything *except* the leading ASCII question mark (`?`), or an object | ||
returned by the [`querystring`][] module's `parse()` method: | ||
|
||
--- | ||
For example: `'query=string'` or `{'query': 'string'}` | ||
|
||
The following methods are provided by the URL module: | ||
If returned as a string, no decoding of the query string is performed. If | ||
returned as an object, both keys and values are decoded. | ||
|
||
## url.format(urlObj) | ||
### urlObject.hash | ||
|
||
The `hash` property consists of the "fragment" portion of the URL including | ||
the leading ASCII hash (`#`) character. | ||
|
||
For example: `'#hash'` | ||
|
||
## url.format(urlObject) | ||
<!-- YAML | ||
added: v0.1.25 | ||
--> | ||
|
||
Take a parsed URL object, and return a formatted URL string. | ||
|
||
Here's how the formatting process works: | ||
|
||
* `href` will be ignored. | ||
* `path` will be ignored. | ||
* `protocol` is treated the same with or without the trailing `:` (colon). | ||
* The protocols `http`, `https`, `ftp`, `gopher`, `file` will be | ||
postfixed with `://` (colon-slash-slash) as long as `host`/`hostname` are present. | ||
* All other protocols `mailto`, `xmpp`, `aim`, `sftp`, `foo`, etc will | ||
be postfixed with `:` (colon). | ||
* `slashes` set to `true` if the protocol requires `://` (colon-slash-slash) | ||
* Only needs to be set for protocols not previously listed as requiring | ||
slashes, such as `mongodb://localhost:8000/`, or if `host`/`hostname` are absent. | ||
* `auth` will be used if present. | ||
* `hostname` will only be used if `host` is absent. | ||
* `port` will only be used if `host` is absent. | ||
* `host` will be used in place of `hostname` and `port`. | ||
* `pathname` is treated the same with or without the leading `/` (slash). | ||
* `query` (object; see `querystring`) will only be used if `search` is absent. | ||
* `search` will be used in place of `query`. | ||
* It is treated the same with or without the leading `?` (question mark). | ||
* `hash` is treated the same with or without the leading `#` (pound sign, anchor). | ||
|
||
## url.parse(urlStr[, parseQueryString][, slashesDenoteHost]) | ||
* `urlObject` {Object} A URL object (either as returned by `url.parse()` or | ||
constructed otherwise). | ||
|
||
The `url.format()` method processes the given URL object and returns a formatted | ||
URL string. | ||
|
||
The formatting process essentially operates as follows: | ||
|
||
* A new empty string `result` is created. | ||
* If `urlObject.protocol` is a string, it is appended as-is to `result`. | ||
* Otherwise, if `urlObject.protocol` is not `undefined` and is not a string, an | ||
[`Error`][] is thrown. | ||
* For all string values of `urlObject.protocol` that *do not end* with an ASCII | ||
colon (`:`) character, the literal string `:` will be appended to `result`. | ||
* If either the `urlObject.slashes` property is true, `urlObject.protocol` | ||
begins with one of `http`, `https`, `ftp`, `gopher`, or `file`, or | ||
`urlObject.protocol` is `undefined`, the literal string `//` will be appended | ||
to `result`. | ||
* If the value of the `urlObject.auth` property is truthy, and either | ||
`urlObject.host` or `urlObject.hostname` are not `undefined`, the value of | ||
`urlObject.auth` will be coerced into a string and appended to `result` | ||
followed by the literal string `@`. | ||
* If the `urlObject.host` property is `undefined` then: | ||
* If the `urlObject.hostname` is a string, it is appended to `result`. | ||
* Otherwise, if `urlObject.hostname` is not `undefined` and is not a string, | ||
an [`Error`][] is thrown. | ||
* If the `urlObject.port` property value is truthy, and `urlObject.hostname` | ||
is not `undefined`: | ||
* The literal string `:` is appended to `result`, and | ||
* The value of `urlObject.port` is coerced to a string and appended to | ||
`result`. | ||
* Otherwise, if the `urlObject.host` property value is truthy, the value of | ||
`urlObject.host` is coerced to a string and appended to `result`. | ||
* If the `urlObject.pathname` property is a string that is not an empty string: | ||
* If the `urlObject.pathname` *does not start* with an ASCII forward slash | ||
(`/`), then the literal string '/' is appended to `result`. | ||
* The value of `urlObject.pathname` is appended to `result`. | ||
* Otherwise, if `urlObject.pathname` is not `undefined` and is not a string, an | ||
[`Error`][] is thrown. | ||
* If the `urlObject.search` property is `undefined` and if the `urlObject.query` | ||
property is an `Object`, the literal string `?` is appended to `result` | ||
followed by the output of calling the [`querystring`][] module's `stringify()` | ||
method passing the value of `urlObject.query`. | ||
* Otherwise, if `urlObject.search` is a string: | ||
* If the value of `urlObject.search` *does not start* with the ASCII question | ||
mark (`?`) character, the literal string `?` is appended to `result`. | ||
* The value of `urlObject.search` is appended to `result`. | ||
* Otherwise, if `urlObject.search` is not `undefined` and is not a string, an | ||
[`Error`][] is thrown. | ||
* If the `urlObject.hash` property is a string: | ||
* If the value of `urlObject.hash` *does not start* with the ASCII hash (`#`) | ||
character, the literal string `#` is appended to `result`. | ||
* The value of `urlObject.hash` is appended to `result`. | ||
* Otherwise, if the `urlObject.hash` property is not `undefined` and is not a | ||
string, an [`Error`][] is thrown. | ||
* `result` is returned. | ||
|
||
|
||
## url.parse(urlString[, parseQueryString[, slashesDenoteHost]]) | ||
<!-- YAML | ||
added: v0.1.25 | ||
--> | ||
|
||
Take a URL string, and return an object. | ||
|
||
Pass `true` as the second argument to also parse the query string using the | ||
`querystring` module. If `true` then the `query` property will always be | ||
assigned an object, and the `search` property will always be a (possibly | ||
empty) string. If `false` then the `query` property will not be parsed or | ||
decoded. Defaults to `false`. | ||
* `urlString` {string} The URL string to parse. | ||
* `parseQueryString` {boolean} If `true`, the `query` property will always | ||
be set to an object returned by the [`querystring`][] module's `parse()` | ||
method. If `false`, the `query` property on the returned URL object will be an | ||
unparsed, undecoded string. Defaults to `false`. | ||
* `slashesDenoteHost` {boolean} If `true`, the first token after the literal | ||
string `//` and preceeding the next `/` will be interpreted as the `host`. | ||
For instance, given `//foo/bar`, the result would be | ||
`{host: 'foo', pathname: '/bar'}` rather than `{pathname: '//foo/bar'}`. | ||
Defaults to `false`. | ||
|
||
Pass `true` as the third argument to treat `//foo/bar` as | ||
`{ host: 'foo', pathname: '/bar' }` rather than | ||
`{ pathname: '//foo/bar' }`. Defaults to `false`. | ||
The `url.parse()` method takes a URL string, parses it, and returns a URL | ||
object. | ||
|
||
## url.resolve(from, to) | ||
<!-- YAML | ||
added: v0.1.25 | ||
--> | ||
|
||
Take a base URL, and a href URL, and resolve them as a browser would for | ||
an anchor tag. Examples: | ||
* `from` {string} The Base URL being resolved against. | ||
* `to` {string} The HREF URL being resolved. | ||
|
||
The `url.resolve()` method resolves a target URL relative to a base URL in a | ||
manner similar to that of a Web browser resolving an anchor tag HREF. | ||
|
||
For example: | ||
|
||
```js | ||
url.resolve('/one/two/three', 'four') // '/one/two/four' | ||
url.resolve('http://example.com/', '/one') // 'http://example.com/one' | ||
url.resolve('http://example.com/one', '/two') // 'http://example.com/two' | ||
``` | ||
|
||
## Escaped Characters | ||
|
||
URLs are only permitted to contain a certain range of characters. Spaces (`' '`) | ||
and the following characters will be automatically escaped in the | ||
properties of URL objects: | ||
|
||
``` | ||
< > " ` \r \n \t { } | \ ^ ' | ||
``` | ||
|
||
For example, the ASCII space character (`' '`) is encoded as `%20`. The ASCII | ||
forward slash (`/`) character is encoded as `%3C`. | ||
|
||
|
||
[`Error`]: errors.html#errors_class_error | ||
[`querystring`]: querystring.html |