Skip to content

Commit

Permalink
docs: Add parsing custom HTML to README.md (#326)
Browse files Browse the repository at this point in the history
  • Loading branch information
toufic-m authored and adampash committed Mar 25, 2019
1 parent b3e2a0f commit da9606a
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ If Mercury is unable to find a field, that field will return `null`.

#### `parse()` Options

##### Content Formats

By default, Mercury Parser returns the `content` field as HTML. However, you can override this behavior by passing in options to the `parse` function, specifying whether or not to scrape all pages of an article, and what type of output to return (valid values are `'html'`, `'markdown'`, and `'text'`). For example:

```javascript
Expand All @@ -78,6 +80,19 @@ This returns the the page's `content` as GitHub-flavored Markdown:
"content": "...**Thunder** is the [stage name](https://en.wikipedia.org/wiki/Stage_name) for the..."
```

##### Pre-fetched HTML

You can use Mercury Parser to parse custom or pre-fetched HTML by passing an HTML string to the `parse` function as follows:

```javascript
Mercury.parse(url, {
html:
'<html><body><article><h1>Thunder (mascot)</h1><p>Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos</p></article></body></html>',
}).then(result => console.log(result));
```

Note that the URL argument is still supplied, in order to identify the web site and use its custom parser, if it has any, though it will not be used for fetching content.

#### The command-line parser

Mercury Parser also ships with a CLI, meaning you can use the Mercury Parser
Expand Down

0 comments on commit da9606a

Please sign in to comment.