Skip to content

Commit

Permalink
Merge commit from fork
Browse files Browse the repository at this point in the history
  • Loading branch information
colinodell authored Dec 7, 2024
2 parents cb026a5 + d777db8 commit 2f1e520
Show file tree
Hide file tree
Showing 42 changed files with 1,655 additions and 131 deletions.
18 changes: 18 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,24 @@ jobs:

- run: vendor/bin/psalm --no-progress --stats --threads=$(nproc) --output-format=github --shepherd

pathological:
name: Pathological Tests
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- uses: shivammathur/setup-php@v2
with:
php-version: 8.1
extensions: curl, mbstring, yaml
coverage: none
tools: composer:v2

- run: composer update --no-progress

- run: php tests/pathological/test.php

docs-lint:
permissions:
contents: read # for actions/checkout to fetch code
Expand Down
2 changes: 2 additions & 0 deletions .phpstorm.meta.php
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
'html_input',
'allow_unsafe_links',
'max_nesting_level',
'max_delimiters_per_line',
'renderer',
'renderer/block_separator',
'renderer/inner_separator',
Expand Down Expand Up @@ -89,6 +90,7 @@
'table/alignment_attributes/left',
'table/alignment_attributes/center',
'table/alignment_attributes/right',
'table/max_autocompleted_cells',
'table_of_contents',
'table_of_contents/html_class',
'table_of_contents/max_heading_level',
Expand Down
44 changes: 44 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,31 @@ Updates should follow the [Keep a CHANGELOG](https://keepachangelog.com/) princi

## [Unreleased][unreleased]

This is a **security release** to address potential denial of service attacks when parsing specially crafted,
malicious input from untrusted sources (like user input).

### Added

- Added `max_delimiters_per_line` config option to prevent denial of service attacks when parsing malicious input
- Added `table/max_autocompleted_cells` config option to prevent denial of service attacks when parsing large tables
- The `AttributesExtension` now supports attributes without values (#985, #986)
- The `AutolinkExtension` exposes two new configuration options to override the default behavior (#969, #987):
- `autolink/allowed_protocols` - an array of protocols to allow autolinking for
- `autolink/default_protocol` - the default protocol to use when none is specified
- Added `RegexHelper::isWhitespace()` method to check if a given character is an ASCII whitespace character
- Added `CacheableDelimiterProcessorInterface` to ensure linear complexity for dynamic delimiter processing
- Added `Bracket` delimiter type to optimize bracket parsing

### Changed

- `[` and `]` are no longer added as `Delimiter` objects on the stack; a new `Bracket` type with its own stack is used instead
- `UrlAutolinkParser` no longer parses URLs with more than 127 subdomains
- Expanded reference links can no longer exceed 100kb, or the size of the input document (whichever is greater)
- Delimiters should always provide a non-null value via `DelimiterInterface::getIndex()`
- We'll attempt to infer the index based on surrounding delimiters where possible
- The `DelimiterStack` now accepts integer positions for any `$stackBottom` argument
- Several small performance optimizations

## [2.5.3] - 2024-08-16

### Changed
Expand Down Expand Up @@ -77,6 +102,25 @@ Updates should follow the [Keep a CHANGELOG](https://keepachangelog.com/) princi
- Fixed declaration parser being too strict
- `FencedCodeRenderer`: don't add `language-` to class if already prefixed

### Deprecated

- Returning dynamic values from `DelimiterProcessorInterface::getDelimiterUse()` is deprecated
- You should instead implement `CacheableDelimiterProcessorInterface` to help the engine perform caching to avoid performance issues.
- Failing to set a delimiter's index (or returning `null` from `DelimiterInterface::getIndex()`) is deprecated and will not be supported in 3.0
- Deprecated `DelimiterInterface::isActive()` and `DelimiterInterface::setActive()`, as these are no longer used by the engine
- Deprecated `DelimiterStack::removeEarlierMatches()` and `DelimiterStack::searchByCharacter()`, as these are no longer used by the engine
- Passing a `DelimiterInterface` as the `$stackBottom` argument to `DelimiterStack::processDelimiters()` or `::removeAll()` is deprecated and will not be supported in 3.0; pass the integer position instead.

### Fixed

- Fixed NUL characters not being replaced in the input
- Fixed quadratic complexity parsing unclosed inline links
- Fixed quadratic complexity parsing emphasis and strikethrough delimiters
- Fixed issue where having 500,000+ delimiters could trigger a [known segmentation fault issue in PHP's garbage collection](https://bugs.php.net/bug.php?id=68606)
- Fixed quadratic complexity deactivating link openers
- Fixed quadratic complexity parsing long backtick code spans with no matching closers
- Fixed catastrophic backtracking when parsing link labels/titles

## [2.4.1] - 2023-08-30

### Fixed
Expand Down
9 changes: 6 additions & 3 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@
"phpstan/phpstan": "^1.8.2",
"phpunit/phpunit": "^9.5.21 || ^10.5.9 || ^11.0.0",
"scrutinizer/ocular": "^1.8.1",
"symfony/finder": "^5.3 | ^6.0 || ^7.0",
"symfony/yaml": "^2.3 | ^3.0 | ^4.0 | ^5.0 | ^6.0 || ^7.0",
"symfony/finder": "^5.3 | ^6.0 | ^7.0",
"symfony/process": "^5.4 | ^6.0 | ^7.0",
"symfony/yaml": "^2.3 | ^3.0 | ^4.0 | ^5.0 | ^6.0 | ^7.0",
"unleashedtech/php-coding-standard": "^3.1.1",
"vimeo/psalm": "^4.24.0 || ^5.0.0"
},
Expand Down Expand Up @@ -103,11 +104,13 @@
"phpstan": "phpstan analyse",
"phpunit": "phpunit --no-coverage",
"psalm": "psalm --stats",
"pathological": "tests/pathological/test.php",
"test": [
"@phpcs",
"@phpstan",
"@psalm",
"@phpunit"
"@phpunit",
"@pathological"
]
},
"extra": {
Expand Down
2 changes: 2 additions & 0 deletions docs/2.5/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ $config = [
'html_input' => 'escape',
'allow_unsafe_links' => false,
'max_nesting_level' => PHP_INT_MAX,
'max_delimiters_per_line' => PHP_INT_MAX,
'slug_normalizer' => [
'max_length' => 255,
],
Expand Down Expand Up @@ -73,6 +74,7 @@ Here's a list of the core configuration options available:
- `escape` - Escape all HTML
- `allow_unsafe_links` - Remove risky link and image URLs by setting this to `false` (default: `true`)
- `max_nesting_level` - The maximum nesting level for blocks (default: `PHP_INT_MAX`). Setting this to a positive integer can help protect against long parse times and/or segfaults if blocks are too deeply-nested.
- `max_delimiters_per_line` - The maximum number of delimiters (e.g. `*` or `_`) allowed in a single line (default: `PHP_INT_MAX`). Setting this to a positive integer can help protect against long parse times and/or segfaults if lines are too long.
- `slug_normalizer` - Array of options for configuring how URL-safe slugs are created; see [the slug normalizer docs](/2.5/customization/slug-normalizer/#configuration) for more details
- `instance` - An alternative normalizer to use (defaults to the included `SlugNormalizer`)
- `max_length` - Limits the size of generated slugs (defaults to 255 characters)
Expand Down
2 changes: 2 additions & 0 deletions docs/2.5/customization/delimiter-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ public function getDelimiterUse(DelimiterInterface $opener, DelimiterInterface $

This method is used to tell the engine how many characters from the matching delimiters should be consumed. For simple processors you'll likely return `1` (or whatever your minimum length is). In more advanced cases, you can examine the opening and closing delimiters and perform additional logic to determine whether they should be fully or partially consumed. You can also return `0` if you'd like.

**Note:** Unless you're returning a hard-coded value, you should probably implement `CacheableDelimiterProcessorInterface` instead of `DelimiterProcessorInterface` - this will allow the engine to perform additional caching for better performance.

### `process()`

```php
Expand Down
9 changes: 9 additions & 0 deletions docs/2.5/extensions/tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ $config = [
'center' => ['align' => 'center'],
'right' => ['align' => 'right'],
],
'max_autocompleted_cells' => 10_000,
],
];

Expand Down Expand Up @@ -159,6 +160,14 @@ $config = [

Or any other HTML attributes you'd like!

### Limiting Auto-Completed Cells

The GFM specification says that the:

> table’s rows may vary in the number of cells. If there are a number of cells fewer than the number of cells in the header row, empty cells are inserted.
This feature could be abused to create very large tables. To prevent this, you can configure the `max_autocompleted_cells` option to limit the number of empty cells that will be autocompleted. If the limit is reached, further parsing of the table will be aborted.

## Credits

The Table functionality was originally built by [Martin Hasoň](https://github.com/hason) and [Webuni s.r.o.](https://www.webuni.cz) before it was merged into the core parser.
22 changes: 21 additions & 1 deletion docs/2.5/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ In order to be fully compliant with the CommonMark spec, certain security settin

- `html_input`: How to handle raw HTML
- `allow_unsafe_links`: Whether unsafe links are permitted
- `max_nesting_level`: Protected against long render times or segfaults
- `max_nesting_level`: Protect against long render times or segfaults
- `max_delimiters_per_line`: Protect against long parse times or rendering segfaults

Further information about each option can be found below.

Expand Down Expand Up @@ -88,6 +89,25 @@ echo $converter->convert($markdown);

See the [configuration](/2.5/configuration/) section for more information.

## Max Delimiters Per Line

Similarly to the maximum nesting level, **no maximum number of delimiters per line is enforced by default.** Delimiters can be nested (like `*a **b** c*`) or un-nested (like `*a* *b* *c*`) - in either case, having too many in a single line can result in long parse times. We therefore have a separate option to limit the number of delimiters per line.

If you need to parse untrusted input, consider setting a reasonable `max_delimiters_per_line` (perhaps 100-1000) depending on your needs. Once this level is hit, any subsequent delimiters on that line will be rendered as plain text.

### Example - Prevent too many delimiters

```php
use League\CommonMark\CommonMarkConverter;

$markdown = '*a* **b *c **d** c* b**'; // 8 delimiters (* and **)

$converter = new CommonMarkConverter(['max_delimiters_per_line' => 6]);
echo $converter->convert($markdown);

// <p><em>a</em> **b *c <strong>d</strong> c* b**</p>
```

## Additional Filtering

Although this library does offer these security features out-of-the-box, some users may opt to also run the HTML output through additional filtering layers (like HTMLPurifier). If you do this, make sure you **thoroughly** test your additional post-processing steps and configure them to work properly with the types of HTML elements and attributes that converted Markdown might produce, otherwise, you may end up with weird behavior like missing images, broken links, mismatched HTML tags, etc.
24 changes: 24 additions & 0 deletions docs/2.6/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,27 @@ redirect_from: /upgrading/
---

# Upgrading from 2.5 to 2.6

## `max_delimiters_per_line` Configuration Option

The `max_delimiters_per_line` configuration option was added in 2.6 to help protect against malicious input that could
cause excessive memory usage or denial of service attacks. It defaults to `PHP_INT_MAX` (no limit) for backwards
compatibility, which is safe when parsing trusted input. However, if you're parsing untrusted input from users, you
should probably set this to a reasonable value (somewhere between `100` and `1000`) to protect against malicious inputs.

## Custom Delimiter Processors

If you're implementing a custom delimiter processor, and `getDelimiterUse()` has more logic than just a
simple `return` statement, you should implement `CacheableDelimiterProcessorInterface` instead of
`DelimiterProcessorInterface` to improve performance and avoid possible quadratic performance issues.

`DelimiterProcessorInterface` has a `getDelimiterUse()` method that tells the engine how many characters from the
matching delimiters should be consumed. Simple processors usually always return a hard-coded integer like `1` or `2`.
However, some more advanced processors may need to examine the opening and closing delimiters and perform additional
logic to determine whether they should be fully or partially consumed. Previously, these results could not be safely
cached, resulting in possible quadratic performance issues.

In 2.6, the `CacheableDelimiterProcessorInterface` was introduced to allow these "dynamic" processors to be safely
cached. It requires a new `getCacheKey()` method that returns a string that uniquely identifies the combination of
factors considered when determining the delimiter use. This key is then used to cache the results of the search for
a matching delimiter.
2 changes: 2 additions & 0 deletions phpstan.neon.dist
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ parameters:
message: '#Parameter .+ of class .+Reference constructor expects string, string\|null given#'
- path: src/Util/RegexHelper.php
message: '#Method .+RegexHelper::unescape\(\) should return string but returns string\|null#'
- path: src/Delimiter/DelimiterStack.php
message: '#unknown class WeakMap#'
exceptions:
uncheckedExceptionClasses:
# Exceptions caused by bad developer logic that should always bubble up:
Expand Down
83 changes: 83 additions & 0 deletions src/Delimiter/Bracket.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
<?php

declare(strict_types=1);

/*
* This file is part of the league/commonmark package.
*
* (c) Colin O'Dell <[email protected]>
*
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*/

namespace League\CommonMark\Delimiter;

use League\CommonMark\Node\Node;

final class Bracket
{
private Node $node;
private ?Bracket $previous;
private bool $hasNext = false;
private int $position;
private bool $image;
private bool $active = true;

public function __construct(Node $node, ?Bracket $previous, int $position, bool $image)
{
$this->node = $node;
$this->previous = $previous;
$this->position = $position;
$this->image = $image;
}

public function getNode(): Node
{
return $this->node;
}

public function getPrevious(): ?Bracket
{
return $this->previous;
}

public function hasNext(): bool
{
return $this->hasNext;
}

public function getPosition(): int
{
return $this->position;
}

public function isImage(): bool
{
return $this->image;
}

/**
* Only valid in the context of non-images (links)
*/
public function isActive(): bool
{
return $this->active;
}

/**
* @internal
*/
public function setHasNext(bool $hasNext): void
{
$this->hasNext = $hasNext;
}

/**
* @internal
*/
public function setActive(bool $active): void
{
$this->active = $active;
}
}
6 changes: 6 additions & 0 deletions src/Delimiter/DelimiterInterface.php
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,14 @@ public function canClose(): bool;

public function canOpen(): bool;

/**
* @deprecated This method is no longer used internally and will be removed in 3.0
*/
public function isActive(): bool;

/**
* @deprecated This method is no longer used internally and will be removed in 3.0
*/
public function setActive(bool $active): void;

public function getChar(): string;
Expand Down
12 changes: 8 additions & 4 deletions src/Delimiter/DelimiterParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,20 @@ public function parse(InlineParserContext $inlineContext): bool

[$canOpen, $canClose] = self::determineCanOpenOrClose($charBefore, $charAfter, $character, $processor);

if (! ($canOpen || $canClose)) {
$inlineContext->getContainer()->appendChild(new Text(\str_repeat($character, $numDelims)));

return true;
}

$node = new Text(\str_repeat($character, $numDelims), [
'delim' => true,
]);
$inlineContext->getContainer()->appendChild($node);

// Add entry to stack to this opener
if ($canOpen || $canClose) {
$delimiter = new Delimiter($character, $numDelims, $node, $canOpen, $canClose);
$inlineContext->getDelimiterStack()->push($delimiter);
}
$delimiter = new Delimiter($character, $numDelims, $node, $canOpen, $canClose, $inlineContext->getCursor()->getPosition());
$inlineContext->getDelimiterStack()->push($delimiter);

return true;
}
Expand Down
Loading

0 comments on commit 2f1e520

Please sign in to comment.