diff --git a/README.md b/README.md index dcc0edf..de2dfe9 100644 --- a/README.md +++ b/README.md @@ -1 +1,94 @@ # TYPO3 HTML Sanitizer + +> :warning: This package is still experimental, common safe HTML tags & attributes +> as given in [`\TYPO3\HtmlSanitizer\Builder\CommonBuilder`](src/Builder/CommonBuilder.php) +> still might be adjusted. + +## In a Nutshell + ++ [`\TYPO3\HtmlSanitizer\Behavior`](src/Behavior.php) contains declarative settings for + a particular process for sanitizing HTML. ++ [`\TYPO3\HtmlSanitizer\Visitor\VisitorInterface`](src/Visitor/VisitorInterface.php) + (multiple different visitors can exist at the same time) are actually doing the work + based on the declared `Behavior`. Visitors can modify nodes or mark them for deletion. ++ [`\TYPO3\HtmlSanitizer\Sanitizer`](src/Sanitizer.php) can be considered as the working + instance, invoking visitors, parsing and serializing HTML. In general this instance does + not contain much logic on how to handle particular nodes, attributes or values ++ [`\TYPO3\HtmlSanitizer\Builder\BuilderInterface`](src/Builder/BuilderInterface.php) can + be used to create multiple different builder instances - in terms of "presets" - which + combine declaring a particular `Behavior`, initialization of `VisitorInterface` instances, + and finally returning a ready-to-use `Sanitizer` instance + +## Example & API + +```php +addValues(new Behavior\RegExpAttrValue('#^https?://#')); + +// attention: only `Behavior` implementation uses immutability +// (invoking `withFlags()` or `withTags()` returns new instance) +$behavior = (new Behavior()) + ->withFlags(Behavior::ENCODE_INVALID_TAG) + ->withTags( + (new Behavior\Tag('div', Behavior\Tag::ALLOW_CHILDREN)) + ->addAttrs(...$commonAttrs), + (new Behavior\Tag('a', Behavior\Tag::ALLOW_CHILDREN)) + ->addAttrs($hrefAttr, ...$commonAttrs), + (new Behavior\Tag('br')), + ); + +$visitors = [new CommonVisitor($behavior)]; +$sanitizer = new Sanitizer(...$visitors); + +$html = <<< EOH +
+EOH; + +echo $sanitizer->sanitize($html); +``` + +will result in the following sanitized output + +```html +