Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sanitizer config - multiple rules #11133

Merged
merged 8 commits into from
Apr 29, 2020

Conversation

cipherboy
Copy link
Contributor

In #9888, it was reported that my earlier pull request #9075 didn't quite function as expected. I was quite hopeful the ValuesWithShadow() worked as expected (and, I thought my testing showed it did) but I guess not. @zeripath proposed an alternative syntax which I like:

[markup.sanitizer.1]
ELEMENT=a
ALLOW_ATTR=target
REGEXP=something
[markup.sanitizer.2]
ELEMENT=a
ALLOW_ATTR=target
REGEXP=something

This was quite easy to adopt into the existing code. I've done so in a semi-backwards-compatible manner:

  • The value from .Value() is used for each element.
  • We parse [markup.sanitizer] and all [markup.sanitizer.*] sections and add them as rules.

This means that existing configs will load one rule (not all rules). It also means people can use string identifiers ([markup.sanitiser.KaTeX]) if they prefer, instead of numbered ones.

Signed-off-by: Alexander Scheel <[email protected]>
Resolves: 9888

Signed-off-by: Alexander Scheel <[email protected]>
@cipherboy cipherboy force-pushed the fix-sanitizer-config branch from b216247 to 4042adb Compare April 19, 2020 14:16
@lunny lunny added the type/enhancement An improvement of existing functionality label Apr 19, 2020
@lunny lunny added this to the 1.12.0 milestone Apr 19, 2020
@codecov-io
Copy link

Codecov Report

Merging #11133 into master will increase coverage by 0.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #11133   +/-   ##
=======================================
  Coverage   43.47%   43.47%           
=======================================
  Files         600      600           
  Lines       85108    85104    -4     
=======================================
  Hits        37002    37002           
+ Misses      43537    43534    -3     
+ Partials     4569     4568    -1     
Impacted Files Coverage Δ
modules/setting/markup.go 1.29% <0.00%> (+0.06%) ⬆️
modules/git/repo.go 49.79% <0.00%> (-1.26%) ⬇️
models/gpg_key.go 53.42% <0.00%> (-0.53%) ⬇️
services/pull/pull.go 35.30% <0.00%> (+1.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9588d2c...4042adb. Read the comment docs.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Apr 19, 2020
@6543
Copy link
Member

6543 commented Apr 19, 2020

first thought: since #9075 got into v1.11 -> we will break existing config based on that app.ini settings
so if we change the configuration we have to migrate old config if they exist 👀

@lafriks
Copy link
Member

lafriks commented Apr 19, 2020

Maybe it could be possible to also support old syntax for backward compatibility

@cipherboy
Copy link
Contributor Author

I think the point is that the old config doesn't work well -- it is best we migrate completely rather than support what doesn't work. In particular, this won't work:

[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = a
ALLOW_ATTR = rel
REGEXP = $2

Because ValuesWithShadows() returns an array of size 1 rather than of size 2. This means we can't align the values in a three rule set:

[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = a
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3

Our data is:

ELEMENT = ['a', 'img']
ALLOW_ATTR = ['target', 'rel', 'src']
REGEX = [$1, $2, $3]

So we don't know if a is duplicated or img is duplicated!


Rewriting the ini file (with ini) thus isn't possible either. Some other tool would be needed to do this.

There's then two working scenarios:

  1. All distinct element/allowed attrs/regexp tuples.
  2. Only one element/attr/regexp (what's currently documented -- KaTeX ruleset).

With this PR currently, we support number two just fine. That leaves number one -- since it borders closely on what doesn't work, I'm inclined to require manual migration. Thoughts?

@zeripath
Copy link
Contributor

Yeah the old system just doesn't work.

Copy link
Contributor

@zeripath zeripath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments as above otherwise approve

docs/content/doc/advanced/external-renderers.en-us.md Outdated Show resolved Hide resolved
docs/content/doc/advanced/config-cheat-sheet.en-us.md Outdated Show resolved Hide resolved
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Apr 20, 2020
@zeripath zeripath added the pr/breaking Merging this PR means builds will break. Needs a description what exactly breaks, and how to fix it! label Apr 20, 2020
Signed-off-by: Alexander Scheel <[email protected]>
@cipherboy
Copy link
Contributor Author

Thanks @zeripath for the docs review. I wrote the docs first (and originally was thinking numbered sections) but then I looked at the code and figured numbering was a needless restriction.

I've added an example of what was broken with the old config format to the cheat sheet as well.

; The following keys can be used multiple times to define sanitation policy rules.
[markup.sanitizer.1]
; The following keys can appear once to define a sanitation policy rule.
; This section can appear with an incremenented number to define multiple rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; This section can appear with an incremenented number to define multiple rules.
; This section can appear with an incrementing numbers or any distinct alphanumeric string to define multiple rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "this section can appear with an incrementing numbers to define multiple rules" is correct English either.

I've made it just "This section can appear again with a unique alphanumeric string to define multiple rules".

custom/conf/app.ini.sample Outdated Show resolved Hide resolved
Comment on lines 663 to 695
This was changed because the implementation with the ini parser used was flawed; the following configs were indistinguishable after parsing:

```ini
[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = a
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3
```

and

```ini
[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = img
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3
```

Because of limitations in the ini library, we are unable to automatically migrate configurations.

We will still parse the first rule from a `[markup.sanitizer]` section if present, but multiple rules must be manually migrated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This was changed because the implementation with the ini parser used was flawed; the following configs were indistinguishable after parsing:
```ini
[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = a
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3
```
and
```ini
[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = img
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3
```
Because of limitations in the ini library, we are unable to automatically migrate configurations.
We will still parse the first rule from a `[markup.sanitizer]` section if present, but multiple rules must be manually migrated.

I'd rather remove this section. If you feel like an explanation is needed, please reference the version where it was introduced and link the relevant issue or this PR (e.g. "this was changed from the original implementation in 1.11 due to severe limitations; see #1234"). But we tend to avoid making explicit version references in the docs, so maybe just remove it and let the Blog speak. 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a blog entry is fine with @zeripath I could write such, but I must confess I've never looked at the blog entries and mostly read documentation. :-) On the breaking section of release notes, there's a link to the PR, but not to the blog entry -- do none of those have blog entries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... So this bit could be dropped and instead placed in the PR description - during release we can then precis this and put it into the blog post.

Comment on lines 95 to 96

**Note**: The above section numbering policy is new; previously the section was `[markup.sanitizer]` and keys could be redefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note**: The above section numbering policy is new; previously the section was `[markup.sanitizer]` and keys could be redefined.

Same as above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had specifically requested some information as this is a breaking change requiring change to config (although the previous config simply does not work.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid duplicating too much info, I've added a reference to the cheat sheet here but explicitly left this line. Its either that or add a reference to this pull request (in both places) if we don't want it in the documentation -- but to someone wanting to migrate, it isn't clear what the problem is just by looking at our extended discussion here. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
**Note**: The above section numbering policy is new; previously the section was `[markup.sanitizer]` and keys could be redefined.
**Note**: Prior to Gitea 1.12 there was a single `markup.sanitiser` section and keys could be redefined, however, there were significant problems with this method of configuration necessitating this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had specifically requested some information as this is a breaking change requiring change to config (although the previous config simply does not work.)

Sorry, @zeripath. I missed your comment, which was already resolved.

modules/setting/markup.go Show resolved Hide resolved
@@ -965,8 +965,8 @@ SHOW_FOOTER_TEMPLATE_LOAD_TIME = true

[markup.sanitizer.1]
; The following keys can appear once to define a sanitation policy rule.
; This section can appear with an incremenented number to define multiple rules.
; e.g., [markup.sanitizer.1] -> [markup.sanitizer.2]
; This section can appear again with a unique alphanmuric string to define multiple rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; This section can appear again with a unique alphanmuric string to define multiple rules.
; This section can appear multiple times by adding a unique alphanumeric suffix to define multiple rules.

Comment on lines 95 to 96

**Note**: The above section numbering policy is new; previously the section was `[markup.sanitizer]` and keys could be redefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had specifically requested some information as this is a breaking change requiring change to config (although the previous config simply does not work.)

Sorry, @zeripath. I missed your comment, which was already resolved.

Copy link
Member

@guillep2k guillep2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one typo. Otherwise LG-TM.

custom/conf/app.ini.sample Outdated Show resolved Hide resolved
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Apr 29, 2020
@zeripath
Copy link
Contributor

make lg-tm work

@zeripath zeripath merged commit 1bf9e44 into go-gitea:master Apr 29, 2020
ydelafollye pushed a commit to ydelafollye/gitea that referenced this pull request Jul 31, 2020
In go-gitea#9888, it was reported that my earlier pull request go-gitea#9075 didn't quite function as expected. I was quite hopeful the `ValuesWithShadow()` worked as expected (and, I thought my testing showed it did) but I guess not. @zeripath proposed an alternative syntax which I like:

```ini
[markup.sanitizer.1]
ELEMENT=a
ALLOW_ATTR=target
REGEXP=something
[markup.sanitizer.2]
ELEMENT=a
ALLOW_ATTR=target
REGEXP=something
```

This was quite easy to adopt into the existing code. I've done so in a semi-backwards-compatible manner:

 - The value from `.Value()` is used for each element.
 - We parse `[markup.sanitizer]` and all `[markup.sanitizer.*]` sections and add them as rules.

This means that existing configs will load one rule (not all rules). It also means people can use string identifiers (`[markup.sanitiser.KaTeX]`) if they prefer, instead of numbered ones.

Co-authored-by: Andrew Thornton <[email protected]>
Co-authored-by: guillep2k <[email protected]>
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. pr/breaking Merging this PR means builds will break. Needs a description what exactly breaks, and how to fix it! type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants