Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Apostrophes are considered as word separators in "whole-word" filters, causing false-positives #875

Closed
nekohayo opened this issue Mar 26, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@nekohayo
Copy link

Describe the bug

In French, we have a lot of compound words that use an apostrophe (like in English, it's == it is).

The problem is, those seem to be considered as word delimiters by Tuba's filters.

Steps To Reproduce

Set up some filters like these:

image

Notice the filter on the whole-word string AI

Then write some toots that contain that string as part of a compound word with an apostrophe (typographical or not), such as `Ah, cette douleur que j'ai, que j'ai !" or "Je n'ai rien fait de mal".

Result: those toots get filtered.

I suppose there might be other cases like this (typographic apostrophes? other punctuation marks? special characters?), but I haven't tested them...

Logs and/or Screenshots

No response

Instance Backend

Mastodon

Operating System

Fedora 39

Package

Flatpak

Troubleshooting information

No response

Additional Context

No response

@nekohayo nekohayo added the bug Something isn't working label Mar 26, 2024
@GeopJr
Copy link
Owner

GeopJr commented Mar 26, 2024

Okay so, I followed your instructions, added AI to whole word filters and sent myself a toot from another account with (API response):
content: '<p>test <span class="h-card" translate="no"><a href="https://mastodon.social/@agenthsudabbuwiadn" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>agenthsudabbuwiadn</span></a></span> Ah, cette douleur que j'ai, que j'ai !</p>'

j'ai is in there

It got filtered with (API response):

{
	"filtered": [
		{
			"filter": {
				"id": "60558",
				"title": "a",
				"context": [
					"home",
					"notifications",
					"public",
					"thread",
					"account"
				],
				"expires_at": null,
				"filter_action": "warn"
			},
			"keyword_matches": [
				"ai"
			],
			"status_matches": null
		}
	]
}

So Tuba does it right, but the problem is on Mastodon's filter matching, I'll see on their issue tracker if it has already been reported

@GeopJr
Copy link
Owner

GeopJr commented Mar 26, 2024

This looks similar:
mastodon/mastodon#8405 (whole word filter applies to urls, between / and .)

I'll close this as there's nothing I can do, Mastodon tells Tuba it should be filtered and it filters it

:/

@GeopJr GeopJr closed this as not planned Won't fix, can't repro, duplicate, stale Mar 26, 2024
@GeopJr
Copy link
Owner

GeopJr commented Mar 26, 2024

I kind-of get why this is happening and dk how they are going to solve it, but it's up to them.

If whole word ignored ', ", ., ,,... then everything would bypass it.

For example:
The CEO said: 'we replaced everyone with AI'
This was made with AI.
would both bypass the filter if it wasn't handled the way it is currently handled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants