-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add safe search filter for NSFW rooms #208
Conversation
// Browse the room directory searching the room directory for those NSFW rooms | ||
// (narrowing down results). | ||
archiveUrl = matrixPublicArchiveURLCreator.roomDirectoryUrl({ | ||
searchTerm: roomPlanetPrefix, | ||
}); | ||
const { data: roomDirectoryWithSearchPageHtml } = await fetchEndpointAsText(archiveUrl); | ||
const domWithSearch = parseHTML(roomDirectoryWithSearchPageHtml); | ||
|
||
const roomsCardsOnPageWithSearch = [ | ||
...domWithSearch.document.querySelectorAll(`[data-testid="room-card"]`), | ||
]; | ||
|
||
// Assert that the rooms we searched for are on the page | ||
const roomsIdsOnPageWithSearch = roomsCardsOnPageWithSearch.map((roomCardEl) => { | ||
return roomCardEl.getAttribute('data-room-id'); | ||
}); | ||
assert.deepStrictEqual(roomsIdsOnPageWithSearch.sort(), [roomUranusId, roomMarsId].sort()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test flakey (it's failed 3/4 times I've retried in CI)? Example failure: https://github.com/matrix-org/matrix-public-archive/actions/runs/4868717759/jobs/8682443906#step:7:203
Synapse doesn't cache responses for searches on /publicRooms
so it shouldn't be a stale results problem with other tests cases interfering. I even see the Bypassing cache as search request.
log line on the server when running the test:
$ docker logs -f --tail 10 matrix_public_archive_test_hs1_1
...
2023-05-03 06:51:17,515 - synapse.handlers.room_list - 90 - INFO - POST-25759 - Getting public room list: limit=9, since=None, search=True, network=None
2023-05-03 06:51:17,515 - synapse.handlers.room_list - 101 - INFO - POST-25759 - Bypassing cache as search request.
2023-05-03 06:51:17,546 - synapse.access.http.8008 - 460 - INFO - POST-25759 - ::ffff:172.22.0.1 - 8008 - {@archiver:hs1} Processed request: 0.031sec/0.001sec (0.020sec, 0.001sec) (0.008sec/0.022sec/2) 554B 200 "POST /_matrix/client/v3/publicRooms HTTP/1.1" "undici" [0 dbevts]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can actually reproduce consistently locally after re-pulling the latest
Synapse image (minimal test run reproduction: npm run test -- --grep "Room directory"
)
Just did some bisecting of Synapse versions to figure out where the Synapse change was introduced:
v1.81.0
: Test fails consistently ❌v1.80.0
: Test fails consistently ❌v1.80.0rc1
: Test fails consistently ❌v1.79.0
: Working ✅v1.68.0
: Working ✅
I don't know which Synapse commit/PR specifically yet: https://github.com/matrix-org/synapse/blob/develop/CHANGES.md#synapse-1800rc1-2023-03-21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a bisect on Synapse commits and found the culprit -> matrix-org/synapse#15229. Not totally obvious what's going wrong/different to cause this but probably something with visibility
.
Bisect (relevant page of commits):
4953cd71dfbb1925dc4a477efd2ed48c2dfd70d6
(2023-03-16): Test fails consistently ❌ (locally builtmatrixdotorg/synapse:4953cd71
)
a1c986939444db9bbf2343224091c88d18b3f598
(2023-03-13): Test fails consistently ❌ (locally builtmatrixdotorg/synapse:a1c98693
)
be4ea209e8bf92fb3660807c1fe8ad3d7d05621f
(2023-03-08): Test fails consistently ❌ (locally builtmatrixdotorg/synapse:be4ea209
)
88efc75bab2849b7b1cee52770dea3cf9925b2e8
(2023-03-08): Working ✅ (locally builtmatrixdotorg/synapse:88efc75b
)
8314646cd3563d9aaaf8028c9e58989b4ed980ba
(2023-03-07): Working ✅ (locally builtmatrixdotorg/synapse:8314646c
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem to be a visibility
issue as the rooms still end up as preset_config=public_chat
and visibility=public
.
Seems like a race condition because it passes if I add a delay after the room creation. And I would think this is a read-after-write consistency issue since we're not even using workers. Created matrix-org/synapse#15526 to track this ⏩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pinned the Synapse version to v1.79.0
before the regression for now ⏩
If we need to test with the latest Synapse, we will need to introduce some wait/retry complexity like this:
// Try to avoid flakey tests where the homeserver hasn't added the rooms
// to the room directory yet
await waitForRoomIdsInHomeserverRoomDirectory([roomUranusId, roomMarsId]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitForResultsInHomeserverRoomDirectory(...)
was added in #276
await waitForResultsInHomeserverRoomDirectory({
client,
searchTerm: 'room-foo',
});
It's not possible to search directly for a room_id
(at least in Synapse) so I've opted to searching for the room name.
Relevant Synapse issue: matrix-org/synapse#15526 Full context: #208 (comment)
Add safe search filter for NSFW rooms
Alternative community PR (unfinished): #155
Fix #89
Todo
Should we add safe search to the query parameters:?safe-search=false