Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard-code check for Multilingual Wikisource to avoid errors #470

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions src/Wikidata.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ public function getWikisourceLangs( string $lang ): array {
$cacheItem->expiresAfter( new DateInterval( 'P1M' ) );
$this->logger->debug( "Requesting list of Wikisources from Wikidata" );
$query =
"SELECT ?label ?website WHERE { "
"SELECT ?item ?label ?website WHERE { "
// Instance of Wikisource language edition but not of closed wiki.
. "?item wdt:P31 wd:Q15156455 . "
. "MINUS { ?item wdt:P31 wd:Q47495990 . } "
Expand All @@ -54,12 +54,14 @@ public function getWikisourceLangs( string $lang ): array {
$data = $this->fetch( $query );
$out = [];
foreach ( $data as $datum ) {
preg_match( '|https://([a-z-_]*)\.?wikisource\.org|', $datum['website'], $matches );
$subdomain = $matches[1];
if ( empty( $subdomain ) ) {
$subdomain = 'mul';
// Hard-code Multilingual Wikisource, to avoid issues with incubator Wikisources
// being given the same domain name as P856 (official website). T342520.
if ( str_ends_with( $datum['item'], 'Q18198097' ) ) {
$out['mul'] = $datum['label'];
continue;
}
$out[$subdomain] = $datum['label'];
preg_match( '|https://([a-z-_]*)\.?wikisource\.org|', $datum['website'], $matches );
$out[$matches[1]] = $datum['label'];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$matches[1] seems a bit fragile to me if Wikidata data are invalid for some reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I did initially try it with Wikimedia language code (P424), but that doesn't get the correct value for some, like syl. Although maybe https://www.wikidata.org/wiki/Q120844812 should not be called a Wikisource language edition yet, while it's in incubator.

If someone puts an invalid value for Official website then you're right, this would break, but then whatever we pull from Wikidata could get broken at some point so I'm not sure it's too different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have made this disucssion slip. Indeed, that's a good point. I am fine with merging this PR.

}
return $out;
} );
Expand Down
12 changes: 6 additions & 6 deletions tests/WikidataTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ protected function setUp(): void {
public function testLanguageList() {
// Get the list in English and most are in their own language.
$langs = $this->wikidata->getWikisourceLangs( 'en' );
$this->assertSame( $langs['sv'], 'svenskspråkiga Wikisource' );
$this->assertSame( $langs['en'], 'English Wikisource' );
$this->assertSame( $langs['mul'], 'Multilingual Wikisource' );
$this->assertSame( 'svenskspråkiga Wikisource', $langs['sv'] );
$this->assertSame( 'English Wikisource', $langs['en'] );
$this->assertSame( 'Multilingual Wikisource', $langs['mul'] );
// Get the list in a different language and the only one changed should be mul.
$langs2 = $this->wikidata->getWikisourceLangs( 'fr' );
$this->assertSame( $langs2['en'], 'English Wikisource' );
$this->assertSame( $langs2['sv'], 'svenskspråkiga Wikisource' );
$this->assertSame( $langs2['mul'], 'Wikisource multilingue' );
$this->assertSame( 'English Wikisource', $langs2['en'] );
$this->assertSame( 'svenskspråkiga Wikisource', $langs2['sv'] );
$this->assertSame( 'Wikisource multilingue', $langs2['mul'] );
// Note that this test doesn't test the fallback to the interface language for missing local labels,
// because this will hopefully be fixed on Wikidata and so wouldn't be repeatable.
}
Expand Down