Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality check: Title should not contain URL #12354

Open
koppor opened this issue Sep 2, 2022 · 7 comments · May be fixed by #12431
Open

Quality check: Title should not contain URL #12354

koppor opened this issue Sep 2, 2022 · 7 comments · May be fixed by #12431
Assignees
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty.

Comments

@koppor
Copy link
Member

koppor commented Sep 2, 2022

Example:

booktitle = {in Symposium on Automotive/Avionics Systems Engineering SAASE, [http://www.jacobsschool.ucsd.edu/GordonCenter/g\_leadership/l\_summer/docs/saase/papers/MeedeniyaAleti](http://www.jacobsschool.ucsd.edu/GordonCenter/g/_leadership/l/_summer/docs/saase/papers/MeedeniyaAleti) Buhnova.pdf},

This is not a valid booktitle; the URL should not be contained. A warning should be displayed.

@lzmmxh
Copy link
Contributor

lzmmxh commented Oct 9, 2024

Hi, I’m interested in taking on this issue. Could you assign it to me? @koppor

@koppor
Copy link
Member Author

koppor commented Oct 13, 2024

@lzmmxh Done. You find the user-facing documentation at https://docs.jabref.org/finding-sorting-and-cleaning-entries/checkintegrity. With Ctrl+Shift+F you will find the code.

@koppor koppor transferred this issue from JabRef/jabref-koppor Jan 5, 2025
@koppor koppor added the good first issue An issue intended for project-newcomers. Varies in difficulty. label Jan 5, 2025
@github-project-automation github-project-automation bot moved this to Free to take in Good First Issues Jan 5, 2025
@11raphael
Copy link

Hello, I am working with @LinusDietz as a member of a KCL student team and we are interested in this issue. Could we be assigned this?

@LinusDietz
Copy link
Member

LinusDietz commented Jan 24, 2025

I also think this is a good feature to work on. I would suggest you give a short description (here in the issue) on how you want to minimize false positives and consider which fields (besides the title/booktitle) this integrity check should refer to. For example, the author field? This means we (and the @JabRef/developers) can refine the requirements for this issue a bit more before you start writing much code.

For example, these real papers should ideally not be flagged:

@calixtus
Copy link
Member

Since there was no activity by the formerly assigned contributor i'm reassign this issue to @11raphael. As @LinusDietz is monitoring and supporting your progress im looking forward to your pull request. Please open a draft pr early so we can follow your changes too.

@calixtus calixtus assigned 11raphael and unassigned lzmmxh Jan 25, 2025
@calixtus calixtus moved this from Free to take to Assigned in Good First Issues Jan 25, 2025
@RapidShotzz
Copy link

Hello @LinusDietz @11raphael , we were thinking of flagging full URLs that are not supposed to appear in the title/booktitle field e.g. Exploring the Impact of Social Media on Education: https://www.example.com/education-impact. Additionally, we also aim to flag URLs that are embedded in the middle of title text, aswell as URLs that are not related to a research topic.

To avoid false positives and incorrect flagging, we want to accept titles that mention website names as part of the topic e.g. Applying Trip@dvice Recommendation Technology to www.visiteurope.com. Moreover, we aim to allow partial URLs/website references in context and domains as part of a technical term.

The integrity check will focus on ensuring that URLs which have a start structure of http://, https:// or www. are not mistakenly included in the title/booktitle fields. In terms of minimising false positives, the check will only flag full URLs that are followed by a path and will avoid flagging domain names or references that are linked to valid research titles.

Besides the title and booktitle fields, we thought about the impact of checking other fields where URLs are not usually expected, such as the Author field. This would ensure that the author field doesn't contain a URL next to the Author's name.

@LinusDietz
Copy link
Member

sounds good. Go ahead and open the PR. It makes sense to open it early (when it's still work in progress), so we can give feedback earlier.

@RapidShotzz RapidShotzz linked a pull request Jan 30, 2025 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty.
Projects
Status: Assigned
Development

Successfully merging a pull request may close this issue.

6 participants