-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quality check: Title should not contain URL #12354
Comments
Hi, I’m interested in taking on this issue. Could you assign it to me? @koppor |
@lzmmxh Done. You find the user-facing documentation at https://docs.jabref.org/finding-sorting-and-cleaning-entries/checkintegrity. With Ctrl+Shift+F you will find the code. |
Hello, I am working with @LinusDietz as a member of a KCL student team and we are interested in this issue. Could we be assigned this? |
I also think this is a good feature to work on. I would suggest you give a short description (here in the issue) on how you want to minimize false positives and consider which fields (besides the title/booktitle) this integrity check should refer to. For example, the author field? This means we (and the @JabRef/developers) can refine the requirements for this issue a bit more before you start writing much code. For example, these real papers should ideally not be flagged:
|
Since there was no activity by the formerly assigned contributor i'm reassign this issue to @11raphael. As @LinusDietz is monitoring and supporting your progress im looking forward to your pull request. Please open a draft pr early so we can follow your changes too. |
Hello @LinusDietz @11raphael , we were thinking of flagging full URLs that are not supposed to appear in the title/booktitle field e.g. Exploring the Impact of Social Media on Education: https://www.example.com/education-impact. Additionally, we also aim to flag URLs that are embedded in the middle of title text, aswell as URLs that are not related to a research topic. To avoid false positives and incorrect flagging, we want to accept titles that mention website names as part of the topic e.g. Applying Trip@dvice Recommendation Technology to www.visiteurope.com. Moreover, we aim to allow partial URLs/website references in context and domains as part of a technical term. The integrity check will focus on ensuring that URLs which have a start structure of http://, https:// or www. are not mistakenly included in the title/booktitle fields. In terms of minimising false positives, the check will only flag full URLs that are followed by a path and will avoid flagging domain names or references that are linked to valid research titles. Besides the title and booktitle fields, we thought about the impact of checking other fields where URLs are not usually expected, such as the Author field. This would ensure that the author field doesn't contain a URL next to the Author's name. |
sounds good. Go ahead and open the PR. It makes sense to open it early (when it's still work in progress), so we can give feedback earlier. |
Example:
booktitle = {in Symposium on Automotive/Avionics Systems Engineering SAASE, [http://www.jacobsschool.ucsd.edu/GordonCenter/g\_leadership/l\_summer/docs/saase/papers/MeedeniyaAleti](http://www.jacobsschool.ucsd.edu/GordonCenter/g/_leadership/l/_summer/docs/saase/papers/MeedeniyaAleti) Buhnova.pdf},
This is not a valid booktitle; the URL should not be contained. A warning should be displayed.
The text was updated successfully, but these errors were encountered: