Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compare bibtex files #1

Open
haesleinhuepf opened this issue Nov 28, 2024 · 3 comments
Open

compare bibtex files #1

haesleinhuepf opened this issue Nov 28, 2024 · 3 comments

Comments

@haesleinhuepf
Copy link
Member

Assume I have two bibtex files from different sources. How can I measure similarities between entries? For example, I would like to search for all entries in file 1 which are not in file 2.

git-bob think about this

Copy link

This message was generated by git-bob (version: 0.14.1dev, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

To compare BibTeX files and identify entries in one file that are not in the other, you can use Python with the bibtexparser library. Here's a Jupyter Notebook to get you started:

import bibtexparser

def load_bibtex(file_path):
    with open(file_path, 'r') as bib_file:
        return bibtexparser.load(bib_file)

def compare_bibtex_entries(bib1, bib2):
    entries1 = {entry['ID']: entry for entry in bib1.entries}
    entries2 = {entry['ID']: entry for entry in bib2.entries}
    
    unique_to_bib1 = {key: entries1[key] for key in entries1 if key not in entries2}
    return unique_to_bib1

# Load the BibTeX files
bib1 = load_bibtex('path/to/file1.bib')
bib2 = load_bibtex('path/to/file2.bib')

# Find entries unique to the first BibTeX file
unique_entries = compare_bibtex_entries(bib1, bib2)

# Display unique entries
unique_entries

This code reads two BibTeX files and compares their entries by IDs, listing which ones are unique to the first file. Adjust 'path/to/file1.bib' and 'path/to/file2.bib' to the paths of your files.

@haesleinhuepf
Copy link
Member Author

Let's assume IDs are not identical. I'd prefer to count how many parameters (author, url, title, ...) are identifcal between to bibtex entries. Then, we can assume if > 5 of those parameters are identical, they are the same records.

git-bob think about how to implement this.

Copy link

This message was generated by git-bob (version: 0.14.1dev, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

To account for non-identical IDs while comparing BibTeX entries, we can modify the code to count identical parameters and use a threshold to determine similarity:

import bibtexparser

def load_bibtex(file_path):
    with open(file_path, 'r') as bib_file:
        return bibtexparser.load(bib_file)

def compare_bibtex_entries(bib1, bib2, threshold=5):
    def count_matching_parameters(entry1, entry2):
        return sum(1 for key in entry1 if key in entry2 and entry1[key] == entry2[key])
    
    unique_to_bib1 = []
    for entry1 in bib1.entries:
        match_found = False
        for entry2 in bib2.entries:
            if count_matching_parameters(entry1, entry2) > threshold:
                match_found = True
                break
        if not match_found:
            unique_to_bib1.append(entry1)
    
    return unique_to_bib1

### Load the BibTeX files
bib1 = load_bibtex('path/to/file1.bib')
bib2 = load_bibtex('path/to/file2.bib')

### Find entries unique to the first BibTeX file
unique_entries = compare_bibtex_entries(bib1, bib2)

### Display unique entries
unique_entries

This approach allows us to identify similar entries based on parameter matching, even if IDs differ, with a customizable threshold for similarity.

@haesleinhuepf haesleinhuepf changed the title compare bibtext files compare bibtex files Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant