compare bibtex files #1

haesleinhuepf · 2024-11-28T17:32:37Z

Assume I have two bibtex files from different sources. How can I measure similarities between entries? For example, I would like to search for all entries in file 1 which are not in file 2.

git-bob think about this

github-actions · 2024-11-28T17:33:51Z

^{This message was generated by git-bob (version: 0.14.1dev, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.}

To compare BibTeX files and identify entries in one file that are not in the other, you can use Python with the bibtexparser library. Here's a Jupyter Notebook to get you started:

import bibtexparser

def load_bibtex(file_path):
    with open(file_path, 'r') as bib_file:
        return bibtexparser.load(bib_file)

def compare_bibtex_entries(bib1, bib2):
    entries1 = {entry['ID']: entry for entry in bib1.entries}
    entries2 = {entry['ID']: entry for entry in bib2.entries}
    
    unique_to_bib1 = {key: entries1[key] for key in entries1 if key not in entries2}
    return unique_to_bib1

# Load the BibTeX files
bib1 = load_bibtex('path/to/file1.bib')
bib2 = load_bibtex('path/to/file2.bib')

# Find entries unique to the first BibTeX file
unique_entries = compare_bibtex_entries(bib1, bib2)

# Display unique entries
unique_entries

This code reads two BibTeX files and compares their entries by IDs, listing which ones are unique to the first file. Adjust 'path/to/file1.bib' and 'path/to/file2.bib' to the paths of your files.

haesleinhuepf · 2024-11-28T17:46:05Z

Let's assume IDs are not identical. I'd prefer to count how many parameters (author, url, title, ...) are identifcal between to bibtex entries. Then, we can assume if > 5 of those parameters are identical, they are the same records.

git-bob think about how to implement this.

github-actions · 2024-11-28T17:47:24Z

^{This message was generated by git-bob (version: 0.14.1dev, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.}

To account for non-identical IDs while comparing BibTeX entries, we can modify the code to count identical parameters and use a threshold to determine similarity:

import bibtexparser

def load_bibtex(file_path):
    with open(file_path, 'r') as bib_file:
        return bibtexparser.load(bib_file)

def compare_bibtex_entries(bib1, bib2, threshold=5):
    def count_matching_parameters(entry1, entry2):
        return sum(1 for key in entry1 if key in entry2 and entry1[key] == entry2[key])
    
    unique_to_bib1 = []
    for entry1 in bib1.entries:
        match_found = False
        for entry2 in bib2.entries:
            if count_matching_parameters(entry1, entry2) > threshold:
                match_found = True
                break
        if not match_found:
            unique_to_bib1.append(entry1)
    
    return unique_to_bib1

### Load the BibTeX files
bib1 = load_bibtex('path/to/file1.bib')
bib2 = load_bibtex('path/to/file2.bib')

### Find entries unique to the first BibTeX file
unique_entries = compare_bibtex_entries(bib1, bib2)

### Display unique entries
unique_entries

This approach allows us to identify similar entries based on parameter matching, even if IDs differ, with a customizable threshold for similarity.

haesleinhuepf changed the title ~~compare bibtext files~~ compare bibtex files Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compare bibtex files #1

compare bibtex files #1

haesleinhuepf commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

haesleinhuepf commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

compare bibtex files #1

compare bibtex files #1

Comments

haesleinhuepf commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

haesleinhuepf commented Nov 28, 2024

github-actions bot commented Nov 28, 2024