Skip to content


Latest commit



66 lines (42 loc) · 2.45 KB

File metadata and controls

66 lines (42 loc) · 2.45 KB

How to process big puzzle archive (like

    rm -rf bwh-zips/ && mkdir bwh-zips
    ./scripts/ -o bwh-zips/ --source bwh-2015.tgz bwh/
    ./scripts/ -o gxd/ bwh-zips/

How to manually process puzzles from appropriate publisher with different sources

    ./scripts/ <branch_name> <> "<source_a_name>"
    [ example: ./scripts/ latime_02 "LA Times" ]
    cd gxd && git checkout <branch_name> && git add .
    [ example: cd gxd && git checkout latime_01 && git add . ]
    # Check for quality before commit
    git commit -m 'message about source_a'
    cd ..
    ./scripts/ -o gxd/ <> --extsrc "<ext_src_b>" --intsrc "<int_src_b>"
    [ example: ./scripts/ -o gxd/ bwh_zips/ --extsrc "bwh" --intsrc "bwh-2015.tgz" ]
    cd gxd && ../scripts/ <branch_a_gitcode> <branch_b_gitcode> <outdir>
    [ example: cd gxd && ../scripts/ xml bwh latimes/ ]
    # outdir - where output of prev scripts goes - usually named by publisher
    # Check for quality before commit
    cd .. && ./scripts/

How to check receipts.tsv for duplicate values

filter out duplicates based on InternalSource & Filename

    awk 'BEGIN {FS="\t"} {c[$5$6]++} {if (c[$5$6] == 1) print $0}' receipts.tsv

number of receipts

    cat receipts.tsv | wc -l

enumerate ExternalSources with amount of receipts

    cat receipts.tsv | cut -f 4 | sort | uniq -c | sort -n

enumerate InternalSources with amount of receipts

    cat receipts.tsv | cut -f 5 | sort | uniq -c | sort -n

print duplicate receipts based on InternalSource & Filename

    cat receipts.tsv | cut -f 5,6 | sort | uniq -d -c

print duplicate receipts based on receiptid

    cat receipts.tsv | cut -f 1 | sort -n | uniq -c -d

check for puzzle duplicates and generate diffs

    cd gxd/
    ../scripts/ <origin> <input> -R <dir to process>

check from meda.db for receipts with empty xdid

    select * from receipts where xdid=='';

check from meta.db for receipts amount with duplicate xdid

   select count(*) from (select xdid, count(*) as c from receipts group by xdid having c>=2 order by c);