Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset JSONs are not minified #1098

Open
tsibley opened this issue Feb 15, 2024 · 0 comments
Open

Dataset JSONs are not minified #1098

tsibley opened this issue Feb 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@tsibley
Copy link
Member

tsibley commented Feb 15, 2024

Current Behavior
Dataset JSONs are not minified.

$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | head -n10 | cut -c 1-120
{
  "version": "v2",
  "meta": {
    "title": "Genomic epidemiology of SARS-CoV-2 with subsampling focused globally over the past 2 months",
    "updated": "2024-02-15",
    "build_url": "https://github.com/nextstrain/ncov",
    "data_provenance": [
      {
        "name": "GenBank",
        "url": "https://www.ncbi.nlm.nih.gov/genbank/"

$ curl -s --compressed https://data.nextstrain.org/zika.json | head -n10 | cut -c 1-120
{"version":"v2","meta":{"title":"Real-time tracking of Zika virus evolution","updated":"2024-02-05","build_url":"https:/

Minification would make a big difference in size:

$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | wc --bytes
33630950

$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | jq -c | wc --bytes
2841344

We apparently never enabled the optional augur export v2 minification for production builds (an unfortunate oversight!). But even the automatic minification done by recent Augur versions is subverted by custom post-processing that explicitly outputs unminified (pretty-printed) JSON. Oops.

$ g -F json.dump
scripts/add_labels.py
65:        json.dump(input_json, f, indent=2)

scripts/add_priorities_to_meta.py
44:        json.dump(input_json, fh, indent=2)

scripts/construct-recency-from-submission-date.py
44:        json.dump(node_data, fh)

scripts/developer_scripts/parse_mutational_fitness_tsv_into_distance_map.py
68:        json.dump(json_output, f, indent=2)

scripts/explicit_translation.py
75:        json.dump({"nodes":node_data, "annotations":annotations, "reference":root_sequence_translations}, fh)

scripts/fix-colorings.py
89:        json.dump(input_json, f, indent=2)

scripts/include_prefix.py
32:        json.dump(auspice_json, f, indent=2)
52:        json.dump(modified_tip_frequencies_json, f, indent=2)

workflow/snakemake_rules/export_for_nextstrain.smk
323:            json.dump(data, fh, indent=2)
487:    response = requests.post("https://slack.com/api/chat.postMessage", headers=headers, data=json.dumps(data))

Expected behavior
All JSONs are minified.

Possible solution

  1. Adjust json.dump() and json.dumps() callsites to respect AUGUR_MINIFY_JSON (or alternatively to always minify)
  2. Replace json.dump() and json.dumps() callsites with augur.utils.write_json() which brings the benefits of respecting AUGUR_MINIFY_JSON but also automatic minification by size… but we maybe probably kinda sorta should promote that to Augur's public API first.

Additional context
@miparedes was having a heck of time getting his custom builds (based on an older version of this repo) to minify.

@tsibley tsibley added the bug Something isn't working label Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant