Skip to content

Latest commit



357 lines (221 loc) · 9.65 KB

File metadata and controls

357 lines (221 loc) · 9.65 KB

v7.2.1 (2023-10-20)

  • Revert deployment tagging changes, since they didn't solve the duplicate workflow trigger problem.

v7.2.0 (2023-10-19)

  • Force parameter facets based on GCMD keywords to be upper-case.
  • Only use short name for sensor facets in which the short name and long name are identical.

v7.1.0 (2023-10-11)

  • Updating harvesting to harvest storage system and spatial coverage into separate facets, instead of a combined facet_featured facet

v7.0.0 (2023-10-09)

  • Updating harvesting to include cumulus value
  • Adding a new facet that will cover both cumulus and global spatial coverage.

v6.5.1 (2023-09-26)

  • Updating logging default configuration

v6.5.0 (2023-09-21)

  • Adding logging functionality to the code, including the ability to specify log file destination and log level for both the file and console output

v6.4.1 (2023-09-15)

  • Added GitHub Action workflows for continuous integration features
  • Updated bump rake task to use Bump gem
  • Removed release rake task, moved it to the CI workflow

v6.4.0 (2023-08-14)

  • Fixed a bug with the sanitization, which was trying to modify the string directly (causing problems with frozen strings). Changed to return a new, sanitized string.

v6.3.0 (2023-07-24)

  • Update Rubocop configuration to actually run against files, and make necessary corrections to comply with Rubocop styling.

v6.2.0 (2023-07-18)

  • Remove deprecated harvesters and associated tests, helpers, etc.

v6.1.0 (2023-07-14)

  • Updated a few other dependencies that weren't at the newest versions.

v6.0.0 (2023-07-14)

  • Updated Ruby to 3.2.2, updated gem dependencies to more recent versions.

v5.2.0 (2022-08-31)

  • Updated the call for identifiers for the json harvester to use the proper "metadataPrefix" parameter, and request the dif identifiers instead of iso.

v5.1.0 (2020-07-23)

  • Added a CLI method to "ping" the Solr and Source servers for a given data center.
  • Added a CLI method "errcode" to get information about the various error codes that may be returned during harvest
  • Updated the CLI harvest to return more useful error codes on failure.

v5.0.1 (2020-07-02)

  • Bug fix: some requires weren't included that needed to be.

v5.0.0 (2020-07-02)

  • Update Ruby to 2.6.5, update gem dependencies to more recent version.
  • Updates to correspond with an update to Solr 8.5.2

v4.2.1 (2019-08-13)

  • Patch release to include updated CHANGELOG.

v4.2.0 (2019-08-12)

  • Update dataset-catalog-services URL to only fetch current (not retired) metadata records.
  • Add a few more gem release notes to README.

Note: v4.1.0 was prematurely released and, in theory, yanked. However, on the second try at publishing 4.1.0, Rubygems complained about the attempt to republish a gem. The version was therefore bumped again to 4.2.0 as the path of least resistance to a successful publish. v4.1.0 should not be used.

v4.0.1 (2019-07-08)

  • Update CHANGELOG and release instructions.
  • Fix README typo.

v4.0.0 (2019-07-08)


  • Update spatial field formatting to work with Solr 8.1.1.

v3.11.0 (2019-06-10)


  • Update Ruby, Nokogiri, RestClient, Rubocop, and Webmock versions to address security warnings.
  • Update syntax as necessary for new versions of Rubocop and RestClient.

v3.10.0 (2017-04-10)


  • Constrain ADC and ECHO feeds to only fetch records in the arctic.


  • v3.9.1 and v3.10.0 were mistakenly released after version 3.9.0 was tagged. All three versions are identical, although v3.9.0 was never released to

v3.8.4 (2017-03-30)


  • Fix deleting old records after harvest for ADE auto suggest.

v3.8.3 (2017-03-29)


  • Add dependency on ffi-geos to fix issue where RGeo::Geos.factory returned nil on Ubuntu 14 when parsing the BCO-DMO feed.

v3.8.2 (2017-03-29)


  • Update NOAA WDS Paleo feed URL to use https.

v3.8.1 (2017-03-29)


  • Fix BCO-DMO harvester to only fail when there are issues with individual records if --die-on-failure is given.

v3.8.0 (2017-03-28)


  • Change ECHO harvester to harvest 100 records at a time, rather than 1000 to avoid timeout/hanging issues with the large requests.
  • Change "CISL"/ACADIS Gateway harvester to "NSF Arctic Data Center"; redirects to another site, and the data center's name was changed. The feed format was also changed; the harvester was updated to consume the new feed.


  • Update NODC feed URL to use https.
  • Update RDA feed URL to use https.
  • Update handling of geometries to match new format provided by BCO-DMO feed.
  • Update NMI feed URL; the feed was relocated.
  • Harvesting tDAR starts from record 0 instead of record 1.
  • tDAR harvester no longer attempts to obtain another page of records after all the records have been harvested; where other feeds return an empty response that our harvester handles without issue, tDAR throws an error if the "startRecord" parameter is higher than their last record.
  • Exit with a non-0 status when a problem with the whole feed is encountered, even if --die-on-failure is not passed. That flag should only cause failures when there are issues with individual records; we don't want harvesting to stop due to a metadata issue with a small number of records.
  • Include BCO-DMO URL in the harvester output the same way all the other URLs are displayed.

v3.7.1 (2016-05-18)

  • RuboCop fixes.

v3.7.0 (2016-05-18)

New Features

  • Add sponsored programs to NSIDC harvesting.
  • Add support for ingesting Data Access Links from NSIDC JSON


  • Fix dependency issue with gem "listen".
  • Fix bad configuration for OAI feed URLs.

v3.5.1 (2016-02-15)


  • Add temporal duration facet for GTN-P data center.

v3.5.0 (2016-02-11)


  • Update long name for GTN-P data center.

v3.4.0 (2016-02-11)

New Features

  • Add harvester for GTN-P.

v3.3.4 (2016-02-08)

See v3.4.0.

v3.3.3 (2016-01-14)


  • Added quote checking for cisl offset parsing check

v3.3.1 (2015-09-25)


  • Remove strange facet string for temporal duration from NOAA Paleo search results.

v3.3.0 (2015-09-24)

New Features

v3.2.1 (2015-09-23)


  • Catch a timeout error earlier in the stack to prevent an infinite loop of retries; this bug caused the PDC harvester to attempt to access the feed 150 times, instead of simply failing after 3 failed attempts. Pivotal 103057378


  • Change NODC harvester's default page size from 100 to 50. The NODC feed is responding with HTTP 500 when requesting records 301-400, but not when requesting 301-350 or 351-400.

v3.2.0 (2015-07-01)

New Features

v3.1.2 (2015-06-30)


v3.1.1 (2015-06-29)


  • Updated deletion constraints such that lucene special characters in dataset names do not cause deletion of that data provider's data to fail.

v3.1.0 (2015-06-25)


  • Remove gi-cat as a dependency as no harvesters utilize it.
  • Harvest the UCAR NCAR - Earth Observing Laboratory (UCAR/NCAR EOL) from EOL's THREDDS endpoint instead of GI-Cat
  • Harvest the Norwegian Meteorological Institute feed directly instead of via GI-Cat.


  • Fix broken configuration, where production was attempting to use the Blue, rather than the the production, Solr host for harvesting. (see PCT-410)

v3.0.1 (2015-06-18)


  • Fix broken delete_all commands.

v3.0.0 (2015-06-15)

  • Packaged as a gem with a new executable file, providing a new interface to harvest feeds into solr.
  • Change the RDA and EOL harvesters to store the data center name as "UCAR NCAR", rather than "UCAR/NCAR". This fixes a bug with deleting the datasets; the query to Solr was failing because the "/" character could not be correctly escaped.

v2.0.0 (2015-06-08)

v1.1.0 (2015-06-05)


  • Add support to harvest RDA directly from their feed, rather than through GI-Cat.

v1.0.0 (2015-06-02)


  • Fix missing accented characters in datasets from Polar Data Catalogue

v0.4.0 (2015-02-25)


  • Added TDAR translator for harvesting into Solr
  • Added PDC (Polar Data Catalog) translator for harvesting int Solr
  • Revised CISL endpoint to harvest a subset of data. Created translator to harvest directly rather than through GI-Cat


  • Fixed USGS harvesting issue where it was timing out on specific records
  • Fixed EOL translator for processing spatial bounds properly
  • Validate bounding boxes for documents being added to Solr

v0.2.0 (2015-02-19)


  • Set USGS page size from 100 to 10 to reduce Solr load
  • Added exception handling for REST POSTs to Solr

v0.0.2 (2015-02-11)


  • Updated project to use new CI tools and processes