Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hathi online #209

Merged
merged 17 commits into from
Jun 4, 2020
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 16 additions & 8 deletions .rubocop_todo.yml
Original file line number Diff line number Diff line change
@@ -1,46 +1,54 @@
# This configuration was generated by
# `rubocop --auto-gen-config`
# on 2019-09-04 12:26:22 -0400 using RuboCop version 0.74.0.
# on 2020-06-02 23:04:19 -0400 using RuboCop version 0.84.0.
# The point is for the user to remove these configuration records
# one by one as the offenses are removed from the code base.
# Note that changes in the inspected code, or installation of new
# versions of RuboCop, may require this file to be generated again.

# Offense count: 17
# Offense count: 16
# Configuration parameters: IgnoredMethods.
Metrics/AbcSize:
Max: 29

# Offense count: 18
# Offense count: 19
# Configuration parameters: CountComments, ExcludedMethods.
# ExcludedMethods: refine
Metrics/BlockLength:
Max: 162
Max: 179

# Offense count: 2
# Configuration parameters: CountComments.
Metrics/ClassLength:
Max: 134
Max: 136

# Offense count: 5
# Configuration parameters: IgnoredMethods.
Metrics/CyclomaticComplexity:
Max: 11

# Offense count: 14
# Offense count: 15
# Configuration parameters: CountComments, ExcludedMethods.
Metrics/MethodLength:
Max: 21

# Offense count: 1
# Configuration parameters: CountComments.
Metrics/ModuleLength:
Max: 145
Max: 156

# Offense count: 1
# Configuration parameters: MinBodyLength.
Style/GuardClause:
Exclude:
- 'lib/marc_format_processor.rb'

# Offense count: 3
Style/MixinUsage:
Exclude:
- 'lib/traject/psulib_config.rb'

# Offense count: 291
# Offense count: 55
# Cop supports --auto-correct.
# Configuration parameters: AutoCorrect, AllowHeredoc, AllowURI, URISchemes, IgnoreCopDirectives, IgnoredPatterns.
# URISchemes: http, https
Expand Down
5 changes: 3 additions & 2 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@ gem 'mail'
gem 'marc'
gem 'rake'
gem 'rsolr'
gem 'traject', '3.1.0'
gem 'traject'
gem 'traject-marc4j_reader', platform: :jruby
gem 'whenever', require: false

group :development, :test do
gem 'byebug', platform: :mri
gem 'pry', platform: :mri
gem 'rspec'
gem 'rubocop'
gem 'simplecov', '< 0.18'
gem 'simplecov', '< 0.18' # CodeClimate does not work with .18 or later
end
39 changes: 30 additions & 9 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ GEM
public_suffix (>= 2.0.2, < 5.0)
ast (2.4.0)
builder (3.2.4)
byebug (11.1.3)
chronic (0.10.2)
coderay (1.1.2)
concurrent-ruby (1.1.6)
diff-lcs (1.3)
docile (1.3.2)
Expand All @@ -14,17 +16,24 @@ GEM
dot-properties (0.1.3)
faraday (1.0.1)
multipart-post (>= 1.2, < 3)
hashie (3.6.0)
http (3.3.0)
ffi (1.13.0)
ffi (1.13.0-java)
ffi-compiler (1.0.1)
ffi (>= 1.0.0)
rake
hashie (4.1.0)
http (4.4.1)
addressable (~> 2.3)
http-cookie (~> 1.0)
http-form_data (~> 2.0)
http_parser.rb (~> 0.6.0)
http-form_data (~> 2.2)
http-parser (~> 1.2.0)
http-cookie (1.0.3)
domain_name (~> 0.5)
http-form_data (2.3.0)
http_parser.rb (0.6.0-java)
http-parser (1.2.1)
ffi-compiler (>= 1.0, < 2.0)
httpclient (2.8.3)
json (2.3.0)
json (2.3.0-java)
library_stdnums (1.6.0)
mail (2.7.1)
Expand All @@ -36,12 +45,19 @@ GEM
marc (~> 1.0)
marc-marc4j (1.0.0-java)
marc (~> 1)
method_source (1.0.0)
mini_mime (1.0.2)
mini_portile2 (2.4.0)
multipart-post (2.1.1)
nokogiri (1.10.9)
mini_portile2 (~> 2.4.0)
nokogiri (1.10.9-java)
parallel (1.19.1)
parser (2.7.1.2)
ast (~> 2.4.0)
pry (0.13.1)
coderay (~> 1.1)
method_source (~> 1.0)
public_suffix (4.0.5)
rainbow (3.0.0)
rake (13.0.1)
Expand Down Expand Up @@ -80,11 +96,11 @@ GEM
simplecov-html (~> 0.10.0)
simplecov-html (0.10.2)
slop (3.6.0)
traject (3.1.0)
traject (3.3.0)
concurrent-ruby (>= 0.8.0)
dot-properties (>= 0.1.1)
hashie (~> 3.1)
http (~> 3.0)
hashie (>= 3.1, < 5)
http (>= 3.0, < 5)
httpclient (~> 2.5)
marc (~> 1.0)
marc-fastxmlwriter (~> 1.0)
Expand All @@ -94,26 +110,31 @@ GEM
traject-marc4j_reader (1.1.0-java)
marc (~> 1.0)
marc-marc4j (~> 1.0)
unf (0.1.4)
unf_ext
unf (0.1.4-java)
unf_ext (0.0.7.7)
unicode-display_width (1.7.0)
whenever (1.0.0)
chronic (>= 0.6.3)
yell (2.2.2)

PLATFORMS
java
ruby

DEPENDENCIES
byebug
library_stdnums
mail
marc
pry
rake
rsolr
rspec
rubocop
simplecov (< 0.18)
traject (= 3.1.0)
traject
traject-marc4j_reader
whenever

Expand Down
12 changes: 2 additions & 10 deletions config/indexer_settings_dev.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,5 @@
---

solr_url: http://localhost:8983/solr/psul_blacklight
log_batch_size: 100_000
solr_version: 7.4.0
log_file: log/traject.log
log_error_file: log/traject_error.log
solr_writer_commit_on_close: true
reader_class_name: Traject::MarcCombiningReader
marc4j_reader_permissive: true
marc4j_reader_source_encoding: UTF-8
processing_thread_pool: 5
commit_timeout: 10000
hathi_overlap_path: ignorethis_hathi/
hathi_overlap_file: final_overlap_may.csv
9 changes: 1 addition & 8 deletions config/indexer_settings_production.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,5 @@
---

log_batch_size: 100_000
solr_version: 7.4.0
log_file: /var/log/traject/traject_prod.log
log_error_file: /var/log/traject/traject_error_prod.log
solr_writer_commit_on_close: true
reader_class_name: Traject::MarcCombiningReader
marc4j_reader_permissive: true
marc4j_reader_source_encoding: UTF-8
processing_thread_pool: 7
commit_timeout: 900
commit_timeout: 900
9 changes: 0 additions & 9 deletions config/indexer_settings_qa.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,4 @@
---

solr_url: http://localhost:8983/solr/psul_blacklight
log_batch_size: 100_000
solr_version: 7.4.0
log_file: /var/log/traject/traject_qa.log
log_error_file: /var/log/traject/traject_error_qa.log
solr_writer_commit_on_close: true
reader_class_name: Traject::MarcCombiningReader
marc4j_reader_permissive: true
marc4j_reader_source_encoding: UTF-8
processing_thread_pool: 7
commit_timeout: 10000
5 changes: 5 additions & 0 deletions config/indexer_settings_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---

processing_thread_pool: 5
hathi_overlap_path: spec/fixtures/hathitrust/
banukutlu marked this conversation as resolved.
Show resolved Hide resolved
hathi_overlap_file: mock_overlap.csv
52 changes: 24 additions & 28 deletions lib/marc_access_facet_processor.rb
Original file line number Diff line number Diff line change
@@ -1,49 +1,45 @@
# frozen_string_literal: true

# Determines the access status of a record, how patrons are able to acquire an item.
# https://github.com/psu-libraries/psulib_blacklight/wiki/Access-Facet
class MarcAccessFacetProcessor
LIBRARIES_MAP = Traject::TranslationMap.new('libraries')

def initialize
freeze
end

# Extract 949m for access facet
def extract_access_data(record)
return unless record.fields('949').any?

access_data = []
libraries_map = Traject::TranslationMap.new('libraries')

Traject::MarcExtractor.cached('949m').collect_matching_lines(record) do |field, spec, extractor|
def extract_access_data(record, context)
access = Traject::MarcExtractor.cached('949m').collect_matching_lines(record) do |field, spec, extractor|
library_code = extractor.collect_subfields(field, spec).first
access_data << case library_code
when 'ONLINE'
'Online'
when 'ACQ_DSL', 'ACQUISTNS', 'SERIAL-SRV'
'On Order'
when 'ZREMOVED', 'XTERNAL'
next
else
resolve_library_code(field, libraries_map.translate_array([library_code])[0])
end
case library_code
when 'ONLINE'
'Online'
when 'ACQ_DSL', 'ACQUISTNS', 'SERIAL-SRV'
'On Order'
when 'ZREMOVED', 'XTERNAL'
next
else
resolve_library_code field, LIBRARIES_MAP[library_code]
end
end
access_data.compact!
access_data.uniq!
access_data.delete('On Order') if not_only_on_order?(access_data)
access_data

access << 'Online' if context.output_hash&.dig('ht_id_ssim')
access.compact.uniq
access.delete 'On Order' if not_only_on_order? access
access
end

# If there is anything other than On Order, we DO NOT include On Order
def not_only_on_order?(access_data)
access_data.include?('On Order') && (access_data.length > 1)
access_data.include?('On Order') && access_data.length > 1
end

def resolve_library_code(field, library)
return 'Other' if library.nil?
return 'Other' if library.nil? # Something unexpected
return 'In the Library' unless field['l'] == 'ON-ORDER'

if !field['l'].nil? && field['l'] == 'ON-ORDER'
'On Order'
else
'In the Library'
end
'On Order'
end
end
9 changes: 9 additions & 0 deletions lib/traject/macros/custom.rb
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,15 @@ def self.includes_oclc_indicators?(sf_a)
sf_a.include?('ocm') ||
sf_a.include?('OCLC')
end

# Extract ht_id
def extract_ht_id
lambda do |_record, accumulator, context|
oclc_number = context.output_hash&.dig('oclc_number_ssim')&.first
accumulator << HATHI_ETAS_OVERLAP[oclc_number]
accumulator.compact
end
end
end
end
end
Loading