Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify examples in cdx-indexer help text to do as stated #683

Merged
merged 1 commit into from
Dec 8, 2021

Conversation

ldko
Copy link
Contributor

@ldko ldko commented Nov 22, 2021

Description

Modifies the examples given in the help text of the cdx-indexer script.

Motivation and Context

Currently the examples given at the end of the cdx-indexer script's help text do not work as stated because the intended output locations are being interpreted as input locations. Here is the text at the end of the help given with cdx-indexer -h:

Some examples:

* Create "example.cdx" index from example.warc.gz
cdxindexer.py ./cdx/example.cdx ./warcs/example.warc.gz

* Create "combined.cdx", a combined, sorted index of all warcs in ./warcs/
cdxindexer.py --sort combined.cdx ./warcs/

* Create a sorted cdx per file in ./cdx/ for each archive file in ./warcs/
cdxindexer.py --sort ./cdx/ ./warcs/

Here's output of running the first example, though the two paths do exist:

$ ls -l ./cdx ./warcs/example.warc.gz 
-rw-r--r-- 1 user group 309111734 Nov 22 15:15 ./warcs/example.warc.gz

./cdx:
total 0
$ cdx-indexer ./cdx/example.cdx ./warcs/example.warc.gz
 CDX N b a m s k r M S V g
Traceback (most recent call last):
  File "/home/me/virtualenvs/pywb/bin/cdx-indexer", line 33, in <module>
    sys.exit(load_entry_point('pywb==2.6.2', 'console_scripts', 'cdx-indexer')())
  File "/home/me/virtualenvs/pywb/lib/python3.7/site-packages/pywb-2.6.2-py3.7.egg/pywb/indexer/cdxindexer.py", line 471, in main
    minimal=cmd.minimal_cdxj)
  File "/home/me/virtualenvs/pywb/lib/python3.7/site-packages/pywb-2.6.2-py3.7.egg/pywb/indexer/cdxindexer.py", line 298, in write_multi_cdx_index
    with open(fullpath, 'rb') as infile:
FileNotFoundError: [Errno 2] No such file or directory: './cdx/example.cdx'

This tiny PR changes the examples given, so they work as expected:

Some examples:

* Create "example.cdx" index from example.warc.gz
cdx-indexer --output ./cdx/example.cdx ./warcs/example.warc.gz

* Create "combined.cdx", a combined, sorted index of all warcs in ./warcs/
cdx-indexer --sort --output combined.cdx ./warcs/

* Create a sorted cdx per file in ./cdx/ for each archive file in ./warcs/
cdx-indexer --sort --output ./cdx/ ./warcs/

@ikreymer ikreymer merged commit 5c35a43 into webrecorder:main Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants