Remove RDD option in app; DataFrame only now. #450
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
GitHub issue(s): #449
What does this Pull Request do?
Remove RDD option in app; DataFrame only now.
I'll get an associated documentation PR with this as well.
How should this be tested?
If you want to robust, the following:
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/DomainGraphExtractorGEXF --output-format GEXF
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/DomainGraphExtractorTEXT
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor DomainFrequencyExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/DomainFrequencyExtractor
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor ImageGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/ImageGraphExtractor
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor PlainTextExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/PlainTextExtractor
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor WebPagesExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/WebPagesExtractor
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor DomainFrequencyExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/DomainFrequencyExtractorSingle --partition 1
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor ImageGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/ImageGraphExtractorSingle --partition 1
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor PlainTextExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/PlainTextExtractorSingle --partition 1
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor WebPagesExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/WebPagesExtractorSingle --partition 1
Should produce the following: