Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graphml output to CommandLineApp and DomainGraphExtractor. #438

Merged
merged 10 commits into from
Apr 11, 2020

Conversation

ruebot
Copy link
Member

@ruebot ruebot commented Apr 9, 2020

GitHub issue(s): #435

What does this Pull Request do?

Add graphml output to CommandLineApp and DomainGraphExtractor.

How should this be tested?

TravisCI +

bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/graphml-rdd --output-format GRAPHML
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/graphml-df --df --output-format GRAPHML

Those both should have the same line count.

For example:

$ wc -l graphml-df/GRAPHML.xml graphml-rdd/GRAPHML.xml
  33911 graphml-df/GRAPHML.xml
  33911 graphml-rdd/GRAPHML.xml
  67822 total

Additional Notes:

I should add a test as well for WriteGraphmlDF similar to what we have for WriteGexf. So, don't merge this right away, but feel free to test it and make sure it works.

This might also take care of the remainder of #223 and supersede #397? 👋 @SinghGursimran

@ruebot ruebot requested review from lintool and ianmilligan1 April 9, 2020 02:14
@codecov
Copy link

codecov bot commented Apr 9, 2020

Codecov Report

Merging #438 into master will increase coverage by 0.13%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master     #438      +/-   ##
==========================================
+ Coverage   78.04%   78.18%   +0.13%     
==========================================
  Files          43       43              
  Lines        1558     1586      +28     
  Branches      286      289       +3     
==========================================
+ Hits         1216     1240      +24     
- Misses        217      218       +1     
- Partials      125      128       +3     

Copy link
Member

@ianmilligan1 ianmilligan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 🤦 comment from me below (but the file outputs are great!).

Copy link
Member

@ianmilligan1 ianmilligan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks great on all fronts! (give me a thumbs up if you're happy to have this PR merged, @ruebot, looks good to me)

@ruebot
Copy link
Member Author

ruebot commented Apr 11, 2020

@ianmilligan1 sure, let's use the following as a commit message:

Add graphml output to CommandLineApp and DomainGraphExtractor.

@ianmilligan1 ianmilligan1 merged commit 943cf92 into master Apr 11, 2020
@ianmilligan1 ianmilligan1 deleted the issue-435 branch April 11, 2020 21:56
ruebot added a commit to archivesunleashed/aut-docs that referenced this pull request Apr 13, 2020
ianmilligan1 pushed a commit to archivesunleashed/aut-docs that referenced this pull request Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add graphml output to DomainGraphExtractor
3 participants