Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large data sets #126

Open
bkmgit opened this issue Mar 25, 2022 · 2 comments
Open

Large data sets #126

bkmgit opened this issue Mar 25, 2022 · 2 comments

Comments

@bkmgit
Copy link
Contributor

bkmgit commented Mar 25, 2022

It may be good to have a different way to work with large datasets. For example the https://ldbcouncil.org/benchmarks/graphalytics/ data sets are 1.1Tb in total.

@kou
Copy link
Member

kou commented Mar 25, 2022

Do you have any idea?

What should we care about it? Local storage size? Download time? ...?

@bkmgit
Copy link
Contributor Author

bkmgit commented Mar 27, 2022

  1. Can have a threshold for streaming data from disk, instead of reading all data into memory. Have a default such as 500Mb which the user can adjust
  2. Local storage may be an issue, perhaps ask the user if they want to proceed and give an estimate of storage space required.
  3. For download time, cannot do much about this, on Linux wget -c is helpful for continuing an incomplete download without starting again. If the data is stored on the cloud in a suitable form, one can stream the interesting portion, but this requires infrastructure allows this and perhaps is another step for the future. At present want to consider datasets upto 100 Gb which may be analyzed on a workstation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants