You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can have a threshold for streaming data from disk, instead of reading all data into memory. Have a default such as 500Mb which the user can adjust
Local storage may be an issue, perhaps ask the user if they want to proceed and give an estimate of storage space required.
For download time, cannot do much about this, on Linux wget -c is helpful for continuing an incomplete download without starting again. If the data is stored on the cloud in a suitable form, one can stream the interesting portion, but this requires infrastructure allows this and perhaps is another step for the future. At present want to consider datasets upto 100 Gb which may be analyzed on a workstation.
It may be good to have a different way to work with large datasets. For example the https://ldbcouncil.org/benchmarks/graphalytics/ data sets are 1.1Tb in total.
The text was updated successfully, but these errors were encountered: