Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load functionality for multiple .npy files #1388

Conversation

Reisii
Copy link
Collaborator

@Reisii Reisii commented Feb 28, 2024

Due Diligence

  • General:
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • documentation updated where needed

Description

Many smaller .npy-files will be loaded at the same time

Added to one big DNDarray

Issue/s resolved: #900

Changes proposed:

Type of change

New feature

Memory requirements

Performance

Does this change modify the behaviour of other functions? If so, which?

no

Copy link
Contributor

Thank you for the PR!

@@ -1,6 +1,7 @@
"""
Manipulation operations for (potentially distributed) `DNDarray`s.
"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this has been added unintentionally

Copy link
Contributor

Thank you for the PR!

Copy link

codecov bot commented Feb 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.91%. Comparing base (225dc96) to head (b5a06b2).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1388      +/-   ##
==========================================
+ Coverage   91.88%   91.91%   +0.02%     
==========================================
  Files          80       80              
  Lines       11916    11942      +26     
==========================================
+ Hits        10949    10976      +27     
+ Misses        967      966       -1     
Flag Coverage Δ
unit 91.91% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

crea_array = np.concatenate(x, 0)
np.save("data" + str(i), x)

array1 = ht.load_npy_from_path("/home/nguy_t4/home/heat/heat/core/tests")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Der Pfad wird natürlich nicht gefunden, da die Tests auf den Runnern nicht auf deinem Rechner laufen... du musst hier einen relativen Pfad angeben, vermutlich /heat/core/tests o.ä.

@mrfh92
Copy link
Collaborator

mrfh92 commented Mar 11, 2024

As the error

ERROR: test_load_npy (heat.core.tests.test_io.TestIO)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/heat/heat/heat/core/tests/test_io.py", line 760, in test_load_npy
    self.assertEqual(load_array.numpy(), int_array)
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/unittest/case.py", line 912, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/unittest/case.py", line 902, in _baseAssertEqual
    if not first == second:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

tells, comparing two numpy arrays a and b results in an ambiguous result as an array with the same shape as a and b but with entries True or False is returned and, in general, it is difficult to assign a single True/False to such an outcome.

In our case, however, True should be the outcome if and only if the two arrays coincide in every element. Thus, the .all() function needs to be used as follows:

self.assertTrue((a==b).all())

I have made these changes.

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Jul 2, 2024

Thank you for the PR!

Hoppe added 3 commits July 3, 2024 09:03
…iles_into_one_DNDarray' of github.com:helmholtz-analytics/heat into features/900-Improve_load-functionality_load_multiple_files_into_one_DNDarray
Copy link
Contributor

github-actions bot commented Jul 3, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Jul 3, 2024

Thank you for the PR!

np.save(os.path.join(os.getcwd(), "heat/datasets", "float_data"), x)
ht.MPI_WORLD.Barrier()
with self.assertRaises(RuntimeError):
ht.load_npy_from_path("heat/datasets/npy_dummy", dtype=ht.int64, split=0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are loading the dummy file from the wrong directory or you saved it in the wrong directory

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the hint 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it should work again

Copy link
Contributor

github-actions bot commented Jul 3, 2024

Thank you for the PR!

@mrfh92 mrfh92 self-requested a review July 3, 2024 12:38
mrfh92
mrfh92 previously approved these changes Jul 3, 2024
Copy link
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve to let full CI-matrix run (test combinations of different Python and PyTorch versions)

@mrfh92 mrfh92 requested a review from mtar July 3, 2024 12:39
@mrfh92
Copy link
Collaborator

mrfh92 commented Jul 3, 2024

@mtar if you have time can you have a second look into this PR and review? (to follow the convention not to review own students' work)

mrfh92
mrfh92 previously approved these changes Jul 3, 2024
Copy link
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve again to let CI-matrix run

Copy link
Contributor

github-actions bot commented Jul 3, 2024

Thank you for the PR!

heat/core/io.py Outdated Show resolved Hide resolved
Co-authored-by: Michael Tarnawa <[email protected]>
Copy link
Contributor

github-actions bot commented Jul 4, 2024

Thank you for the PR!

@mrfh92 mrfh92 self-requested a review July 4, 2024 14:06
Copy link
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve to let CI run; also @mtar has reviewed yesterday already

Copy link
Contributor

github-actions bot commented Jul 4, 2024

Thank you for the PR!

@mrfh92 mrfh92 merged commit a774559 into main Jul 4, 2024
53 checks passed
@mrfh92 mrfh92 deleted the features/900-Improve_load-functionality_load_multiple_files_into_one_DNDarray branch July 4, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve load-functionality: load multiple files into one DNDarray
4 participants