You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We didn't notice them, because they happen on Windows on macOS, but we only run the prototype tests on Linux. Given that the prototypes work with files and the path handling is different on all platforms, this can be problematic.
The torchdata team brought the failures to our attention, since they run our dataset tests on the full matrix. I propose we also run the prototype datasets tests on Windows and macOS to avoid these post mortems. To limit the extra needed CI resources I would only run the tests only on Python3.7.
That means we are looking at 2 extra CI runs per PR with roughly 10 minutes runtime.
As discussed offline with @pmeier, this is the 4th time that platform specific issues creep in on datasets and need to be tested on each platform. Given that the prototype execution time is only 5 minutes, adding 2 jobs for Windows and macOS sounds like a reasonable tradeoff.
@seemethere@bigfootjon Just highlighting this case as another example of why often we need to run tests across different setups. This is something that was debated at #5479, so I thought to provide a bit more evidence. This of course doesn't mean that we shouldn't reduce the costs. I think the solution is to restructure the CI jobs to separate tests that can break on different platforms from those that don't. This way we can reduce significantly the costs by running across all configs only those that must be run everywhere. We are in a bit of a crunch resource-wise but we hope we could pick this up within Q2.
There are now three separate occasions of bugs in the prototype datasets that were not picked up by our CI.
fromfile
on windows #4980We didn't notice them, because they happen on Windows on macOS, but we only run the prototype tests on Linux. Given that the prototypes work with files and the path handling is different on all platforms, this can be problematic.
The torchdata team brought the failures to our attention, since they run our dataset tests on the full matrix. I propose we also run the prototype datasets tests on Windows and macOS to avoid these post mortems. To limit the extra needed CI resources I would only run the tests only on Python3.7.
That means we are looking at 2 extra CI runs per PR with roughly 10 minutes runtime.
cc @pmeier @seemethere @bjuncek
The text was updated successfully, but these errors were encountered: