Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance cross-platform compatibility for loading PySRRegressor models #681

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

zzccchen
Copy link

This PR improves the cross-platform compatibility of the PySRRegressor.from_file method, allowing seamless loading of models saved on different operating systems (Windows and Linux).

Key changes:

  1. Introduced a custom unpickler (CustomUnpickler) to handle path objects from different operating systems.
  2. Added a path_to_str function to convert path objects to strings, ensuring consistent path handling across platforms.
  3. Updated the model loading process to use the custom unpickler and path conversion.
  4. Maintained the existing functionality for creating models from scratch when a pickle file is not found.

These changes allow users to load PySRRegressor models on any supported platform, regardless of where the model was originally saved. This enhancement is particularly useful for users working in mixed environments or collaborating across different operating systems.

The modifications are contained within the from_file method to minimize impact on other parts of the codebase. The method remains backwards compatible with existing usage patterns.

Testing:

  • Tested loading models saved on Linux from a Windows environment.
  • Verified that the existing functionality for creating models from scratch remains intact.

This enhancement addresses the issue of cross-platform incompatibility when loading saved models, improving the overall user experience and flexibility of PySRRegressor.

@coveralls
Copy link

coveralls commented Jul 27, 2024

Pull Request Test Coverage Report for Build 10138721258

Details

  • 15 of 17 (88.24%) changed or added relevant lines in 2 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.3%) to 93.448%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pysr/utils.py 12 14 85.71%
Files with Coverage Reduction New Missed Lines %
pysr/export_sympy.py 3 77.27%
Totals Coverage Status
Change from base Build 10137135457: -0.3%
Covered Lines: 1141
Relevant Lines: 1221

💛 - Coveralls

@MilesCranmer
Copy link
Owner

Thanks, this is great! I'll do a code review for suggesting some changes

pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
Copy link
Owner

@MilesCranmer MilesCranmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, added some comments. Also could you add some unittests for CustomUnpickler? The test should be one that is currently failing, so this CustomUnpickler is required to get that unittest working.

pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
@MilesCranmer
Copy link
Owner

(By the way, feel free to click "Resolved" on any of the comments once they are completed)

@zzccchen
Copy link
Author

(By the way, feel free to click "Resolved" on any of the comments once they are completed)

Got it :)

@MilesCranmer
Copy link
Owner

Looking good! Thanks. I think it just needs some tests now – ideally tests that fail with the current master branch, but are fixed by this PR

@MilesCranmer
Copy link
Owner

MilesCranmer commented Jul 29, 2024

One option is to add a test that is only run on the Windows tests – it could generate and then pickle the model on Windows, and then create and run a dockerfile that attempts to unpickle the model? (Since docker runs linux).

@zzccchen
Copy link
Author

Looking good! Thanks. I think it just needs some tests now – ideally tests that fail with the current master branch, but are fixed by this PR

Thank you for the feedback. I agree tests would be valuable, but I'm not experienced in writing test cases. Could you provide some guidance or examples for appropriate tests for these changes?

@zzccchen
Copy link
Author

One option is to add a test that is only run on the Windows tests – it could generate and then pickle the model on Windows, and then create and run a dockerfile that attempts to unpickle the model? (Since docker runs linux).

I appreciate your suggestion, but I'm afraid I'm a bit lost on how to implement this. The process of generating, pickling, and then using Docker for unpickling sounds complex. I'm not sure where to start. Would you be able to help implement this test, or provide a more detailed guide?

@MilesCranmer
Copy link
Owner

MilesCranmer commented Jul 29, 2024

Ok what I would do is:

  1. Copy https://github.com/MilesCranmer/PySR/blob/master/pysr/test/test_dev.py to pysr/test/test_pickle.py
  2. Copy https://github.com/MilesCranmer/PySR/blob/master/pysr/test/test_dev_pysr.dockerfile to pysr/test/test_pickle.dockerfile
  3. Modify the code
    def test_simple_change_to_backend(self):
    """Test that we can use a development version of SymbolicRegression.jl"""
    PYSR_TEST_JULIA_VERSION = os.environ.get("PYSR_TEST_JULIA_VERSION", "1.6")
    PYSR_TEST_PYTHON_VERSION = os.environ.get("PYSR_TEST_PYTHON_VERSION", "3.9")
    build_result = subprocess.run(
    [
    "docker",
    "build",
    "-t",
    "pysr-dev",
    "--build-arg",
    f"JLVERSION={PYSR_TEST_JULIA_VERSION}",
    "--build-arg",
    f"PYVERSION={PYSR_TEST_PYTHON_VERSION}",
    "-f",
    "pysr/test/test_dev_pysr.dockerfile",
    ".",
    ],
    env=os.environ,
    cwd=Path(__file__).parent.parent.parent,
    universal_newlines=True,
    )
    self.assertEqual(build_result.returncode, 0)
    test_result = subprocess.run(
    [
    "docker",
    "run",
    "--rm",
    "pysr-dev",
    "python3",
    "-c",
    "from pysr import SymbolicRegression as SR; print(SR.__test_function())",
    ],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    env=os.environ,
    cwd=Path(__file__).parent.parent.parent,
    )
    self.assertEqual(test_result.returncode, 0)
    self.assertEqual(test_result.stdout.decode("utf-8").strip(), "2.3")
    within test_pickle.py to run a simple PySR example where the equation file parameter set to some Path to a folder created with https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory. Also change the class name to TestPickle or something.
  4. Modify
    tests = [TestDev]
    if just_tests:
    return tests
    suite = unittest.TestSuite()
    loader = unittest.TestLoader()
    for test in tests:
    suite.addTests(loader.loadTestsFromTestCase(test))
    runner = unittest.TextTestRunner()
    return runner.run(suite)
    to reference TestPickle.
  5. Search for any appearances of test_dev throughout the codebase, and use that to figure out where you need to add the equivalent test_pickle.
  6. Modify the test_pickle.dockerfile, specifically all of these lines:
    RUN mkdir /pysr/pysr/test
    # Now, we create a custom version of SymbolicRegression.jl
    # First, we get the version from juliapkg.json:
    RUN python3 -c 'import json; print(json.load(open("/pysr/pysr/juliapkg.json", "r"))["packages"]["SymbolicRegression"]["version"])' > /pysr/sr_version
    # Remove any = or ^ or ~ from the version:
    RUN cat /pysr/sr_version | sed 's/[\^=~]//g' > /pysr/sr_version_processed
    # Now, we check out the version of SymbolicRegression.jl that PySR is using:
    RUN git clone -b "v$(cat /pysr/sr_version_processed)" --single-branch https://github.com/MilesCranmer/SymbolicRegression.jl /srjl
    # Edit SymbolicRegression.jl to create a new function.
    # We want to put this function immediately after `module SymbolicRegression`:
    RUN sed -i 's/module SymbolicRegression/module SymbolicRegression\n__test_function() = 2.3/' /srjl/src/SymbolicRegression.jl
    # Edit PySR to use the custom version of SymbolicRegression.jl:
    ADD ./pysr/test/generate_dev_juliapkg.py /generate_dev_juliapkg.py
    RUN python3 /generate_dev_juliapkg.py /pysr/pysr/juliapkg.json /srjl
    # Install and pre-compile
    RUN pip3 install --no-cache-dir . && python3 -c 'import pysr'
    to try to load the pickle file generated by the test you created in (3). And basically just verify that the equations loaded are correct.
  7. Note how the code here:
    build_result = subprocess.run(
    [
    "docker",
    "build",
    "-t",
    "pysr-dev",
    "--build-arg",
    f"JLVERSION={PYSR_TEST_JULIA_VERSION}",
    "--build-arg",
    f"PYVERSION={PYSR_TEST_PYTHON_VERSION}",
    "-f",
    "pysr/test/test_dev_pysr.dockerfile",
    ".",
    ],
    env=os.environ,
    cwd=Path(__file__).parent.parent.parent,
    universal_newlines=True,
    )
    self.assertEqual(build_result.returncode, 0)
    test_result = subprocess.run(
    [
    "docker",
    "run",
    "--rm",
    "pysr-dev",
    "python3",
    "-c",
    "from pysr import SymbolicRegression as SR; print(SR.__test_function())",
    ],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    env=os.environ,
    cwd=Path(__file__).parent.parent.parent,
    )
    self.assertEqual(test_result.returncode, 0)
    self.assertEqual(test_result.stdout.decode("utf-8").strip(), "2.3")
    builds and runs a dockerfile. Do the same in your new test_pickle.py, but reference the test_pickle.dockerfile. You should be able to pass the filename as an argument.

I think this should work. It might be overcomplicated though, maybe there's an easier way to test this. Feel free to take a different approach to test your new addition. For simple pure-Python tests you can just add a new test case to the class in test.py, such as

PySR/pysr/test/test.py

Lines 285 to 297 in 3aee19e

X = self.rstate.randn(100, 3) + 1j * self.rstate.randn(100, 3)
y = (2 + 1j) * np.cos(X[:, 0] * (0.5 - 0.3j))
model = PySRRegressor(
binary_operators=["+", "-", "*"],
unary_operators=["cos"],
**self.default_test_kwargs,
early_stop_condition="(loss, complexity) -> loss <= 1e-4 && complexity <= 6",
)
model.niterations = DEFAULT_NITERATIONS * 10
model.fit(X, y)
test_y = model.predict(X)
self.assertTrue(np.issubdtype(test_y.dtype, np.complexfloating))
self.assertLessEqual(np.average(np.abs(test_y - y) ** 2), 1e-4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants