Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releasing 2019.1.0 #239

Closed
chryswoods opened this issue Jan 31, 2019 · 32 comments
Closed

Releasing 2019.1.0 #239

chryswoods opened this issue Jan 31, 2019 · 32 comments

Comments

@chryswoods
Copy link
Member

We really need to push on and release 2019.1.0. As you may have seen, I've cleaned out all of the old issues, fixed many of them, and have now fully got the Linux CI/CD (with all passing tests) working (OS X build is running for the first time now). Paolo has got a visual studio build working, which I'd like to grab and add to pipelines so that we build all three main OS's at once.

Todo for 2019.1.0

@ppxasjsm - point me in the direction of your openmm optimisation script so that I can add this to package_sire.py as a script to run when a Sire binary is installed.

@lohedges - Could you revisit issue #229 and see if you can help Toni implement this fix? If this won't be done, then we will drop the issue.

@lohedges and @jmichel80 - Could you get example files and get sander to give you Amber-calculated energies so that we can double-check that the AmberPrm parser is reading/writing input files that have equivalent energies as sander. Tests could include "calculate energy in sander, then read/write the file using Sire, then calculate energy in sander using new file to ensure that the energy hasn't changed". Also, the bond, angle and dihedral+improper energies should be equal (dihedral+improper are summed together). The intramolecular coulomb and LJ energy can be compared if you use a long cutoff. The test_nrg function in test_amber2.py shows how to calculate intramolecular energies in Sire that match those calculated in Amber.

@chryswoods - I will make sure that the OS X (and perhaps Windows) devel pipeline builds all work, will fix the issue that the pipeline is still not stopped by a failed sire_tests, and will then add the hooks for pushes to "master" which will automatically create a new release on every push (the release will be automatically named from the version number reported by Sire). If I am feeling adventurous, I may even get it to auto-update the website with the new release links ;-)

Can we target getting all of this complete before Feb 8th (next Friday)? I'd really like to get Sire, and then BioSimSpace out as stable, released binaries before we all go away or get busy.

@jmichel80
Copy link
Contributor

Yes I have a dataset of 50+ solvated protein-ligand complexes now. Will upload a unit test soon (unless it's not a good idea to upload >2 GB of input files...)

@chryswoods
Copy link
Member Author

Not a good idea to put >2GB into SireUnitTests or BioSimSpace, as this will severely slow down the CI/CD (it would take too long to download for each build).

Can you create another repository that we can use for running manual tests against these files.

@ppxasjsm
Copy link
Collaborator

ppxasjsm commented Feb 1, 2019

The script for fixing CUDA platforms is here:
https://github.com/michellab/Sire/blob/devel/build/optimise_openmm.py

@lohedges
Copy link
Member

lohedges commented Feb 1, 2019

I notice in the optimise_openmm.py and build_sire.py scripts that we don't lock down the version of OpenMM, e.g. to 7.3. Would it be better to do that, then optimise the CUDA driver for a specific version. (I'm not sure how much changes between OpenMM versions, and hopefully their API is backwards compatible.)

@chryswoods
Copy link
Member Author

I'm pretty sure that the version in optimise_openmm.py is locked down to the version that was installed in Sire. I don't think it is worth locking down the version in build_sire.py as, that way, we always get the latest version, and then know asap if the new version breaks things.

@chryswoods
Copy link
Member Author

(also the sire_devel_latest_osx.run binary is now available and linked to from the website. It takes a LONG time to build, during which I discovered that azure pipelines has a 1 hour timeout)

@lohedges
Copy link
Member

lohedges commented Feb 1, 2019

Ah yes, on re-reading the optimise script I see that it grabs the version of OpenMM from the original install. Great!

@ppxasjsm
Copy link
Collaborator

ppxasjsm commented Feb 1, 2019

Yeah version is locked down, from the original install. We may want to lock down on a major release though, as in if OpenMM moved to 8.0 then we would need to test compatibility first.

@chryswoods
Copy link
Member Author

I spoke too soon. The OS X binaries made on azure pipelines have a broken python. Discovered this when trying to sort out the OS X BioSimSpace binaries. Python exits with

Fatal Python error: PyThreadState_Get: no current thread

Abort trap: 6

when I try to import Sire.Mol...

Very strange as the tests all worked. I will keep debugging.

@ppxasjsm
Copy link
Collaborator

ppxasjsm commented Feb 1, 2019

I think this is the same I have seen previously on my Mac. I still haven't done a new install on it.

@chryswoods
Copy link
Member Author

When I am back in the office I will build and upload a sire_devel_latest_osx.run from my "old" tabletop mac, and see if the problem persists. Strange as my day-to-day mac is always kept up to date so that I can "enjoy" these compatibility issues first ;-)

@chryswoods
Copy link
Member Author

I've fixed the OS X issues. The compile problem was because newer macs can't compile code that supports 10.8, so I had to update the minimum supported to 10.9 (Mavericks - released 2013). To make sure this worked I also had to update Qt5 to the latest version.

The OS X "no current thread" issue is a conda problem. Rather confusingly, they have statically linked libPython into the python binary shipped with conda on OS X. This is the wrong thing to do, as it means that all compiled python extensions CANNOT link to libPython for fear of duplicating symbols. My understanding was that best practice was for the python executable to shared link to libPython so that this could be shared by all compiled modules. Fortunately, Sire ships its own Python (sire_python) which is shared linked to libPython. To fix this, on OS X, I will change the python symlink so that it points to sire_python rather than python3.6m. Practically this makes no difference, other than adding the "--ppn" option to support TBB parallel libraries.

sire_devel_latest_osx.run has the compiled version. It should work if you use "sire_python". Equally, compiling on your laptop should also now work. I'm now adding in the code to change the symlink and add call optimise_openmm at install time. I will then look at the AmberPrm crash Julien reported (which should be a simple fix)

@chryswoods
Copy link
Member Author

No - didn't work. openmm and numpy are compiled without libPython, so are not compatible with sire_python. Got to work out how to now remove libPython from boost_python... This is not trivial as boost_python has to be shared otherwise we end up with multiple copies of global classes. Yet another string to the "I hate anaconda" bow... :-(

@chryswoods
Copy link
Member Author

I've fixed the miniconda version to 4.2.12 (2016 miniconda, python 3.5.0) on OS X. This is the last miniconda that used a separate libPython. This is a workaround until I can work out how to solve this problem on a more long-term basis. For linux, I've updated to the latest miniconda as there isn't an issue here. I am building a new sire-devel-latest-osx and sire-devel-latest-linux which should hopefully be uploaded by the end of today.

@chryswoods
Copy link
Member Author

Eeeee! Conda packages like numpy don't work with the old miniconda. Now decided to go all in with conda and use it for dependencies (GSL, boost, tbb etc.). If I like against these, and don't explicitly include libPython, then I think it will work...

@chryswoods
Copy link
Member Author

Useful thread discussing the static/dynamic linking of anaconda python. conda-forge/python-feedstock#222

I've updated the cmake files so that we now install Qt, GSL, boost and TBB from conda, and then Sire builds and links against these. So far it has worked for corelib and wrappers are now compiling on OS X. Compile is much quicker as we aren't building Qt or others :-). Should also make it easier in the future to create a "conda install sire" package, as now we can just list our dependencies, and don't have many things ourselves to worry about. Going to commit this for linux and OS X and see how it compiles on azure pipelines...

@chryswoods
Copy link
Member Author

Nope - while it builds on my mac, it fails on azure pipelines because the TBB from miniconda needs a newer OS X version than I want to support. The linux build similarly fails because it needs a newer compiler than I'd like to use (gcc 7). Will have to think about how to deal with this going forward...

@chryswoods
Copy link
Member Author

Even this did not fix numpy.... numpy refuses to load in ipython, or in python after importing Sire.Mol. sire_test doesn't run at all. Very annoying. But it does feel like I am making progress...

@chryswoods
Copy link
Member Author

Now deciding to use the conda build infrastructure. I'm almost at the point of writing a conda build recipe, so might as well go the whole way.

@ppxasjsm
Copy link
Collaborator

ppxasjsm commented Feb 5, 2019

Good luck!

@chryswoods
Copy link
Member Author

Ok - I now have Sire building, linking and (seemingly) mostly running correctly built using the conda-supplied compilers, cmake, make and all dependencies. I've updated all of the build scripts and removed the bundled source from the repo. You can now compile Sire using "compile_sire.sh" with only wget or curl on your system :-). There is still an issue with a crash caused by the python wrapping some return values (clone_const_reference of ViewsOfMol seems to cause a crash - likely caused by extra cleverness used with this class?).

I've got builds running for OS X and Linux to create new run files. I expect the linux build to timeout so will also build it locally so can push to docker hub.

@chryswoods
Copy link
Member Author

(except that gcc7 is really slow, so both the linux and OS X builds time out well before completing...)

@chryswoods
Copy link
Member Author

Linux build using conda for everything is now working and is uploaded (sire_devel_latest_linux.run and siremol/sire-devel:latest both point to the new version). CI/CD builds on linux now work with all unit tests pass, at least on the CI/CD server. I still see the crashes with double-free in sireparsertests, which I will investigate.

The OS X build crashes randomly still, which also needs investigating.

@chryswoods
Copy link
Member Author

OS X build now runs without crashing (fixed the clone_const_reference() bug). CI/CD now works on Linux and OS X - just waiting to see if all of the unit tests pass. This also appears to have fixed the crash for sireparsertests, at least on OS X. Have run all of them with ppn=4 and they run without crashing.

I've also updated the BioSimSpace pipelines and dockerfiles so that they use the new Sire layout. I think we are pretty much ready to release :-) (or, at least I will hold onto this happy thought and not look at the results and experience disappointment until Monday)

@chryswoods
Copy link
Member Author

From emailing with Paolo and my own tests the AmberPrm bug is still present when running in parallel over more than 4 cores. I will debug and fix this as a high priority.

Also, thanks to Paolo we have a full windows Conda build. I’d like to add this as well before releasing as it would be great to have 2019.1 released simultaneously on all three major OS platforms.

@jmichel80
Copy link
Contributor

hi @chryswoods what's the status with the AmberPrm parser bug ? Is there anything else holding us from a proper 2019.1 release ?

@chryswoods
Copy link
Member Author

My push to get this out a couple of weeks back was because I had time then to concentrate on fixing bugs, merging pull requests and debugging the CI/CD service. Unfortunately that time has now passed and I am too busy with other work. I will have another window in April...

The best route forwards for the AmberPrm bug is to disable parallelisation of the 14-scaling factor parsing. Parallelisation is controlled individually using code in amberprm.cpp that looks like this;

//assign all of the parameters

    //assign all of the parameters
    if (usesParallel())
    {
        AmberParams atoms, bonds, angles, dihedrals, excluded;

        tbb::parallel_invoke
        (
            [&](){ atoms = assign_atoms(); },
            [&](){ bonds = assign_bonds(); },
            [&](){ angles = assign_angles(); },
            [&](){ dihedrals = assign_dihedrals(); },
            [&](){ excluded = assign_excluded(); }
        );

        params += atoms;
        params += bonds;
        params += angles;
        params += dihedrals;
        params += excluded;
    }
    else
    {
        params += assign_atoms();
        params += assign_bonds();
        params += assign_angles();
        params += assign_dihedrals();
        params += assign_excluded();
    }

The best thing to do is replace

if (usesParallel())

with

if (false)

to disable parallelisation of that section, and then see if that fixes the crash. Keep disabling parallel sections until it is fixed.

For 2019.1 we have the additional issue that the libcpuid merge from Paolo still doesn't work on my mac and there are general issues that Lester is fixing as he is preparing a "conda install". These will all be resolved, but pushing for 2019.1 will now have to wait until April when I will next have time to dedicate some time to Sire.

@chryswoods
Copy link
Member Author

(should add, only commenting on a Sunday as I am on a train to Durham now - normally I would encourage everyone not to work at the weekend)

@mark-mackey-cresset
Copy link

What's the current status of 2019.1 - are we close to a final release now?

@lohedges
Copy link
Member

lohedges commented May 2, 2019

Hi Mark. I'm working on this right this moment. Everything has been updated in devel and I will soon create a pull request for a merge to master. A 2019.1.0 package can be found in our conda channel. Follow the installation instructions in the GitHub README to install. I've already uploaded a binary package to our cloud infrastructure and I can email you a download link if you'd like to test.

Cheers.

@lohedges
Copy link
Member

lohedges commented May 3, 2019

Sire 2019.1.0 is now released.

@lohedges lohedges closed this as completed May 3, 2019
@mark-mackey-cresset
Copy link

mark-mackey-cresset commented May 3, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants