-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apparent unexpected difference in results when re-using a binary Network file #116
Comments
Thank you for raising this issue. We are aware of some small non-determinisms when using multiple threads to set up the network of people and places. (Look for the omp critical pragmas in the code). This has historically been considered acceptable because of the general stochastic nature of the model. We do expect the code to be deterministic when using a single-thread. Please also note that the networks are highly sensitive to the input files and command line options. The network files are also platform dependent. We are planning on making the parsing and reading of input files more robust. It would be helpful if you could give us the command lines you were using to demonstrate these differences, so that we can investigate further. |
This is on the same machine, running single threaded, with no variation in parameters. Completely identical command lines other than replacing /S: with /L: in the second case. You can test this with any example you like, but I can provide the specific commands I ran if you don't mind all the paths and input files being local to my machine:
You can see numerical differences in the results output to stdout: You can also compare the severity/age outputs - they differ significantly as I showed in the plots above. Simply removing the "/L:" argument causes the code to run correctly, albeit without the speed improvement of re-using the network. Note that this is not a random fluctuation but a constant effect: multiple sets of results all re-using the same network file (with no variation of parameters) will be identical, all with the same discrepancy from the original. |
Regarding randomness: If the RNG is used in the generation of the network, but not when a cached network is loaded, this might explain the discrepancy. The random sequence in the former case would be advanced relative to the latter. This would produce a constant effect since the random seeds are identical for the jobs. A simple fix would be to re-initialise the RNG after generation of the network. However, it could of course be a straightforward bug in the loading/saving of the file. |
There were a couple of places where we use P.newseed1 = (int)(ranf() * 1e8); to make new seeds, but Rand's setall already implements RNG splitting, so we now use that instead. This needed a few knock-on changes in the places where we seed the RNG, as if we wanted to reuse the seed values then we now need to store them. Finally, we now also split the seeds used for the setup phase, and reinitialise the RNG with them after the network has been created or loaded. It looks like historically the second pair of seeds had been used at this point, to make the runs identical regardless of how the network was made, but that this had been changed when seed-resetting was implemented. Now the determinism is back, fixing #116.
This is posted from the UK RAMP “RED TEAM” at Edinburgh University, as invited through UKRI and RAMP with agreement of Microsoft. We have taken the charge of kicking the tires on the code as a pseudo public user. We are testing documentation, stability and usability based upon the documentation we can find on the git repository. Thus, if we report something that is “well known” then it is in effect a report on the lack of documentation which would have made this clear.
Issue: Apparent unexpected difference in results when re-using a binary Network file
The expected mode of operation for the covid-sim package seems to be a single run with nominal conditions, followed by subsequent runs with variations to parameters: this is logical.
Time-saving measures are implemented by allowing the nominal run to save binary representations of the population density and network data. These files can be loaded and re-used in subsequent runs, rather than being re-created each time (using the /S: and /L: command line arguments)
However, there appears to be a bug in either the creation or re-use of the network file. If we attempt two completely identical runs, only varying in that the second should use the network file produced by the first, the results are quite different:
This mostly manifests as a time-offset, but there are numerical changes beyond that.
Re-use of the binary population density file seems to have no associated problems.
The text was updated successfully, but these errors were encountered: