Loading from a backend #2195

Kieran108 · 2017-05-18T13:20:25Z

I am new to pymc3, and have an issue loading a saved trace. I have a simple model with two variables, of the form:

theta = pm.Uniform('theta',0.0,1.0) 
sigma = pm.HalfNormal('sigma',sd=1)

I run the model, and save the trace using the Text backend:

db = pm.backends.Text('trace')
trace = pm.sample(draws=numSamples, step=step, start=start, trace=db)

I then have a data file with 4 columns, theta_interval_, sigma_log_, theta, sigma. When I then try to load the trace, I get a MultiTrace object with 0 variables:

basic_model = pm.Model()
with basic_model: 
     trace = pm.backends.text.load('trace')
     print trace

Results in:
<MultiTrace: 1 chains, 1348 iterations, 0 variables>

I cannot use this MultiTrace object, if I simply try to plot the data, I get

pm.forestplot(trace)

ValueError: zero-size array to reduction operation maximum which has no identity

I also cannot extract the variables from the loaded trace, even though when I view the data, I see it has saved correctly. Is there an issue with loading a text backend, or is there a problem with how I am trying to load it? I have also tried using the SQL backend with similar results.

The text was updated successfully, but these errors were encountered:

ColCarroll · 2017-05-18T13:33:38Z

Thanks for reporting -- I'm closing to consolidate discussion to #2189, but it is helpful to know that you're using the backends and that this is one of the problems.

the tl;dr: Adding hamiltonian monte carlo samplers means that there is less reason to persist the trace to disk (since you can get a good sample in memory), and the backends are a holdover from pymc2 when this wasn't the case. For this reason, we're considering deprecating the backends in favor of only supporting ndarrays, and adding utilities for reading/writing to disk.

If you are running this on your own machine, the suggestion in the other issue to use pickle should suffice, though remember not to unpickle files from untrusted sources!

kyleabeauchamp · 2017-05-18T15:09:06Z

I am still skeptical of the idea that one can do away with burn-in, sub-sampling, and disk persistence entirely. It's nice to be able to serialize results for later consumption, possibly to have separate scripts for sampling and model generation and for downstream analysis.

ColCarroll · 2017-05-18T15:35:39Z

The part that I object to is that while sampling at every step there is a call to record, which writes data to disk in all but the default ndarray backend. This feels bad both for the performance overhead (haven't checked it, just a guess), and for the maintenance overhead (~1000 lines).

What has been discussed is getting rid of that, and only supporting serializing the resulting trace to disk, which sounds like addresses your concerns?

kyleabeauchamp · 2017-05-18T15:52:04Z

SGTM

fonnesbeck · 2017-05-18T16:22:27Z

Currently, I like to use the trace_to_dataframe to serialize:

trace_to_dataframe(trace).to_csv('output/trace.csv')

twiecki · 2017-05-19T07:23:02Z

@fonnesbeck But how can you load traces back into a trace object?

fonnesbeck · 2017-05-19T14:06:48Z

I don't usually load them back into their objects. Typically, I am generating output from many, many runs and then, for example, generating faceted plots in Seaborn with them, or some other type of post-processing. So, data frames are what I want.

junpenglao added the bug label May 18, 2017

ColCarroll closed this as completed May 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading from a backend #2195

Loading from a backend #2195

Kieran108 commented May 18, 2017

ColCarroll commented May 18, 2017

kyleabeauchamp commented May 18, 2017

ColCarroll commented May 18, 2017

kyleabeauchamp commented May 18, 2017

fonnesbeck commented May 18, 2017

twiecki commented May 19, 2017

fonnesbeck commented May 19, 2017

Loading from a backend #2195

Loading from a backend #2195

Comments

Kieran108 commented May 18, 2017

ColCarroll commented May 18, 2017

kyleabeauchamp commented May 18, 2017

ColCarroll commented May 18, 2017

kyleabeauchamp commented May 18, 2017

fonnesbeck commented May 18, 2017

twiecki commented May 19, 2017

fonnesbeck commented May 19, 2017