Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading from a backend #2195

Closed
Kieran108 opened this issue May 18, 2017 · 7 comments
Closed

Loading from a backend #2195

Kieran108 opened this issue May 18, 2017 · 7 comments
Labels

Comments

@Kieran108
Copy link

I am new to pymc3, and have an issue loading a saved trace. I have a simple model with two variables, of the form:

theta = pm.Uniform('theta',0.0,1.0) 
sigma = pm.HalfNormal('sigma',sd=1)

I run the model, and save the trace using the Text backend:

db = pm.backends.Text('trace')
trace = pm.sample(draws=numSamples, step=step, start=start, trace=db)

I then have a data file with 4 columns, theta_interval_, sigma_log_, theta, sigma. When I then try to load the trace, I get a MultiTrace object with 0 variables:

basic_model = pm.Model()
with basic_model: 
     trace = pm.backends.text.load('trace')
     print trace

Results in:
<MultiTrace: 1 chains, 1348 iterations, 0 variables>

I cannot use this MultiTrace object, if I simply try to plot the data, I get

pm.forestplot(trace)

ValueError: zero-size array to reduction operation maximum which has no identity

I also cannot extract the variables from the loaded trace, even though when I view the data, I see it has saved correctly. Is there an issue with loading a text backend, or is there a problem with how I am trying to load it? I have also tried using the SQL backend with similar results.

@junpenglao junpenglao added the bug label May 18, 2017
@ColCarroll
Copy link
Member

Thanks for reporting -- I'm closing to consolidate discussion to #2189, but it is helpful to know that you're using the backends and that this is one of the problems.

the tl;dr: Adding hamiltonian monte carlo samplers means that there is less reason to persist the trace to disk (since you can get a good sample in memory), and the backends are a holdover from pymc2 when this wasn't the case. For this reason, we're considering deprecating the backends in favor of only supporting ndarrays, and adding utilities for reading/writing to disk.

If you are running this on your own machine, the suggestion in the other issue to use pickle should suffice, though remember not to unpickle files from untrusted sources!

@kyleabeauchamp
Copy link
Contributor

I am still skeptical of the idea that one can do away with burn-in, sub-sampling, and disk persistence entirely. It's nice to be able to serialize results for later consumption, possibly to have separate scripts for sampling and model generation and for downstream analysis.

@ColCarroll
Copy link
Member

The part that I object to is that while sampling at every step there is a call to record, which writes data to disk in all but the default ndarray backend. This feels bad both for the performance overhead (haven't checked it, just a guess), and for the maintenance overhead (~1000 lines).

What has been discussed is getting rid of that, and only supporting serializing the resulting trace to disk, which sounds like addresses your concerns?

@kyleabeauchamp
Copy link
Contributor

SGTM

@fonnesbeck
Copy link
Member

Currently, I like to use the trace_to_dataframe to serialize:

trace_to_dataframe(trace).to_csv('output/trace.csv')

@twiecki
Copy link
Member

twiecki commented May 19, 2017

@fonnesbeck But how can you load traces back into a trace object?

@fonnesbeck
Copy link
Member

I don't usually load them back into their objects. Typically, I am generating output from many, many runs and then, for example, generating faceted plots in Seaborn with them, or some other type of post-processing. So, data frames are what I want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants