Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to get functional representation of engine #9

Closed
wdm0006 opened this issue Mar 5, 2017 · 6 comments
Closed

Add method to get functional representation of engine #9

wdm0006 opened this issue Mar 5, 2017 · 6 comments

Comments

@wdm0006
Copy link
Collaborator

wdm0006 commented Mar 5, 2017

Currently, the engines are all statefull classes. This is useful for many things, but the final use case is an operation that looks much like a pure function:

func(url) = source

It is possible to add a method into the base engine class that will return a purely functional equivalent of the get_page_source method such that if you do (in pseudocode):

foo = Engine()
a = foo.get_page_source_as_func()
foo.set_state(bar='baz')
b = foo.get_page_source_as_func()

a != b

Further, the functions returned, a and b here, would take just one argument, url, and have all of the state of the parent object baked in. This allows us to do some interesting things more performantly, like:

source_generator = (a(url) for url in urls)

or

from joblib import Parallel
Parallel(n_jobs=10, backend="threading")(delayed(a)(url) for url in urls)
@wdm0006 wdm0006 added this to the v0.0.2 milestone Mar 5, 2017
@wdm0006 wdm0006 changed the title Add functional interface to base engine class Add method to get functional representation of engine Mar 5, 2017
@wdm0006 wdm0006 added the E:hard label Mar 5, 2017
@dlrobertson
Copy link

👍 This is a great idea. Definitely agree with the E:hard label though 😄

@coxjonc
Copy link

coxjonc commented Mar 6, 2017

Could we do it with a simple implementation like this, or do we want something that would support parallel execution?

@wdm0006
Copy link
Collaborator Author

wdm0006 commented Mar 6, 2017

I think that might work, I think we would need a test similar to the pseudocode outlined above. You're basically currying the function Engine.get_source(self, url) with the instance of self at that point in time. If self gets modified after, does it also modify the previously returned function? If so, it will be a little bit more involved, using something like functools or toolz to actually curry the method with a copy of the instance that won't get modified.

@coxjonc
Copy link

coxjonc commented Mar 6, 2017

Ah yeah you're right - any modifications to that instance of the class would be reflected by the function. I'll read up on functools and work on currying the function with a copy of all the instance variables. Good catch.

@omnunum
Copy link
Collaborator

omnunum commented Mar 9, 2017

This idea works as-is under the assumption that at no point during the execution of the get_page_source method will the state of the engine will change.

To answer this

If self gets modified after, does it also modify the previously returned function?

It will, since it is still bound to that engine object, even though we no longer have to explicitly reference it when we call the get_page_source method/fake function.

If we can't guarantee this, we should be able to accomplish what we want by implementing __deepcopy__ and performing that before the functional currying. This won't guarantee a run-to-run lack of state changes within a thread, but each instance will now be threadsafe (assuming one copy per thread).

@omnunum
Copy link
Collaborator

omnunum commented Mar 19, 2017

Assigned to @geezhawk

@omnunum omnunum modified the milestones: v0.0.1, v0.0.2 Mar 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants