-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working with non-hardcoded data #141
Comments
As you pointed out hyperas is currently a simple wrapper that uses data() and model() as templates from which it formats code that it then executes. Meaning that within data() you would define everything, just like a regular script. In all of your examples, you basically want to be able to generate new templates that hyperas can call. e.g. lets say you have a application that uses hyperas based on an input dataset: data_template = "def data(): \n{pipeline} \nreturn x_train,y_train,x_test,y_test"
pipelines = {'mnist': ' import something \n# some reshape \n# some scaling', ...}
def get_data_func(dset):
pipeline = pipelines[dset]
return data_template.format(pipeline=pipeline)
def model(x_train, x_test, y_train, y_test):
# define model
return {'loss' :-acc , ....}
def do_optimize(input_dset):
data_func_string = get_data_func(input_dset)
best_run, best_model = optim.minimize(model=model,
data= data_func_string,
...)
return best_run, best_model
if __name__ == '__main__':
input_dset = input('What dataset do you want optimize a model for?')
best_run, best_model = do_optimize(input_dset) get_data_func('mnist') would return a string like: ''' def data():
import something
# some reshape
# some scaling
return x_train,y_train,x_test,y_test''' This currently is not allowed, but shouldn't take too long to hack out. Basically just making sure that formatting is consistent with the internal of hyperas. The source that you'd want to touch is here around line 194 or so. Something like: if not isinstance(data,str):
# line 194
else:
data_string = data The example above is also not how you should template strings in this situation. I recommend something like jinja if you are really going to go down that path and need flexibility. It may be better to just go with regular hyperopt in this situation. Does this help? |
Another way is to pickle the arguments for A simple example: def data():
import argparse
import pickle
args_file = 'data_args.pkl'
args = pickle.load(open(args_file, 'rb'))
(X_train, y_train) = some_file_loader(args.train)
(X_valid, y_valid) = some_file_loader(args.valid)
return X_train, y_train, X_valid, y_valid
import argparse
import pickle
parser = argparse.ArgumentParser()
parser.add_argument('--train', help='Training data file', type=str, required=True)
parser.add_argument('--valid', help='Validation data file', type=str, required=True)
args = parser.parse_args()
args_file = 'data_args.pkl'
pickle.dump(args, open(args_file, 'wb'))
X_train, y_train, X_valid, y_valid = data()
best_run, best_model = optim.minimize(model=model,
data=data,
...) |
For future reference, if someone else has this issue. There is a simple way to do it: We just have to write a function that returns the args: import argparse
def my_args():
parser = argparse.ArgumentParser()
parser.add_argument('--train', help='Training data file', type=str, required=True)
parser.add_argument('--valid', help='Validation data file', type=str, required=True)
args = parser.parse_args()
return args Then we can call it in best_run, best_model = optim.minimize(model=model,
data=data,
functions=[my_args],
...) then call it in def model(x_train, x_test, y_train, y_test):
args = my_args()
train_file = args.train
valid_file = args.valid
# define model
return {'loss' :-acc , ....}
``` |
As of now, I didn't find a way to pass parameters to the
data()
function, and it appears to ignore everything in the code except imports. This is because Hyperas creates a new Python file out of the data, the model and everything else before attempting to train, and this works well if you're training on MNIST or some other data set that came with the framework - or on random data. But what if the data-set is selected from a drop-down or retrieved from a URL? What if you want to run it out of a script that has a config file that specifies the path to the data? What if it needs to read a database? Is there a way to do this the way Hyperas is currently set up? If not, is there anything on the roadmap?The text was updated successfully, but these errors were encountered: