-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Passing model as callable to step#141 #111 #148
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure this indeed works.
art/step/step.py
Outdated
trainer_kwargs: Dict = {}, | ||
logger: Optional[Union[Logger, Iterable[Logger], bool]] = None, | ||
): | ||
""" | ||
Initialize a model-based step. | ||
|
||
Args: | ||
model (ArtModule): The model associated with this step. | ||
model_func (ArtModule): The model associated with this step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct docs
art/step/step.py
Outdated
@@ -227,6 +229,12 @@ def __call__( | |||
""" | |||
self.model.set_metric_calculator(metric_calculator) | |||
super().__call__(previous_states, datamodule, metric_calculator) | |||
#save model to file | |||
ModelSaver().save(self.model, self.get_step_id(), self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it would be lovely if every run would save the model. I'd really hate the fact that my best model was overwritten for the reason that I just wanted to try step with different configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since lightning can save models by itself I wonder if this functionality is really necessary. We shouldn't write new functionalities if they are provided
art/step/step.py
Outdated
@@ -227,6 +229,12 @@ def __call__( | |||
""" | |||
self.model.set_metric_calculator(metric_calculator) | |||
super().__call__(previous_states, datamodule, metric_calculator) | |||
#save model to file | |||
ModelSaver().save(self.model, self.get_step_id(), self.name) | |||
del self.model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked that this actually works and the memory is freed?
What if someone want to read step.model now? Maybe it could be changed into model path or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't solve the issue model will still be available from trainer
art/step/step.py
Outdated
#save model to file | ||
ModelSaver().save(self.model, self.get_step_id(), self.name) | ||
del self.model | ||
if torch.cuda.is_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest moving into function clean_after_step or something like this
art/step/step.py
Outdated
trainer_kwargs (Dict, optional): Arguments to be passed to the trainer. Defaults to {}. | ||
logger (Optional[Union[Logger, Iterable[Logger], bool]], optional): Logger to be used. Defaults to None. | ||
""" | ||
super().__init__() | ||
if logger is not None: | ||
logger.add_tags(self.name) | ||
|
||
self.model = model | ||
assert isinstance(model_func, Callable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too week, basically any torch module is Callable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
art/step/step.py
Outdated
|
||
self.model = model | ||
assert isinstance(model_func, Callable) | ||
self.model = model_func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn'y be run during initialization but rather at step execution
My findings are as follows:
Here are the results that I obtained. Look that memory doesn't change during step additions. Also when I artificialy run dataloaders creation it doesn't lead to increased memory consumption. After I run the first step consumption increases and it is constant throughout all other steps run. MNISTModel ResNet-152 Here is what happens when we don't clean anything. Memory consumption increases with every model added ResNet-152 |
To reproduce memory consumption I used following script that must be places inside exp1 folder to work import os
import datasets
import psutil
import torch
import torch.nn as nn
from art_data import MNISTDataModule
from einops import rearrange
from models import MNISTModel
from torchvision.models import resnet152
from art.core.base_components.base_model import ArtModule
from art.experiment.Experiment import ArtProject
from art.step.checks import CheckScoreCloseTo
from art.step.steps import CheckLossOnInit, Overfit, OverfitOneBatch
from art.utils.enums import (BATCH, INPUT, LOSS, PREDICTION, TARGET,
TRAIN_LOSS, VALIDATION_LOSS)
process = psutil.Process()
def get_data_module(n_train=200):
mnist_data = datasets.load_dataset("mnist")
mnist_data = mnist_data.rename_columns({"image": INPUT, "label": TARGET})
mnist_data['train'] = mnist_data['train'].select(range(n_train))
return MNISTDataModule(mnist_data)
class DummyModule(ArtModule):
def __init__(self):
super().__init__()
self.model = resnet152(pretrained=False)
self.linear = nn.Linear(1000, 10)
def forward(self, x):
return self.linear(self.model(x))
def log_params(self,):
return {
"lr" : 0.001,
}
def parse_data(self, data):
X = rearrange(data[BATCH][INPUT], "b h w -> b 1 h w").float()
X = X.repeat(1, 3, 1, 1)
X /= 255
X = (X - 0.1307) / 0.3081 # mean value of MNIST dataset
return {INPUT: X, TARGET: data[BATCH][TARGET]}
def predict(self, data):
return {PREDICTION: self.model(data[INPUT]), **data}
def compute_loss(self, data):
loss = data["CrossEntropyLoss"] # user must know the classes names
return {LOSS: loss}
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=0.001)
return optimizer
for model_class in [MNISTModel, DummyModule]:
mnist_data_module = get_data_module()
memory_used = {
"before": [process.memory_info().rss],
"at_step_adding": [],
"just_before_step_run": [],
"after_each_step_run": [],
}
mnist_data_module.train_dataloader()
mnist_data_module.val_dataloader()
memory_used['after_loaders'] = process.memory_info().rss
project = ArtProject("exp1", mnist_data_module)
ce_loss = nn.CrossEntropyLoss()
project.register_metrics([ce_loss])
for i in range(1):
project.add_step(
CheckLossOnInit(lambda : model_class()),
[CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])
project.add_step(
Overfit(lambda : model_class()),
[CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])
project.add_step(
OverfitOneBatch(lambda : model_class(), number_of_steps=1),
[CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])
memory_used["at_step_adding"].append(process.memory_info().rss)
memory_used["at_step_adding"].append(process.memory_info().rss)
for step in project.steps:
project.metric_calculator.compile(step["skipped_metrics"])
step, checks = step["step"], step["checks"]
project.state.current_step = step
memory_used["just_before_step_run"].append(process.memory_info().rss)
step(project.state.step_states, project.datamodule, project.metric_calculator)
memory_used["after_each_step_run"].append(process.memory_info().rss)
print(model_class.__name__)
print(memory_used) |
@@ -34,7 +34,8 @@ def wrapper_visualize_input(*args, **kwargs): | |||
function: Decorated function. | |||
""" | |||
if visualizing_function_in is not None: | |||
visualizing_function_in(*args, **kwargs) | |||
to_be_passed = args[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As now we decorate classes and not instances. self
is passed as an first arg here. I thinks we shouldn't give access to the model. That's why I did this.
@@ -165,7 +168,6 @@ def do(self, previous_states: Dict): | |||
Args: | |||
previous_states (Dict): previous states | |||
""" | |||
self.model.turn_on_model_regularizations() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have access to the model here now. I think this should be rethough as Regularize step will be developed.
@@ -78,7 +81,7 @@ class OverfitOneBatch(ModelStep): | |||
def __init__( | |||
self, | |||
model: ArtModule, | |||
number_of_steps: int = 100, | |||
number_of_steps: int = 50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but does it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I changed it for the purpose of new tutorial. Actually Idk what default value is the best
model.ml_train({"dataloader": self.datamodule.train_dataloader()}) | ||
model.set_metric_calculator(self.metric_calculator) | ||
result = self.trainer.validate(model=model, datamodule= self.datamodule) | ||
self.results["scores"].update(result[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could move ml_train to step, then those lines would be handled by default inside the initialize_model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ML train will be used just by EvaluateBaseline step I would rather keep it here
37f6bc6
to
c66c722
Compare
This is work in progress initial idea how to handle it. If good i will adjust art_template to it