Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Passing model as callable to step#141 #111 #148

Merged
merged 7 commits into from
Nov 16, 2023

Conversation

trebacz626
Copy link
Collaborator

This is work in progress initial idea how to handle it. If good i will adjust art_template to it

Copy link
Owner

@SebChw SebChw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure this indeed works.

art/step/step.py Outdated
trainer_kwargs: Dict = {},
logger: Optional[Union[Logger, Iterable[Logger], bool]] = None,
):
"""
Initialize a model-based step.

Args:
model (ArtModule): The model associated with this step.
model_func (ArtModule): The model associated with this step.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct docs

art/step/step.py Outdated
@@ -227,6 +229,12 @@ def __call__(
"""
self.model.set_metric_calculator(metric_calculator)
super().__call__(previous_states, datamodule, metric_calculator)
#save model to file
ModelSaver().save(self.model, self.get_step_id(), self.name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it would be lovely if every run would save the model. I'd really hate the fact that my best model was overwritten for the reason that I just wanted to try step with different configuration.

Copy link
Owner

@SebChw SebChw Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since lightning can save models by itself I wonder if this functionality is really necessary. We shouldn't write new functionalities if they are provided

art/step/step.py Outdated
@@ -227,6 +229,12 @@ def __call__(
"""
self.model.set_metric_calculator(metric_calculator)
super().__call__(previous_states, datamodule, metric_calculator)
#save model to file
ModelSaver().save(self.model, self.get_step_id(), self.name)
del self.model
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked that this actually works and the memory is freed?

What if someone want to read step.model now? Maybe it could be changed into model path or something?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't solve the issue model will still be available from trainer

art/step/step.py Outdated
#save model to file
ModelSaver().save(self.model, self.get_step_id(), self.name)
del self.model
if torch.cuda.is_available():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving into function clean_after_step or something like this

art/step/step.py Outdated
trainer_kwargs (Dict, optional): Arguments to be passed to the trainer. Defaults to {}.
logger (Optional[Union[Logger, Iterable[Logger], bool]], optional): Logger to be used. Defaults to None.
"""
super().__init__()
if logger is not None:
logger.add_tags(self.name)

self.model = model
assert isinstance(model_func, Callable)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too week, basically any torch module is Callable

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

art/step/step.py Outdated

self.model = model
assert isinstance(model_func, Callable)
self.model = model_func()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn'y be run during initialization but rather at step execution

@SebChw
Copy link
Owner

SebChw commented Nov 14, 2023

My findings are as follows:

  1. If we use self.model and just write del self.model The object itself won't be removed as it is still stored in self.trainer
  2. It would be better if we never initialize self.model inside the step (It is just much saver, as in our code other functions depends on self.model - We just will remove this dependency). As then we need to clean both self.model and self.trainer
  3. del self.trainer cleans all the memory after model
  4. There is still some leak after each step. The more self.train or self.validate the bigger this leak. I now it because Overfit lead to bigger leak than the OverfitOnebatch or Validate
  5. This leaks depend on whether we passed some already initialize dataloader or not.
  6. I checked between MNISTModel and my model that used ResNet152 and achieved exactly the same leaks so it means that model is no longer a problem. That other leak should be neglected I think

Here are the results that I obtained. Look that memory doesn't change during step additions. Also when I artificialy run dataloaders creation it doesn't lead to increased memory consumption. After I run the first step consumption increases and it is constant throughout all other steps run.

MNISTModel
{'before': [417382400], 'at_step_adding': [425689088, 425689088], 'just_before_step_run': [425689088, 2268672000, 2317844480], 'after_each_step_run': [2268672000, 2317844480, 2318102528], 'after_loaders': 417390592}

ResNet-152
{'before': [2319036416], 'at_step_adding': [2319081472, 2319081472], 'just_before_step_run': [2319081472, 2335186944, 2474815488], 'after_each_step_run': [2335186944, 2474815488, 2474905600], 'after_loaders': 2319036416}

Here is what happens when we don't clean anything. Memory consumption increases with every model added

ResNet-152
{'before': [417574912], 'at_step_adding': [425877504, 425877504], 'just_before_step_run': [425877504, 2513481728, 3409813504], 'after_each_step_run': [2513481728, 3409813504, 4340330496], 'after_loaders': 417579008}

@SebChw
Copy link
Owner

SebChw commented Nov 14, 2023

To reproduce memory consumption I used following script that must be places inside exp1 folder to work

import os

import datasets
import psutil
import torch
import torch.nn as nn
from art_data import MNISTDataModule
from einops import rearrange
from models import MNISTModel
from torchvision.models import resnet152

from art.core.base_components.base_model import ArtModule
from art.experiment.Experiment import ArtProject
from art.step.checks import CheckScoreCloseTo
from art.step.steps import CheckLossOnInit, Overfit, OverfitOneBatch
from art.utils.enums import (BATCH, INPUT, LOSS, PREDICTION, TARGET,
                             TRAIN_LOSS, VALIDATION_LOSS)

process = psutil.Process()

def get_data_module(n_train=200):
    mnist_data = datasets.load_dataset("mnist")

    mnist_data = mnist_data.rename_columns({"image": INPUT, "label": TARGET})
    mnist_data['train'] = mnist_data['train'].select(range(n_train))

    return MNISTDataModule(mnist_data)

class DummyModule(ArtModule):
    def __init__(self):
        super().__init__()
        self.model = resnet152(pretrained=False)
        self.linear = nn.Linear(1000, 10)

    def forward(self, x):
        return self.linear(self.model(x))
    
    def log_params(self,):
        return {
            "lr" : 0.001,
        }
    
    def parse_data(self, data):
        X = rearrange(data[BATCH][INPUT], "b h w -> b 1 h w").float()
        X = X.repeat(1, 3, 1, 1)
        X /= 255
        X = (X - 0.1307) / 0.3081  # mean value of MNIST dataset
        return {INPUT: X, TARGET: data[BATCH][TARGET]}

    def predict(self, data):
        return {PREDICTION: self.model(data[INPUT]), **data}

    def compute_loss(self, data):
        loss = data["CrossEntropyLoss"]  # user must know the classes names
        return {LOSS: loss}

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=0.001)
        return optimizer


for model_class in [MNISTModel, DummyModule]:
    mnist_data_module = get_data_module()

    memory_used = {
        "before": [process.memory_info().rss],
        "at_step_adding": [],
        "just_before_step_run": [],
        "after_each_step_run": [],
    }

    mnist_data_module.train_dataloader()
    mnist_data_module.val_dataloader()
    memory_used['after_loaders'] = process.memory_info().rss

    project = ArtProject("exp1", mnist_data_module)
    ce_loss = nn.CrossEntropyLoss()


    project.register_metrics([ce_loss])
    for i in range(1):
        project.add_step(
        CheckLossOnInit(lambda : model_class()),
        [CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])
        project.add_step(
            Overfit(lambda : model_class()),
            [CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])
        project.add_step(
        OverfitOneBatch(lambda : model_class(), number_of_steps=1),
        [CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])    

        memory_used["at_step_adding"].append(process.memory_info().rss)

        memory_used["at_step_adding"].append(process.memory_info().rss)


    for step in project.steps:
        project.metric_calculator.compile(step["skipped_metrics"])
        step, checks = step["step"], step["checks"]
        project.state.current_step = step
        memory_used["just_before_step_run"].append(process.memory_info().rss)
        step(project.state.step_states, project.datamodule, project.metric_calculator)
        memory_used["after_each_step_run"].append(process.memory_info().rss)

    print(model_class.__name__)
    print(memory_used)

@@ -34,7 +34,8 @@ def wrapper_visualize_input(*args, **kwargs):
function: Decorated function.
"""
if visualizing_function_in is not None:
visualizing_function_in(*args, **kwargs)
to_be_passed = args[1:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As now we decorate classes and not instances. self is passed as an first arg here. I thinks we shouldn't give access to the model. That's why I did this.

@@ -165,7 +168,6 @@ def do(self, previous_states: Dict):
Args:
previous_states (Dict): previous states
"""
self.model.turn_on_model_regularizations()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you delete this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have access to the model here now. I think this should be rethough as Regularize step will be developed.

@@ -78,7 +81,7 @@ class OverfitOneBatch(ModelStep):
def __init__(
self,
model: ArtModule,
number_of_steps: int = 100,
number_of_steps: int = 50,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but does it matter?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I changed it for the purpose of new tutorial. Actually Idk what default value is the best

model.ml_train({"dataloader": self.datamodule.train_dataloader()})
model.set_metric_calculator(self.metric_calculator)
result = self.trainer.validate(model=model, datamodule= self.datamodule)
self.results["scores"].update(result[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could move ml_train to step, then those lines would be handled by default inside the initialize_model.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ML train will be used just by EvaluateBaseline step I would rather keep it here

@SebChw SebChw force-pushed the feature/141-passing-model-to-step branch from 37f6bc6 to c66c722 Compare November 16, 2023 08:54
@SebChw SebChw changed the title WIP: Passing model as callable to step#141 WIP: Passing model as callable to step#141 #111 Nov 16, 2023
@SebChw SebChw merged commit b54a8d8 into develop Nov 16, 2023
@SebChw SebChw deleted the feature/141-passing-model-to-step branch November 16, 2023 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants