WIP: Passing model as callable to step#141 #111 #148

trebacz626 · 2023-11-07T14:17:40Z

This is work in progress initial idea how to handle it. If good i will adjust art_template to it

SebChw

Please make sure this indeed works.

SebChw · 2023-11-07T14:22:52Z

art/step/step.py

        trainer_kwargs: Dict = {},
        logger: Optional[Union[Logger, Iterable[Logger], bool]] = None,
    ):
        """
        Initialize a model-based step.

        Args:
-            model (ArtModule): The model associated with this step.
+            model_func (ArtModule): The model associated with this step.


correct docs

SebChw · 2023-11-07T14:32:10Z

art/step/step.py

@@ -227,6 +229,12 @@ def __call__(
        """
        self.model.set_metric_calculator(metric_calculator)
        super().__call__(previous_states, datamodule, metric_calculator)
+        #save model to file
+        ModelSaver().save(self.model, self.get_step_id(), self.name)


For me it would be lovely if every run would save the model. I'd really hate the fact that my best model was overwritten for the reason that I just wanted to try step with different configuration.

Since lightning can save models by itself I wonder if this functionality is really necessary. We shouldn't write new functionalities if they are provided

SebChw · 2023-11-07T14:36:03Z

art/step/step.py

@@ -227,6 +229,12 @@ def __call__(
        """
        self.model.set_metric_calculator(metric_calculator)
        super().__call__(previous_states, datamodule, metric_calculator)
+        #save model to file
+        ModelSaver().save(self.model, self.get_step_id(), self.name)
+        del self.model


Have you checked that this actually works and the memory is freed?

What if someone want to read step.model now? Maybe it could be changed into model path or something?

This won't solve the issue model will still be available from trainer

SebChw · 2023-11-07T14:42:48Z

art/step/step.py

+        #save model to file
+        ModelSaver().save(self.model, self.get_step_id(), self.name)
+        del self.model
+        if torch.cuda.is_available():


I'd suggest moving into function clean_after_step or something like this

SebChw · 2023-11-14T11:46:20Z

art/step/step.py

            trainer_kwargs (Dict, optional): Arguments to be passed to the trainer. Defaults to {}.
            logger (Optional[Union[Logger, Iterable[Logger], bool]], optional): Logger to be used. Defaults to None.
        """
        super().__init__()
        if logger is not None:
            logger.add_tags(self.name)
-
-        self.model = model
+        assert isinstance(model_func, Callable)


This is too week, basically any torch module is Callable

SebChw · 2023-11-14T12:11:56Z

art/step/step.py

-
-        self.model = model
+        assert isinstance(model_func, Callable)
+        self.model = model_func()


This shouldn'y be run during initialization but rather at step execution

SebChw · 2023-11-14T13:28:47Z

My findings are as follows:

If we use self.model and just write del self.model The object itself won't be removed as it is still stored in self.trainer
It would be better if we never initialize self.model inside the step (It is just much saver, as in our code other functions depends on self.model - We just will remove this dependency). As then we need to clean both self.model and self.trainer
del self.trainer cleans all the memory after model
There is still some leak after each step. The more self.train or self.validate the bigger this leak. I now it because Overfit lead to bigger leak than the OverfitOnebatch or Validate
This leaks depend on whether we passed some already initialize dataloader or not.
I checked between MNISTModel and my model that used ResNet152 and achieved exactly the same leaks so it means that model is no longer a problem. That other leak should be neglected I think

Here are the results that I obtained. Look that memory doesn't change during step additions. Also when I artificialy run dataloaders creation it doesn't lead to increased memory consumption. After I run the first step consumption increases and it is constant throughout all other steps run.

MNISTModel
{'before': [417382400], 'at_step_adding': [425689088, 425689088], 'just_before_step_run': [425689088, 2268672000, 2317844480], 'after_each_step_run': [2268672000, 2317844480, 2318102528], 'after_loaders': 417390592}

ResNet-152
{'before': [2319036416], 'at_step_adding': [2319081472, 2319081472], 'just_before_step_run': [2319081472, 2335186944, 2474815488], 'after_each_step_run': [2335186944, 2474815488, 2474905600], 'after_loaders': 2319036416}

Here is what happens when we don't clean anything. Memory consumption increases with every model added

ResNet-152
{'before': [417574912], 'at_step_adding': [425877504, 425877504], 'just_before_step_run': [425877504, 2513481728, 3409813504], 'after_each_step_run': [2513481728, 3409813504, 4340330496], 'after_loaders': 417579008}

SebChw · 2023-11-14T14:05:31Z

To reproduce memory consumption I used following script that must be places inside exp1 folder to work

import os

import datasets
import psutil
import torch
import torch.nn as nn
from art_data import MNISTDataModule
from einops import rearrange
from models import MNISTModel
from torchvision.models import resnet152

from art.core.base_components.base_model import ArtModule
from art.experiment.Experiment import ArtProject
from art.step.checks import CheckScoreCloseTo
from art.step.steps import CheckLossOnInit, Overfit, OverfitOneBatch
from art.utils.enums import (BATCH, INPUT, LOSS, PREDICTION, TARGET,
                             TRAIN_LOSS, VALIDATION_LOSS)

process = psutil.Process()

def get_data_module(n_train=200):
    mnist_data = datasets.load_dataset("mnist")

    mnist_data = mnist_data.rename_columns({"image": INPUT, "label": TARGET})
    mnist_data['train'] = mnist_data['train'].select(range(n_train))

    return MNISTDataModule(mnist_data)

class DummyModule(ArtModule):
    def __init__(self):
        super().__init__()
        self.model = resnet152(pretrained=False)
        self.linear = nn.Linear(1000, 10)

    def forward(self, x):
        return self.linear(self.model(x))
    
    def log_params(self,):
        return {
            "lr" : 0.001,
        }
    
    def parse_data(self, data):
        X = rearrange(data[BATCH][INPUT], "b h w -> b 1 h w").float()
        X = X.repeat(1, 3, 1, 1)
        X /= 255
        X = (X - 0.1307) / 0.3081  # mean value of MNIST dataset
        return {INPUT: X, TARGET: data[BATCH][TARGET]}

    def predict(self, data):
        return {PREDICTION: self.model(data[INPUT]), **data}

    def compute_loss(self, data):
        loss = data["CrossEntropyLoss"]  # user must know the classes names
        return {LOSS: loss}

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=0.001)
        return optimizer


for model_class in [MNISTModel, DummyModule]:
    mnist_data_module = get_data_module()

    memory_used = {
        "before": [process.memory_info().rss],
        "at_step_adding": [],
        "just_before_step_run": [],
        "after_each_step_run": [],
    }

    mnist_data_module.train_dataloader()
    mnist_data_module.val_dataloader()
    memory_used['after_loaders'] = process.memory_info().rss

    project = ArtProject("exp1", mnist_data_module)
    ce_loss = nn.CrossEntropyLoss()


    project.register_metrics([ce_loss])
    for i in range(1):
        project.add_step(
        CheckLossOnInit(lambda : model_class()),
        [CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])
        project.add_step(
            Overfit(lambda : model_class()),
            [CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])
        project.add_step(
        OverfitOneBatch(lambda : model_class(), number_of_steps=1),
        [CheckScoreCloseTo(metric=ce_loss, value=2.3, rel_tol=0.1)])    

        memory_used["at_step_adding"].append(process.memory_info().rss)

        memory_used["at_step_adding"].append(process.memory_info().rss)


    for step in project.steps:
        project.metric_calculator.compile(step["skipped_metrics"])
        step, checks = step["step"], step["checks"]
        project.state.current_step = step
        memory_used["just_before_step_run"].append(process.memory_info().rss)
        step(project.state.step_states, project.datamodule, project.metric_calculator)
        memory_used["after_each_step_run"].append(process.memory_info().rss)

    print(model_class.__name__)
    print(memory_used)

kordc · 2023-11-15T22:30:55Z

art/core/visualizer/visualization_decorators.py

@@ -34,7 +34,8 @@ def wrapper_visualize_input(*args, **kwargs):
                function: Decorated function.
            """
            if visualizing_function_in is not None:
-                visualizing_function_in(*args, **kwargs)
+                to_be_passed = args[1:]


Why do we need this?

As now we decorate classes and not instances. self is passed as an first arg here. I thinks we shouldn't give access to the model. That's why I did this.

kordc · 2023-11-15T22:34:35Z

art/step/steps.py

@@ -165,7 +168,6 @@ def do(self, previous_states: Dict):
        Args:
            previous_states (Dict): previous states
        """
-        self.model.turn_on_model_regularizations()


Why did you delete this?

We don't have access to the model here now. I think this should be rethough as Regularize step will be developed.

kordc · 2023-11-15T22:35:14Z

art/step/steps.py

@@ -78,7 +81,7 @@ class OverfitOneBatch(ModelStep):
    def __init__(
        self,
        model: ArtModule,
-        number_of_steps: int = 100,
+        number_of_steps: int = 50,


Okay, but does it matter?

Oh I changed it for the purpose of new tutorial. Actually Idk what default value is the best

kordc · 2023-11-15T22:40:26Z

art/step/steps.py

+        model.ml_train({"dataloader": self.datamodule.train_dataloader()})
+        model.set_metric_calculator(self.metric_calculator)
+        result = self.trainer.validate(model=model, datamodule= self.datamodule)
+        self.results["scores"].update(result[0])


I wonder if we could move ml_train to step, then those lines would be handled by default inside the initialize_model.

Since ML train will be used just by EvaluateBaseline step I would rather keep it here

passing model as callable

8dc2f11

SebChw requested changes Nov 7, 2023

View reviewed changes

SebChw reviewed Nov 14, 2023

View reviewed changes

SebChw added 2 commits November 15, 2023 20:33

Step now takes Class not object instance

35d04c3

adapt art to these changes

edeaf7d

kordc reviewed Nov 15, 2023

View reviewed changes

save obtained model

c66c722

SebChw force-pushed the feature/141-passing-model-to-step branch from 37f6bc6 to c66c722 Compare November 16, 2023 08:54

SebChw added 2 commits November 16, 2023 10:10

fix log params lack of model

01af401

rename art_decorator

2a3a6e8

SebChw changed the title ~~WIP: Passing model as callable to step#141~~ WIP: Passing model as callable to step#141 #111 Nov 16, 2023

SebChw mentioned this pull request Nov 16, 2023

Change setvizualization name closes #111 #149

Closed

Merge logger changes

babedc3

kordc approved these changes Nov 16, 2023

View reviewed changes

SebChw approved these changes Nov 16, 2023

View reviewed changes

SebChw merged commit b54a8d8 into develop Nov 16, 2023

SebChw deleted the feature/141-passing-model-to-step branch November 16, 2023 09:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Passing model as callable to step#141 #111 #148

WIP: Passing model as callable to step#141 #111 #148

trebacz626 commented Nov 7, 2023

SebChw left a comment

SebChw Nov 7, 2023

SebChw Nov 7, 2023

SebChw Nov 10, 2023 •

edited

Loading

SebChw Nov 7, 2023

SebChw Nov 14, 2023

SebChw Nov 7, 2023

SebChw Nov 14, 2023

SebChw Nov 14, 2023

SebChw Nov 14, 2023

SebChw commented Nov 14, 2023 •

edited

Loading

SebChw commented Nov 14, 2023

kordc Nov 15, 2023

SebChw Nov 16, 2023

kordc Nov 15, 2023

SebChw Nov 16, 2023

kordc Nov 15, 2023

SebChw Nov 16, 2023

kordc Nov 15, 2023

SebChw Nov 16, 2023

WIP: Passing model as callable to step#141 #111 #148

WIP: Passing model as callable to step#141 #111 #148

Conversation

trebacz626 commented Nov 7, 2023

SebChw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SebChw Nov 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SebChw commented Nov 14, 2023 • edited Loading

SebChw commented Nov 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SebChw Nov 10, 2023 •

edited

Loading

SebChw commented Nov 14, 2023 •

edited

Loading