Several issues in nni.nas.pytorch #3047

HeekangPark · 2020-10-30T06:06:14Z

Environment:

NNI version: master(999.0.0-developing)
NNI mode (local|remote|pai): local
Client OS: Ubuntu 18.04 LTS
Server OS (for remote mode only):
Python version: 3.7
PyTorch/TensorFlow version: 1.6
Is conda/virtualenv/venv used?: conda used
Is running in Docker?: no

SPOS Trainer

In SPOS example(examples/nas/spos), DALI is used to load data. DALI automatically put data on GPU, so the trainer(nni.nas.pytorch.spos.trainer.py) has no problem. However, if I use pytorch default dataloader, it fails to run.

I believe nni.nas.pytorch.spos.trainer.py should have lines to load data to gpu.

def train_one_epoch(self, epoch):
    self.model.train()
    meters = AverageMeterGroup()
    for step, (x, y) in enumerate(self.train_loader):
        x, y = x.to(self.device), y.to(self.device)  # added line
        self.optimizer.zero_grad()
        self.mutator.reset()
        # ....

def validate_one_epoch(self, epoch):
    self.model.eval()
    meters = AverageMeterGroup()
    with torch.no_grad():
        for step, (x, y) in enumerate(self.valid_loader):
            x, y = x.to(self.device), y.to(self.device)  # added line
            self.mutator.reset()
            logits = self.model(x)
        # ....

I only checked SPOS, but I think other trainers would have similar problems. They should also be fixed as well.

apply_fixed_architecture

I think the function apply_fixed_architecture() in nni.nas.pytorch.fixed should have an option to suppress log messages.

class FixedArchitecture(Mutator):
    def __init__(self, model, fixed_arc, strict=True, verbose=True):  # updated line
        super().__init__(model)
        self._fixed_arc = fixed_arc
        self._verbose = verbose  # added line
        # ....

    # ...

    def replace_layer_choice(self, module=None, prefix=""):
        if module is None:
            module = self.model
        for name, mutable in module.named_children():
            global_name = (prefix + "." if prefix else "") + name
            if isinstance(mutable, LayerChoice):
                chosen = self._fixed_arc[mutable.key]
                if sum(chosen) == 1 and max(chosen) == 1 and not mutable.return_mask:
                    if self._verbose:  # added line
                        _logger.info("Replacing %s with candidate number %d.", global_name, chosen.index(1))  # updated line
                    setattr(module, name, mutable[chosen.index(1)])
                else:
                    if self._verbose and mutable.return_mask:  # updated line
                        _logger.info("`return_mask` flag of %s is true. As it relies on the behavior of LayerChoice, " \
                                     "LayerChoice will not be replaced.")
                    # remove unused parameters
                    for ch, n in zip(chosen, mutable.names):
                        if ch == 0 and not isinstance(ch, float):
                            setattr(mutable, n, None)
            else:
                self.replace_layer_choice(mutable, global_name)


def apply_fixed_architecture(model, fixed_arc, verbose=True):   # updated line
    if isinstance(fixed_arc, str):
        with open(fixed_arc) as f:
            fixed_arc = json.load(f)
    architecture = FixedArchitecture(model, fixed_arc, verbose)  # updated line
    architecture.reset()
    # ....

The text was updated successfully, but these errors were encountered:

ultmaster · 2020-10-30T08:55:20Z

Hi @HeekangPark, thanks for your kind suggestions. I think both of them are great and you can submit PRs for that. Would love to have you as a contributor. :)

ultmaster self-assigned this Oct 30, 2020

HeekangPark mentioned this issue Oct 31, 2020

Fix Error in nas SPOS trainer, apply_fixed_architecture #3051

Merged

scarlett2018 added the user raised label Dec 5, 2020

ultmaster closed this as completed Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several issues in nni.nas.pytorch #3047

Several issues in nni.nas.pytorch #3047

HeekangPark commented Oct 30, 2020

ultmaster commented Oct 30, 2020

Several issues in nni.nas.pytorch #3047

Several issues in nni.nas.pytorch #3047

Comments

HeekangPark commented Oct 30, 2020

SPOS Trainer

apply_fixed_architecture

ultmaster commented Oct 30, 2020