Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Several issues in nni.nas.pytorch #3047

Closed
HeekangPark opened this issue Oct 30, 2020 · 1 comment
Closed

Several issues in nni.nas.pytorch #3047

HeekangPark opened this issue Oct 30, 2020 · 1 comment
Assignees

Comments

@HeekangPark
Copy link
Contributor

Environment:

  • NNI version: master(999.0.0-developing)
  • NNI mode (local|remote|pai): local
  • Client OS: Ubuntu 18.04 LTS
  • Server OS (for remote mode only):
  • Python version: 3.7
  • PyTorch/TensorFlow version: 1.6
  • Is conda/virtualenv/venv used?: conda used
  • Is running in Docker?: no

SPOS Trainer

In SPOS example(examples/nas/spos), DALI is used to load data. DALI automatically put data on GPU, so the trainer(nni.nas.pytorch.spos.trainer.py) has no problem. However, if I use pytorch default dataloader, it fails to run.

I believe nni.nas.pytorch.spos.trainer.py should have lines to load data to gpu.

def train_one_epoch(self, epoch):
    self.model.train()
    meters = AverageMeterGroup()
    for step, (x, y) in enumerate(self.train_loader):
        x, y = x.to(self.device), y.to(self.device)  # added line
        self.optimizer.zero_grad()
        self.mutator.reset()
        # ....

def validate_one_epoch(self, epoch):
    self.model.eval()
    meters = AverageMeterGroup()
    with torch.no_grad():
        for step, (x, y) in enumerate(self.valid_loader):
            x, y = x.to(self.device), y.to(self.device)  # added line
            self.mutator.reset()
            logits = self.model(x)
        # ....

I only checked SPOS, but I think other trainers would have similar problems. They should also be fixed as well.

apply_fixed_architecture

I think the function apply_fixed_architecture() in nni.nas.pytorch.fixed should have an option to suppress log messages.

class FixedArchitecture(Mutator):
    def __init__(self, model, fixed_arc, strict=True, verbose=True):  # updated line
        super().__init__(model)
        self._fixed_arc = fixed_arc
        self._verbose = verbose  # added line
        # ....

    # ...

    def replace_layer_choice(self, module=None, prefix=""):
        if module is None:
            module = self.model
        for name, mutable in module.named_children():
            global_name = (prefix + "." if prefix else "") + name
            if isinstance(mutable, LayerChoice):
                chosen = self._fixed_arc[mutable.key]
                if sum(chosen) == 1 and max(chosen) == 1 and not mutable.return_mask:
                    if self._verbose:  # added line
                        _logger.info("Replacing %s with candidate number %d.", global_name, chosen.index(1))  # updated line
                    setattr(module, name, mutable[chosen.index(1)])
                else:
                    if self._verbose and mutable.return_mask:  # updated line
                        _logger.info("`return_mask` flag of %s is true. As it relies on the behavior of LayerChoice, " \
                                     "LayerChoice will not be replaced.")
                    # remove unused parameters
                    for ch, n in zip(chosen, mutable.names):
                        if ch == 0 and not isinstance(ch, float):
                            setattr(mutable, n, None)
            else:
                self.replace_layer_choice(mutable, global_name)


def apply_fixed_architecture(model, fixed_arc, verbose=True):   # updated line
    if isinstance(fixed_arc, str):
        with open(fixed_arc) as f:
            fixed_arc = json.load(f)
    architecture = FixedArchitecture(model, fixed_arc, verbose)  # updated line
    architecture.reset()
    # ....
@ultmaster ultmaster self-assigned this Oct 30, 2020
@ultmaster
Copy link
Contributor

Hi @HeekangPark, thanks for your kind suggestions. I think both of them are great and you can submit PRs for that. Would love to have you as a contributor. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants