Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

error when using 'nnictl view' #3495

Closed
Roy-Kid opened this issue Mar 29, 2021 · 10 comments
Closed

error when using 'nnictl view' #3495

Roy-Kid opened this issue Mar 29, 2021 · 10 comments

Comments

@Roy-Kid
Copy link

Roy-Kid commented Mar 29, 2021

Environment:

  • NNI version: v2.1
  • NNI mode (local|remote|pai): loca (after experiment)
  • Client OS: ubuntu
  • Server OS (for remote mode only):
  • Python version: 3.8
  • PyTorch/TensorFlow version: pytorch
  • Is conda/virtualenv/venv used?: no, then yes
  • Is running in Docker?: no

INFO: view experiment sr5k10ga...
Traceback (most recent call last):
File "/home/roy/.local/bin/nnictl", line 8, in
sys.exit(parse_args())
File "/home/roy/.local/lib/python3.8/site-packages/nni/tools/nnictl/nnictl.py", line 272, in parse_args
args.func(args)
File "/home/roy/.local/lib/python3.8/site-packages/nni/tools/nnictl/launcher.py", line 646, in view_experiment
manage_stopped_experiment(args, 'view')
File "/home/roy/.local/lib/python3.8/site-packages/nni/tools/nnictl/launcher.py", line 633, in manage_stopped_experiment
experiment_config = Config(experiment_id, experiments_dict[args.id]['logDir']).get_config()
File "/home/roy/.local/lib/python3.8/site-packages/nni/tools/nnictl/config_utils.py", line 88, in init
self.conn = sqlite3.connect(os.path.join(log_dir, experiment_id, 'db', 'nni.sqlite'))
File "/usr/lib/python3.8/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

After trial, I use nnictl to check out the result but the error raise. I try to find the solution in the issues and someone says that if conda used will fix this. But I did fix it. Can someone tells me how to fix this problem?

@J-shang
Copy link
Contributor

J-shang commented Mar 30, 2021

please check if you have sr5k10ga metadata in your ~/nni-experiments/.experiment, like:

{
    "sr5k10ga": {
        "id": "sr5k10ga",
        "port": 8080,
        "startTime": 1617081086228,
        "endTime": "N/A",
        "status": "STOPPED",
        "platform": "local",
        "experimentName": "your_exp_name",
        "tag": [],
        "pid": 2233,
        "webuiUrl": [
            "http://127.0.0.1:8080"
        ],
        "logDir": "/home/your_user_name/nni-experiments"
    }
}

It seems the logDir may empty in your metadata.

@Roy-Kid
Copy link
Author

Roy-Kid commented Mar 30, 2021

Non, It does the metadata in the .experiment:

image

And I can use the webUI during the experiment

@J-shang
Copy link
Contributor

J-shang commented Mar 30, 2021

For this experiment, you can set logDir as "logDir": "/home/your_user_name/nni-experiments" or the logDir you specified in config, then try to view again.

And to solve this issue, could you show your config.yml, launch command, or other information for our reference to reproduce this problem.

@Roy-Kid
Copy link
Author

Roy-Kid commented Mar 30, 2021

After setting the logDir it works! Here is my config:
image
All the metadata leave the logDir blank.

@kvartet
Copy link
Contributor

kvartet commented Apr 5, 2021

@J-shang, I encountered this problem when using nnictl resume. I also launched the experiment from Python, and did not set logDir in the configuration file. In my metadata, both webuiUrl and logDir are empty. In addition, I used experiment.start instead of experiment.run, and manually stopped the experiment through Ctrl-C. I'm not sure if this is related.

When I set experiment_working_directory in V2 schema and try to resume, it fails again, the error is:

INFO:  resume experiment 39a5yzhv...
Traceback (most recent call last):
  File "/home/v-yiruxu/anaconda3/envs/env/bin/nnictl", line 33, in <module>
    sys.exit(load_entry_point('nni', 'console_scripts', 'nnictl')())
  File "/home/v-yiruxu/nni/nni/tools/nnictl/nnictl.py", line 272, in parse_args
    args.func(args)
  File "/home/v-yiruxu/nni/nni/tools/nnictl/launcher.py", line 649, in resume_experiment
    manage_stopped_experiment(args, 'resume')
  File "/home/v-yiruxu/nni/nni/tools/nnictl/launcher.py", line 632, in manage_stopped_experiment
    experiment_config = Config(experiment_id, experiments_dict[args.id]['logDir']).get_config()
  File "/home/v-yiruxu/nni/nni/tools/nnictl/config_utils.py", line 88, in __init__
    self.conn = sqlite3.connect(os.path.join(log_dir, experiment_id, 'db', 'nni.sqlite'))
sqlite3.OperationalError: unable to open database file

@J-shang
Copy link
Contributor

J-shang commented Apr 6, 2021

@kvartet In fact, we did not support resume and view experiment launched by python in nni v2.1, and this feature will support in nni v2.2 by #3490 .
To solve this issue, try to modify the logDir in metadata, and check if there has ./EXPERIMENT_ID/db/nni.sqlite under logDir.

@kvartet
Copy link
Contributor

kvartet commented Apr 6, 2021

oops! Looking forward to the nni V2.2!

@Roy-Kid
Copy link
Author

Roy-Kid commented May 20, 2021

@J-shang Hi, I update the nni to v2.2 by sudo pip3 install --update nni, then the version of nni update to 2.2 but nnictl still 2.1. Thus, after I run an experiment from python, the logDir still leaves blank. Could you tell me how to fix this?

@J-shang
Copy link
Contributor

J-shang commented May 20, 2021

@Roy-Kid Hello, this may due to pip3 and sudo pip3 called pip3 in different locations. You could try python3 -m pip uninstall nni and make sure you have uninstall nni, then python3 -m pip install nni. Also, you may need sudo pip3 uninstall nni to uninstall the nni installed by sudo pip3.

By the way, why you use sudo?

@Roy-Kid
Copy link
Author

Roy-Kid commented May 20, 2021

@J-shang Thanks! I fix it. Using sudo is a very bad habit and I will also fix it in the future. Thanks again for your kindly help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants