Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus Error : totalMemory: 11.17GiB freeMemory: 11.10GiB #46

Closed
aaronrmm opened this issue Nov 5, 2018 · 1 comment
Closed

Bus Error : totalMemory: 11.17GiB freeMemory: 11.10GiB #46

aaronrmm opened this issue Nov 5, 2018 · 1 comment

Comments

@aaronrmm
Copy link

aaronrmm commented Nov 5, 2018

I'm getting a memory bus error on trying to load in train_x_lpd_5_phr.npz, even when attempting to load a pretrained model.

musegan.interpolation INFO     Using parameters:
{'beat_resolution': 12,
 'condition_track_idx': 3,
 'data_shape': [4, 48, 84, 5],
 'is_accompaniment': True,
 'is_conditional': False,
 'latent_dim': 128,
 'nets': {'discriminator': 'default', 'generator': 'accompaniment'},
 'use_binary_neurons': False}
musegan.interpolation INFO     Using configurations:
{'adam': {'beta1': 0.5, 'beta2': 0.9},
 'batch_size': 64,
 'checkpoint_dir': './musegan/exp/accompaniment/bass/model',
 'colormap': [[1.0, 0.0, 0.0],
              [1.0, 0.5, 0.0],
              [0.0, 1.0, 0.0],
              [0.0, 0.0, 1.0],
              [0.0, 0.5, 1.0]],
 'columns': 5,
 'config': './musegan/exp/accompaniment/bass/config.yaml',
 'data_filename': 'train_x_lpd_5_phr',
 'data_root': None,
 'data_source': 'sa',
 'evaluate_steps': 100,
 'gan_loss_type': 'wasserstein',
 'gpu': '0',
 'initial_learning_rate': 0.001,
 'learning_rate_schedule': {'end': 50000, 'end_value': 0.0, 'start': 45000},
 'log_loss_steps': 100,
 'lower': 0.0,
 'midi': {'is_drums': [1, 0, 0, 0, 0],
          'lowest_pitch': 24,
          'programs': [0, 0, 25, 33, 48],
          'tempo': 100},
 'mode': 'lerp',
 'n_dis_updates_per_gen_update': 5,
 'n_jobs': 20,
 'params': './musegan/exp/accompaniment/bass/params.yaml',
 'result_dir': './musegan/exp/accompaniment/bass/results/interpolation',
 'rows': 5,
 'runs': 10,
 'sample_grid': [8, 8],
 'save_array_samples': True,
 'save_checkpoint_steps': 10000,
 'save_image_samples': True,
 'save_pianoroll_samples': True,
 'save_samples_steps': 100,
 'save_summaries_steps': 0,
 'slope_schedule': {'end': 50000, 'end_value': 5.0, 'start': 10000},
 'steps': 50000,
 'upper': 1.0,
 'use_gradient_penalties': True,
 'use_learning_rate_decay': True,
 'use_random_transpose': False,
 'use_slope_annealing': False,
 'use_train_test_split': False}
musegan.model        INFO     Building model.
musegan.model        INFO     Building training nodes.
musegan.model        INFO     Building losses.
musegan.model        INFO     Building training ops.
musegan.model        INFO     Building summaries.
musegan.model        INFO     Building prediction nodes.
2018-11-05 22:08:23.124707: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-05 22:08:23.125223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-11-05 22:08:23.125265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-05 22:08:23.549934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-05 22:08:23.550032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-11-05 22:08:23.550059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-11-05 22:08:23.550423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10758 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
musegan.interpolation INFO     Restoring the latest checkpoint.
INFO:tensorflow:Restoring parameters from /content/musegan/exp/accompaniment/bass/model/model.ckpt-300450
tensorflow           INFO     Restoring parameters from /content/musegan/exp/accompaniment/bass/model/model.ckpt-300450
./musegan/scripts/run_interpolation.sh: line 24:   694 Bus error               (core dumped) python3 "$DIR/../src/interpolation.py" --checkpoint_dir "$1/model" --result_dir "$1/results/interpolation" --params "$1/params.yaml" --config "$1/config.yaml" --lower 0.0 --upper 1.0 --runs 10 --gpu "$gpu"
@salu133445
Copy link
Owner

I guess they are the same problems since the training data will also be loaded at the inference stage for the accompaniment model. You need to have >5G RAM for loading the entire training data. One solution is to load only part of the training data if you don't have enough RAM. You can modify the following function to achieve this (by setting the length of the first axis to a smaller number).

def load_data_from_npz(filename):
    """Load and return the training data from a npz file (sparse format)."""
    with np.load(filename) as f:
        data = np.zeros(f['shape'], np.bool_)
        data[[x for x in f['nonzero']]] = True
    return data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants