Bus Error : totalMemory: 11.17GiB freeMemory: 11.10GiB #46

aaronrmm · 2018-11-05T22:15:17Z

I'm getting a memory bus error on trying to load in train_x_lpd_5_phr.npz, even when attempting to load a pretrained model.

musegan.interpolation INFO     Using parameters:
{'beat_resolution': 12,
 'condition_track_idx': 3,
 'data_shape': [4, 48, 84, 5],
 'is_accompaniment': True,
 'is_conditional': False,
 'latent_dim': 128,
 'nets': {'discriminator': 'default', 'generator': 'accompaniment'},
 'use_binary_neurons': False}
musegan.interpolation INFO     Using configurations:
{'adam': {'beta1': 0.5, 'beta2': 0.9},
 'batch_size': 64,
 'checkpoint_dir': './musegan/exp/accompaniment/bass/model',
 'colormap': [[1.0, 0.0, 0.0],
              [1.0, 0.5, 0.0],
              [0.0, 1.0, 0.0],
              [0.0, 0.0, 1.0],
              [0.0, 0.5, 1.0]],
 'columns': 5,
 'config': './musegan/exp/accompaniment/bass/config.yaml',
 'data_filename': 'train_x_lpd_5_phr',
 'data_root': None,
 'data_source': 'sa',
 'evaluate_steps': 100,
 'gan_loss_type': 'wasserstein',
 'gpu': '0',
 'initial_learning_rate': 0.001,
 'learning_rate_schedule': {'end': 50000, 'end_value': 0.0, 'start': 45000},
 'log_loss_steps': 100,
 'lower': 0.0,
 'midi': {'is_drums': [1, 0, 0, 0, 0],
          'lowest_pitch': 24,
          'programs': [0, 0, 25, 33, 48],
          'tempo': 100},
 'mode': 'lerp',
 'n_dis_updates_per_gen_update': 5,
 'n_jobs': 20,
 'params': './musegan/exp/accompaniment/bass/params.yaml',
 'result_dir': './musegan/exp/accompaniment/bass/results/interpolation',
 'rows': 5,
 'runs': 10,
 'sample_grid': [8, 8],
 'save_array_samples': True,
 'save_checkpoint_steps': 10000,
 'save_image_samples': True,
 'save_pianoroll_samples': True,
 'save_samples_steps': 100,
 'save_summaries_steps': 0,
 'slope_schedule': {'end': 50000, 'end_value': 5.0, 'start': 10000},
 'steps': 50000,
 'upper': 1.0,
 'use_gradient_penalties': True,
 'use_learning_rate_decay': True,
 'use_random_transpose': False,
 'use_slope_annealing': False,
 'use_train_test_split': False}
musegan.model        INFO     Building model.
musegan.model        INFO     Building training nodes.
musegan.model        INFO     Building losses.
musegan.model        INFO     Building training ops.
musegan.model        INFO     Building summaries.
musegan.model        INFO     Building prediction nodes.
2018-11-05 22:08:23.124707: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-05 22:08:23.125223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-11-05 22:08:23.125265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-05 22:08:23.549934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-05 22:08:23.550032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-11-05 22:08:23.550059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-11-05 22:08:23.550423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10758 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
musegan.interpolation INFO     Restoring the latest checkpoint.
INFO:tensorflow:Restoring parameters from /content/musegan/exp/accompaniment/bass/model/model.ckpt-300450
tensorflow           INFO     Restoring parameters from /content/musegan/exp/accompaniment/bass/model/model.ckpt-300450
./musegan/scripts/run_interpolation.sh: line 24:   694 Bus error               (core dumped) python3 "$DIR/../src/interpolation.py" --checkpoint_dir "$1/model" --result_dir "$1/results/interpolation" --params "$1/params.yaml" --config "$1/config.yaml" --lower 0.0 --upper 1.0 --runs 10 --gpu "$gpu"

The text was updated successfully, but these errors were encountered:

salu133445 · 2018-11-06T08:23:03Z

I guess they are the same problems since the training data will also be loaded at the inference stage for the accompaniment model. You need to have >5G RAM for loading the entire training data. One solution is to load only part of the training data if you don't have enough RAM. You can modify the following function to achieve this (by setting the length of the first axis to a smaller number).

def load_data_from_npz(filename):
    """Load and return the training data from a npz file (sparse format)."""
    with np.load(filename) as f:
        data = np.zeros(f['shape'], np.bool_)
        data[[x for x in f['nonzero']]] = True
    return data

salu133445 closed this as completed Nov 14, 2018

w00zie mentioned this issue May 27, 2020

Questions about the train_x_lpd_5_phr.npz file #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bus Error : totalMemory: 11.17GiB freeMemory: 11.10GiB #46

Bus Error : totalMemory: 11.17GiB freeMemory: 11.10GiB #46

aaronrmm commented Nov 5, 2018

salu133445 commented Nov 6, 2018

Bus Error : totalMemory: 11.17GiB freeMemory: 11.10GiB #46

Bus Error : totalMemory: 11.17GiB freeMemory: 11.10GiB #46

Comments

aaronrmm commented Nov 5, 2018

salu133445 commented Nov 6, 2018