Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on a single GPU #5

Open
azamshoaib opened this issue Mar 5, 2020 · 9 comments
Open

Training on a single GPU #5

azamshoaib opened this issue Mar 5, 2020 · 9 comments

Comments

@azamshoaib
Copy link

Hi,
I would like to know is this network can be trained on a single gpu. Because when I am training it gives me Cuda out of memory error. Please help me in this regard.

@yaohungt
Copy link

yaohungt commented Mar 5, 2020

Can you try this alternative codebase:
https://github.com/yaohungt/Capsules-Inverted-Attention-Routing

This uses less memory and has better inference speed.

@azamshoaib
Copy link
Author

@yaohungt Thank you so much. I have reduced the batch size and now it is training.

@AliS567
Copy link

AliS567 commented Mar 24, 2020

Hello,
I have a similar problem running on all 3 gpu's, my input size however is 84x84.
Thanks!

@yaohungt
Copy link

Hi, can you be more specific?

If your input has a larger size, then you may need a larger network to fit the training.

@AliS567
Copy link

AliS567 commented Mar 24, 2020

Yes of course, i was attempting to input .mat files of 84 x 84 with only 1 channel. Trying to work my way through some errors i decided to alter the dimensions of my input image to 32 x 32 to match the CIFAR10 data used in this example, i feel this should fix the memory problems. However, i now have an error to do with batch size matching. ValueError: Expected input batch_size (128) to match target batch_size (5).

I believe this is because i am inputting 32 x 32 with no padding.

Apologies for taking up your time, i am fairly new to pytorch!

Thanks alot for the swift reply! 😃

@yaohungt
Copy link

I haven't seen your code, but my guess is because of your input size: 84x84x1. While CIFAR10 has 32x32x3.

You can modify the config file in ./configs so that the code can work on your dataset.

@AliS567
Copy link

AliS567 commented Mar 24, 2020

I have altered the backbone code to accept one channel.

def DataGenerationwt():
    data_path='/home/icos/Desktop/Ali/compute/WT_features/'
    original_path='/home/icos/Desktop/Imene/1d_dataset4_updated/'

    data = (sio.loadmat(data_path+'wt_real.mat', squeeze_me=False, chars_as_strings=False, mat_dtype=True, struct_as_record=True))  
    label = (sio.loadmat(original_path+'EMIdatav2.mat', squeeze_me=False, chars_as_strings=False, mat_dtype=True, struct_as_record=True))  

    data_array = np.array(data['wt_real'])
    label_array = np.array(label['Y'])

    data_final = torch.from_numpy(data_array)
    label_final = torch.from_numpy(label_array)

    return data_final, label_final

# Data
print('==> Preparing data..')
# assert args.dataset == 'CIFAR10' or args.dataset == 'CIFAR100'
# transform_train = transforms.Compose([
#         transforms.RandomCrop(32, padding=4),
#         transforms.RandomHorizontalFlip(),
#         transforms.ToTensor(),
#         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
#     ])
# transform_test = transforms.Compose([
#         transforms.ToTensor(),
#         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
#     ])
assert args.dataset == 'wt'
data, label = DataGenerationwt()
print(data.size())
data = data.view(-1,1,32,32)
print(label.size())
input(data.size())
samples = int(data.size(0)*0.75)
samples2 = int(data.size(0)*0.25)
print(samples)
mainset = tudata.TensorDataset(data.float(), label)

my_train, my_test = torch.utils.data.random_split(mainset, [samples, samples2])
trainloader = tudata.DataLoader(my_train, batch_size=128, shuffle=True, num_workers=args.num_workers)        
testloader = tudata.DataLoader(my_test, batch_size=100, shuffle=False, num_workers=args.num_workers)

print('==> Building model..')
# Model parameters

I think my problem will be to do with the way im loading data?

ValueError: Expected input batch_size (128) to match target batch_size (5).

Thanks again for your time

@yaohungt
Copy link

I'm not sure. I think you can print 1) the shape of the default CIFAR10 data; and 2) the shape of your own data. They shall look alike.

@AliS567
Copy link

AliS567 commented Mar 24, 2020

Yeah it's been quite mind boggling so far, i'll keep working!

Thank you for all your good work!
Ali

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants