Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues on creating HDF5 file #10

Open
QLgogo opened this issue Oct 2, 2017 · 16 comments
Open

Issues on creating HDF5 file #10

QLgogo opened this issue Oct 2, 2017 · 16 comments

Comments

@QLgogo
Copy link

QLgogo commented Oct 2, 2017

I run the code "python mk_dataset.py ..." from ./pyscripts . The screen printed the following lines:

...
Loading dataset.
Loading ingr vocab.
('Image path is:', '/home/yue_zhang/Desktop/im2recipe/data/recipe1M/images')
('H5 file is', '/home/yue_zhang/Desktop/im2recipe/data/h5/data.h5')
{'test': 100808, 'train': 471475, 'val': 100297}
Assembling dataset.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
Could not load image...Using black one instead.
...
Could not load image...Using black one instead.
Writing out data.

However, finally a 115 GB file is produced. What is wrong with this? Can this 115-GB file be used for the next test analyses?

@QLgogo
Copy link
Author

QLgogo commented Oct 2, 2017

Besides, I use the 115-GB file to do test analyses by running the codes: "th main.lua -test 1 -loadsnap im2recipe_model.t7". I got many 150*1024 matrixes. Could you tell me how to explain it? That is, given an image, how do this matrix tell us the probability for each recognised ingredient?

@nhynes
Copy link

nhynes commented Oct 2, 2017

What is wrong with this?

Nothing, other than that several images couldn't be loaded and were replaced with zeros instead of actual data. You should try to figure out why your images weren't loaded :)

I got many 150*1024 matrixes

Each matrix is a mini-batch of embeddings. There are two sets of embeddings: one for images and one for recipes. If you want to do embedding->ingredient prediction, you'll want the image embeddings and then train a new model to go from those to ingredient. I can tell you already that unless you add some special trick, it's not going to "just work."

@QLgogo
Copy link
Author

QLgogo commented Oct 3, 2017

Hi nhynes, thanks for your answers and patience. I encountered different problems and opened this new issue.
Could you tell me how large your RAM memory is when you run the code "th main.lua -test 1 -loadsnap im2recipe_model.t7". I use 32GB RAM memory and set up another 70GB swap memory, but still get "out of memory" error.
For embedding->ingredient prediction/recipe, do you mean I cannot directly get the results like your demo website (e.g., pineapple pie 0.89)? That is, I have to build another deep learning model by myself to do the classification by using the test results (i.e., embeddings) as input?

@QLgogo
Copy link
Author

QLgogo commented Oct 3, 2017

Sorry I want to add more details to "out of memory" error. I kept checking the disk space and RAM space when the codes were running. "out of memory" error occurred when there was enough disk space and RAM space. Specifically, after the screen printed "[torch.DoubleTensor of size 150x3x224x224]" and "[torch.DoubleTensor of size 15x150x1024]", the following error info appeared:

nil	
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-4200/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/home/yue_zhang/torch/install/bin/luajit: ...ue_zhang/torch/install/share/lua/5.1/threads/threads.lua:179: [thread 1 endcallback] ...e/yue_zhang/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 1 module of nn.Sequential:
In 1 module of nn.ParallelTable:
In 5 module of nn.Sequential:
In 1 module of nn.Sequential:
In 3 module of nn.Sequential:
/home/yue_zhang/torch/install/share/lua/5.1/nn/THNN.lua:110: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-4200/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
	[C]: in function 'v'
	/home/yue_zhang/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'BatchNormalization_updateOutput'
	...ng/torch/install/share/lua/5.1/nn/BatchNormalization.lua:124: in function <...ng/torch/install/share/lua/5.1/nn/BatchNormalization.lua:113>
	[C]: in function 'xpcall'
	...e/yue_zhang/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	.../yue_zhang/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../yue_zhang/torch/install/share/lua/5.1/nn/Sequential.lua:41>
	[C]: in function 'xpcall'
	...e/yue_zhang/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	.../yue_zhang/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../yue_zhang/torch/install/share/lua/5.1/nn/Sequential.lua:41>
	[C]: in function 'xpcall'
	...
	[C]: in function 'xpcall'
	...ue_zhang/torch/install/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
	...ue_zhang/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
	/home/yue_zhang/Desktop/im2recipe/drivers/test.lua:38: in function </home/yue_zhang/Desktop/im2recipe/drivers/test.lua:24>
	/home/yue_zhang/Desktop/im2recipe/drivers/init.lua:47: in function </home/yue_zhang/Desktop/im2recipe/drivers/init.lua:45>
	/home/yue_zhang/Desktop/im2recipe/drivers/init.lua:43: in function 'test'
	main.lua:52: in main chunk
	[C]: in function 'dofile'
	...hang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
	[C]: in function 'error'
	...e/yue_zhang/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
	.../yue_zhang/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
	/home/yue_zhang/Desktop/im2recipe/drivers/test.lua:55: in function </home/yue_zhang/Desktop/im2recipe/drivers/test.lua:40>
	[C]: in function 'xpcall'
	...ue_zhang/torch/install/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
	...ue_zhang/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
	/home/yue_zhang/Desktop/im2recipe/drivers/test.lua:38: in function </home/yue_zhang/Desktop/im2recipe/drivers/test.lua:24>
	/home/yue_zhang/Desktop/im2recipe/drivers/init.lua:47: in function </home/yue_zhang/Desktop/im2recipe/drivers/init.lua:45>
	/home/yue_zhang/Desktop/im2recipe/drivers/init.lua:43: in function 'test'
	main.lua:52: in main chunk
	[C]: in function 'dofile'
	...hang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50
stack traceback:
	[C]: in function 'error'
	...ue_zhang/torch/install/share/lua/5.1/threads/threads.lua:179: in function 'dojob'
	...ue_zhang/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
	/home/yue_zhang/Desktop/im2recipe/drivers/test.lua:38: in function </home/yue_zhang/Desktop/im2recipe/drivers/test.lua:24>
	/home/yue_zhang/Desktop/im2recipe/drivers/init.lua:47: in function </home/yue_zhang/Desktop/im2recipe/drivers/init.lua:45>
	/home/yue_zhang/Desktop/im2recipe/drivers/init.lua:43: in function 'test'
	main.lua:52: in main chunk
	[C]: in function 'dofile'
	...hang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

Do you have any idea? Many thanks in advance!

@nhynes
Copy link

nhynes commented Oct 3, 2017

You ran out of CUDA (i.e. GPU memory). You need to either use more gpus using the -ngpus option or reduce the batch size. Here's what the error message was in the trace you posted, in case you want to search the internet for more information: cuda runtime error (2) : out of memory

@QLgogo
Copy link
Author

QLgogo commented Oct 3, 2017

Yes, you are right. I tried to set -batchSize 50 and the error disappeared. Then how about embedding->ingredient prediction/recipe? Do you mean I cannot directly get the results like your demo website (e.g., pineapple pie 0.89)? That is, I have to build another deep learning model by myself to do the classification by using the test results (i.e., embeddings) as input?

@nhynes
Copy link

nhynes commented Oct 3, 2017

how about embedding->ingredient prediction/recipe?

The demo only does one of those things. I suggest that you read the paper to get a better sense of how the model might be used for your specific application :)

@QLgogo
Copy link
Author

QLgogo commented Oct 4, 2017

Hi nhynes, I read your paper and rank.py file. To get the ingredients from a given image, I think I can directly revise your rank.py file to print out the test_ids whose image-recipe pair has highest similarity. Then use the test_ids to find back the recipe (ingredients and instructions). Is it right?

@nhynes
Copy link

nhynes commented Oct 4, 2017

use the test_ids to find back the recipe

For sure! What I was saying was that you'd have a hard time directly predicting the ingredients from the embedding. You can always go im2recipe/recipe2im.

@QLgogo
Copy link
Author

QLgogo commented Oct 4, 2017

Sorry I wonder if the test_ids is the same as the ids in layer1.json file? If it is, then I should be able to track back?

@nhynes
Copy link

nhynes commented Oct 4, 2017

Yes.

@QLgogo
Copy link
Author

QLgogo commented Oct 11, 2017

Hi nhynes. Sorry it is me again. Finally I figured out how to modify the rank.py file to get the ingredient and recipe. Thanks for all above help!
However, when I replaced part of the test images (partitioned as 'test') with my own images (12000+ pieces), updated data.h5 file (including 80000+ test images) and then run th main.lua -test 1 -loadsnap im2recipe_model.t7, I found that the number of test_ids saved in the output results decreased to 50000+, not including any replaced test images. Could you tell me whether you add any filtering or subsampling operation when running th main.lua -test 1 -loadsnap im2recipe_model.t7 ?

@nhynes
Copy link

nhynes commented Oct 11, 2017

Hey @QLgogo, no worries! You're helping to expose challenges with deploying the model :)

To solve the problem, generally, you'll need to sort the images by recipe id so that they are in the order given by ids_test. You'll then need to update ilens_test to have the correct number of images for each recipe.

You only want the image embeddings though, right? In that case, it's much easier: just pick the image model out of the pretrained one and feed the images through that directly.

@QLgogo
Copy link
Author

QLgogo commented Oct 21, 2017

Hi nhynes. Yes, I only want the image embeddings and input them into rank.py to get test_ids. To check the structure of data.h5 (downloaded from recipe1M website), typing h5ls data.h5 shows the following:
classes_test Dataset {51334}
classes_train Dataset {238459}
classes_val Dataset {51129}
ids_test Dataset {51334}
ids_train Dataset {238459}
ids_val Dataset {51129}
ilens_test Dataset {51334}
ilens_train Dataset {238459}
ilens_val Dataset {51129}
imnames_test Dataset {82392}
imnames_train Dataset {383779}
imnames_val Dataset {82093}
impos_test Dataset {51334, 5}
impos_train Dataset {238459, 5}
impos_val Dataset {51129, 5}
ims_test Dataset {100848, 3, 256, 256}
ims_train Dataset {471557, 3, 256, 256}
ims_val Dataset {100311, 3, 256, 256}
ingrs_test Dataset {51334, 20}
ingrs_train Dataset {238459, 20}
ingrs_val Dataset {51129, 20}
numims_test Dataset {51334}
numims_train Dataset {238459}
numims_val Dataset {51129}
rbps_test Dataset {51334}
rbps_train Dataset {238459}
rbps_val Dataset {51129}
rlens_test Dataset {51334}
rlens_train Dataset {238459}
rlens_val Dataset {51129}
stvecs_test Dataset {464115, 1024}
stvecs_train Dataset {2163659, 1024}
stvecs_val Dataset {464059, 1024}

So each test_id (total number of ids_test is 51334) is related with more than one images (total number of imgs_test is 82392). However, in rank.py, test_ids.t7 is read in and an image name is returned based on its test_id. That is, a test_id is related with only one image (total length of names in rank.py is 51334). As a result, I can only collect the corresponding recipes for 51334 images.

To find out the corresponding recipes for my own images, I plan to directly replace the content of your .jpg files with the content of my .jpg files. So I need to make sure all images to be replaced is among the 51334 images. Thus, could you tell me how you code the multiple images within a test_id? That is, how to narrow down the 82392 images into 51334 images? e.g. Do you randomly select one from the multiple ones within a test_id or concatenate them?

@nhynes
Copy link

nhynes commented Oct 21, 2017

Since you only want the image embeddings, I highly recommend that you do not mess with the hdf5 file. Instead, try the approach of using the fine-tuned CNN directly to get the embeddings for your images.

Something like this.

@GZWQ
Copy link

GZWQ commented Oct 22, 2023

Hi, QLgogo, May you share me the h5 data file, i.e., data.h5? Here is my email: [email protected]

I would greatly appreciate your assistance in sharing this data. Please let me know if you can provide this data or if there are any specific procedures I should follow to access it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants