Skip to content
This repository has been archived by the owner on Apr 9, 2024. It is now read-only.

Warning from clu.metrics when running evaluation #4

Open
YingtianDt opened this issue Aug 3, 2022 · 6 comments
Open

Warning from clu.metrics when running evaluation #4

YingtianDt opened this issue Aug 3, 2022 · 6 comments

Comments

@YingtianDt
Copy link

Hi, thank you very much for your wonderful work!

I just tried to run it on MOVI-A dataset. At the evaluation time, the CLU.metrics gives a logging as (I adjust the batch size to be 8):
metrics.py:232] Ignoring mask for model output 'loss' because of shape mismatch: output.shape=() vs. mask.shape=(8,)

I read through the code and think this is fine. But I am writting to ensure this.

Also, can I ask if my evaluation metrics on MOVI-A look fine? -- eval_ari=0.8829; eval_ari_nobg=0.9373; eval_loss=28674.48.

Thank you in advance.

@nilushacj
Copy link

Hi @YingtianDt . Could you kindly share your code on how you performed the inference/predictions and visualized the results?

I was able to train the model. But I am not able to figure out how to use the output from the resulting directory (which has the checkpoints and plugins subfolders from training ). So how do I use the trained model to test it on new video data and visualize the predictions?

@YingtianDt
Copy link
Author

Hi @nilushacj . I am also working on it, so I don't have a complete code for this now. But I suggest to check out lib/trainer/evaluate.py:300, where the function "evaluate" uses a writer to first record the evaluation metrics (write_scalars) and then the predictions (write_images). It seems to generate a file for tensorboard, but I didn't use it.

@nilushacj
Copy link

Hi @YingtianDt . Would you happen to know specifically which file it is? Basically when I open Tensorboard with the output checkpoint folder as the log directory, I am able to visualize the results of the training there. However, I am looking for the actual matrices/tensors corresponding to the predictions so that I can save each output video (segmentations, flow and bounding boxes) and visualize it.

As you correctly pointed out, I have been looking at the evaluate function (in both the trainer and evaluator) given in the code. I am not quite sure how I can use the output from "writer.write_images" and the CLU documentation for this is also a bit unclear to me.

I printed the keys and value shapes for the dictionary which is obtained by the "jax.tree_map" that is used as an input to the "writer.write_images" (screenshot attached below). If you look at the keys "video", "segmentations", "flow" and "boxes" in the image attached, the shape is (5, 64, 320, 3) for all 4 of them. Here, I believe that the 1st element (5) is the number of frames, 2nd element (64) is the batch size and the 4th element (3) is the number of channels. However, what is the 3rd element (320)?

Any help is immensely appreciated
writer-write_images

@nilushacj
Copy link

Hi again @YingtianDt . So for the second part of my message earlier, I will go ahead and answer it myself :D. I believe this shape corresponds to an image grid which is obtained from the video arrays (from the function "video_to_image_grid" in "utils.py")

@tkipf
Copy link
Contributor

tkipf commented Aug 17, 2022

Hi @YingtianDt, thanks a lot for bringing this warning to our attention! You can ignore this warning for the sake of model training, but it highlights that the displayed eval_loss is not entirely correct. The other metrics, however, are completely unaffected by this (incl. the loss reported for training).

Let me try to explain what's going on: to make sure we can evaluate on all validation/test set examples irrespective of our batch size (while keeping the batch size constant for all individual forward passes through the model), we append a "dummy" examples at the end of the validation/test set which are labeled using the mask variable. Our metrics take into account this mask variable and ignore all dummy examples when calculating the metrics. One exception for this is the eval_loss which we do not compute per example in a separate metric, but we simply take the batch-averaged loss that our loss function produces and store this in our metrics writer. It turns out that this ignores the mask variable and hence the logged eval_loss on the final batch (depending on the batch size) can be incorrect.

Fixing this requires a slight refactoring of our loss functions. Since this doesn't have any effect on model training and since we do not report eval_loss in our paper, this is not our highest priority to fix right now, but I will leave this issue open for now. Thanks again for bringing this up!

@YingtianDt
Copy link
Author

YingtianDt commented Aug 17, 2022

@tkipf Thank you very much for your reply! I will definitely watch out the loss metric. Please feel free to close the issue if there is no more questions from other people.

@nilushacj Hi, I think for the first part of your question, the outputs from jax.tree_map(np.array, ...) should already be a dictionary of np.array. As you've realized, the shapes of these arrays correspond to the big image grids concatenating all frames per example (not per frame). If you just save them, for example, using plt.imshow(video[0]) and plt.imsave, in your case you will see the video frames (height:width*frames=64:64*5) for example 0, etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants