Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM's. But not always. #360

Open
deadman3000 opened this issue Jan 17, 2025 · 1 comment
Open

OOM's. But not always. #360

deadman3000 opened this issue Jan 17, 2025 · 1 comment

Comments

@deadman3000
Copy link

I have a 16GB 4080 Super and 32GB DDR on a 5800X3D setup. I've been having trouble with OOM's on some of my videos got through VFI to Video Combine VHS. But not always. Occasionally they pass through. I'm think it may be more to do with the resolution being too large to fit into my systems memory? I've managed to get a 512x960 @ 129 frames through but not a 544x960 @ 129 frames through.

Is this the cause or something else entirely?

BTW I had even less success rate with Python/CUDA latest that the older 2.3.1+cu121 setup.

FILM VFI
The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/interpolator.py", line 15, in forward
x1: Tensor,
batch_dt: Tensor) -> Tensor:
_0 = (self).debug_forward(x0, x1, batch_dt, )

return (_0["image"])[0]
def debug_forward(self: __torch__.interpolator.Interpolator,
File "code/__torch__/interpolator.py", line 64, in debug_forward
aligned_pyramid1 = __torch__.util.concatenate_pyramids(aligned_pyramid0, forward_flow, )
fuse = self.fuse
_18 = [(fuse).forward(aligned_pyramid1, )]
~~~~~~~~~~~~~ <--- HERE
_19 = {"image": _18, "forward_residual_flow_pyramid": forward_residual_flow_pyramid, "backward_residual_flow_pyramid": backward_residual_flow_pyramid, "forward_flow_pyramid": forward_flow_pyramid, "backward_flow_pyramid": backward_flow_pyramid}
return _19
File "code/__torch__/fusion.py", line 55, in forward
net15 = _0(net14, level_size2, None, "nearest", None, None, False, )
_04 = getattr(_3, "0")
net16 = (_04).forward(net15, )
~~~~~~~~~~~~ <--- HERE
net17 = torch.cat([pyramid[i2], net16], 1)
_13 = getattr(_3, "1")
File "code/__torch__/torch/nn/modules/conv/___torch_mangle_61.py", line 23, in forward
weight = self.weight
bias = self.bias
_0 = (self)._conv_forward(input, weight, bias, )
~~~~~~~~~~~~~~~~~~~ <--- HERE
return _0
def _conv_forward(self: __torch__.torch.nn.modules.conv.___torch_mangle_61.Conv2d,
File "code/__torch__/torch/nn/modules/conv/___torch_mangle_61.py", line 29, in _conv_forward
weight: Tensor,
bias: Optional[Tensor]) -> Tensor:
_1 = torch.conv2d(input, weight, bias, [1, 1], "same", [1, 1])
~~~~~~~~~~~~ <--- HERE
return _1

Traceback of TorchScript, original code (most recent call last):
File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\interpolator.py", line 160, in forward
@torch.jit.export
def forward(self, x0, x1, batch_dt) -> torch.Tensor:
return self.debug_forward(x0, x1, batch_dt)['image'][0]
~~~~~~~~~~~~~~~~~~ <--- HERE
File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\interpolator.py", line 151, in debug_forward

return {
'image': [self.fuse(aligned_pyramid)],
~~~~~~~~~ <--- HERE
'forward_residual_flow_pyramid': forward_residual_flow_pyramid,
'backward_residual_flow_pyramid': backward_residual_flow_pyramid,
File "C:\Users\Danylo\PycharmProjects\frame-interpolation-pytorch\fusion.py", line 115, in forward
level_size = pyramid[i].shape[2:4]
net = F.interpolate(net, size=level_size, mode='nearest')
net = layers[0](net)
~~~~~~~~~ <--- HERE
net = torch.cat([pyramid[i], net], dim=1)
net = layers[1](net)
File "C:\Users\Danylo\anaconda3\envs\research\lib\site-packages\torch\nn\modules\conv.py", line 457, in forward
def forward(self, input: Tensor) -> Tensor:
return self._conv_forward(input, self.weight, self.bias)
~~~~~~~~~~~~~~~~~~ <--- HERE
File "C:\Users\Danylo\anaconda3\envs\research\lib\site-packages\torch\nn\modules\conv.py", line 453, in _conv_forward
weight, bias, self.stride,
_pair(0), self.dilation, self.groups)
return F.conv2d(input, weight, bias, self.stride,
~~~~~~~~ <--- HERE
self.padding, self.dilation, self.groups)
RuntimeError: Allocation on device
@AustinMroz
Copy link
Collaborator

Sorry for the late reply, I was out of town. It does appear to be a lack of memory, but the error isn't being thrown from a VHS node, and the numbers seem off to me.

544x960x129 frames is ~0.8GB of memory required, even if you're doubling that with VFI and the operation isn't done in place, ~3 GB should be well within what your system can handle. A couple debugging steps I'd start with.

  • Open the task manager and double check that nothing else is hogging your VRAM when ComfyUI isn't running
  • If you aren't already, try to use a minimal workflow that's a Load Video -> VFI -> Video Combine.
  • Try launching ComfyUI with --lowvram. If you somehow have highvram/gpuonly enabled it's possible models are still loaded from previous executions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants