-
Notifications
You must be signed in to change notification settings - Fork 23.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many open files error #11201
Comments
@whucdf Thanks for reporting this issue. It is expected because the default
Let me know if there is still any issue. |
closing this now, please feel free to reopen it if needed |
hi @weiyangfb |
Hey! traceback:
I did include the proper configurations:
thanks |
Please use deep copy when appending dataloader output to a list. Take @whucdf 's code as example
index occupied output of data_loader and the connections among mutlprocessing.process could not be closed. So deepcopy is useful in this scenario.
|
I get the error
|
Set PyTorch's shared memory strategy to "file_system", which uses file names to identify shared memory regions, rather than the default "file_descriptors", which uses file descriptors as shared memory handles. This fixes the problem of exceeding the system-wide limit on the number of open files a process can have. See https://github.com/pytorch/pytorch/issues/11201\#issuecomment-421146936 and https://pytorch.org/docs/master/multiprocessing.html?highlight=sharing%20strategy#sharing-strategies.
Set PyTorch's shared memory strategy to "file_system", which uses file names to identify shared memory regions, rather than the default "file_descriptors", which uses file descriptors as shared memory handles. This fixes the problem of exceeding the system-wide limit on the number of open files a process can have. See https://github.com/pytorch/pytorch/issues/11201\#issuecomment-421146936 and https://pytorch.org/docs/master/multiprocessing.html?highlight=sharing%20strategy#sharing-strategies.
is this suppose to be run by the main process (the one doing Thanks! ref: https://pytorch.org/docs/stable/multiprocessing.html#file-descriptor-file-descriptor |
I applied import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system') yet still getting the same error |
For anyone else seeing such error even after setting
This finally fixed it for me. @brando90 This relates to your earlier question. I could confirm that the strategy is not set to the same strategy as in the main process by printing the value of |
@schuhschuh did your solution require you to change the my current setup function looks like this
does the solution of using Thanks! |
My issue is that even with |
On a slightly related note, in my training script, if I don't use the But if I add it, then it all runs fine, but at the very end of my script, all the processes just hang and never terminate. Even if I add a |
I experience the same issue with the latest MacOs nightly build. I am able to chew through a couple of epochs but at some point, the number of open file descriptors becomes too large -- they are simply not being closed properly. The My dataset returns a dictionary of with 3 keys, two for float tensors and one for string. class PhysicsDataset(Dataset):
def __init__(self, data_dir, transform=None):
super().__init__()
self.data_dir = data_dir
self.transform = transform
self.gt_spectra = list(self.data_dir.glob("*.npz"))
self.gt_parameters = json.load(
open(self.data_dir / "all_params.json", 'r'))
def __len__(self):
return len(self.gt_spectra)
def __getitem__(self, index):
with np.load(self.gt_spectra[index]) as data:
pdata = data['spectrum']
pdata = (pdata - pdata.min()) / (pdata.max() - pdata.min())
pdata = torch.from_numpy(pdata).float()
parameters = self.gt_parameters[self.gt_spectra[index].name.replace(
".npz", "")]
if self.transform:
pdata = self.transform(pdata)
# create output tensor with normalised weights
gt_tensor = torch.from_numpy(
np.asarray([(parameters[k] - KEYS[k]['min']) /
(KEYS[k]['max'] - KEYS[k]['min'])
for k in KEYS])).float()
return {
"spectrum": pdata,
"gt_tensor": gt_tensor,
"filename": self.gt_spectra[index].name
} Any ideas why the fds are not closed after each epoch terminates? I suspect this may be due to that |
same here. Have you figured out how to solve it? Thank you! |
Not sure how relevant this would be for you. In my case, I have my training dataset in a JSON-format (one that we've developed internally at our institute) similar to COCO-format. The dataset is open through a wrapper class that provide API for reading it, again, similar to COCO. In my earlier attempts at distributed training, each process ended up opening the same JSON file on its own, and trying to read annotations from it with a bunch of workers ( Something like this, basically: dataset = JSONDataset("/datasets/coco/annotations/train.json")
train_data = torch.utils.data.Dataset(dataset, ...)
train_loader = torch.utils.data.dataloader.DataLoader(train_data, num_workers=16, ...) Instead, I made sure to first parse the entire dataset, read the full list of image files and the corresponding labels, and the only pass a list of files and labels to the And then I don't touch the |
automatically sets `torch.multiprocessing.set_sharing_strategy("file_system")` during opensoundscape import. We may want to revisit this decision, but it seems that this is the recommended setting for avoiding issues seen when using parallelized DataLoader see discussion and recommended solution here pytorch/pytorch#11201 (comment)
This commit adds native PyTorch xpu support for yolov5 sample, i.e. IPEX is not needed in this mode. XPU backend in PyTorch is under active development and is not finished yet. Focus is on functional side of key things and performance is expected to be low. Future improvements should bring it up. As of now this mode of operation is experimental in the sample and is not default, use `--ipex no` to enable. * Can be run as: ./run_model.sh --ipex no * Tried on: * pytorch: 91d565da0c5 ("[dynamo] Add support for tensor's is_complex method") * vision: 89d2b38cbc ("Updated compatibility table") * Status: * --jit script|none: fail on autocast * --jit trace works x16 times slower vs. IPEX (13 img/s vs. 208 img/s), likely some operations are done on CPU since blitter is loaded + seeing this warning: torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallback ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1` (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.) /home/dvrogozh/git/pytorch/torch/nn/functional.py:2103: UserWarning: The operator 'aten::silu.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.silu_(input) /home/dvrogozh/git/pytorch/torch/nn/functional.py:4045: UserWarning: The operator 'aten::upsample_nearest2d.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:303: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch.cat(x, self.d) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:158: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1)) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:66: UserWarning: The operator 'aten::sigmoid.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) y = x[i].sigmoid() /home/dvrogozh/git/pytorch/torch/_tensor.py:40: UserWarning: The operator 'aten::pow.Tensor_Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return f(*args, **kwargs) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:77: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:834: UserWarning: The operator 'aten::gt.Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) xc = prediction[..., 4] > conf_thres # candidates /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::nonzeroon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) x = x[xc[xi]] # confidence /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::index.Tensor_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) x = x[xc[xi]] # confidence See: pytorch/pytorch#11201 See: pytorch/pytorch#114723 Signed-off-by: Dmitry Rogozhkin <[email protected]>
This commit adds native PyTorch xpu support for yolov5 sample, i.e. IPEX is not needed in this mode. XPU backend in PyTorch is under active development and is not finished yet. Focus is on functional side of key things and performance is expected to be low. Future improvements should bring it up. As of now this mode of operation is experimental in the sample and is not default, use `--ipex no` to enable. * Can be run as: ./run_model.sh --ipex no * Tried on: * pytorch: 91d565da0c5 ("[dynamo] Add support for tensor's is_complex method") * vision: 89d2b38cbc ("Updated compatibility table") * Status: * --jit script|none: fail on autocast * --jit trace works x16 times slower vs. IPEX (13 img/s vs. 208 img/s), likely some operations are done on CPU since blitter is loaded + seeing this warning: torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallback ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1` (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.) /home/dvrogozh/git/pytorch/torch/nn/functional.py:2103: UserWarning: The operator 'aten::silu.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.silu_(input) /home/dvrogozh/git/pytorch/torch/nn/functional.py:4045: UserWarning: The operator 'aten::upsample_nearest2d.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:303: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch.cat(x, self.d) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:158: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1)) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:66: UserWarning: The operator 'aten::sigmoid.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) y = x[i].sigmoid() /home/dvrogozh/git/pytorch/torch/_tensor.py:40: UserWarning: The operator 'aten::pow.Tensor_Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return f(*args, **kwargs) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:77: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:834: UserWarning: The operator 'aten::gt.Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) xc = prediction[..., 4] > conf_thres # candidates /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::nonzeroon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) x = x[xc[xi]] # confidence /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::index.Tensor_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) x = x[xc[xi]] # confidence See: pytorch/pytorch#11201 See: pytorch/pytorch#114723 Signed-off-by: Dmitry Rogozhkin <[email protected]>
This commit adds native PyTorch xpu support for efficientnet sample, i.e. IPEX is not needed in this mode. XPU backend in PyTorch is under active development and is not finished yet. Focus is on functional side of key things and performance is expected to be low. Future improvements should bring it up. As of now this mode of operation is experimental in the sample and is not default, use `--ipex yes` to enable. Commit also switches enet sample to torch variant of multiprocessing module and uses set_sharing_strategy('file_system') to avoid too many open files error on dataloader. * Can be run as: ./run_model.sh --ipex no * Tried on: * pytorch: 4e66aaa0109 ("update kineto submodel commit id...") * vision: 96640af090 ("add float support to...") * Status: * --jit script|none: fail on autocast * --jit trace works x30 times slower vs. IPEX (5 img/s vs. 150 img/s), likely some operations are done on CPU since blitter is loaded + seeing this warning: torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallbac k ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1` (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.) /home/dvrogozh/git/pytorch/torch/nn/functional.py:2511: UserWarning: The operator 'aten::native_batch_normon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch.batch_norm( /home/dvrogozh/git/pytorch/torch/nn/functional.py:2103: UserWarning: The operator 'aten::silu.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.silu_(input) /home/dvrogozh/git/pytorch/torch/nn/functional.py:1260: UserWarning: The operator 'aten::_adaptive_avg_pool2don the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.adaptive_avg_pool2d(input, _output_size) /home/dvrogozh/git/pytorch/torch/nn/modules/activation.py:292: UserWarning: The operator 'aten::sigmoid.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch.sigmoid(input) See: pytorch/pytorch#11201 See: pytorch/pytorch#114723 Signed-off-by: Dmitry Rogozhkin <[email protected]>
This commit adds native PyTorch xpu support for yolov5 sample, i.e. IPEX is not needed in this mode. XPU backend in PyTorch is under active development and is not finished yet. Focus is on functional side of key things and performance is expected to be low. Future improvements should bring it up. As of now this mode of operation is experimental in the sample and is not default, use `--ipex no` to enable. * Can be run as: ./run_model.sh --ipex no * Tried on: * pytorch: 91d565da0c5 ("[dynamo] Add support for tensor's is_complex method") * vision: 89d2b38cbc ("Updated compatibility table") * Status: * --jit script|none: fail on autocast * --jit trace works x16 times slower vs. IPEX (13 img/s vs. 208 img/s), likely some operations are done on CPU since blitter is loaded + seeing this warning: torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallback ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1` (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.) /home/dvrogozh/git/pytorch/torch/nn/functional.py:2103: UserWarning: The operator 'aten::silu.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.silu_(input) /home/dvrogozh/git/pytorch/torch/nn/functional.py:4045: UserWarning: The operator 'aten::upsample_nearest2d.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:303: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch.cat(x, self.d) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:158: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1)) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:66: UserWarning: The operator 'aten::sigmoid.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) y = x[i].sigmoid() /home/dvrogozh/git/pytorch/torch/_tensor.py:40: UserWarning: The operator 'aten::pow.Tensor_Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return f(*args, **kwargs) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:77: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x) /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:834: UserWarning: The operator 'aten::gt.Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) xc = prediction[..., 4] > conf_thres # candidates /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::nonzeroon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) x = x[xc[xi]] # confidence /home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::index.Tensor_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) x = x[xc[xi]] # confidence See: pytorch/pytorch#11201 See: pytorch/pytorch#114723 Signed-off-by: Dmitry Rogozhkin <[email protected]>
* Refactor DLRMv1 to models_v2 format (#2170) Signed-off-by: Minh1 Le<[email protected]> Signed-off-by: Mahathi Vatsal <[email protected]> * enable yolov7 (#2002) * update document for yolov7 and resnet50 (#2009) * Molly/yolov7 bkc update (#2021) * make num_iter flexbile * bugfix for bert-large ddp * bkc for rn50 ddp training update * bkc for rn50 ddp training update * bkc for dlrm_v1 ddp training update * yolov7 bkc update * fix yolov7 Int8 Inductor Dynamic shape issue (#2034) * minor fix for throughput (#2093) * Update yolov7 (#2209) * separate dataset setup from model runtime * create a script of yolov7.py for the heavy change in yolov7_ipex_and_inductor.patch * update document * do not count NMS time * add yolov7 to README (#2180) * YOLOv7 Inference container (#2154) * build initial container version * add libgl and tests * add pycocotools and more tests * add ubuntu dockerfile * add more dependencies * remove inductor tests * Update pytorch-yolov7-inference.Dockerfile-centos * remove extra components * Update pytorch-yolov7-inference.Dockerfile-centos * remove gcc source * add container doc * correct link * remove env var * remove bf32 * Update pytorch-yolov7-inference.Dockerfile-ubuntu * Update pytorch-yolov7-inference.Dockerfile-centos * modify container.md * Update CONTAINER.md * remove ISA * Update pytorch-yolov7-inference.Dockerfile-ubuntu * Update pytorch-yolov7-inference.Dockerfile-ubuntu * use default maloc conf * Enable PyTorch yolov7 inference (#2181) * enable PyTorch yolov7 inference * do not count NMS time * change the automatic download of the pre-trained model to manual download * add calibration.sh and description for int8 qparams json file * update document * minor changes * update document and add descriptions * use torch.compile with ipex backend for ipex int8 * Molly/refine summary output (#2205) * make num_iter flexbile * bugfix for bert-large ddp * bkc for rn50 ddp training update * bkc for rn50 ddp training update * bkc for dlrm_v1 ddp training update * refine summary outputs for pytorch cpu * refine for LCM * refine for distilbert * [Inductor][YoloV7] Enable Stock Pytorch launcher (#2159) * yolov7 enable torch launcher * fix log postfix issue * fix yolov7 throughput log bug * add num_warmup and num_iters for inductor path * fix coco path * Revert "fix coco path" This reverts commit 3c5f09fdc849229950cd4dc7cc2984939fd9dc2e. * add coco soft link * fix path * add warm-up and steps for ipex path * use NUMAS for yolov7 realtime instances * remove dataset download * merge develop * remove iter num for yolo * use static shape for IPEX int8 and improve its accuracy (#2078) * [TensorFlow]: Added HuggingFace model for BERT-large SQuAD for FP32 and BF16 (#1988) * * Added HuggingFace model for BERT-large SQuAD for FP32 and BF16. * Updated model scripts to be similar to estimator-based bert_large model for latency. NOTE: This is a SavedModel containing ReadVariable ops in the graph. Hence, it is not optimized for inference when running it with grappler-based TensorFlow. This SavedModel needs to be converted into a frozen graph when running with grappler-based TensorFlow for best performance. The original HuggingFace model can be found here: https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering Co-authored-by: mahathis <[email protected]> Co-authored-by: Nick Camarena <[email protected]> * Added public link for weights (#2200) * [TensorFlow] Enable fp16 and bfloat32 for Bert_large Hugging Face (#2215) * add fp16 support Co-authored-by: nick.camarena <[email protected]> * add xla to training_args list and enable variable batch size (#2268) * add xla to training_args list and enable variable batch size --------- Co-authored-by: AnetaKaczynska <[email protected]> * remove jit_compile=true to enable grappler AMP (#2280) * Enable Graphsage Inference (#1216) * enable graphsage model * clean up pretrained model path and fix amp issues (#1414) * added int8 support for graphsage (#1536) * changing numa core per instance (#1691) * GraphSAGE: added warmup steps (#1766) * added warmup steps * fix style error * added unit test fix * Graphsage: adding support for env var (#1836) * adding support for env var * Update correct number of cores * removing unnecessary condition * Graphsage: added advanced env var (#1848) * added advanced env var for graphsage * [Tensorflow]: Enable xla for GraphSAGE Inference (#2097) * Updated start.sh for graphsage * ipex/fbnet: add fbnet as enet clone (#2064) - FBNet sample is a clone of ENet from latest develop. - Uses huggingface (timm) instead of tochvision for model download - syncronizes only once per pass through dataset rather than once per batch as FBNet does - Includes monkey patch from MA KPI - Perf is matching MA KPI sample Signed-off-by: Voas, Tanner <[email protected]> * ipex: minor fixes for benchmark.sh in various samples - !bin/bash needs to be at top of file - profiles for fp32 included batch sizes that occasionally RTE due to lack of reosurces Signed-off-by: Voas, Tanner <[email protected]> * ipex/enet&fbnet: fix sample issues when running with data Signed-off-by: Voas, Tanner <[email protected]> * ipex/fbnet: switch to explicit device specification This change allows to run inference on the specified device (cpu, xpu or cuda) if few different devices present on the system. Signed-off-by: Dmitry Rogozhkin <[email protected]> * pytorch/fbnet: enable native xpu support path This commit adds native PyTorch xpu support for fbnet sample, i.e. IPEX is not needed in this mode. XPU backend in PyTorch is under active development and is not finished yet. Focus is on functional side of key things and performance is expected to be low. Future improvements should bring it up. As of now this mode of operation is experimental in the sample and is not default, use `--ipex yes` to enable. Commit also switches fbnet sample to torch variant of multiprocessing module and uses set_sharing_strategy('file_system') to avoid too many open files error on dataloader. * Can be run as: ./run_model.sh --ipex no * Tried on: * pytorch: 91d565da0c5 ("[dynamo] Add support for tensor's is_complex method") * vision: 89d2b38cbc ("Updated compatibility table") * Status: * --jit script|none: fail on autocast * --jit trace works x5.5 times slower vs. IPEX (16 img/s vs. 88 img/s), likely some operations are done on CPU since blitter is loaded + seeing this warning: torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallback ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1` (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.) /home/dvrogozh/git/pytorch/torch/nn/functional.py:2511: UserWarning: The operator 'aten::native_batch_normon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch.batch_norm( /home/dvrogozh/git/pytorch/torch/nn/functional.py:1498: UserWarning: The operator 'aten::relu_on the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) result = torch.relu_(input) /home/dvrogozh/git/pytorch/torch/nn/functional.py:1260: UserWarning: The operator 'aten::_adaptive_avg_pool2don the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.) return torch._C._nn.adaptive_avg_pool2d(input, _output_size) See: pytorch/pytorch#11201 See: pytorch/pytorch#114723 Signed-off-by: Dmitry Rogozhkin <[email protected]> * Sort requirement for fbnet * ipex: fixes for enet, fbnet, rife and ifrnet models * ipex/fbnet: minor fixes when loading the model * ipex/ifrnet: minor fixes on CUDA * ipex/rife: minor fixes on CUDA * ipex/enet: reduce memory usage on dataset execution at the cost of reduced throughput * ipex/fbnet: reduce memory usage on dataset execution at the cost of reduced throughput On memory usage fixes: * Change disables images pre-load on dataset mode * This has the effect of reducing dataset reported throughput on BS 128 with single-stream execution to 89% of dummy throughput on B4 version of enet and 22% on C100 version of fbnet: ** Note 1: sample execution takes similar amount of time. We now are just including data loading in throughput calculation vs dummy execution which excludes this processing. ** Note 2: the impact is much higher on fbnet because this model runs 10x the speed of enet B4. As such the impact of including data loading and processing is more significant. Signed-off-by: Voas, Tanner <[email protected]> * Fixed linter issues * Add Pytorch IFRNet Interpolation sample to models_v2 (#1962) * ipex: removed references to enet from samples that cloned it Signed-off-by: Voas, Tanner <[email protected]> * Additional Feature Support for IFRNet * Add new feature support: async, multi-stream, precision and amp * Use common js_sysinfo to report system config information * Align to schema in generated reports * Enable telemetry collection * CUDA Docker Images for Interpolation samples * ipex: minor fixes for benchmark.sh in various samples - !bin/bash needs to be at top of file - profiles for fp32 included batch sizes that occasionally RTE due to lack of reosurces Signed-off-by: Voas, Tanner <[email protected]> * Fixed linter issues * pytorch/ifrnet: enable native xpu support path This commit adds native PyTorch xpu support for ifrnet sample, i.e. IPEX is not needed in this mode. XPU backend in PyTorch is under active development and is not finished yet. Focus is on functional side of key things and performance is expected to be low. Future improvements should bring it up. As of now this mode of operation is experimental in the sample and is not default, use `--ipex no` to enable. * Can be run as: `./run_model.sh --ipex no` * Tried on: * pytorch: `91d565da0c5 ("[dynamo] Add support for tensor's is_complex method")` * vision: `89d2b38cbc ("Updated compatibility table")` * Status: * Loading weights with `map_location` fails (can be scipped) * `--precision bf16`: fails with `RuntimeError: grid_sampler_2d_cpu not implemented for BFloat16` * `--precision fp16`: fails with `RuntimeError: grid_sampler_2d_cpu not implemented for Half` * `--precision fp32`: works, perf is x4.4 times lower than IPEX (2.8 vs. 12.4 frames/s), cpu fallback occurs See: pytorch/pytorch#114723 Signed-off-by: Dmitry Rogozhkin <[email protected]> * ipex: fixes for enet, fbnet, rife and ifrnet models * ipex/fbnet: minor fixes when loading the model * ipex/ifrnet: minor fixes on CUDA * ipex/rife: minor fixes on CUDA * ipex/enet: reduce memory usage on dataset execution at the cost of reduced throughput * ipex/fbnet: reduce memory usage on dataset execution at the cost of reduced throughput On memory usage fixes: * Change disables images pre-load on dataset mode * This has the effect of reducing dataset reported throughput on BS 128 with single-stream execution to 89% of dummy throughput on B4 version of enet and 22% on C100 version of fbnet: ** Note 1: sample execution takes similar amount of time. We now are just including data loading in throughput calculation vs dummy execution which excludes this processing. ** Note 2: the impact is much higher on fbnet because this model runs 10x the speed of enet B4. As such the impact of including data loading and processing is more significant. Signed-off-by: Voas, Tanner <[email protected]> * IFRNet improvements + alignment of RIFE/IFRNet * Resolve issues in dataset mode * Added printout of test summary * Exposed resolution controls * Implemented better progress updates Signed-off-by: Voas, Tanner <[email protected]> * ipex/ifrnet&rife: report accuracy in run printout progress * ipex/ifrnet: add accuracy printout in dataset mode * ipex/rife: add accuracy printout in dataset mode Signed-off-by: Voas, Tanner <[email protected]> * Add Pytorch RIFE Interpolation sample to models_v2 (#2059) - Initial version of RIFE code, supporting 1 stream, Float16 XPU execution, BatchSize=1 with performance/quality modes - Included a ReadMe document for RIFE - Added docker files, baremetal scripts and basic tests - Pull ArXiV version of RIFE and apply local patches -- Using ArXiV version of model and corresponding weights. This is referenced by the main repository as the one corresponding to the ArXiV publication for RIFE (https://github.com/megvii-research/ECCV2022-RIFE#evaluation) -- Added patch file on top of model for XPU support -- Implemented get_model.sh script for RIFE to fetch and patch model - Added top level readme and copyright modifications Related Jira: https://jira.devtools.intel.com/browse/AIAE-336 * Additional Feature Support for RIFE * Add async submission configurability * Multi-stream support for RIFE * Add precision and AMP support * Align to reporting schema * ipex: removed references to enet from samples that cloned it Signed-off-by: Voas, Tanner <[email protected]> * Enable Telemetry and benchmarking scripts for RIFE * CUDA Docker Images for Interpolation samples * ipex: minor fixes for benchmark.sh in various samples - !bin/bash needs to be at top of file - profiles for fp32 included batch sizes that occasionally RTE due to lack of reosurces Signed-off-by: Voas, Tanner <[email protected]> * pytorch/rife: enable native xpu support path This commit adds native PyTorch xpu support for rife sample, i.e. IPEX is not needed in this mode. XPU backend in PyTorch is under active development and is not finished yet. Focus is on functional side of key things and performance is expected to be low. Future improvements should bring it up. As of now this mode of operation is experimental in the sample and is not default, use `--ipex no` to enable. * Can be run as: `./run_model.sh --ipex no` * Tried on: * pytorch: `91d565da0c5 ("[dynamo] Add support for tensor's is_complex method")` * vision: `89d2b38cbc ("Updated compatibility table")` * Status: * Loading weights with `map_location=torch.device('xpu')` fails (can be substituited with `map_location=torch.device('cpu')`) * `--precision bf16`: fails with `RuntimeError: grid_sampler_2d_cpu not implemented for BFloat16` * `--precision fp16`: fails with `RuntimeError: grid_sampler_2d_cpu not implemented for Half` * `--precision fp32`: works, perf is 6.6 times lower than IPEX (2.5 vs. 16.5 frames/s), cpu fallback occurs See: pytorch/pytorch#114723 Signed-off-by: Dmitry Rogozhkin <[email protected]> * ipex: fixes for enet, fbnet, rife and ifrnet models * ipex/fbnet: minor fixes when loading the model * ipex/ifrnet: minor fixes on CUDA * ipex/rife: minor fixes on CUDA * ipex/enet: reduce memory usage on dataset execution at the cost of reduced throughput * ipex/fbnet: reduce memory usage on dataset execution at the cost of reduced throughput On memory usage fixes: * Change disables images pre-load on dataset mode * This has the effect of reducing dataset reported throughput on BS 128 with single-stream execution to 89% of dummy throughput on B4 version of enet and 22% on C100 version of fbnet: ** Note 1: sample execution takes similar amount of time. We now are just including data loading in throughput calculation vs dummy execution which excludes this processing. ** Note 2: the impact is much higher on fbnet because this model runs 10x the speed of enet B4. As such the impact of including data loading and processing is more significant. Signed-off-by: Voas, Tanner <[email protected]> * ipex/refe: sample improvements * Added printout of test summary * Exposed resolution controls * Implemented better progress updates Signed-off-by: Voas, Tanner <[email protected]> * IFRNet improvements + alignment of RIFE/IFRNet * Resolve issues in dataset mode * Added printout of test summary * Exposed resolution controls * Implemented better progress updates Signed-off-by: Voas, Tanner <[email protected]> * ipex/ifrnet&rife: report accuracy in run printout progress * ipex/ifrnet: add accuracy printout in dataset mode * ipex/rife: add accuracy printout in dataset mode Signed-off-by: Voas, Tanner <[email protected]> * Sort requirements.txt * bkc update (#2288) Co-authored-by: Chunyuan WU <[email protected]> * update launcher usage for yolov7 (#2290) * update yolov7 patch to support drop_last for performance test and enable more than 1 inference epoch (#2321) --------- Signed-off-by: Minh1 Le<[email protected]> Signed-off-by: Mahathi Vatsal <[email protected]> Signed-off-by: Voas, Tanner <[email protected]> Signed-off-by: Dmitry Rogozhkin <[email protected]> Co-authored-by: Cao E <[email protected]> Co-authored-by: Srikanth Ramakrishna <[email protected]> Co-authored-by: WeizhuoZhang-intel <[email protected]> Co-authored-by: Bhavani Subramanian <[email protected]> Co-authored-by: Nick Camarena <[email protected]> Co-authored-by: AnetaKaczynska <[email protected]> Co-authored-by: Ashiq Imran <[email protected]> Co-authored-by: Dmitry Rogozhkin <[email protected]> Co-authored-by: sandeep-maddipatla <[email protected]> Co-authored-by: sandeep-maddipatla <[email protected]> Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: xiaofeij <[email protected]>
Issue description
While using the dataloader from pytorch 0.4.1:
With num_workers > 0 the workers store the tensors in shared memory, but do not release the shared memory file handles after they return the tensor to the main process and file handles are no longer needed. The worker will then run out of file handles, if one stores the tensor in a list.
Code example
The error:
System Info
The text was updated successfully, but these errors were encountered: