Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many open files error #11201

Closed
whucdf opened this issue Sep 3, 2018 · 17 comments
Closed

Too many open files error #11201

whucdf opened this issue Sep 3, 2018 · 17 comments
Assignees

Comments

@whucdf
Copy link

whucdf commented Sep 3, 2018

Issue description

While using the dataloader from pytorch 0.4.1:
With num_workers > 0 the workers store the tensors in shared memory, but do not release the shared memory file handles after they return the tensor to the main process and file handles are no longer needed. The worker will then run out of file handles, if one stores the tensor in a list.

Code example


from torch.utils.data import Dataset
class testSet(Dataset):
    def __init__(self):
        super(testSet,self).__init__()
    def `__len__(self):`
        return 1000000
    def __getitem__(self,index):
        return {"index":index}

import torch

test_data = testSet()
test_data_loader = torch.utils.data.DataLoader( dataset=test_data, batch_size=1, num_workers=1)
index = []
for sample in test_data_loader:
    index.append(sample['index'])

The error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-cf6ed576bc1c> in <module>()
----> 1 for sample in test_data_loader:
      2     #print(sample['index'])
      3     index.append(sample['index'])

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    328         while True:
    329             assert (not self.shutdown and self.batches_outstanding > 0)
--> 330             idx, batch = self._get_batch()
    331             self.batches_outstanding -= 1
    332             if idx != self.rcvd_idx:

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _get_batch(self)
    307                 raise RuntimeError('DataLoader timed out after {} seconds'.format(self.timeout))
    308         else:
--> 309             return self.data_queue.get()
    310 
    311     def __next__(self):

~/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py in get(self)
    335             res = self._reader.recv_bytes()
    336         # unserialize the data after having released the lock
--> 337         return _ForkingPickler.loads(res)
    338 
    339     def put(self, obj):

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py in rebuild_storage_fd(cls, df, size)
    149         fd = multiprocessing.reduction.rebuild_handle(df)
    150     else:
--> 151         fd = df.detach()
    152     try:
    153         storage = storage_from_cache(cls, fd_id(fd))

~/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py in detach(self)
     56             '''Get the fd.  This should only be called once.'''
     57             with _resource_sharer.get_connection(self._id) as conn:
---> 58                 return reduction.recv_handle(conn)
     59 
     60 

~/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/reduction.py in recv_handle(conn)
    180         '''Receive a handle over a local connection.'''
    181         with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
--> 182             return recvfds(s, 1)[0]
    183 
    184     def DupFd(fd):

~/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/reduction.py in recvfds(sock, size)
    159             if len(ancdata) != 1:
    160                 raise RuntimeError('received %d items of ancdata' %
--> 161                                    len(ancdata))
    162             cmsg_level, cmsg_type, cmsg_data = ancdata[0]
    163             if (cmsg_level == socket.SOL_SOCKET and

RuntimeError: received 0 items of ancdata

System Info

  • PyTorch
  • OS: Ubuntu 16.04
  • PyTorch version: 0.4.1
@weiyangfb
Copy link
Contributor

@whucdf Thanks for reporting this issue. It is expected because the default file_descriptor share strategy uses file descriptors as shared memory handles, and this will hit the limit when there are too many batches at DataLoader. To get around this, you can switch to file_system strategy by adding this to your script.

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

Let me know if there is still any issue.

@weiyangfb
Copy link
Contributor

closing this now, please feel free to reopen it if needed

@zimenglan-sysu-512
Copy link

hi @weiyangfb
thanks for u help. it does solve the problem.
btw, will it slow down the traing speed?

@cyzanfar
Copy link

cyzanfar commented Mar 9, 2019

Hey!
I am still getting the same error too many open files.
Running on CPU on my Mac OSX.

traceback:

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-cc88ea5f8bd3>", line 2, in <module>
    num_epochs=25)
  File "<ipython-input-3-c38b0d739ba0>", line 23, in train_model
    for inputs, labels in dataloaders[phase]:
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in __iter__
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 545, in __init__
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/multiprocessing/context.py", line 102, in Queue
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/multiprocessing/queues.py", line 42, in __init__
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/multiprocessing/context.py", line 67, in Lock
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/multiprocessing/synchronize.py", line 163, in __init__
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/multiprocessing/synchronize.py", line 60, in __init__
OSError: [Errno 24] Too many open files

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 1863, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'OSError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/site-packages/IPython/core/ultratb.py", line 1095, in get_records
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/site-packages/IPython/core/ultratb.py", line 311, in wrapped
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/site-packages/IPython/core/ultratb.py", line 345, in _fixed_getinnerframes
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/inspect.py", line 1483, in getinnerframes
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/inspect.py", line 1441, in getframeinfo
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/inspect.py", line 696, in getsourcefile
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/inspect.py", line 725, in getmodule
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/inspect.py", line 709, in getabsfile
  File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/posixpath.py", line 376, in abspath
OSError: [Errno 24] Too many open files

I did include the proper configurations:

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

thanks

@Beastmaster
Copy link

Please use deep copy when appending dataloader output to a list. Take @whucdf 's code as example

test_data = testSet() 
test_data_loader = torch.utils.data.DataLoader( dataset=test_data, batch_size=1, num_workers=1)  
index = []  
for sample in test_data_loader:  
    index.append(sample['index'])

index occupied output of data_loader and the connections among mutlprocessing.process could not be closed. So deepcopy is useful in this scenario.

import copy  
test_data = testSet()  
test_data_loader = torch.utils.data.DataLoader( dataset=test_data, batch_size=1, num_workers=1)  
index = []   
for sample in test_data_loader:  
    sample_cp = copy.deepcopy(sample)  
    del sample  
    index.append(sample_cp['index'])

@soulslicer
Copy link

@whucdf Thanks for reporting this issue. It is expected because the default file_descriptor share strategy uses file descriptors as shared memory handles, and this will hit the limit when there are too many batches at DataLoader. To get around this, you can switch to file_system strategy by adding this to your script.

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

Let me know if there is still any issue.

I get the error

torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory

kakusikun added a commit to kakusikun/deep-learning-works that referenced this issue May 7, 2020
p-patil referenced this issue in p-patil/continual-learning Oct 27, 2020
Set PyTorch's shared memory strategy to "file_system", which uses file
names to identify shared memory regions, rather than the default
"file_descriptors", which uses file descriptors as shared memory
handles. This fixes the problem of exceeding the system-wide limit on
the number of open files a process can have. See
https://github.com/pytorch/pytorch/issues/11201\#issuecomment-421146936
and
https://pytorch.org/docs/master/multiprocessing.html?highlight=sharing%20strategy#sharing-strategies.
p-patil referenced this issue in p-patil/continual-learning Oct 29, 2020
Set PyTorch's shared memory strategy to "file_system", which uses file
names to identify shared memory regions, rather than the default
"file_descriptors", which uses file descriptors as shared memory
handles. This fixes the problem of exceeding the system-wide limit on
the number of open files a process can have. See
https://github.com/pytorch/pytorch/issues/11201\#issuecomment-421146936
and
https://pytorch.org/docs/master/multiprocessing.html?highlight=sharing%20strategy#sharing-strategies.
@brando90
Copy link

brando90 commented Feb 28, 2021

@whucdf Thanks for reporting this issue. It is expected because the default file_descriptor share strategy uses file descriptors as shared memory handles, and this will hit the limit when there are too many batches at DataLoader. To get around this, you can switch to file_system strategy by adding this to your script.

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

Let me know if there is still any issue.

is this suppose to be run by the main process (the one doing mp.spawn) or should EVERY process run it inside their run function?

Thanks!

ref: https://pytorch.org/docs/stable/multiprocessing.html#file-descriptor-file-descriptor

https://discuss.pytorch.org/t/how-does-one-setp-up-the-set-sharing-strategy-strategy-for-multiprocessing/113302

@FarisHijazi
Copy link

I applied

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

yet still getting the same error

@schuhschuh
Copy link

schuhschuh commented Aug 9, 2021

For anyone else seeing such error even after setting torch.multiprocessing.set_sharing_strategy('file_system') in their main thread, note that worker processes of the DataLoader will not inherit this setting apparently. I had to use a worker_init_fn such as:

sharing_strategy = "file_system"
torch.multiprocessing.set_sharing_strategy(sharing_strategy)

def set_worker_sharing_strategy(worker_id: int) -> None:
    torch.multiprocessing.set_sharing_strategy(sharing_strategy)

loader = DataLoader(dataset, num_workers=4, worker_init_fn=set_worker_sharing_strategy)

This finally fixed it for me.

@brando90 This relates to your earlier question. I could confirm that the strategy is not set to the same strategy as in the main process by printing the value of torch.multiprocessing.get_sharing_strategy() in worker_init_fn.

@mdabbah
Copy link

mdabbah commented Aug 24, 2021

@schuhschuh did your solution require you to change the setup() function (I'm assuming you are doing distributed training/inference)

my current setup function looks like this

def setup(rank, world_size, port):
    os.environ['MASTER_ADDR'] = 'localhost'
    os.environ['MASTER_PORT'] = f'{port}'

    # initialize the process group
    dist.init_process_group("nccl", rank=rank, world_size=world_size)
    torch.cuda.set_device(rank)

does the solution of using file_system sharing strategy means that I must change
dist.init_process_group("nccl", rank=rank, world_size=world_size) to something like
dist.init_process_group("nccl", init_method="file::/~/somefile", rank=rank, world_size=world_size)

Thanks!

@basilevh
Copy link

basilevh commented Jul 19, 2022

My issue is that even with torch.multiprocessing.set_sharing_strategy('file_system'), after some time (typically in the second half of training), my job crashes with RuntimeError: unable to open shared memory object </torch_2283204_110829360> in read-write mode. This is much more likely to happen whenever I'm training more than one model in parallel on different GPUs. I verified that there is more than enough RAM and disk space available. Is there any other fix? Thank you.

@Xonxt
Copy link

Xonxt commented Jul 23, 2022

On a slightly related note, in my training script, if I don't use the set_sharing_strategy('file_system'), I also get the "too many open files" error.

But if I add it, then it all runs fine, but at the very end of my script, all the processes just hang and never terminate. Even if I add a torch.distributed.barrier() or a torch.distributed.destroy_process_group().

@LemurPwned
Copy link

I experience the same issue with the latest MacOs nightly build. I am able to chew through a couple of epochs but at some point, the number of open file descriptors becomes too large -- they are simply not being closed properly. The set_sharing_strategy is not helping at all.

My dataset returns a dictionary of with 3 keys, two for float tensors and one for string.

class PhysicsDataset(Dataset):
    def __init__(self, data_dir, transform=None):
        super().__init__()
        self.data_dir = data_dir
        self.transform = transform
        self.gt_spectra = list(self.data_dir.glob("*.npz"))
        self.gt_parameters = json.load(
            open(self.data_dir / "all_params.json", 'r'))

    def __len__(self):
        return len(self.gt_spectra)

    def __getitem__(self, index):
        with np.load(self.gt_spectra[index]) as data:
            pdata = data['spectrum']
        pdata = (pdata - pdata.min()) / (pdata.max() - pdata.min())
        pdata = torch.from_numpy(pdata).float()
        parameters = self.gt_parameters[self.gt_spectra[index].name.replace(
            ".npz", "")]
        if self.transform:
            pdata = self.transform(pdata)

        # create output tensor with normalised weights
        gt_tensor = torch.from_numpy(
            np.asarray([(parameters[k] - KEYS[k]['min']) /
                        (KEYS[k]['max'] - KEYS[k]['min'])
                        for k in KEYS])).float()
        return {
            "spectrum": pdata,
            "gt_tensor": gt_tensor,
            "filename": self.gt_spectra[index].name
        }

Any ideas why the fds are not closed after each epoch terminates? I suspect this may be due to that np.load in __getitem__ but I have no idea how to fix that.

@xiyanghu
Copy link

On a slightly related note, in my training script, if I don't use the set_sharing_strategy('file_system'), I also get the "too many open files" error.

But if I add it, then it all runs fine, but at the very end of my script, all the processes just hang and never terminate. Even if I add a torch.distributed.barrier() or a torch.distributed.destroy_process_group().

same here. Have you figured out how to solve it? Thank you!

@Xonxt
Copy link

Xonxt commented Jul 29, 2022

same here. Have you figured out how to solve it? Thank you!

Not sure how relevant this would be for you. In my case, I have my training dataset in a JSON-format (one that we've developed internally at our institute) similar to COCO-format. The dataset is open through a wrapper class that provide API for reading it, again, similar to COCO.

In my earlier attempts at distributed training, each process ended up opening the same JSON file on its own, and trying to read annotations from it with a bunch of workers (num_workers=16).

Something like this, basically:

dataset = JSONDataset("/datasets/coco/annotations/train.json")
train_data = torch.utils.data.Dataset(dataset, ...)
train_loader = torch.utils.data.dataloader.DataLoader(train_data, num_workers=16, ...)

Instead, I made sure to first parse the entire dataset, read the full list of image files and the corresponding labels, and the only pass a list of files and labels to the torch.utils.data.Dataset object, so the workers would only read the image files and not try to share the same JSON-file.

And then I don't touch the set_sharing_strategy function at all, just leaving it at the default value, and just put a destroy_process_group() at the end of the application.

geometrikal added a commit to microfossil/particle-object-detection that referenced this issue Nov 30, 2022
verystrongjoe added a commit to verystrongjoe/wafer_aug_rl that referenced this issue Dec 8, 2022
julienroyd added a commit to recursionpharma/gflownet that referenced this issue Mar 24, 2023
chanshing added a commit to OxWearables/stepcount that referenced this issue Aug 17, 2023
chanshing added a commit to OxWearables/stepcount that referenced this issue Aug 29, 2023
thorstenwagner added a commit to MPI-Dortmund/tomotwin-cryoet that referenced this issue Oct 17, 2023
sammlapp added a commit to kitzeslab/opensoundscape that referenced this issue May 20, 2024
automatically sets `torch.multiprocessing.set_sharing_strategy("file_system")` during opensoundscape import. We may want to revisit this decision, but it seems that this is the recommended setting for avoiding issues seen when using parallelized DataLoader

see discussion and recommended solution here pytorch/pytorch#11201 (comment)
luis-real pushed a commit to intel/ai-reference-models that referenced this issue Aug 2, 2024
This commit adds native PyTorch xpu support for yolov5 sample,
i.e. IPEX is not needed in this mode. XPU backend in PyTorch is
under active development and is not finished yet. Focus is on functional
side of key things and performance is expected to be low. Future
improvements should bring it up.

As of now this mode of operation is experimental in the sample
and is not default, use `--ipex no` to enable.

* Can be run as: ./run_model.sh --ipex no
* Tried on:
  * pytorch: 91d565da0c5 ("[dynamo] Add support for tensor's is_complex method")
  * vision: 89d2b38cbc ("Updated compatibility table")
* Status:
  * --jit script|none: fail on autocast
  * --jit trace works x16 times slower vs. IPEX (13 img/s vs. 208 img/s),
    likely some operations are done on CPU since blitter is loaded + seeing
    this warning:

torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallback
ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1`  (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.)

/home/dvrogozh/git/pytorch/torch/nn/functional.py:2103: UserWarning: The operator 'aten::silu.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.silu_(input)
/home/dvrogozh/git/pytorch/torch/nn/functional.py:4045: UserWarning: The operator 'aten::upsample_nearest2d.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:303: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch.cat(x, self.d)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:158: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:66: UserWarning: The operator 'aten::sigmoid.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  y = x[i].sigmoid()
/home/dvrogozh/git/pytorch/torch/_tensor.py:40: UserWarning: The operator 'aten::pow.Tensor_Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return f(*args, **kwargs)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:77: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:834: UserWarning: The operator 'aten::gt.Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  xc = prediction[..., 4] > conf_thres  # candidates
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::nonzeroon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  x = x[xc[xi]]  # confidence
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::index.Tensor_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  x = x[xc[xi]]  # confidence

See: pytorch/pytorch#11201
See: pytorch/pytorch#114723

Signed-off-by: Dmitry Rogozhkin <[email protected]>
luis-real pushed a commit to intel/ai-reference-models that referenced this issue Aug 2, 2024
This commit adds native PyTorch xpu support for yolov5 sample,
i.e. IPEX is not needed in this mode. XPU backend in PyTorch is
under active development and is not finished yet. Focus is on functional
side of key things and performance is expected to be low. Future
improvements should bring it up.

As of now this mode of operation is experimental in the sample
and is not default, use `--ipex no` to enable.

* Can be run as: ./run_model.sh --ipex no
* Tried on:
  * pytorch: 91d565da0c5 ("[dynamo] Add support for tensor's is_complex method")
  * vision: 89d2b38cbc ("Updated compatibility table")
* Status:
  * --jit script|none: fail on autocast
  * --jit trace works x16 times slower vs. IPEX (13 img/s vs. 208 img/s),
    likely some operations are done on CPU since blitter is loaded + seeing
    this warning:

torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallback
ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1`  (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.)

/home/dvrogozh/git/pytorch/torch/nn/functional.py:2103: UserWarning: The operator 'aten::silu.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.silu_(input)
/home/dvrogozh/git/pytorch/torch/nn/functional.py:4045: UserWarning: The operator 'aten::upsample_nearest2d.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:303: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch.cat(x, self.d)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:158: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:66: UserWarning: The operator 'aten::sigmoid.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  y = x[i].sigmoid()
/home/dvrogozh/git/pytorch/torch/_tensor.py:40: UserWarning: The operator 'aten::pow.Tensor_Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return f(*args, **kwargs)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:77: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:834: UserWarning: The operator 'aten::gt.Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  xc = prediction[..., 4] > conf_thres  # candidates
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::nonzeroon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  x = x[xc[xi]]  # confidence
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::index.Tensor_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  x = x[xc[xi]]  # confidence

See: pytorch/pytorch#11201
See: pytorch/pytorch#114723

Signed-off-by: Dmitry Rogozhkin <[email protected]>
luis-real pushed a commit to intel/ai-reference-models that referenced this issue Aug 2, 2024
This commit adds native PyTorch xpu support for efficientnet sample,
i.e. IPEX is not needed in this mode. XPU backend in PyTorch is
under active development and is not finished yet. Focus is on functional
side of key things and performance is expected to be low. Future
improvements should bring it up.

As of now this mode of operation is experimental in the sample
and is not default, use `--ipex yes` to enable.

Commit also switches enet sample to torch variant of multiprocessing
module and uses set_sharing_strategy('file_system') to avoid too many open
files error on dataloader.

* Can be run as: ./run_model.sh --ipex no
* Tried on:
  * pytorch: 4e66aaa0109 ("update kineto submodel commit id...")
  * vision: 96640af090 ("add float support to...")
* Status:
  * --jit script|none: fail on autocast
  * --jit trace works x30 times slower vs. IPEX (5 img/s vs. 150 img/s),
    likely some operations are done on CPU since blitter is loaded + seeing
    this warning:

torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallbac
k ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1`  (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.)

/home/dvrogozh/git/pytorch/torch/nn/functional.py:2511: UserWarning: The operator 'aten::native_batch_normon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch.batch_norm(
/home/dvrogozh/git/pytorch/torch/nn/functional.py:2103: UserWarning: The operator 'aten::silu.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.silu_(input)
/home/dvrogozh/git/pytorch/torch/nn/functional.py:1260: UserWarning: The operator 'aten::_adaptive_avg_pool2don the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.adaptive_avg_pool2d(input, _output_size)
/home/dvrogozh/git/pytorch/torch/nn/modules/activation.py:292: UserWarning: The operator 'aten::sigmoid.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch.sigmoid(input)

See: pytorch/pytorch#11201
See: pytorch/pytorch#114723
Signed-off-by: Dmitry Rogozhkin <[email protected]>
luis-real pushed a commit to intel/ai-reference-models that referenced this issue Aug 2, 2024
This commit adds native PyTorch xpu support for yolov5 sample,
i.e. IPEX is not needed in this mode. XPU backend in PyTorch is
under active development and is not finished yet. Focus is on functional
side of key things and performance is expected to be low. Future
improvements should bring it up.

As of now this mode of operation is experimental in the sample
and is not default, use `--ipex no` to enable.

* Can be run as: ./run_model.sh --ipex no
* Tried on:
  * pytorch: 91d565da0c5 ("[dynamo] Add support for tensor's is_complex method")
  * vision: 89d2b38cbc ("Updated compatibility table")
* Status:
  * --jit script|none: fail on autocast
  * --jit trace works x16 times slower vs. IPEX (13 img/s vs. 208 img/s),
    likely some operations are done on CPU since blitter is loaded + seeing
    this warning:

torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallback
ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1`  (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.)

/home/dvrogozh/git/pytorch/torch/nn/functional.py:2103: UserWarning: The operator 'aten::silu.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.silu_(input)
/home/dvrogozh/git/pytorch/torch/nn/functional.py:4045: UserWarning: The operator 'aten::upsample_nearest2d.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:303: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch.cat(x, self.d)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/common.py:158: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:66: UserWarning: The operator 'aten::sigmoid.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  y = x[i].sigmoid()
/home/dvrogozh/git/pytorch/torch/_tensor.py:40: UserWarning: The operator 'aten::pow.Tensor_Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return f(*args, **kwargs)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/models/yolo.py:77: UserWarning: The operator 'aten::cat.outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:834: UserWarning: The operator 'aten::gt.Scalar_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  xc = prediction[..., 4] > conf_thres  # candidates
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::nonzeroon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  x = x[xc[xi]]  # confidence
/home/dvrogozh/git/frameworks.ai.models.intel-models/models_v2/pytorch/yolov5/inference/gpu/yolov5/utils/general.py:854: UserWarning: The operator 'aten::index.Tensor_outon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  x = x[xc[xi]]  # confidence

See: pytorch/pytorch#11201
See: pytorch/pytorch#114723

Signed-off-by: Dmitry Rogozhkin <[email protected]>
luis-real pushed a commit to intel/ai-reference-models that referenced this issue Aug 2, 2024
* Refactor DLRMv1 to models_v2 format (#2170)

Signed-off-by: Minh1 Le<[email protected]>
Signed-off-by: Mahathi Vatsal <[email protected]>

* enable yolov7 (#2002)

* update document for yolov7 and resnet50 (#2009)

* Molly/yolov7 bkc update (#2021)

* make num_iter flexbile

* bugfix for bert-large ddp

* bkc for rn50 ddp training update

* bkc for rn50 ddp training update

* bkc for dlrm_v1 ddp training update

* yolov7 bkc update

* fix yolov7 Int8 Inductor Dynamic shape issue (#2034)

* minor fix for throughput (#2093)

* Update yolov7 (#2209)

* separate dataset setup from model runtime

* create a script of yolov7.py for the heavy change in yolov7_ipex_and_inductor.patch

* update document

* do not count NMS time

* add yolov7 to README (#2180)

* YOLOv7 Inference container (#2154)

* build initial container version

* add libgl and tests

* add pycocotools and more tests

* add ubuntu dockerfile

* add more dependencies

* remove inductor tests

* Update pytorch-yolov7-inference.Dockerfile-centos

* remove extra components

* Update pytorch-yolov7-inference.Dockerfile-centos

* remove gcc source

* add container doc

* correct link

* remove env var

* remove bf32

* Update pytorch-yolov7-inference.Dockerfile-ubuntu

* Update pytorch-yolov7-inference.Dockerfile-centos

* modify container.md

* Update CONTAINER.md

* remove ISA

* Update pytorch-yolov7-inference.Dockerfile-ubuntu

* Update pytorch-yolov7-inference.Dockerfile-ubuntu

* use default maloc conf

* Enable PyTorch yolov7 inference (#2181)

* enable PyTorch yolov7 inference

* do not count NMS time

* change the automatic download of the pre-trained model to manual download

* add calibration.sh and description for int8 qparams json file

* update document

* minor changes

* update document and add descriptions

* use torch.compile with ipex backend for ipex int8

* Molly/refine summary output (#2205)

* make num_iter flexbile

* bugfix for bert-large ddp

* bkc for rn50 ddp training update

* bkc for rn50 ddp training update

* bkc for dlrm_v1 ddp training update

* refine summary outputs for pytorch cpu

* refine for LCM

* refine for distilbert

* [Inductor][YoloV7] Enable Stock Pytorch launcher  (#2159)

* yolov7 enable torch launcher

* fix log postfix issue

* fix yolov7 throughput log bug

* add num_warmup and num_iters for inductor path

* fix coco path

* Revert "fix coco path"

This reverts commit 3c5f09fdc849229950cd4dc7cc2984939fd9dc2e.

* add coco soft link

* fix path

* add warm-up and steps for ipex path

* use NUMAS for yolov7 realtime instances

* remove dataset download

* merge develop

* remove iter num for yolo

* use static shape for IPEX int8 and improve its accuracy (#2078)

* [TensorFlow]: Added HuggingFace model for BERT-large SQuAD for FP32 and BF16 (#1988)

* * Added HuggingFace model for BERT-large SQuAD for FP32 and BF16.
* Updated model scripts to be similar to estimator-based bert_large model
  for latency.

NOTE: This is a SavedModel containing ReadVariable ops in the graph.
Hence, it is not optimized for inference when running it with grappler-based
TensorFlow. This SavedModel needs to be converted into a frozen graph when
running with grappler-based TensorFlow for best performance.

The original HuggingFace model can be found here:
https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering

Co-authored-by: mahathis <[email protected]>
Co-authored-by: Nick Camarena <[email protected]>

* Added public link for weights (#2200)

* [TensorFlow] Enable fp16 and bfloat32 for Bert_large Hugging Face (#2215)

* add fp16 support

Co-authored-by: nick.camarena <[email protected]>

* add xla to training_args list and enable variable batch size (#2268)

* add xla to training_args list and enable variable batch size

---------

Co-authored-by: AnetaKaczynska <[email protected]>

* remove jit_compile=true to enable grappler AMP (#2280)

* Enable Graphsage Inference (#1216)

* enable graphsage model

* clean up pretrained model path and fix amp issues (#1414)

* added int8 support for graphsage (#1536)

* changing numa core per instance (#1691)

* GraphSAGE: added warmup steps  (#1766)

* added warmup steps

* fix style error

* added unit test fix

* Graphsage: adding support for env var (#1836)

* adding support for env var

* Update correct number of cores

* removing unnecessary condition

* Graphsage: added advanced env var (#1848)

* added advanced env var for graphsage

* [Tensorflow]: Enable xla for GraphSAGE Inference (#2097)

* Updated start.sh for graphsage

* ipex/fbnet: add fbnet as enet clone (#2064)

- FBNet sample is a clone of ENet from latest develop.
  - Uses huggingface (timm) instead of tochvision for model download
  - syncronizes only once per pass through dataset rather than once per
    batch as FBNet does
- Includes monkey patch from MA KPI
- Perf is matching MA KPI sample

Signed-off-by: Voas, Tanner <[email protected]>

* ipex: minor fixes for benchmark.sh in various samples

- !bin/bash needs to be at top of file
- profiles for fp32 included batch sizes that occasionally RTE due to lack of reosurces

Signed-off-by: Voas, Tanner <[email protected]>

* ipex/enet&fbnet: fix sample issues when running with data

Signed-off-by: Voas, Tanner <[email protected]>

* ipex/fbnet: switch to explicit device specification

This change allows to run inference on the specified device (cpu, xpu or cuda)
if few different devices present on the system.

Signed-off-by: Dmitry Rogozhkin <[email protected]>

* pytorch/fbnet: enable native xpu support path

This commit adds native PyTorch xpu support for fbnet sample,
i.e. IPEX is not needed in this mode. XPU backend in PyTorch is
under active development and is not finished yet. Focus is on functional
side of key things and performance is expected to be low. Future
improvements should bring it up.

As of now this mode of operation is experimental in the sample
and is not default, use `--ipex yes` to enable.

Commit also switches fbnet sample to torch variant of multiprocessing
module and uses set_sharing_strategy('file_system') to avoid too many open
files error on dataloader.

* Can be run as: ./run_model.sh --ipex no
* Tried on:
  * pytorch: 91d565da0c5 ("[dynamo] Add support for tensor's is_complex method")
  * vision: 89d2b38cbc ("Updated compatibility table")
* Status:
  * --jit script|none: fail on autocast
  * --jit trace works x5.5 times slower vs. IPEX (16 img/s vs. 88 img/s),
    likely some operations are done on CPU since blitter is loaded + seeing
    this warning:

torch/utils/data/_utils/pin_memory.py:58: UserWarning: Aten Op fallback from XPU to CPU happends. This may have performance implications. If need debug the fallback
ops please set environment variable `PYTORCH_DEBUG_XPU_FALLBACK=1`  (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:11.)

/home/dvrogozh/git/pytorch/torch/nn/functional.py:2511: UserWarning: The operator 'aten::native_batch_normon the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch.batch_norm(
/home/dvrogozh/git/pytorch/torch/nn/functional.py:1498: UserWarning: The operator 'aten::relu_on the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  result = torch.relu_(input)
/home/dvrogozh/git/pytorch/torch/nn/functional.py:1260: UserWarning: The operator 'aten::_adaptive_avg_pool2don the XPU backend and will fall back to run on the CPU. (Triggered internally at /home/dvrogozh/git/pytorch/third_party/torch-xpu-ops/src/aten/XPUFallback.cpp:16.)
  return torch._C._nn.adaptive_avg_pool2d(input, _output_size)

See: pytorch/pytorch#11201
See: pytorch/pytorch#114723

Signed-off-by: Dmitry Rogozhkin <[email protected]>

* Sort requirement for fbnet

* ipex: fixes for enet, fbnet, rife and ifrnet models

* ipex/fbnet: minor fixes when loading the model
* ipex/ifrnet: minor fixes on CUDA
* ipex/rife: minor fixes on CUDA
* ipex/enet: reduce memory usage on dataset execution at the cost of reduced throughput
* ipex/fbnet: reduce memory usage on dataset execution at the cost of reduced throughput

On memory usage fixes:
* Change disables images pre-load on dataset mode
* This has the effect of reducing dataset reported throughput on BS 128 with single-stream execution to 89% of dummy throughput on B4 version of enet and 22% on C100 version of fbnet:
** Note 1: sample execution takes similar amount of time. We now are just including data loading in throughput calculation vs dummy execution which excludes this processing.
** Note 2: the impact is much higher on fbnet because this model runs 10x the speed of enet B4. As such the impact of including data loading and processing is more significant.

Signed-off-by: Voas, Tanner <[email protected]>

* Fixed linter issues

* Add Pytorch IFRNet Interpolation sample to models_v2 (#1962)

* ipex: removed references to enet from samples that cloned it

Signed-off-by: Voas, Tanner <[email protected]>

* Additional Feature Support for IFRNet

* Add new feature support: async, multi-stream, precision and amp
* Use common js_sysinfo to report system config information
* Align to schema in generated reports
* Enable telemetry collection

* CUDA Docker Images for Interpolation samples

* ipex: minor fixes for benchmark.sh in various samples

- !bin/bash needs to be at top of file
- profiles for fp32 included batch sizes that occasionally RTE due to lack of reosurces

Signed-off-by: Voas, Tanner <[email protected]>

* Fixed linter issues

* pytorch/ifrnet: enable native xpu support path

This commit adds native PyTorch xpu support for ifrnet sample,
i.e. IPEX is not needed in this mode. XPU backend in PyTorch is
under active development and is not finished yet. Focus is on functional
side of key things and performance is expected to be low. Future
improvements should bring it up.

As of now this mode of operation is experimental in the sample
and is not default, use `--ipex no` to enable.

* Can be run as: `./run_model.sh --ipex no`
* Tried on:
  * pytorch: `91d565da0c5 ("[dynamo] Add support for tensor's is_complex method")`
  * vision: `89d2b38cbc ("Updated compatibility table")`
* Status:
  * Loading weights with `map_location` fails (can be scipped)
  * `--precision bf16`: fails with `RuntimeError: grid_sampler_2d_cpu not implemented for BFloat16`
  * `--precision fp16`: fails with `RuntimeError: grid_sampler_2d_cpu not implemented for Half`
  * `--precision fp32`: works, perf is x4.4 times lower than IPEX (2.8 vs. 12.4 frames/s), cpu fallback occurs

See: pytorch/pytorch#114723
Signed-off-by: Dmitry Rogozhkin <[email protected]>

* ipex: fixes for enet, fbnet, rife and ifrnet models

* ipex/fbnet: minor fixes when loading the model
* ipex/ifrnet: minor fixes on CUDA
* ipex/rife: minor fixes on CUDA
* ipex/enet: reduce memory usage on dataset execution at the cost of reduced throughput
* ipex/fbnet: reduce memory usage on dataset execution at the cost of reduced throughput

On memory usage fixes:
* Change disables images pre-load on dataset mode
* This has the effect of reducing dataset reported throughput on BS 128 with single-stream execution to 89% of dummy throughput on B4 version of enet and 22% on C100 version of fbnet:
** Note 1: sample execution takes similar amount of time. We now are just including data loading in throughput calculation vs dummy execution which excludes this processing.
** Note 2: the impact is much higher on fbnet because this model runs 10x the speed of enet B4. As such the impact of including data loading and processing is more significant.

Signed-off-by: Voas, Tanner <[email protected]>

* IFRNet improvements + alignment of RIFE/IFRNet

* Resolve issues in dataset mode
* Added printout of test summary
* Exposed resolution controls
* Implemented better progress updates

Signed-off-by: Voas, Tanner <[email protected]>

* ipex/ifrnet&rife: report accuracy in run printout progress

* ipex/ifrnet: add accuracy printout in dataset mode
* ipex/rife: add accuracy printout in dataset mode

Signed-off-by: Voas, Tanner <[email protected]>

* Add Pytorch RIFE Interpolation sample to models_v2 (#2059)

- Initial version of RIFE code, supporting 1 stream, Float16 XPU execution,
  BatchSize=1 with performance/quality modes

- Included a ReadMe document for RIFE

- Added docker files, baremetal scripts and basic tests

- Pull ArXiV version of RIFE and apply local patches

  -- Using ArXiV version of model and corresponding weights.
     This is referenced by the main repository as the one corresponding
     to the ArXiV publication for RIFE
     (https://github.com/megvii-research/ECCV2022-RIFE#evaluation)
  -- Added patch file on top of model for XPU support
  -- Implemented get_model.sh script for RIFE to fetch and patch model

- Added top level readme and copyright modifications

Related Jira: https://jira.devtools.intel.com/browse/AIAE-336

* Additional Feature Support for RIFE

* Add async submission configurability
* Multi-stream support for RIFE 
* Add precision and AMP support
* Align to reporting schema

* ipex: removed references to enet from samples that cloned it

Signed-off-by: Voas, Tanner <[email protected]>

* Enable Telemetry and benchmarking scripts for RIFE

* CUDA Docker Images for Interpolation samples

* ipex: minor fixes for benchmark.sh in various samples

- !bin/bash needs to be at top of file
- profiles for fp32 included batch sizes that occasionally RTE due to lack of reosurces

Signed-off-by: Voas, Tanner <[email protected]>

* pytorch/rife: enable native xpu support path

This commit adds native PyTorch xpu support for rife sample,
i.e. IPEX is not needed in this mode. XPU backend in PyTorch is
under active development and is not finished yet. Focus is on functional
side of key things and performance is expected to be low. Future
improvements should bring it up.

As of now this mode of operation is experimental in the sample
and is not default, use `--ipex no` to enable.

* Can be run as: `./run_model.sh --ipex no`
* Tried on:
  * pytorch: `91d565da0c5 ("[dynamo] Add support for tensor's is_complex method")`
  * vision: `89d2b38cbc ("Updated compatibility table")`
* Status:
  * Loading weights with `map_location=torch.device('xpu')` fails (can be
    substituited with `map_location=torch.device('cpu')`)
  * `--precision bf16`: fails with `RuntimeError: grid_sampler_2d_cpu not implemented for BFloat16`
  * `--precision fp16`: fails with `RuntimeError: grid_sampler_2d_cpu not implemented for Half`
  * `--precision fp32`: works, perf is 6.6 times lower than IPEX (2.5 vs. 16.5 frames/s), cpu fallback occurs

See: pytorch/pytorch#114723
Signed-off-by: Dmitry Rogozhkin <[email protected]>

* ipex: fixes for enet, fbnet, rife and ifrnet models

* ipex/fbnet: minor fixes when loading the model
* ipex/ifrnet: minor fixes on CUDA
* ipex/rife: minor fixes on CUDA
* ipex/enet: reduce memory usage on dataset execution at the cost of reduced throughput
* ipex/fbnet: reduce memory usage on dataset execution at the cost of reduced throughput

On memory usage fixes:
* Change disables images pre-load on dataset mode
* This has the effect of reducing dataset reported throughput on BS 128 with single-stream execution to 89% of dummy throughput on B4 version of enet and 22% on C100 version of fbnet:
** Note 1: sample execution takes similar amount of time. We now are just including data loading in throughput calculation vs dummy execution which excludes this processing.
** Note 2: the impact is much higher on fbnet because this model runs 10x the speed of enet B4. As such the impact of including data loading and processing is more significant.

Signed-off-by: Voas, Tanner <[email protected]>

* ipex/refe: sample improvements

* Added printout of test summary
* Exposed resolution controls
* Implemented better progress updates

Signed-off-by: Voas, Tanner <[email protected]>

* IFRNet improvements + alignment of RIFE/IFRNet

* Resolve issues in dataset mode
* Added printout of test summary
* Exposed resolution controls
* Implemented better progress updates

Signed-off-by: Voas, Tanner <[email protected]>

* ipex/ifrnet&rife: report accuracy in run printout progress

* ipex/ifrnet: add accuracy printout in dataset mode
* ipex/rife: add accuracy printout in dataset mode

Signed-off-by: Voas, Tanner <[email protected]>

* Sort requirements.txt

* bkc update (#2288)

Co-authored-by: Chunyuan WU <[email protected]>

* update launcher usage for yolov7 (#2290)

* update yolov7 patch to support drop_last for performance test and enable more than 1 inference epoch (#2321)

---------

Signed-off-by: Minh1 Le<[email protected]>
Signed-off-by: Mahathi Vatsal <[email protected]>
Signed-off-by: Voas, Tanner <[email protected]>
Signed-off-by: Dmitry Rogozhkin <[email protected]>
Co-authored-by: Cao E <[email protected]>
Co-authored-by: Srikanth Ramakrishna <[email protected]>
Co-authored-by: WeizhuoZhang-intel <[email protected]>
Co-authored-by: Bhavani Subramanian <[email protected]>
Co-authored-by: Nick Camarena <[email protected]>
Co-authored-by: AnetaKaczynska <[email protected]>
Co-authored-by: Ashiq Imran <[email protected]>
Co-authored-by: Dmitry Rogozhkin <[email protected]>
Co-authored-by: sandeep-maddipatla <[email protected]>
Co-authored-by: sandeep-maddipatla <[email protected]>
Co-authored-by: Chunyuan WU <[email protected]>
Co-authored-by: xiaofeij <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.