Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: can't pickle Environment objects on Windows/MacOs #14

Open
fortepianissimo opened this issue Sep 26, 2018 · 14 comments
Open

Comments

@fortepianissimo
Copy link

fortepianissimo commented Sep 26, 2018

I'm running under Windows 10, following along the instructions given by the readme document. When trying to retrain the model using this command

python nerTagger.py --dataset-type conll2003 train_eval

I ran into the following exception (right after compiling embeddings) - any tips?

Thank you for the wonderful work!

Compiling embeddings... (this is done only one time per embeddings at first launch)
path: d:\Projects\embeddings\glove.840B.300d.txt
100%|████████████████████████████████████████████████████████████████████| 2196017/2196017 [08:06<00:00, 4517.80it/s] embeddings loaded for 2196006 words and 300 dimensions
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
char_input (InputLayer)         (None, None, 30)     0
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, 30, 25) 2150        char_input[0][0]
__________________________________________________________________________________________________
word_input (InputLayer)         (None, None, 300)    0
__________________________________________________________________________________________________
time_distributed_2 (TimeDistrib (None, None, 50)     10200       time_distributed_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, None, 350)    0           word_input[0][0]
                                                                 time_distributed_2[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, None, 350)    0           concatenate_1[0][0]
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, None, 200)    360800      dropout_1[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, None, 200)    0           bidirectional_2[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, None, 100)    20100       dropout_2[0][0]
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, None, 10)     1010        dense_1[0][0]
__________________________________________________________________________________________________
chain_crf_1 (ChainCRF)          (None, None, 10)     120         dense_2[0][0]
==================================================================================================
Total params: 394,380
Trainable params: 394,380
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/60
Exception in thread Thread-2:
Traceback (most recent call last):
  File "d:\Anaconda3\Lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "d:\Anaconda3\Lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "d:\Projects\delft\env\lib\site-packages\keras\utils\data_utils.py", line 548, in _run
    with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
  File "d:\Projects\delft\env\lib\site-packages\keras\utils\data_utils.py", line 522, in <lambda>
    initargs=(seqs,))
  File "d:\Anaconda3\Lib\multiprocessing\context.py", line 119, in Pool
    context=self.get_context())
  File "d:\Anaconda3\Lib\multiprocessing\pool.py", line 174, in __init__
    self._repopulate_pool()
  File "d:\Anaconda3\Lib\multiprocessing\pool.py", line 239, in _repopulate_pool
    w.start()
  File "d:\Anaconda3\Lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "d:\Anaconda3\Lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "d:\Anaconda3\Lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "d:\Anaconda3\Lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle Environment objects
@fortepianissimo
Copy link
Author

fortepianissimo commented Sep 26, 2018

Okay - disabling lmdb in embedding-registry.json seems to make that exception go away. BUT now there's another exception:

__________________________________________________________________________________________________
Epoch 1/60
d:\Projects\delft\env\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
d:\Projects\delft\env\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "d:\Anaconda3\Lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "d:\Anaconda3\Lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "d:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
    return getattr(self.model, name)
  File "d:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
    return getattr(self.model, name)
  File "d:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
    return getattr(self.model, name)
  [Previous line repeated 328 more times]

@pjox
Copy link
Collaborator

pjox commented Sep 26, 2018

Hello! I haven't been able to reproduce the exception in Linux so it might be windows related. I'm trying to get a windows machine in order to try again. In the meanwhile, can you tell us a little more about your set-up? For instance, are you using a GPU? Did you use the requirements-gpu.txt files to set it up? Also, which version of python are you using?

Thanks!

@fortepianissimo
Copy link
Author

Hi sorry I wasn't very clear about my spec:

  • Windows: Windows 10
  • GPU: Tesla Quadro P4000; yes I did install requirements-gpu.txt
  • Python: 3.6.6 (via Anaconda).

@fortepianissimo
Copy link
Author

By the way I also solved this error along the way: DLL load failed message when scikit-learn is imported.

The solution is to install numpy‑1.14.6+mkl‑cp36‑cp36m‑win_amd64.whl (depending on the arch and Python version) from https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy

@pjox
Copy link
Collaborator

pjox commented Sep 26, 2018

Ok, we had some problems before with Python 3.6, I honestly don't think that the Python version is the problem, but if you have the time, can you try creating a Python 3.5 environment with conda conda create -n myenv python=3.5 and see if you encounter the same problems? As soon as I get to try DeLFT on Windows I'll get back to you.

@fortepianissimo
Copy link
Author

Ok I set up Python 3.5 (version 3.5.6 via Anaconda) environment and created another env_python35 under delft dir, here are the errors (infinite recursion):

Epoch 1/60
D:\Projects\delft\env_python35\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
D:\Projects\delft\env_python35\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\spawn.py", line 116, in _main
    self = pickle.load(from_parent)
  File "D:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
    return getattr(self.model, name)
  File "D:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
    return getattr(self.model, name)
  File "D:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
    return getattr(self.model, name)
  File "D:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
... (more same lines like the above) ...
RecursionError: maximum recursion depth exceeded while calling a Python object
Exception in thread Thread-1:
Traceback (most recent call last):
  File "d:\Anaconda3\envs\python35_env\Lib\threading.py", line 914, in _bootstrap_inner
    self.run()
  File "d:\Anaconda3\envs\python35_env\Lib\threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Projects\delft\env_python35\lib\site-packages\keras\utils\data_utils.py", line 548, in _run
    with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
  File "D:\Projects\delft\env_python35\lib\site-packages\keras\utils\data_utils.py", line 522, in <lambda>
    initargs=(seqs,))
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\context.py", line 118, in Pool
    context=self.get_context())
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\pool.py", line 174, in __init__
    self._repopulate_pool()
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\pool.py", line 239, in _repopulate_pool
    w.start()
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)
  File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

@pjox
Copy link
Collaborator

pjox commented Sep 27, 2018

Thanks for the info! I have been looking around and apparently the multiprocessing library works differently on Windows, so this series of errors you are encountering might be caused by that. However I haven't been able to find a Windows machine to test it yet, as soon as I can get hold of one I'll get back to you.

@pjox
Copy link
Collaborator

pjox commented Oct 18, 2018

@fortepianissimo I finally got hold of a Windows machine and was able to reproduce the error, could you please comment lines 77 and 78 in the file utilities/Embeddings.py, that is, these lines:

def __getattr__(self, name):
    return getattr(self.model, name)

and try again?

Note 1: Please also disable lmdb in embedding-registry.json
Note 2: This is a workaround rather than a fix, I'll work on a definite fix in the future

Also, please let me know if the workaround works!

@ghost
Copy link

ghost commented Apr 21, 2019

Hello, I'm new to this. My specs are:

  • OS: Windows 10 Pro 64-bit
  • GPU: NVIDIA 1050Ti-mobile (4 GB) [I've already install tensorflow-gpu as mentioned in requirement.txt]
  • Python 3.7

And I want to ask for 2 things:

  • The first is: How to disable lmdb in embedding-registry.json?
  • The second is: I've already comment lines 77 & 78 in utilities/Embeddings.py but I encountered this problem (NOTE: I even tried to use pickle version 4 but nothing happened):
Using TensorFlow backend.
D:\Anaconda3\envs\ULR\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Anaconda3\envs\ULR\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "D:\Anaconda3\envs\ULR\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Edited: For the first question, I've found the answer (set the "embedding-lmdb-path" to "None")

@davidlenz
Copy link

I face the same issue as @Protossnam EOFError: Ran out of input.
Am on Windows 10 with py3.5. Any updates on this?

@ghost
Copy link

ghost commented May 23, 2019

@davidlenz Sadly, I had to boot my laptop in Linux (Ubuntu) and run the tool. On Linux, I didn't face that issue. It's maybe the problem with Windows and I also looking forward to hearing new update on this too

@oterrier
Copy link

Hi all,
An easy workaround would be to disable multiprocessing when running on Windows
To do that you need to pass multiprocessing=False each time a new Sequence object in created in nerTagger.py

My 2 cts

Olivier

@lfoppiano
Copy link
Collaborator

lfoppiano commented Mar 16, 2022

I have this issue when the download fails and the database is not correctly initialised I supposed:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/site-packages/keras/utils/data_utils.py", line 744, in _run
    with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/site-packages/keras/utils/data_utils.py", line 721, in pool_fn
    pool = get_pool_class(True)(
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Environment' object

Update: I tried to run again and the database was correctly created (via a local version of glove), however the problem occurs probably due to multithreading...

To reproduce it I used:

python -m delft.applications.citationClassifier train_eval

Update: I'm having this problem with macOS.

@kermitt2 kermitt2 changed the title TypeError: can't pickle Environment objects TypeError: can't pickle Environment objects on Windows Mar 16, 2022
@lfoppiano
Copy link
Collaborator

lfoppiano commented Mar 28, 2022

I have the same problem with MacOs.

The solution is to disable the multithreading by setting nb_workers = 0. Depending on the task to be performed it should modified in both sequenceLabelling/wrapper.py and trainer.py: 172.

@lfoppiano lfoppiano changed the title TypeError: can't pickle Environment objects on Windows TypeError: can't pickle Environment objects on Windows/MacOs May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants