Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word2Vec keeps on training during on_batch_end call #2182

Closed
jbayardo opened this issue Sep 12, 2018 · 13 comments · Fixed by #3078
Closed

Word2Vec keeps on training during on_batch_end call #2182

jbayardo opened this issue Sep 12, 2018 · 13 comments · Fixed by #3078
Assignees
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills
Milestone

Comments

@jbayardo
Copy link

jbayardo commented Sep 12, 2018

Description

Saving Word2Vec during a on_batch_end call fails because of something that looks a lot like a race condition. It looks like some internal dict within gensim is still being changed during the call to save.

Steps/Code/Corpus to Reproduce

Train W2V with a callback that looks like:

    def on_batch_end(self, model):
        current_timestamp = datetime.utcnow()
        if current_timestamp - self._last_temporary_save >= timedelta(hours=1):
            relative_path = get_output_path(
                'PartialCheckpoint', add_kwargs=True, relative=self.base_path, epoch=self.epoch, batch=self.batch)
            output_path = os.path.join(self.base_path, relative_path)

            model.save(output_path)
            self._last_temporary_save = current_timestamp

        self.batch += 1

Expected Results

Model checkpoint after every hour of training

Actual Results

While running train:

Exception in thread Thread-15:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 167, in _worker_loop
    callback.on_batch_end(self)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 60, in on_batch_end
    self._save_checkpoint(model, output_path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 90, in _save_checkpoint
    model.save(path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1214, in save
    super(Word2Vec, self).save(*args, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 501, in save
    super(BaseAny2VecModel, self).save(fname_or_handle, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 682, in save
    self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 536, in _smart_save
    compress, subname)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 592, in _save_specials
    for attrib, val in iteritems(self.__dict__):
RuntimeError: dictionary changed size during iteration

Exception in thread Thread-17:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 167, in _worker_loop
    callback.on_batch_end(self)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 60, in on_batch_end
    self._save_checkpoint(model, output_path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 90, in _save_checkpoint
    model.save(path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1214, in save
    super(Word2Vec, self).save(*args, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 501, in save
    super(BaseAny2VecModel, self).save(fname_or_handle, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 682, in save
    self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 536, in _smart_save
    compress, subname)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 592, in _save_specials
    for attrib, val in iteritems(self.__dict__):
RuntimeError: dictionary changed size during iteration

Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-19:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-18:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-11:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-13:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-12:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-14:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Versions

Linux-4.4.0-1062-aws-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Dec 4 2017, 14:50:18) \n[GCC 5.4.0 20160609]')
('NumPy', '1.15.1')
('SciPy', '0.19.1')
('gensim', '3.5.0')
('FAST_VERSION', 1)

@piskvorky piskvorky added the bug Issue described a bug label Sep 12, 2018
@menshikh-iv
Copy link
Contributor

Hello @jbayardo, thanks for report, I reproduced an issue (not exact, but looks similar, race condition too)
I use gensim==3.5.0 and python2.7

from gensim.models import Word2Vec
from gensim.models.callbacks import CallbackAny2Vec
from gensim.test.utils import get_tmpfile
import gensim.downloader as api


corpus = api.load("text8")


class BatchSaver(CallbackAny2Vec):
     def __init__(self, path_prefix):
         self.path_prefix = path_prefix
         self.batch = 0

     def on_batch_end(self, model):
         output_path = get_tmpfile('{}_batch_{}.model'.format(self.path_prefix, self.batch))
         model.save(output_path)
         print("Model saved to {}".format(output_path))
         self.batch += 1


bs = BatchSaver("w2v")
model = Word2Vec(corpus, iter=5, callbacks=[bs])

Expected result

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_1.model
Model saved to /tmp/w2v_batch_2.model
Model saved to /tmp/w2v_batch_3.model
...

Actual result

first variant (almost always)

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-11:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_3.model
Model saved to /tmp/w2v_batch_4.model
Model saved to /tmp/w2v_batch_5.model
Model saved to /tmp/w2v_batch_6.model
Model saved to /tmp/w2v_batch_7.model
Model saved to /tmp/w2v_batch_8.model
Model saved to /tmp/w2v_batch_9.model
Model saved to /tmp/w2v_batch_10.model
Model saved to /tmp/w2v_batch_11.model
Model saved to /tmp/w2v_batch_12.model
Model saved to /tmp/w2v_batch_13.model
Model saved to /tmp/w2v_batch_14.model
Model saved to /tmp/w2v_batch_15.model
Model saved to /tmp/w2v_batch_16.model
Model saved to /tmp/w2v_batch_17.model
Model saved to /tmp/w2v_batch_18.model
Model saved to /tmp/w2v_batch_19.model
Model saved to /tmp/w2v_batch_20.model
Model saved to /tmp/w2v_batch_21.model
Model saved to /tmp/w2v_batch_22.model
Model saved to /tmp/w2v_batch_23.model
Model saved to /tmp/w2v_batch_24.model
Model saved to /tmp/w2v_batch_25.model
Model saved to /tmp/w2v_batch_26.model
Model saved to /tmp/w2v_batch_27.model
Model saved to /tmp/w2v_batch_28.model
Model saved to /tmp/w2v_batch_29.model
Model saved to /tmp/w2v_batch_30.model
Model saved to /tmp/w2v_batch_31.model
Model saved to /tmp/w2v_batch_32.model
Model saved to /tmp/w2v_batch_33.model
Model saved to /tmp/w2v_batch_34.model
Model saved to /tmp/w2v_batch_35.model
Model saved to /tmp/w2v_batch_36.model
Model saved to /tmp/w2v_batch_37.model
Model saved to /tmp/w2v_batch_38.model
Model saved to /tmp/w2v_batch_39.model
Model saved to /tmp/w2v_batch_40.model
Model saved to /tmp/w2v_batch_41.model
Model saved to /tmp/w2v_batch_42.model
Model saved to /tmp/w2v_batch_43.model
Model saved to /tmp/w2v_batch_44.model
Model saved to /tmp/w2v_batch_45.model
Model saved to /tmp/w2v_batch_46.model
Model saved to /tmp/w2v_batch_47.model
Model saved to /tmp/w2v_batch_48.model
Model saved to /tmp/w2v_batch_49.model
Model saved to /tmp/w2v_batch_50.model
Model saved to /tmp/w2v_batch_51.model
Model saved to /tmp/w2v_batch_52.model
Model saved to /tmp/w2v_batch_53.model
Model saved to /tmp/w2v_batch_54.model
Model saved to /tmp/w2v_batch_55.model
Model saved to /tmp/w2v_batch_56.model
Model saved to /tmp/w2v_batch_57.model
Model saved to /tmp/w2v_batch_58.model
Model saved to /tmp/w2v_batch_59.model
Model saved to /tmp/w2v_batch_60.model
Model saved to /tmp/w2v_batch_61.model
Model saved to /tmp/w2v_batch_62.model
Model saved to /tmp/w2v_batch_63.model
Model saved to /tmp/w2v_batch_64.model
Model saved to /tmp/w2v_batch_65.model
Model saved to /tmp/w2v_batch_66.model
Model saved to /tmp/w2v_batch_67.model
Model saved to /tmp/w2v_batch_68.model
Model saved to /tmp/w2v_batch_69.model
Model saved to /tmp/w2v_batch_70.model
Model saved to /tmp/w2v_batch_71.model
Model saved to /tmp/w2v_batch_72.model
Model saved to /tmp/w2v_batch_73.model
Model saved to /tmp/w2v_batch_74.model
Model saved to /tmp/w2v_batch_75.model
Model saved to /tmp/w2v_batch_76.model
Model saved to /tmp/w2v_batch_77.model
Model saved to /tmp/w2v_batch_78.model
Model saved to /tmp/w2v_batch_79.model
Model saved to /tmp/w2v_batch_80.model
Model saved to /tmp/w2v_batch_81.model
Model saved to /tmp/w2v_batch_82.model
Model saved to /tmp/w2v_batch_83.model
Model saved to /tmp/w2v_batch_84.model
Model saved to /tmp/w2v_batch_85.model
Model saved to /tmp/w2v_batch_86.model
Model saved to /tmp/w2v_batch_87.model
Model saved to /tmp/w2v_batch_88.model
Model saved to /tmp/w2v_batch_89.model
Model saved to /tmp/w2v_batch_90.model
Model saved to /tmp/w2v_batch_91.model
Model saved to /tmp/w2v_batch_92.model
Model saved to /tmp/w2v_batch_93.model
Model saved to /tmp/w2v_batch_94.model
Model saved to /tmp/w2v_batch_95.model
Model saved to /tmp/w2v_batch_96.model
Model saved to /tmp/w2v_batch_97.model
Model saved to /tmp/w2v_batch_98.model
Model saved to /tmp/w2v_batch_99.model
Model saved to /tmp/w2v_batch_100.model

second variant (happend only once)

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_1.model
Model saved to /tmp/w2v_batch_4.model
Fatal Python error: GC object already tracked
Aborted (core dumped)

@menshikh-iv menshikh-iv added the difficulty medium Medium issue: required good gensim understanding & python skills label Sep 13, 2018
@jbayardo
Copy link
Author

For some reason, this happened to me every time I did the training. Fact to keep in mind: I was using 14 workers.

@gustavgransbo
Copy link

gustavgransbo commented Mar 21, 2019

I tried to do something very similar to @jbayardo, and had a very similar problem.

I'm also saving on_batch_end, after one hour of training.

class BatchSaver(CallbackAny2Vec):
     def __init__(self, path_prefix, start_time):
         self.path_prefix = path_prefix
         self.last_checkpoint = start_time

     def on_batch_end(self, model):
         cur_time = time.time()
         if cur_time - self.last_checkpoint > 60 * 60:
             output_path = get_tmpfile('/localdata/gustav/backup/w2v/{}_backup.model'.format(self.path_prefix))
             model.save(output_path)
             self.last_checkpoint = cur_time

The first time the model is saved, 5 out of 6 worker threads crashes:

...
2019-03-19 15:17:48,068: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 40836 words/s, in_qsize 0, out_qsize 0
2019-03-19 15:17:48,598: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:17:48,603: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:17:48,604: INFO: storing np array 'vectors' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.wv.vectors.npy
2019-03-19 15:17:48,615: INFO: storing np array 'syn1' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.trainables.syn1.npy
2019-03-19 15:17:48,730: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:17:48,978: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:20:13,438: INFO: storing np array 'syn1neg' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.trainables.syn1neg.npy
2019-03-19 15:20:25,566: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:20:25,569: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 39100 words/s, in_qsize 0, out_qsize 0
2019-03-19 15:20:40,967: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
2019-03-19 15:20:40,979: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 38940 words/s, in_qsize 0, out_qsize 0
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:22:03,181: INFO: not storing attribute vectors_norm
2019-03-19 15:22:03,182: INFO: not storing attribute cum_table
2019-03-19 15:24:40,421: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 484, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecTrainables' object has no attribute 'syn1'
                                                                                        
2019-03-19 15:24:40,423: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
2019-03-19 15:24:40,442: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 36581 words/s, in_qsize 10, out_qsize 1
...

After this, training continues with one worker thread until the first epoch is finished, at which point the process starts waiting for the workers that were killed during the first save.

Note that in my case the 4 threads crash because of AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors', and one from AttributeError: 'Word2VecTrainables' object has no attribute 'syn1'

I'm using gensim 3.7.1 and python 3.6.8.

@gojomo
Copy link
Collaborator

gojomo commented Jul 12, 2020

on_batch_end() is a really bad place to try a whole-model save, because it's called inside every separate worker thread, and many times per training-epoch, with absolutely no coordination with other in-progress worker-threads. Race issues using that callback are to be expected.

on_epoch_end(), instead, is only called from the single manager thread, when all worker-threads have finished their work. It's a more appropriate place for something that wants to write the whole state of an unchanging model.

In fact, I'd recommend removing the on_batch_begin() and on_batch_end() callbacks from CallbackAny2Vec entirely. They arrived in the under-reviewed #1777 PR and it's hard for me to imagine a compelling use for them, occurring as they do many essentially-random times, within each worker-thread, each training epoch. (If not removed, their doc-comments should warn that they're happening in a worker thread while lots of other worker threads are mutating the model or executing other simultaneous on_batch_end() calls.)

@gojomo
Copy link
Collaborator

gojomo commented Jan 18, 2021

Also, per a question on StackOverflow, I've just noticed the batch-related callbacks have never been fired for the worker_loop code paths.

So I'd again recommend removing on_batch_begin and on_batch_end from all modes & documentation, entirely, ASAP.

@piskvorky piskvorky added this to the 4.0.0 milestone Jan 18, 2021
@piskvorky
Copy link
Owner

piskvorky commented Jan 18, 2021

Thanks for following up @gojomo. I'm marking this ticket for 4.0.0, I think it fits a major release well.

I'll do another Gensim sprint again soon, to finish 4.0.0. I haven't seen any worthwhile feedback from beta users, so the plan is to just tie up any loose ends & release.

@ghost
Copy link

ghost commented Feb 1, 2021

Is there any recommended way to run arbitrary code at a finer granularity than the epoch level?

For example, it might be useful to interleave a secondary training objective on the word vectors, which would require saving/loading the vectors. It would be nice to be able to do this every 1000 batches, for example, rather than every epoch.

@gojomo
Copy link
Collaborator

gojomo commented Feb 2, 2021

To an extent, the 'epochs' are arbitrary. While you'd want to provide your whole corpus to build_vocab() (to ensure all words & word-frequencies are properly measured), you could probably split it into as many small pseudo-epochs as you like before feeding it to train() - via some custom iterable that upon each restart only cycles over some 1/Nth of the full amount, before starting over again. In that way, you could use the epoch callback to a fairly-fine resolution (though really small splits might greatly interfere with the multithreaded-efficiency).

I've not written code to do this, but it's probably just a few lines of custom Iterable-wrapper. You'd want to increase the epochs specified to train() to ensure the right number of "real" passes occur, and I think you'd want to adjust the total_examples hint to train() to 1/Nth the real number (the actual count per fake-iteration), to get more accurate within-'epoch' learning-rate decay & progress estimates.

Alternatively: you can always edit any of the source code to do anything at any step/interval, but it can get a bit hairy in the Cython code. (Potentially, also, Gensim could in the future offer more callbacks in a better-thought-out way - probably just before/after the batches-to-each thread.)

@mpenkov
Copy link
Collaborator

mpenkov commented Feb 26, 2021

@piskvorky Just wanted to confirm that the action to take here is

removing on_batch_begin and on_batch_end from all modes & documentation, entirely, ASAP.

Is my understanding correct?

@piskvorky
Copy link
Owner

Not sure, I'll have to read this thread. I'll try that after fixing my 4.0 tickets, OK?

@mpenkov
Copy link
Collaborator

mpenkov commented Feb 26, 2021

Sure, I have other 4.0 stuff to work on in the meanwhile, so this isn't a blocker.

@mpenkov
Copy link
Collaborator

mpenkov commented Mar 9, 2021

TODO for Misha:

  • Replace callbacks with functions that raise an runtime error when called, and alert the user that these functions are no longer supported
  • Update migration docs

@piskvorky
Copy link
Owner

Is my understanding correct?

I re-read the thread; dropping on_batch_begin and on_batch_end is indeed the action here (+any other such poorly designed callbacks – hopefully there are not more of them).

Plus update migration docs, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants