Word2Vec keeps on training during on_batch_end call #2182

jbayardo · 2018-09-12T18:54:37Z

Description

Saving Word2Vec during a on_batch_end call fails because of something that looks a lot like a race condition. It looks like some internal dict within gensim is still being changed during the call to save.

Steps/Code/Corpus to Reproduce

Train W2V with a callback that looks like:

    def on_batch_end(self, model):
        current_timestamp = datetime.utcnow()
        if current_timestamp - self._last_temporary_save >= timedelta(hours=1):
            relative_path = get_output_path(
                'PartialCheckpoint', add_kwargs=True, relative=self.base_path, epoch=self.epoch, batch=self.batch)
            output_path = os.path.join(self.base_path, relative_path)

            model.save(output_path)
            self._last_temporary_save = current_timestamp

        self.batch += 1

Expected Results

Model checkpoint after every hour of training

Actual Results

While running train:

Exception in thread Thread-15:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 167, in _worker_loop
    callback.on_batch_end(self)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 60, in on_batch_end
    self._save_checkpoint(model, output_path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 90, in _save_checkpoint
    model.save(path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1214, in save
    super(Word2Vec, self).save(*args, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 501, in save
    super(BaseAny2VecModel, self).save(fname_or_handle, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 682, in save
    self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 536, in _smart_save
    compress, subname)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 592, in _save_specials
    for attrib, val in iteritems(self.__dict__):
RuntimeError: dictionary changed size during iteration

Exception in thread Thread-17:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 167, in _worker_loop
    callback.on_batch_end(self)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 60, in on_batch_end
    self._save_checkpoint(model, output_path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 90, in _save_checkpoint
    model.save(path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1214, in save
    super(Word2Vec, self).save(*args, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 501, in save
    super(BaseAny2VecModel, self).save(fname_or_handle, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 682, in save
    self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 536, in _smart_save
    compress, subname)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 592, in _save_specials
    for attrib, val in iteritems(self.__dict__):
RuntimeError: dictionary changed size during iteration

Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-19:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-18:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-11:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-13:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-12:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-14:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Versions

Linux-4.4.0-1062-aws-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Dec 4 2017, 14:50:18) \n[GCC 5.4.0 20160609]')
('NumPy', '1.15.1')
('SciPy', '0.19.1')
('gensim', '3.5.0')
('FAST_VERSION', 1)

The text was updated successfully, but these errors were encountered:

menshikh-iv · 2018-09-13T02:32:44Z

Hello @jbayardo, thanks for report, I reproduced an issue (not exact, but looks similar, race condition too)
I use gensim==3.5.0 and python2.7

from gensim.models import Word2Vec
from gensim.models.callbacks import CallbackAny2Vec
from gensim.test.utils import get_tmpfile
import gensim.downloader as api


corpus = api.load("text8")


class BatchSaver(CallbackAny2Vec):
     def __init__(self, path_prefix):
         self.path_prefix = path_prefix
         self.batch = 0

     def on_batch_end(self, model):
         output_path = get_tmpfile('{}_batch_{}.model'.format(self.path_prefix, self.batch))
         model.save(output_path)
         print("Model saved to {}".format(output_path))
         self.batch += 1


bs = BatchSaver("w2v")
model = Word2Vec(corpus, iter=5, callbacks=[bs])

Expected result

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_1.model
Model saved to /tmp/w2v_batch_2.model
Model saved to /tmp/w2v_batch_3.model
...

Actual result

first variant (almost always)

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-11:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_3.model
Model saved to /tmp/w2v_batch_4.model
Model saved to /tmp/w2v_batch_5.model
Model saved to /tmp/w2v_batch_6.model
Model saved to /tmp/w2v_batch_7.model
Model saved to /tmp/w2v_batch_8.model
Model saved to /tmp/w2v_batch_9.model
Model saved to /tmp/w2v_batch_10.model
Model saved to /tmp/w2v_batch_11.model
Model saved to /tmp/w2v_batch_12.model
Model saved to /tmp/w2v_batch_13.model
Model saved to /tmp/w2v_batch_14.model
Model saved to /tmp/w2v_batch_15.model
Model saved to /tmp/w2v_batch_16.model
Model saved to /tmp/w2v_batch_17.model
Model saved to /tmp/w2v_batch_18.model
Model saved to /tmp/w2v_batch_19.model
Model saved to /tmp/w2v_batch_20.model
Model saved to /tmp/w2v_batch_21.model
Model saved to /tmp/w2v_batch_22.model
Model saved to /tmp/w2v_batch_23.model
Model saved to /tmp/w2v_batch_24.model
Model saved to /tmp/w2v_batch_25.model
Model saved to /tmp/w2v_batch_26.model
Model saved to /tmp/w2v_batch_27.model
Model saved to /tmp/w2v_batch_28.model
Model saved to /tmp/w2v_batch_29.model
Model saved to /tmp/w2v_batch_30.model
Model saved to /tmp/w2v_batch_31.model
Model saved to /tmp/w2v_batch_32.model
Model saved to /tmp/w2v_batch_33.model
Model saved to /tmp/w2v_batch_34.model
Model saved to /tmp/w2v_batch_35.model
Model saved to /tmp/w2v_batch_36.model
Model saved to /tmp/w2v_batch_37.model
Model saved to /tmp/w2v_batch_38.model
Model saved to /tmp/w2v_batch_39.model
Model saved to /tmp/w2v_batch_40.model
Model saved to /tmp/w2v_batch_41.model
Model saved to /tmp/w2v_batch_42.model
Model saved to /tmp/w2v_batch_43.model
Model saved to /tmp/w2v_batch_44.model
Model saved to /tmp/w2v_batch_45.model
Model saved to /tmp/w2v_batch_46.model
Model saved to /tmp/w2v_batch_47.model
Model saved to /tmp/w2v_batch_48.model
Model saved to /tmp/w2v_batch_49.model
Model saved to /tmp/w2v_batch_50.model
Model saved to /tmp/w2v_batch_51.model
Model saved to /tmp/w2v_batch_52.model
Model saved to /tmp/w2v_batch_53.model
Model saved to /tmp/w2v_batch_54.model
Model saved to /tmp/w2v_batch_55.model
Model saved to /tmp/w2v_batch_56.model
Model saved to /tmp/w2v_batch_57.model
Model saved to /tmp/w2v_batch_58.model
Model saved to /tmp/w2v_batch_59.model
Model saved to /tmp/w2v_batch_60.model
Model saved to /tmp/w2v_batch_61.model
Model saved to /tmp/w2v_batch_62.model
Model saved to /tmp/w2v_batch_63.model
Model saved to /tmp/w2v_batch_64.model
Model saved to /tmp/w2v_batch_65.model
Model saved to /tmp/w2v_batch_66.model
Model saved to /tmp/w2v_batch_67.model
Model saved to /tmp/w2v_batch_68.model
Model saved to /tmp/w2v_batch_69.model
Model saved to /tmp/w2v_batch_70.model
Model saved to /tmp/w2v_batch_71.model
Model saved to /tmp/w2v_batch_72.model
Model saved to /tmp/w2v_batch_73.model
Model saved to /tmp/w2v_batch_74.model
Model saved to /tmp/w2v_batch_75.model
Model saved to /tmp/w2v_batch_76.model
Model saved to /tmp/w2v_batch_77.model
Model saved to /tmp/w2v_batch_78.model
Model saved to /tmp/w2v_batch_79.model
Model saved to /tmp/w2v_batch_80.model
Model saved to /tmp/w2v_batch_81.model
Model saved to /tmp/w2v_batch_82.model
Model saved to /tmp/w2v_batch_83.model
Model saved to /tmp/w2v_batch_84.model
Model saved to /tmp/w2v_batch_85.model
Model saved to /tmp/w2v_batch_86.model
Model saved to /tmp/w2v_batch_87.model
Model saved to /tmp/w2v_batch_88.model
Model saved to /tmp/w2v_batch_89.model
Model saved to /tmp/w2v_batch_90.model
Model saved to /tmp/w2v_batch_91.model
Model saved to /tmp/w2v_batch_92.model
Model saved to /tmp/w2v_batch_93.model
Model saved to /tmp/w2v_batch_94.model
Model saved to /tmp/w2v_batch_95.model
Model saved to /tmp/w2v_batch_96.model
Model saved to /tmp/w2v_batch_97.model
Model saved to /tmp/w2v_batch_98.model
Model saved to /tmp/w2v_batch_99.model
Model saved to /tmp/w2v_batch_100.model

second variant (happend only once)

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_1.model
Model saved to /tmp/w2v_batch_4.model
Fatal Python error: GC object already tracked
Aborted (core dumped)

jbayardo · 2018-09-13T20:03:36Z

For some reason, this happened to me every time I did the training. Fact to keep in mind: I was using 14 workers.

gustavgransbo · 2019-03-21T09:42:27Z

I tried to do something very similar to @jbayardo, and had a very similar problem.

I'm also saving on_batch_end, after one hour of training.

class BatchSaver(CallbackAny2Vec):
     def __init__(self, path_prefix, start_time):
         self.path_prefix = path_prefix
         self.last_checkpoint = start_time

     def on_batch_end(self, model):
         cur_time = time.time()
         if cur_time - self.last_checkpoint > 60 * 60:
             output_path = get_tmpfile('/localdata/gustav/backup/w2v/{}_backup.model'.format(self.path_prefix))
             model.save(output_path)
             self.last_checkpoint = cur_time

The first time the model is saved, 5 out of 6 worker threads crashes:

...
2019-03-19 15:17:48,068: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 40836 words/s, in_qsize 0, out_qsize 0
2019-03-19 15:17:48,598: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:17:48,603: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:17:48,604: INFO: storing np array 'vectors' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.wv.vectors.npy
2019-03-19 15:17:48,615: INFO: storing np array 'syn1' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.trainables.syn1.npy
2019-03-19 15:17:48,730: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:17:48,978: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:20:13,438: INFO: storing np array 'syn1neg' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.trainables.syn1neg.npy
2019-03-19 15:20:25,566: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:20:25,569: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 39100 words/s, in_qsize 0, out_qsize 0
2019-03-19 15:20:40,967: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
2019-03-19 15:20:40,979: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 38940 words/s, in_qsize 0, out_qsize 0
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:22:03,181: INFO: not storing attribute vectors_norm
2019-03-19 15:22:03,182: INFO: not storing attribute cum_table
2019-03-19 15:24:40,421: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 484, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecTrainables' object has no attribute 'syn1'
                                                                                        
2019-03-19 15:24:40,423: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
2019-03-19 15:24:40,442: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 36581 words/s, in_qsize 10, out_qsize 1
...

After this, training continues with one worker thread until the first epoch is finished, at which point the process starts waiting for the workers that were killed during the first save.

Note that in my case the 4 threads crash because of AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors', and one from AttributeError: 'Word2VecTrainables' object has no attribute 'syn1'

I'm using gensim 3.7.1 and python 3.6.8.

gojomo · 2020-07-12T01:35:02Z

on_batch_end() is a really bad place to try a whole-model save, because it's called inside every separate worker thread, and many times per training-epoch, with absolutely no coordination with other in-progress worker-threads. Race issues using that callback are to be expected.

on_epoch_end(), instead, is only called from the single manager thread, when all worker-threads have finished their work. It's a more appropriate place for something that wants to write the whole state of an unchanging model.

In fact, I'd recommend removing the on_batch_begin() and on_batch_end() callbacks from CallbackAny2Vec entirely. They arrived in the under-reviewed #1777 PR and it's hard for me to imagine a compelling use for them, occurring as they do many essentially-random times, within each worker-thread, each training epoch. (If not removed, their doc-comments should warn that they're happening in a worker thread while lots of other worker threads are mutating the model or executing other simultaneous on_batch_end() calls.)

gojomo · 2021-01-18T18:26:36Z

Also, per a question on StackOverflow, I've just noticed the batch-related callbacks have never been fired for the worker_loop code paths.

So I'd again recommend removing on_batch_begin and on_batch_end from all modes & documentation, entirely, ASAP.

piskvorky · 2021-01-18T18:47:39Z

Thanks for following up @gojomo. I'm marking this ticket for 4.0.0, I think it fits a major release well.

I'll do another Gensim sprint again soon, to finish 4.0.0. I haven't seen any worthwhile feedback from beta users, so the plan is to just tie up any loose ends & release.

ghost · 2021-02-01T06:43:20Z

Is there any recommended way to run arbitrary code at a finer granularity than the epoch level?

For example, it might be useful to interleave a secondary training objective on the word vectors, which would require saving/loading the vectors. It would be nice to be able to do this every 1000 batches, for example, rather than every epoch.

gojomo · 2021-02-02T06:07:40Z

To an extent, the 'epochs' are arbitrary. While you'd want to provide your whole corpus to build_vocab() (to ensure all words & word-frequencies are properly measured), you could probably split it into as many small pseudo-epochs as you like before feeding it to train() - via some custom iterable that upon each restart only cycles over some 1/Nth of the full amount, before starting over again. In that way, you could use the epoch callback to a fairly-fine resolution (though really small splits might greatly interfere with the multithreaded-efficiency).

I've not written code to do this, but it's probably just a few lines of custom Iterable-wrapper. You'd want to increase the epochs specified to train() to ensure the right number of "real" passes occur, and I think you'd want to adjust the total_examples hint to train() to 1/Nth the real number (the actual count per fake-iteration), to get more accurate within-'epoch' learning-rate decay & progress estimates.

Alternatively: you can always edit any of the source code to do anything at any step/interval, but it can get a bit hairy in the Cython code. (Potentially, also, Gensim could in the future offer more callbacks in a better-thought-out way - probably just before/after the batches-to-each thread.)

mpenkov · 2021-02-26T10:50:58Z

@piskvorky Just wanted to confirm that the action to take here is

removing on_batch_begin and on_batch_end from all modes & documentation, entirely, ASAP.

Is my understanding correct?

piskvorky · 2021-02-26T13:38:04Z

Not sure, I'll have to read this thread. I'll try that after fixing my 4.0 tickets, OK?

mpenkov · 2021-02-26T13:38:50Z

Sure, I have other 4.0 stuff to work on in the meanwhile, so this isn't a blocker.

mpenkov · 2021-03-09T07:24:46Z

TODO for Misha:

Replace callbacks with functions that raise an runtime error when called, and alert the user that these functions are no longer supported
Update migration docs

piskvorky · 2021-03-09T07:56:08Z

Is my understanding correct?

I re-read the thread; dropping on_batch_begin and on_batch_end is indeed the action here (+any other such poorly designed callbacks – hopefully there are not more of them).

Plus update migration docs, yes.

piskvorky added the bug Issue described a bug label Sep 12, 2018

menshikh-iv mentioned this issue Sep 13, 2018

Race condition when running on_batch_end #2180

Closed

menshikh-iv added the difficulty medium Medium issue: required good gensim understanding & python skills label Sep 13, 2018

menshikh-iv assigned mpenkov Dec 14, 2018

piskvorky added this to the 4.0.0 milestone Jan 18, 2021

mpenkov mentioned this issue Mar 16, 2021

Remove on_batch_begin and on_batch_end callbacks #3078

Merged

mpenkov closed this as completed in #3078 Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word2Vec keeps on training during on_batch_end call #2182

Word2Vec keeps on training during on_batch_end call #2182

jbayardo commented Sep 12, 2018 •

edited by mpenkov

Loading

menshikh-iv commented Sep 13, 2018

jbayardo commented Sep 13, 2018

gustavgransbo commented Mar 21, 2019 •

edited

Loading

gojomo commented Jul 12, 2020 •

edited

Loading

gojomo commented Jan 18, 2021

piskvorky commented Jan 18, 2021 •

edited

Loading

ghost commented Feb 1, 2021

gojomo commented Feb 2, 2021 •

edited

Loading

mpenkov commented Feb 26, 2021

piskvorky commented Feb 26, 2021

mpenkov commented Feb 26, 2021

mpenkov commented Mar 9, 2021

piskvorky commented Mar 9, 2021

Word2Vec keeps on training during on_batch_end call #2182

Word2Vec keeps on training during on_batch_end call #2182

Comments

jbayardo commented Sep 12, 2018 • edited by mpenkov Loading

Description

Steps/Code/Corpus to Reproduce

Expected Results

Actual Results

Versions

menshikh-iv commented Sep 13, 2018

jbayardo commented Sep 13, 2018

gustavgransbo commented Mar 21, 2019 • edited Loading

gojomo commented Jul 12, 2020 • edited Loading

gojomo commented Jan 18, 2021

piskvorky commented Jan 18, 2021 • edited Loading

ghost commented Feb 1, 2021

gojomo commented Feb 2, 2021 • edited Loading

mpenkov commented Feb 26, 2021

piskvorky commented Feb 26, 2021

mpenkov commented Feb 26, 2021

mpenkov commented Mar 9, 2021

piskvorky commented Mar 9, 2021

jbayardo commented Sep 12, 2018 •

edited by mpenkov

Loading

gustavgransbo commented Mar 21, 2019 •

edited

Loading

gojomo commented Jul 12, 2020 •

edited

Loading

piskvorky commented Jan 18, 2021 •

edited

Loading

gojomo commented Feb 2, 2021 •

edited

Loading