This repository has been archived by the owner on Mar 11, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 562
"sliding window" bigtable training mode #713
Open
amj
wants to merge
17
commits into
tensorflow:master
Choose a base branch
from
amj:bt-running-window
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 11 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
d40fa04
first cut at training with BT over many blocks
32fdca9
bad at math
39807ee
try it this way, concat inside get_many
3a3f2a9
syntax
145a6fc
typo
68f3e79
another typo
7389196
help i cant type
b68bc5e
move batching inside the loop
936638a
collapse from key,data to just data. make rotation always on
9091cca
fix double concat.
0cab4e9
PR comments.
82f5a8b
Merge branch 'master' into bt-running-window
fc0e3a0
extract to flags
8518517
add moar params
b29875a
fix main code branch
e90ce64
lint
cdfe391
Update flags correctly
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -132,6 +132,36 @@ def after_run(self, run_context, run_values): | |
self.before_weights = None | ||
|
||
|
||
def train_many(start_at=1000000, num_datasets=3): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you expose moves here also. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what do you mean? number of steps? |
||
""" Trains on a set of bt_datasets, skipping eval for now. | ||
(from preprocessing.get_many_tpu_bt_input_tensors) | ||
""" | ||
if not FLAGS.use_tpu and FLAGS.use_bt: | ||
raise ValueError("Only tpu & bt mode supported") | ||
|
||
tf.logging.set_verbosity(tf.logging.INFO) | ||
estimator = dual_net.get_estimator() | ||
effective_batch_size = FLAGS.train_batch_size * FLAGS.num_tpu_cores | ||
|
||
def _input_fn(params): | ||
games = bigtable_input.GameQueue( | ||
FLAGS.cbt_project, FLAGS.cbt_instance, FLAGS.cbt_table) | ||
games_nr = bigtable_input.GameQueue( | ||
FLAGS.cbt_project, FLAGS.cbt_instance, FLAGS.cbt_table + '-nr') | ||
|
||
return preprocessing.get_many_tpu_bt_input_tensors( | ||
games, games_nr, params['batch_size'], | ||
start_at=start_at, num_datasets=num_datasets) | ||
|
||
hooks = [] | ||
steps = num_datasets * FLAGS.steps_to_train | ||
logging.info("Training, steps = %s, batch = %s -> %s examples", | ||
steps or '?', effective_batch_size, | ||
(steps * effective_batch_size) if steps else '?') | ||
|
||
estimator.train(_input_fn, steps=steps, hooks=hooks) | ||
|
||
|
||
def train(*tf_records: "Records to train on"): | ||
"""Train on examples.""" | ||
tf.logging.set_verbosity(tf.logging.INFO) | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the general approach: if the training loop does multiple scans, I would expect to create a new dataset for each pass, rather than try to create a single enormous dataset, which I imagine would be harder to debug, inspect, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but multiple calls to tpuestimator.train will create new graphs :( I am not sure what a good solution for lazy evaluating of these Datasets would be. As it is, it takes a real long time to build the datasets before training even starts -- i suspect the concatenate is doing something bad as things get slower and slower.