Skip to content

Choosing a dataset for training a link_only model #1814

Answered by RobinL
finalgrrrl asked this question in Q&A
Discussion options

You must be logged in to vote

Good questions - you're right this is not straightforward.

In a perfect world you'd have a large dataset for the left and right datasets.

For the situation you describe:

  • the u values are driven by the variety of values in the combination of two datasets. This will be dominated by the larger (right) dataset, so they would probably be best trained on the right dataset.
  • the m values are more tricky because they primarily depend on the data quality. Especially if you're linking poorer quality (left) datsets to a higher quality (right) dataset, using the right dataset for training probably isn't a good idea. Training them on the right dataset will only actually work at all if you have duplica…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by finalgrrrl
Comment options

You must be logged in to vote
1 reply
@RobinL
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants