You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Retweets commonly have the form: RT @original_tweeter Original tweet text.
Twitter appears to truncate the retweet, including the prefixes, to 144 characters.
In tweets which Twitter "recognizes" as retweets, i.e. where mytweet["retweeted_status"] is not None, the original, non-truncated tweet text is available as mytweet["retweeted_status"]["text"].
This could in theory be used to replace the original (and truncated) tweet text portion of the retweet in item_text.
We should verify that these always match; are there cases where the retweet text might diverge from the original tweet? If so, then replacing it might create an accuracy/integrity issue, and we might not want to overwrite it (although we would never change the raw JSON as stored - this discussion is only regarding item_text).
More conservative options might include:
Adding a new field - e.g. original_text_of_retweet or something to that effect - to which we extract ["retweet_status"]["text"] and make it available (optionally?) in extracts. We could include a column with the original tweet text, and another column with our best guess at "fixing" the retweet.
Adding a flag to the extract commands to indicate whether or not to "fix" item_text. This still entails the risk that extracts then include item_text values that don't match item_text in our database.
Note also that the ["truncated"] node seems to be unreliable. As an example, this retweet truncated the original tweet, but ["truncated"] is false: http://sfm.library.gwu.edu/twitter-item/7695264/
The text was updated successfully, but these errors were encountered:
kerchner
changed the title
For retweets, consider using original tweet text (when present) to repair truncation by Twitter
For retweets, consider extracting text of original tweet (when present) to provide fuller context for truncated retweets
Jan 6, 2015
We do something like this for is_retweet, adding a column to the csv output using our own logic to catch retweets that didn't use twitter's retweet function. Researchers asked for this.
Has someone asked us to do something like this?
At most we should add a value rather than changing anything received directly from twitter.
@dchud yes this was requested by the student project team from the Elliott School when they noticed that the text of some retweets is truncated (relative to the original tweet).
It sounds like you concur with the first bullet in the comment above (the first comment) that at most we should add a new value to surface ["retweet_status"]["text"] when present - and/or a new value which computes a "complete" (i.e. un-truncated) retweet using ["retweet_status"]["text"] when present.
Retweets commonly have the form: RT @original_tweeter Original tweet text.
Twitter appears to truncate the retweet, including the prefixes, to 144 characters.
In tweets which Twitter "recognizes" as retweets, i.e. where mytweet["retweeted_status"] is not None, the original, non-truncated tweet text is available as mytweet["retweeted_status"]["text"].
This could in theory be used to replace the original (and truncated) tweet text portion of the retweet in item_text.
We should verify that these always match; are there cases where the retweet text might diverge from the original tweet? If so, then replacing it might create an accuracy/integrity issue, and we might not want to overwrite it (although we would never change the raw JSON as stored - this discussion is only regarding item_text).
More conservative options might include:
Note also that the ["truncated"] node seems to be unreliable. As an example, this retweet truncated the original tweet, but ["truncated"] is false: http://sfm.library.gwu.edu/twitter-item/7695264/
The text was updated successfully, but these errors were encountered: