Documenting the Now
DocNow Tweet Catalog
Description:

362,464,578 tweet ids for tweets directed at Donald Trump (@realDonaldTrump), collected with Documenting the Now's twarc. Tweets can be “rehydrated” with Documenting the Now’s twarc, or Hydrator.

twarc hydrate to_realdonaldtrump_20210120_ids.txt > to_realdonaldtrump_20210120.jsonl.

Collection notes:

  • Tweets from May 7, 2017 - October 16, 2018 of the dataset used a combination of the Filter (Streaming) API and Search API.
  • The Filter API failed on June 21, 2017.
  • From June 23, 2017 forward only the Search API was used to collect.
  • Collection was done every 5 days on a cron job, and periodically deduplicated.
  • There is a data gap from Tue Jul 28 13:53:50 +0000 2020 through Thu Aug 06 09:36:23 +0000 2020 due to a collection error.

This dataset also includes a number of derivative csv files from the original jsonl collected. This includes:

User csv:

jq -r '[.id_str, .created_at, .user.screen_name, .retweeted_status != null] | @csv' to_realdonaldtrump_20190130.jsonl > to_realdonaldtrump_20190130_users.jsonl