Description:
362,464,578 tweet ids for tweets directed at Donald Trump (@realDonaldTrump), collected with Documenting the Now's twarc. Tweets can be “rehydrated” with Documenting the Now’s twarc, or Hydrator.
twarc hydrate to_realdonaldtrump_20210120_ids.txt > to_realdonaldtrump_20210120.jsonl
.
Collection notes:
- Tweets from May 7, 2017 - October 16, 2018 of the dataset used a combination of the Filter (Streaming) API and Search API.
- The Filter API failed on June 21, 2017.
- From June 23, 2017 forward only the Search API was used to collect.
- Collection was done every 5 days on a cron job, and periodically deduplicated.
- There is a data gap from
Tue Jul 28 13:53:50 +0000 2020
throughThu Aug 06 09:36:23 +0000 2020
due to a collection error.
This dataset also includes a number of derivative csv files from the original jsonl
collected. This includes:
- A user csv file created with jq (see below).
- twut userInfo
- twut language
- twut times
- twut sources
- twut hashtags
- twut urls
- twut animatedGifUrls
- twut imageUrls
- twut mediaUrls
- twut videoUrls
User csv:
jq -r '[.id_str, .created_at, .user.screen_name, .retweeted_status != null] | @csv' to_realdonaldtrump_20190130.jsonl > to_realdonaldtrump_20190130_users.jsonl