Title:

EveTAR: a large-scale multi-task test collection over Arabic tweets

Repository:

Qatar University

Repository URL:

http://qufaculty.qu.edu.qa/telsayed/evetar/

Creator(s):

bigIR research group

Subjects:

Arabic
Microblogs
Events

Dates:

12/30/2014 - 02/02/2015

Number of Tweets:

355,821,033

Description:

EveTAR test collection, the first Arabic freely-available Test Collection for multiple information retrieval tasks in Twitter. It supports Event Detection (ED), Ad-hoc search (AS), Timeline generation (TTG), Real-time summarization (RTS). EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). Besides the full collection (EveTAR-F), we provide four different subsets of EveTAR: (1) EveTAR-S: Random sample of 15M tweets (2) EveTAR-S.m: MSA tweets of the sample (3) EveTAR-S.d: Dialectal tweets of the sample (4) EveTAR-Q: Judged tweets only