| Corpus |
Data(words) |
Tasks: Languages |
Annotation Task Definition |
Sources |
Phase |
Evaluations |
Availability |
| ACE-Pilot |
15K Pilot |
Entities-Pilot: English |
Entity Dectection and Tracking: Phase1 v2.2 |
TDT-2, Newspaper |
ACE Pilot |
May 2000 数据挖掘研究院 November 2000 |
Currently ACE/TIDES only; slated for future publication
|
| ACE-1 |
180K Train
45K Eval |
Entities: English |
Entity Detection and Tracking: Phase1 v2.2 |
TDT-2, Newspaper |
ACE PHASE1 |
February 2002 |
Currently ACE/TIDES only; slated for future publication |
| ACE-2 |
180K Train (from ACE-1 Train)
45K Dev/Test (from ACE-1 Eval)
45,000 Eval (new) |
Entities (revised treatment), Relations: English |
EDT Annotation Guidelines V2.5 RDC Annotation Guidelines V3.6 |
TDT-2, Newspaper
|
ACE PHASE2 |
September 2002 |
LDC Publication 数据挖掘交友 LDC2003T11
|
ACE-2 EELD Supplement
|
30K Train (new)
20K Eval (new) |
Entities, Relations: English
|
EDT Annotation Guidelines v2.5
RDC Annotation Guidelines V3.6 |
RCK domain
|
EELD
|
September 2002
|
EELD/ACE only
|
ACE2003 Training Data
|
100K/lang Train (new)
50K/lang Eval (new) |
Entities: English, Chinese, Arabic
Relations: English, Chinese
|
EDT Annotation Guidelines v2.5
RDC Annotation Guidelines V3.6 |
TDT-4 |
ACE PHASE2 |
TIDES Extraction, September 2003 |
LDC Publication 数据挖掘实验室 LDC2004T09 |
ACE 2004 Pilot Data |
25K English Pilot (new)
|
English:
Entities, Relations, Events
|
English Entity Guidelines V4.2.6 数据挖掘实验室
English Linking Guidelines V3.0
English Relations Guidelines V4.3.2
English Events Guidelines V2.0
|
|
Spring 2004 Mid-Course Correction Workshop 数据挖掘交友
|
2004 Pilot Study, February, 2004
|
ACE2004 Pilot Corpus: LDC2004E03
Availability: contact the LDC
|
ACE2004 Training Data
|
150K Train/lang (new)
50K Eval/lang
|
数据挖掘论坛 Entities: English, Chinese, Arabic
Relations: English, Chinese, Arabic |
English Entity Guidelines V4.2.6
English Linking Guidelines V3.0
English Relations Guidelines V4.3.2
Chinese Entity Guidelines V4.2.4
Chinese Linking Guidelines V2.0
Chinese Relations Guidelinies V4.3
Arabic Entity Guidelines V4.2.3
Arabic Linking Guidelines V1.0
Arabic Relations Guidelines V4.3 |
TDT-4; 数据挖掘论坛 Chinese Treebank; Arabic Treebank; Switchboard; Fisher;
|
ACE PHASE3 |
ACE Program/ TIDES Extraction, September 2004
|
ACE/TIDES sites can obtain the following corpora by contacting LDC
ACE/TIDES Extraction 2004 Training Data LDC2004E17, LDC2005T09
ACE/TIDES Extraction 2004 Training Data - Consistency Study LDC2004E39
ACE/TIDES Extraction 2004 Evaluation/Test Data LDC2004E51
ACE/TIDES Extraction 2004 Evaluation Data - Consistency Study LDC2004E40
|
| ACE 2005 |
Training Data (new):
English: 260K words Chinese: 308K characters (205K words) Arabic: 100K words
Evaluation Data (new):
Engish, Chinese and Arabic: 50K words
|
Entities, Relations, Events:
English, Chinese, Arabic |
English-Entities-Guidelines_v5.6.1.pdf English-Values-Guidelines_v1.2.4.pdf English-Relations-Guidelines_v5.8.3.pdf English-Events-Guidelines_v5.4.3.pdf English-TimestampingGuidelines_v3.pdf 数据挖掘论坛 English-TIMEX2-Guidelines_v0.1.pdf
Chinese-Entities-Guidelines_v5.5.pdf Chinese-Values-Guidelines_v1.1.2.pdf Chinese-Relations-Guidelines_v5.5.1.pdf Chinese-Events-Guidelines_v5.5.1.pdf Chinese-TIMEX2-Guideline-Summary_v1.2.pdf 数据挖掘论坛 Chinese-Timestamping-Guidelines_v2.pdf
Arabic-Entities-Guidelines_v5.3.3.pdf Arabic-Values-Guidelines_v1.2.3.pdf Arabic-Relations-Guidelines_v5.3.4.pdf Arabic-Events-Guidelines_v5.4.4.pdf |
Newswire; 数据挖掘交友
Broadcast News;
Broadcast Conversation;
WebBlogs;
WebForums;
English Fisher Telephone Transcripts; |
ACE 2005 |
November 2005 |
ACE sites can obtain the following corpus by contacting LDC
ACE 2005 Multilingual Training Data V6.0: LDC2005E18 |