ACE Data Overview

Project Specifications: ACE Data Overview

Corpus Data(words) Tasks: Languages Annotation Task Definition Sources Phase Evaluations Availability
ACE-Pilot 15K Pilot Entities-Pilot: English Entity Dectection and Tracking: Phase1 v2.2 TDT-2, Newspaper ACE Pilot May 2000

数据挖掘研究院


November 2000
Currently ACE/TIDES only; slated for future publication
ACE-1 180K Train

45K Eval
Entities:
English
Entity Detection and Tracking: Phase1 v2.2 TDT-2, Newspaper ACE PHASE1 February 2002 Currently ACE/TIDES only; slated for future publication
ACE-2 180K Train
(from ACE-1 Train)

 45K Dev/Test
(from ACE-1 Eval)

45,000 Eval (new)
Entities (revised treatment), Relations:
English
EDT Annotation Guidelines V2.5
 
RDC Annotation Guidelines V3.6
TDT-2, Newspaper
ACE PHASE2 September 2002 LDC Publication 数据挖掘交友
LDC2003T11
ACE-2 EELD Supplement
30K Train (new)

20K Eval (new)
Entities, Relations:
English
EDT Annotation Guidelines v2.5

RDC Annotation Guidelines V3.6
RCK domain
EELD
September 2002
EELD/ACE only
ACE2003 Training Data
100K/lang Train (new)

50K/lang Eval (new)
Entities: English, Chinese, Arabic

Relations: English, Chinese
EDT Annotation Guidelines v2.5

RDC Annotation Guidelines V3.6
TDT-4 ACE PHASE2 TIDES Extraction, September 2003 LDC Publication 数据挖掘实验室
LDC2004T09




ACE 2004 Pilot Data




25K English Pilot (new)




English:

Entities, Relations, Events
English Entity  Guidelines V4.2.6 数据挖掘实验室

English Linking Guidelines V3.0

English Relations Guidelines V4.3.2

English Events Guidelines V2.0




Spring 2004 Mid-Course Correction Workshop 数据挖掘交友





2004 Pilot Study,
February, 2004



ACE2004 Pilot Corpus: LDC2004E03

Availability: contact the LDC
ACE2004 Training Data
150K Train/lang (new)


50K Eval/lang

数据挖掘论坛


Entities: English, Chinese, Arabic


Relations: 
English, Chinese, Arabic
English Entity  Guidelines V4.2.6

English Linking Guidelines V3.0

English Relations Guidelines V4.3.2



Chinese Entity Guidelines V4.2.4

Chinese Linking Guidelines V2.0

Chinese Relations Guidelinies V4.3



Arabic Entity Guidelines V4.2.3

Arabic Linking Guidelines V1.0

Arabic Relations Guidelines V4.3
TDT-4;

数据挖掘论坛


Chinese Treebank;
Arabic Treebank;
Switchboard;
Fisher;

ACE PHASE3 ACE Program/
TIDES Extraction, September 2004


ACE/TIDES sites can obtain the following corpora by contacting LDC


ACE/TIDES Extraction 2004 Training Data
LDC2004E17, LDC2005T09


ACE/TIDES Extraction 2004 Training Data - Consistency Study
LDC2004E39


ACE/TIDES Extraction 2004 Evaluation/Test Data
LDC2004E51


ACE/TIDES Extraction 2004 Evaluation Data - Consistency Study
LDC2004E40
ACE 2005 Training Data (new):

English: 260K words
Chinese: 308K characters (205K words)
Arabic: 100K words

Evaluation Data (new):

Engish, Chinese and Arabic: 50K words
Entities, Relations, Events:

English, Chinese, Arabic
English-Entities-Guidelines_v5.6.1.pdf
English-Values-Guidelines_v1.2.4.pdf
English-Relations-Guidelines_v5.8.3.pdf
English-Events-Guidelines_v5.4.3.pdf
English-TimestampingGuidelines_v3.pdf 数据挖掘论坛
English-TIMEX2-Guidelines_v0.1.pdf

Chinese-Entities-Guidelines_v5.5.pdf
Chinese-Values-Guidelines_v1.1.2.pdf
Chinese-Relations-Guidelines_v5.5.1.pdf Chinese-Events-Guidelines_v5.5.1.pdf
Chinese-TIMEX2-Guideline-Summary_v1.2.pdf 数据挖掘论坛
Chinese-Timestamping-Guidelines_v2.pdf

Arabic-Entities-Guidelines_v5.3.3.pdf
Arabic-Values-Guidelines_v1.2.3.pdf
Arabic-Relations-Guidelines_v5.3.4.pdf
Arabic-Events-Guidelines_v5.4.4.pdf
Newswire;

数据挖掘交友



Broadcast News;

Broadcast Conversation;

WebBlogs;

WebForums;

English Fisher Telephone Transcripts;
ACE 2005 November 2005 ACE sites can obtain the following corpus by contacting LDC

ACE 2005 Multilingual Training Data V6.0: LDC2005E18

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Agile Business Process Management with Sense and Respond
下一篇:Data Recovery: Engineers vs. Software, Part 1
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • 什么是精益生产及精益生产管理方法的特点
  • 网络舆情互联网信息监控分析系统显威力
  • 10 Ways Process Excellence Impacts Corpo
  • 高速公路现代化管理技术的研究
  • 从信息时代迈向“概念时代”
  • Microsoft Word-OpenOffice Translator Hit
  • 企业信息化的疑问:ERP是否已经过时?
  • “图灵”趣话
  • 知识管理是什么?
  • BI and Search: A Shorter Route to Better
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • Consolidating Learning Management System
  • 从信息时代迈向“概念时代”
  • 基于SOA内容管理:使知识更易获取
  • 企业信息化的疑问:ERP是否已经过时?
  • 搜索引擎是否侵权不便表态
  • ExaGrid Sees Big Future in Storage Grid
  • 知识管理是什么?
  • 高速公路现代化管理技术的研究
  • 汉和助广东科龙股份有限公司实现知识共享
  • BEA WebLogic帮助长沙卷烟厂建设信息门户
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静