Linguistic Data Consortium

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1,701 to 1,750 of 1,819 Results

TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014 Nov 17, 2017 Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2017, "TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014", https://hdl.handle.net/11272.1/AB2/XOE0NF, Abacus Data Network, V1 TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014 was developed by the Linguistic Data Consortium and contains training and evaluation data produced in support of the TAC KBP Chinese Cross-lingual Entity Linking tasks in 2011, 201...
Ancient Chinese Corpus Oct 18, 2017 Chen, Xiaohe; Li, Bin; Feng, Minxuan; Xu, Chao; Xu, Runhua; Shi, Min; Yu, Lili; Xiao, Lei; Wang, Qingqing, 2017, "Ancient Chinese Corpus", https://hdl.handle.net/11272.1/AB2/4HYBFE, Abacus Data Network, V1 Ancient Chinese Corpus was developed at Nanjing Normal University. It contains word-segmented and part-of-speech tagged text from Zuozhuan, an ancient Chinese work believed to date from the Warring States Period (475-221 BC). Zuozhuan is a commentary on the Chunqui, a history of...
MWE-Aware English Dependency Corpus 2.0 Oct 18, 2017 Kato, Akihiko; Shindo, Hiroyuki; Matsumoto, Yuji, 2017, "MWE-Aware English Dependency Corpus 2.0", https://hdl.handle.net/11272.1/AB2/GKYOY9, Abacus Data Network, V1 MWE-Aware English Dependency Corpus Version 2.0 was developed by the Nara Institute of Science and Technology Computational Linguistics Laboratory and consists of English compound function words annotated in dependency format. The data is derived from OntoNotes Release 5.0 (LDC20...
RATS Keyword Spotting Oct 18, 2017 Graff, David; Ma, Xiaoyi; Strassel, Stephanie; Walker, Kevin; Jones, Karen, 2017, "RATS Keyword Spotting", https://hdl.handle.net/11272.1/AB2/IFVKNB, Abacus Data Network, V1 RATS Keyword Spotting was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 3,100 hours of Levantine Arabic and Farsi conversational telephone speech with automatic and manual annotation of speech segments, transcripts and keywords generated from...
English Web Treebank Propbank Oct 18, 2017 O'Gorman, Tim; Conger, Katherine; Palmer, Martha, 2017, "English Web Treebank Propbank", https://hdl.handle.net/11272.1/AB2/Q8LILM, Abacus Data Network, V1 English Web Treebank Propbank, LDC Catalog Number LDC2017T15 and ISBN 1-58563-818-8, was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and provides predicate-argument structure annotation for English Web Treebank (LDC2012T...
Multi-Language Conversational Telephone Speech 2011 -- South Asian Oct 15, 2017 Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2017, "Multi-Language Conversational Telephone Speech 2011 -- South Asian", https://hdl.handle.net/11272.1/AB2/JPGPJM, Abacus Data Network, V1 Multi-Language Conversational Telephone Speech 2011 – South Asian was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 118 hours of telephone speech in five distinct language varieties of South Asia (i.e. the Indian sub-continent): Bengali, Hind...
SRI-FRTIV Sep 14, 2017 Shriberg, Elizabeth; Kathol, Andreas; Graciarena, Martin; Bratt, Harry; Kajarekar, Sachin; Jameel, Huda; Richey, Colleen; Goodman, Fred, 2017, "SRI-FRTIV", https://hdl.handle.net/11272.1/AB2/YONFH9, Abacus Data Network, V1 SRI-FRTIV (Five-way Recorded Toastmaster Intrinsic Variation) was developed by SRI International in 2007-2008 and is comprised of approximately 232 hours of English speech from thirty-four speakers who were members of Toastmaster clubs. Participants were asked to speak at three d...
2015-2016 CoNLL Shared Task Sep 14, 2017 Xue, Nianwen; Ng, Hwee Tou; Pradhan, Sameer; Rutherford, Attapol T.; Webber, Bonnie; Wang, Chuan; Wang, Hong Min; Prasad, Rashmi, 2017, "2015-2016 CoNLL Shared Task", https://hdl.handle.net/11272.1/AB2/TSNLNO, Abacus Data Network, V1 2015-2016 CoNLL Shared Task, LDC Catalog Number LDC2017T13 and ISBN 1-58563-812-9, contains the Chinese and English training, development and test data for the 2015 and 2016 CoNLL (Conference on Computational Natural Language Learning) Shared Task Evaluation which focused on shal...
GALE Phase 4 Arabic Broadcast Conversation Speech Aug 15, 2017 Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2017, "GALE Phase 4 Arabic Broadcast Conversation Speech", https://hdl.handle.net/11272.1/AB2/XFDC1A, Abacus Data Network, V1 GALE Phase 4 Arabic Broadcast Conversation Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 75 hours of Arabic broadcast conversation speech collected in 2008 and 2009 by LDC, MediaNet, Tunis, Tunisia and MTC, Rabat, Morocco during Ph...
GALE Phase 4 Arabic Broadcast Conversation Transcripts Aug 15, 2017 Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2017, "GALE Phase 4 Arabic Broadcast Conversation Transcripts", https://hdl.handle.net/11272.1/AB2/WLEBLW, Abacus Data Network, V1 GALE Phase 4 Arabic Broadcast Conversation Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 75 hours of Arabic broadcast conversation speech collected in 2008 and 2009 by LDC, MediaNet, Tunis, Tunisia and MTC, Rabat, M...
Metalogue Multi-Issue Bargaining Dialogue Jul 18, 2017 Petukhova, Volha; Malchanau, Andrei; Oualil, Youssef; Klakow, Dietrich; Stevens, Christopher; Weerd, Harmen de; Taatgen, Niels, 2017, "Metalogue Multi-Issue Bargaining Dialogue", https://hdl.handle.net/11272.1/AB2/U57KQP, Abacus Data Network, V1 Metalogue Multi-Issue Bargaining Dialogue was developed by the Metalogue Consortium under the European Community’s Seventh Framework Programme for Research and Technological Development. This release consists of approximately 2.5 hours of semantically annotated English dialogue d...
KSUEmotions Jul 18, 2017 Meftah, Ali Hamid; Alotaibi, Yousef Ajami; Selouani, Sid-Ahmed, 2017, "KSUEmotions", https://hdl.handle.net/11272.1/AB2/3HNHPQ, Abacus Data Network, V1 KSUEmotions was developed by King Saud University (KSU) and contains approximately five hours of emotional Modern Standard Arabic (MSA) speech from 23 subjects. Speakers were from three countries: Yemen, Saudi Arabia and Syria. Subjects read MSA sentences from newswire text in th...
Abstract Meaning Representation (AMR) Annotation Release 2.0 Jun 15, 2017 Knight, Kevin; Badarau, Bianca; Baranescu, Laura; Bonial, Claire; Bardocz, Madalina; Griffitt, Kira; Hermjakob, Ulf; Marcu, Daniel; Palmer, Martha; O'Gorman, Tim; Schneider, Nathan, 2017, "Abstract Meaning Representation (AMR) Annotation Release 2.0", https://hdl.handle.net/11272.1/AB2/8MN4GE, Abacus Data Network, V1 Abstract Meaning Representation (AMR) Annotation Release 2.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado’s Computational Language and Educational Research group and the Information Sciences Institute at the Universi...
Multi-Language Conversational Telephone Speech 2011 -- Turkish May 15, 2017 Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2017, "Multi-Language Conversational Telephone Speech 2011 -- Turkish", https://hdl.handle.net/11272.1/AB2/FPNZZV, Abacus Data Network, V1 Introduction Multi-Language Conversational Telephone Speech 2011 -- Turkish was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 18 hours of telephone speech in Turkish. The data were collected primarily to support research and technology evalua...
The EventStatus Corpus May 15, 2017 Huang, Ruihong; Jurafsky, Daniel; Riloff, Ellen, 2017, "The EventStatus Corpus", https://hdl.handle.net/11272.1/AB2/EGUSOP, Abacus Data Network, V1 Introdution The EventStatus Corpus was developed by researchers at Texas A&M University, Stanford University and The University of Utah. It consists of approximately 3,000 English and 1,500 Spanish news articles about civil unrest events annotated with temporal tags. This corpus...
IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a May 15, 2017 Benowitz, Daniel; Bills, Aric; Conners, Thomas; Dubinski, Eyal; Fiscus, Jonathan; Harper, Mary; Heighway, Melanie; Le, Hanh; Melot, Jennifer; Onaka, Akiko; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2017, "IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a", https://hdl.handle.net/11272.1/AB2/ME10OS, Abacus Data Network, V1 Introduction IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 207 hours of Lao conversational and scripted telephone speech collected in 2013 along...
Phrase Detectives Corpus May 15, 2017 Chamberlain, Jon; Poesio, Massimo; Kruschwitz, Udo, 2017, "Phrase Detectives Corpus", https://hdl.handle.net/11272.1/AB2/NN2QFX, Abacus Data Network, V1 Introduction Phrase Detectives Corpus was developed by the School of Computer Science and Electronic Engineering at the University of Essex and consists of approximately 19,012 words across 40 documents anaphorically-annotated by the Phrase Detectives game, an online interactive...
CHiME2 Grid Apr 17, 2017 Vincent, Emmanuel; Barker, Jon; Watanabe, Shinji; Le Roux, Jonathan; Nesta, Francesco; Matassoni, Marco, 2017, "CHiME2 Grid", https://hdl.handle.net/11272.1/AB2/ASLFRE, Abacus Data Network, V1 Introduction CHiME2 Grid was developed as part of The 2nd CHiME Speech Separation and Recognition Challenge and contains approximately 120 hours of English speech from a noisy living room environment. The CHiME Challenges focus on distant-microphone automatic speech recognition (...
BOLT Egyptian Arabic SMS/Chat and Transliteration Apr 17, 2017 Song, Zhiyi; Fore, Dana; Strassel, Stephanie; Lee, Haejoong; Wright, Jonathan, 2017, "BOLT Egyptian Arabic SMS/Chat and Transliteration", https://hdl.handle.net/11272.1/AB2/7I6ANJ, Abacus Data Network, V1 Introduction BOLT Egyptian Arabic SMS/Chat and Transliteration was developed by the Linguistic Data Consortium (LDC) and consists of naturally-occurring Short Message Service (SMS) and Chat (CHT) data collected through data donations and live collection involving native speakers...
GALE English-Chinese Parallel Aligned Treebank -- Training Mar 17, 2017 Li, Xuansong; Grimes, Stephen; Strassel, Stephanie; Ma, Xiaoyi; Xue, Nianwen; Marcus, Mitch; Taylor, Ann, 2017, "GALE English-Chinese Parallel Aligned Treebank -- Training", https://hdl.handle.net/11272.1/AB2/QROJQB, Abacus Data Network, V1 Introduction GALE English-Chinese Parallel Aligned Treebank – Training was developed by the Linguistic Data Consortium (LDC) and contains 196,123 tokens of word aligned English and Chinese parallel text with treebank annotations. This material was used as training data in the DAR...
BOLT Chinese Discussion Forum Parallel Training Data Mar 17, 2017 Song, Zhiyi; Garland, Jennifer; Walker, Christopher; Strassel, Stephanie, 2017, "BOLT Chinese Discussion Forum Parallel Training Data", https://hdl.handle.net/11272.1/AB2/EWIO27, Abacus Data Network, V1 Introduction BOLT Chinese Discussion Forum Parallel Training Data was developed by the Linguistic Data Consortium (LDC) and consists of 1,876,799 tokens of Chinese discussion forum data collected for the DARPA BOLT program along with their corresponding English translations. The...
GALE Phase 3 Arabic Broadcast News Speech Part 2 Feb 15, 2017 Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2017, "GALE Phase 3 Arabic Broadcast News Speech Part 2", https://hdl.handle.net/11272.1/AB2/SRRGAW, Abacus Data Network, V1 Introduction GALE Phase 3 Arabic Broadcast News Speech Part 2 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 128 hours of Arabic broadcast news speech collected in 2007 by the Linguistic Data Consortium (LDC), MediaNet, Tunis, Tunisia and...
IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 Jan 19, 2017 Andrus, Tony; Bills, Aric; Corris, Miriam; Dubinski, Eyal; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Le, Hanh; Ray, Jessica; Rytting, Anton; Silber, Ronnie; Shen, Wade; Tzoukermann, Evelyne, 2017, "IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7", https://hdl.handle.net/11272.1/AB2/CSHOZ8, Abacus Data Network, V1 Introduction IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 201 hours of Vietnamese conversational and scripted telephone speech collected i...
TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 Dec 15, 2016 Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2016, "TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014", https://hdl.handle.net/11272.1/AB2/HL83QO, Abacus Data Network, V1 Introduction TAC KBP Spanish Cross-Lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP Spanish Cross-lingual Entity Linking...
Bamanankan Lexicon Dec 15, 2016 Bamba, Moussa, 2016, "Bamanankan Lexicon", https://hdl.handle.net/11272.1/AB2/OOCBVZ, Abacus Data Network, V1 Introduction Bamanankan Lexicon was developed by the Linguistic Data Consortium (LDC) and contains 5,978 entries of the Bamanankan language presented as a Bamanankan-English lexicon and a Bamanankan-French lexicon. It is the third publication in an LDC project to build an electro...
GALE Phase 4 Arabic Newswire Parallel Sentences Dec 15, 2016 Song, Zhiyi; Krug, Gary; Strassel, Stephanie, 2016, "GALE Phase 4 Arabic Newswire Parallel Sentences", https://hdl.handle.net/11272.1/AB2/R1M8ZY, Abacus Data Network, V1 Introduction GALE Phase 4 Arabic Newswire Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global Autonomous Language Exploitation) Program....
IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g Dec 15, 2016 Conners, Thomas; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Jarrett, Amy; Lin, Willa; Molina, María Encarnación Pérez; Rafalko, Shawna; Ray, Jessica; Rytting, Anton; Shen, Wade; Tzoukermann, Evelyne, 2016, "IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g", https://hdl.handle.net/11272.1/AB2/IULTZX, Abacus Data Network, V1 Introduction IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 213 hours of Tagalog conversational and scripted telephone speech collected in 2012...
GALE Phase 3 and 4 Chinese Newswire Parallel Text Nov 15, 2016 Song, Zhiyi; Krug, Gary; Strassel, Stephanie, 2016, "GALE Phase 3 and 4 Chinese Newswire Parallel Text", https://hdl.handle.net/11272.1/AB2/KYZUJ0, Abacus Data Network, V1 Introduction GALE Phase 3 and 4 Chinese Newswire Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA GALE (Global Autonomous Language Exploitation)...
Multi-Language Conversational Telephone Speech 2011 -- Slavic Group Nov 15, 2016 Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2016, "Multi-Language Conversational Telephone Speech 2011 -- Slavic Group", https://hdl.handle.net/11272.1/AB2/OL5RQH, Abacus Data Network, V1 Introduction Multi-Language Conversational Telephone Speech 2011 -- Slavic Group was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 60 hours of telephone speech in each of three distinct Slavic languages: Polish, Russian and Ukranian. The data...
IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a Nov 15, 2016 Bills, Aric; David, Anne; Dubinski, Eyal; Fiscus, Jonathan; Hammond, Simon; Gann, Ketty; Harper, Mary; Hefright, Brook; Kazi, Michael; Lam, Julie; Ray, Jessica; Richardson, Fred; Rytting, Anton; Walter, Marle, 2016, "IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a", https://hdl.handle.net/11272.1/AB2/W0TIWB, Abacus Data Network, V1 Introduction IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 190 hours of Georgian conversational and scripted telephone speech collected in 2...
IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5 Oct 19, 2016 Andresen, Jess; Bills, Aric; Dubinski, Eyal; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; J. Hazen, T.; Jarrett, Amy; Roomi, Bergul; Ray, Jessica; Rytting, Anton; Shen, Wade; Tzoukermann, Evelyne, 2016, "IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5", https://hdl.handle.net/11272.1/AB2/GYXA1F, Abacus Data Network, V1 Introduction IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5 was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 213 hours of Turkish conversational and scripted telephone speech collected in 2012...
Richer Event Description Oct 19, 2016 O'Gorman, Tim; Palmer, Martha, 2016, "Richer Event Description", https://hdl.handle.net/11272.1/AB2/H5RQJH, Abacus Data Network, V1 Introduction Richer Event Description was developed by the University of Colorado Boulder-CLEAR (Computational Language and Education Research, Carnegie Mellon University and LDC. It consists of coreference, bridging and event-event relations (temporal, causal, subevent and repor...
IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY Sep 15, 2016 Adams, Nikki; Bills, Aric; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Jarrett, Amy; Khugyani, Kamila; Lin, Willa; Ray, Jessica; Rytting, Anton; Shen, Wade; Strahan, Tania; Tzoukermann, Evelyne, 2016, "IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY", https://hdl.handle.net/11272.1/AB2/GLFN3X, Abacus Data Network, V1 Introduction IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 214 hours of Pashto conversational and scripted telephone speech collected in 2011...
ARL Arabic Dependency Treebank Sep 15, 2016 Tratz, Stephen, 2016, "ARL Arabic Dependency Treebank", https://hdl.handle.net/11272.1/AB2/GKAG4O, Abacus Data Network, V1 Introduction ARL Arabic Dependency Treebank was developed by the US Army Research Laboratory (ARL) and was derived from four LDC resources: Arabic Treebank (ATB) Part 1 v 4.1 (LDC2010T13), Part 2 v 3.1 (LDC2011T09), Part 3 v 3.2 (LDC2010T08) and Broadcast News v 1.0 (LDC2012T07)....
IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a Aug 16, 2016 Bills, Aric; David, Anne; Dubinski, Eyal; Fiscus, Jonathan; Gillies, Breanna; Gnanadesikan, Amalia; Harper, Mary; Hammond, Simon; Jarrett, Amy; Molina, María; Ray, Jessica; Rytting, Anton; Paget, Shelly; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne; Wong, Jamie, 2016, "IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a", https://hdl.handle.net/11272.1/AB2/9JCM5S, Abacus Data Network, V1 Introduction IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 205 hours of Assamese conversational and scripted telephone speech collected in 2...
IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b Aug 16, 2016 Bills, Aric; David, Anne; Dubinski, Eyal; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; Jarrett, Amy; Molina, María; Ray, Jessica; Rytting, Anton; Paget, Shelly; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne; Wong, Jamie, 2016, "IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b", https://hdl.handle.net/11272.1/AB2/WKL40N, Abacus Data Network, V1 Introduction IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 215 hours of Bengali conversational and scripted telephone speech collected in 201...
GALE Phase 3 Arabic Broadcast News Speech Part 1 Aug 15, 2016 Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio Denise; Strassel, Stephanie, 2016, "GALE Phase 3 Arabic Broadcast News Speech Part 1", https://hdl.handle.net/11272.1/AB2/B0XGQD, Abacus Data Network, V1 GALE Phase 3 Arabic Broadcast News Speech Part 1 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 132 hours of Arabic broadcast news speech collected in 2007 by the Linguistic Data Consortium (LDC), MediaNet, Tunis, Tunisia and MTC, Rabat, M...
GALE Phase 3 Arabic Broadcast News Transcripts Part 1 Aug 15, 2016 Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2016, "GALE Phase 3 Arabic Broadcast News Transcripts Part 1", https://hdl.handle.net/11272.1/AB2/IQOADN, Abacus Data Network, V1 GALE Phase 3 Arabic Broadcast News Transcripts Part 1 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 132 hours of Arabic broadcast news speech collected in 2007 by the Linguistic Data Consortium (LDC), MediaNet, Tunis, Tunisia a...
IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c Jul 19, 2016 Andrus, Tony; Dubinski, Eyal; Fiscus, Jonathan G.; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Lin, Willa; Ray, Jessica; Rytting, Anton; Shen, Wade; Tzoukermann, Evelyne; Wong, Jamie, 2016, "IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c", https://hdl.handle.net/11272.1/AB2/01SD6T, Abacus Data Network, V1 IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 215 hours of Cantonese conversational and scripted telephone speech collected in 2011 along w...
GALE Phase 3 and 4 Chinese Broadcast News Parallel Text Jul 15, 2016 Song, Zhiyi; Krug, Gary; Jiang, Zixin; Strassel, Stephanie, 2016, "GALE Phase 3 and 4 Chinese Broadcast News Parallel Text", https://hdl.handle.net/11272.1/AB2/CE2DP3, Abacus Data Network, V1 Introduction GALE Phase 3 and 4 Chinese Broadcast News Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA GALE (Global Autonomous Language Exploit...
English Speed Networking Conversational Transcripts Jul 15, 2016 Muir, Kate; Joinson, Adam; Cotterill, Rachel; Dewdney, Nigel, 2016, "English Speed Networking Conversational Transcripts", https://hdl.handle.net/11272.1/AB2/LX2FQA, Abacus Data Network, V1 Introduction English Speed Networking Conversational Transcripts was developed at the University of the West of England and contains 388 transcripts of English face-to-face and instant messaging conversations about business ideas collected in 2014 and 2015 from participants (unde...
Digital Archive of Southern Speech - NLP Version Jul 15, 2016 Kretzschmar Jr., William; Bounds, Paulina; Hettel, Jacqueline; Coats, Steven; Pederson, Lee; Lena Opas-Hänninen, Lisa; Juuso, Ilkka; Seppänen, Tapio, 2016, "Digital Archive of Southern Speech - NLP Version", https://hdl.handle.net/11272.1/AB2/F4QH6S, Abacus Data Network, V1 Introduction Digital Archive of Southern Speech - NLP Version (DASS-NLP) was developed by LDC as an alternate version of Digital Archive of Southern Speech (DASS) (LDC2012S03) suitable for natural language processing and human language technology applications. Specifically, the o...
GALE Phase 4 Arabic Weblog Parallel Sentences Jun 15, 2016 Song, Zhiyi; Krug, Gary; Jiang, Zixin; Strassel, Stephanie, 2016, "GALE Phase 4 Arabic Weblog Parallel Sentences", https://hdl.handle.net/11272.1/AB2/3GAMIQ, Abacus Data Network, V1 Introduction GALE Phase 4 Arabic Weblog Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global Autonomous Language Exploitation) Program. T...
Chinese Treebank 9.0 Jun 15, 2016 Xue, Nianwen; Zhang, Xiuhong; Jiang, Zixin; Palmer, Martha; Xia, Fei; Chiou, Fu-Dong; Chang, Meiyu, 2016, "Chinese Treebank 9.0", https://hdl.handle.net/11272.1/AB2/YYY4FY, Abacus Data Network, V1 Introduction Chinese Treebank 9.0 consists of approximately two million words of annotated and parsed text from Chinese newswire, government documents, magazine articles, various broadcast news and broadcast conversation programs, web newsgroups, weblogs, discussion forums, chat...
GALE Phase 4 Chinese Broadcast Conversation Transcripts May 16, 2016 Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2016, "GALE Phase 4 Chinese Broadcast Conversation Transcripts", https://hdl.handle.net/11272.1/AB2/QOKU34, Abacus Data Network, V1 Introduction GALE Phase 4 Chinese Broadcast Conversation Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 172 hours of Chinese broadcast conversation speech collected in 2008 by LDC and Hong Kong University of Science...
GALE Phase 4 Chinese Broadcast Conversation Speech May 16, 2016 Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2016, "GALE Phase 4 Chinese Broadcast Conversation Speech", https://hdl.handle.net/11272.1/AB2/Y6ZKMX, Abacus Data Network, V1 Introduction GALE Phase 4 Chinese Broadcast Conversation Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 172 hours of Mandarin Chinese broadcast conversation speech collected in 2008 by LDC and Hong Kong University of Science and Tec...
GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences Apr 18, 2016 Song, Zhiyi; Krug, Gary; Strassel, Stephanie, 2016, "GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences", https://hdl.handle.net/11272.1/AB2/FGSLZN, Abacus Data Network, V1 Introduction GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global Autonomous Language Exploita...
HAVIC Pilot Transcription Apr 18, 2016 Tracey, Jennifer; Strassel, Stephanie; Morris, Amanda; Li, Xuansong; Antonishek, Brian; Fiscus, Jonathan, 2016, "HAVIC Pilot Transcription", https://hdl.handle.net/11272.1/AB2/ODUSVC, Abacus Data Network, V1 Introduction HAVIC Pilot Transcription was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 72 hours of user-generated videos with transcripts based on the English speech audio extracted from the videos. This data set was created in collaboratio...
H1 Children's Writing Apr 18, 2016 Berkling, Kay, 2016, "H1 Children's Writing", https://hdl.handle.net/11272.1/AB2/OJCHNV, Abacus Data Network, V1 Introduction H1 Children's Writing was developed by the Cooperative State University Baden-Württemberg, University of Education. It consists of 996 texts written over three months by 88 German school children age seven through eleven years. The data in this corpus was collected b...
GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text Mar 15, 2016 Song, Zhiyi; Krug, Gary; Strassel, Stephanie, 2016, "GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text", https://hdl.handle.net/11272.1/AB2/JVLMY4, Abacus Data Network, V1 Introduction GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA GALE (Global Autonomous Language...

TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014

Nov 17, 2017

Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2017, "TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014", https://hdl.handle.net/11272.1/AB2/XOE0NF, Abacus Data Network, V1

TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014 was developed by the Linguistic Data Consortium and contains training and evaluation data produced in support of the TAC KBP Chinese Cross-lingual Entity Linking tasks in 2011, 201...

Ancient Chinese Corpus

Oct 18, 2017

Chen, Xiaohe; Li, Bin; Feng, Minxuan; Xu, Chao; Xu, Runhua; Shi, Min; Yu, Lili; Xiao, Lei; Wang, Qingqing, 2017, "Ancient Chinese Corpus", https://hdl.handle.net/11272.1/AB2/4HYBFE, Abacus Data Network, V1

Ancient Chinese Corpus was developed at Nanjing Normal University. It contains word-segmented and part-of-speech tagged text from Zuozhuan, an ancient Chinese work believed to date from the Warring States Period (475-221 BC). Zuozhuan is a commentary on the Chunqui, a history of...

MWE-Aware English Dependency Corpus 2.0

Oct 18, 2017

Kato, Akihiko; Shindo, Hiroyuki; Matsumoto, Yuji, 2017, "MWE-Aware English Dependency Corpus 2.0", https://hdl.handle.net/11272.1/AB2/GKYOY9, Abacus Data Network, V1

MWE-Aware English Dependency Corpus Version 2.0 was developed by the Nara Institute of Science and Technology Computational Linguistics Laboratory and consists of English compound function words annotated in dependency format. The data is derived from OntoNotes Release 5.0 (LDC20...

RATS Keyword Spotting

Oct 18, 2017

Graff, David; Ma, Xiaoyi; Strassel, Stephanie; Walker, Kevin; Jones, Karen, 2017, "RATS Keyword Spotting", https://hdl.handle.net/11272.1/AB2/IFVKNB, Abacus Data Network, V1

RATS Keyword Spotting was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 3,100 hours of Levantine Arabic and Farsi conversational telephone speech with automatic and manual annotation of speech segments, transcripts and keywords generated from...

English Web Treebank Propbank

Oct 18, 2017

O'Gorman, Tim; Conger, Katherine; Palmer, Martha, 2017, "English Web Treebank Propbank", https://hdl.handle.net/11272.1/AB2/Q8LILM, Abacus Data Network, V1

English Web Treebank Propbank, LDC Catalog Number LDC2017T15 and ISBN 1-58563-818-8, was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and provides predicate-argument structure annotation for English Web Treebank (LDC2012T...

Multi-Language Conversational Telephone Speech 2011 -- South Asian

Oct 15, 2017

Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2017, "Multi-Language Conversational Telephone Speech 2011 -- South Asian", https://hdl.handle.net/11272.1/AB2/JPGPJM, Abacus Data Network, V1

Multi-Language Conversational Telephone Speech 2011 – South Asian was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 118 hours of telephone speech in five distinct language varieties of South Asia (i.e. the Indian sub-continent): Bengali, Hind...

SRI-FRTIV

Sep 14, 2017

Shriberg, Elizabeth; Kathol, Andreas; Graciarena, Martin; Bratt, Harry; Kajarekar, Sachin; Jameel, Huda; Richey, Colleen; Goodman, Fred, 2017, "SRI-FRTIV", https://hdl.handle.net/11272.1/AB2/YONFH9, Abacus Data Network, V1

SRI-FRTIV (Five-way Recorded Toastmaster Intrinsic Variation) was developed by SRI International in 2007-2008 and is comprised of approximately 232 hours of English speech from thirty-four speakers who were members of Toastmaster clubs. Participants were asked to speak at three d...

2015-2016 CoNLL Shared Task

Sep 14, 2017

Xue, Nianwen; Ng, Hwee Tou; Pradhan, Sameer; Rutherford, Attapol T.; Webber, Bonnie; Wang, Chuan; Wang, Hong Min; Prasad, Rashmi, 2017, "2015-2016 CoNLL Shared Task", https://hdl.handle.net/11272.1/AB2/TSNLNO, Abacus Data Network, V1

2015-2016 CoNLL Shared Task, LDC Catalog Number LDC2017T13 and ISBN 1-58563-812-9, contains the Chinese and English training, development and test data for the 2015 and 2016 CoNLL (Conference on Computational Natural Language Learning) Shared Task Evaluation which focused on shal...

GALE Phase 4 Arabic Broadcast Conversation Speech

Aug 15, 2017

Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2017, "GALE Phase 4 Arabic Broadcast Conversation Speech", https://hdl.handle.net/11272.1/AB2/XFDC1A, Abacus Data Network, V1

GALE Phase 4 Arabic Broadcast Conversation Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 75 hours of Arabic broadcast conversation speech collected in 2008 and 2009 by LDC, MediaNet, Tunis, Tunisia and MTC, Rabat, Morocco during Ph...

GALE Phase 4 Arabic Broadcast Conversation Transcripts

Aug 15, 2017

Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2017, "GALE Phase 4 Arabic Broadcast Conversation Transcripts", https://hdl.handle.net/11272.1/AB2/WLEBLW, Abacus Data Network, V1

GALE Phase 4 Arabic Broadcast Conversation Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 75 hours of Arabic broadcast conversation speech collected in 2008 and 2009 by LDC, MediaNet, Tunis, Tunisia and MTC, Rabat, M...

Metalogue Multi-Issue Bargaining Dialogue

Jul 18, 2017

Petukhova, Volha; Malchanau, Andrei; Oualil, Youssef; Klakow, Dietrich; Stevens, Christopher; Weerd, Harmen de; Taatgen, Niels, 2017, "Metalogue Multi-Issue Bargaining Dialogue", https://hdl.handle.net/11272.1/AB2/U57KQP, Abacus Data Network, V1

Metalogue Multi-Issue Bargaining Dialogue was developed by the Metalogue Consortium under the European Community’s Seventh Framework Programme for Research and Technological Development. This release consists of approximately 2.5 hours of semantically annotated English dialogue d...

KSUEmotions

Jul 18, 2017

Meftah, Ali Hamid; Alotaibi, Yousef Ajami; Selouani, Sid-Ahmed, 2017, "KSUEmotions", https://hdl.handle.net/11272.1/AB2/3HNHPQ, Abacus Data Network, V1

KSUEmotions was developed by King Saud University (KSU) and contains approximately five hours of emotional Modern Standard Arabic (MSA) speech from 23 subjects. Speakers were from three countries: Yemen, Saudi Arabia and Syria. Subjects read MSA sentences from newswire text in th...

Abstract Meaning Representation (AMR) Annotation Release 2.0

Jun 15, 2017

Knight, Kevin; Badarau, Bianca; Baranescu, Laura; Bonial, Claire; Bardocz, Madalina; Griffitt, Kira; Hermjakob, Ulf; Marcu, Daniel; Palmer, Martha; O'Gorman, Tim; Schneider, Nathan, 2017, "Abstract Meaning Representation (AMR) Annotation Release 2.0", https://hdl.handle.net/11272.1/AB2/8MN4GE, Abacus Data Network, V1

Abstract Meaning Representation (AMR) Annotation Release 2.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado’s Computational Language and Educational Research group and the Information Sciences Institute at the Universi...

Multi-Language Conversational Telephone Speech 2011 -- Turkish

May 15, 2017

Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2017, "Multi-Language Conversational Telephone Speech 2011 -- Turkish", https://hdl.handle.net/11272.1/AB2/FPNZZV, Abacus Data Network, V1

Introduction Multi-Language Conversational Telephone Speech 2011 -- Turkish was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 18 hours of telephone speech in Turkish. The data were collected primarily to support research and technology evalua...

The EventStatus Corpus

May 15, 2017

Huang, Ruihong; Jurafsky, Daniel; Riloff, Ellen, 2017, "The EventStatus Corpus", https://hdl.handle.net/11272.1/AB2/EGUSOP, Abacus Data Network, V1

Introdution The EventStatus Corpus was developed by researchers at Texas A&M University, Stanford University and The University of Utah. It consists of approximately 3,000 English and 1,500 Spanish news articles about civil unrest events annotated with temporal tags. This corpus...

IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a

May 15, 2017

Benowitz, Daniel; Bills, Aric; Conners, Thomas; Dubinski, Eyal; Fiscus, Jonathan; Harper, Mary; Heighway, Melanie; Le, Hanh; Melot, Jennifer; Onaka, Akiko; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2017, "IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a", https://hdl.handle.net/11272.1/AB2/ME10OS, Abacus Data Network, V1

Introduction IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 207 hours of Lao conversational and scripted telephone speech collected in 2013 along...

Phrase Detectives Corpus

May 15, 2017

Chamberlain, Jon; Poesio, Massimo; Kruschwitz, Udo, 2017, "Phrase Detectives Corpus", https://hdl.handle.net/11272.1/AB2/NN2QFX, Abacus Data Network, V1

Introduction Phrase Detectives Corpus was developed by the School of Computer Science and Electronic Engineering at the University of Essex and consists of approximately 19,012 words across 40 documents anaphorically-annotated by the Phrase Detectives game, an online interactive...

CHiME2 Grid

Apr 17, 2017

Vincent, Emmanuel; Barker, Jon; Watanabe, Shinji; Le Roux, Jonathan; Nesta, Francesco; Matassoni, Marco, 2017, "CHiME2 Grid", https://hdl.handle.net/11272.1/AB2/ASLFRE, Abacus Data Network, V1

Introduction CHiME2 Grid was developed as part of The 2nd CHiME Speech Separation and Recognition Challenge and contains approximately 120 hours of English speech from a noisy living room environment. The CHiME Challenges focus on distant-microphone automatic speech recognition (...

BOLT Egyptian Arabic SMS/Chat and Transliteration

Apr 17, 2017

Song, Zhiyi; Fore, Dana; Strassel, Stephanie; Lee, Haejoong; Wright, Jonathan, 2017, "BOLT Egyptian Arabic SMS/Chat and Transliteration", https://hdl.handle.net/11272.1/AB2/7I6ANJ, Abacus Data Network, V1

Introduction BOLT Egyptian Arabic SMS/Chat and Transliteration was developed by the Linguistic Data Consortium (LDC) and consists of naturally-occurring Short Message Service (SMS) and Chat (CHT) data collected through data donations and live collection involving native speakers...

GALE English-Chinese Parallel Aligned Treebank -- Training

Mar 17, 2017

Li, Xuansong; Grimes, Stephen; Strassel, Stephanie; Ma, Xiaoyi; Xue, Nianwen; Marcus, Mitch; Taylor, Ann, 2017, "GALE English-Chinese Parallel Aligned Treebank -- Training", https://hdl.handle.net/11272.1/AB2/QROJQB, Abacus Data Network, V1

Introduction GALE English-Chinese Parallel Aligned Treebank – Training was developed by the Linguistic Data Consortium (LDC) and contains 196,123 tokens of word aligned English and Chinese parallel text with treebank annotations. This material was used as training data in the DAR...

BOLT Chinese Discussion Forum Parallel Training Data

Mar 17, 2017

Song, Zhiyi; Garland, Jennifer; Walker, Christopher; Strassel, Stephanie, 2017, "BOLT Chinese Discussion Forum Parallel Training Data", https://hdl.handle.net/11272.1/AB2/EWIO27, Abacus Data Network, V1

Introduction BOLT Chinese Discussion Forum Parallel Training Data was developed by the Linguistic Data Consortium (LDC) and consists of 1,876,799 tokens of Chinese discussion forum data collected for the DARPA BOLT program along with their corresponding English translations. The...

GALE Phase 3 Arabic Broadcast News Speech Part 2

Feb 15, 2017

Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2017, "GALE Phase 3 Arabic Broadcast News Speech Part 2", https://hdl.handle.net/11272.1/AB2/SRRGAW, Abacus Data Network, V1

Introduction GALE Phase 3 Arabic Broadcast News Speech Part 2 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 128 hours of Arabic broadcast news speech collected in 2007 by the Linguistic Data Consortium (LDC), MediaNet, Tunis, Tunisia and...

IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7

Jan 19, 2017

Andrus, Tony; Bills, Aric; Corris, Miriam; Dubinski, Eyal; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Le, Hanh; Ray, Jessica; Rytting, Anton; Silber, Ronnie; Shen, Wade; Tzoukermann, Evelyne, 2017, "IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7", https://hdl.handle.net/11272.1/AB2/CSHOZ8, Abacus Data Network, V1

Introduction IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 201 hours of Vietnamese conversational and scripted telephone speech collected i...

TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014

Dec 15, 2016

Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2016, "TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014", https://hdl.handle.net/11272.1/AB2/HL83QO, Abacus Data Network, V1

Introduction TAC KBP Spanish Cross-Lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP Spanish Cross-lingual Entity Linking...

Bamanankan Lexicon

Dec 15, 2016

Bamba, Moussa, 2016, "Bamanankan Lexicon", https://hdl.handle.net/11272.1/AB2/OOCBVZ, Abacus Data Network, V1

Introduction Bamanankan Lexicon was developed by the Linguistic Data Consortium (LDC) and contains 5,978 entries of the Bamanankan language presented as a Bamanankan-English lexicon and a Bamanankan-French lexicon. It is the third publication in an LDC project to build an electro...

GALE Phase 4 Arabic Newswire Parallel Sentences

Dec 15, 2016

Song, Zhiyi; Krug, Gary; Strassel, Stephanie, 2016, "GALE Phase 4 Arabic Newswire Parallel Sentences", https://hdl.handle.net/11272.1/AB2/R1M8ZY, Abacus Data Network, V1

Introduction GALE Phase 4 Arabic Newswire Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global Autonomous Language Exploitation) Program....

IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g

Dec 15, 2016

Conners, Thomas; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Jarrett, Amy; Lin, Willa; Molina, María Encarnación Pérez; Rafalko, Shawna; Ray, Jessica; Rytting, Anton; Shen, Wade; Tzoukermann, Evelyne, 2016, "IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g", https://hdl.handle.net/11272.1/AB2/IULTZX, Abacus Data Network, V1

Introduction IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 213 hours of Tagalog conversational and scripted telephone speech collected in 2012...

GALE Phase 3 and 4 Chinese Newswire Parallel Text

Nov 15, 2016

Song, Zhiyi; Krug, Gary; Strassel, Stephanie, 2016, "GALE Phase 3 and 4 Chinese Newswire Parallel Text", https://hdl.handle.net/11272.1/AB2/KYZUJ0, Abacus Data Network, V1

Introduction GALE Phase 3 and 4 Chinese Newswire Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA GALE (Global Autonomous Language Exploitation)...

Multi-Language Conversational Telephone Speech 2011 -- Slavic Group

Nov 15, 2016

Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2016, "Multi-Language Conversational Telephone Speech 2011 -- Slavic Group", https://hdl.handle.net/11272.1/AB2/OL5RQH, Abacus Data Network, V1

Introduction Multi-Language Conversational Telephone Speech 2011 -- Slavic Group was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 60 hours of telephone speech in each of three distinct Slavic languages: Polish, Russian and Ukranian. The data...

IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a

Nov 15, 2016

Bills, Aric; David, Anne; Dubinski, Eyal; Fiscus, Jonathan; Hammond, Simon; Gann, Ketty; Harper, Mary; Hefright, Brook; Kazi, Michael; Lam, Julie; Ray, Jessica; Richardson, Fred; Rytting, Anton; Walter, Marle, 2016, "IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a", https://hdl.handle.net/11272.1/AB2/W0TIWB, Abacus Data Network, V1

Introduction IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 190 hours of Georgian conversational and scripted telephone speech collected in 2...

IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5

Oct 19, 2016

Andresen, Jess; Bills, Aric; Dubinski, Eyal; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; J. Hazen, T.; Jarrett, Amy; Roomi, Bergul; Ray, Jessica; Rytting, Anton; Shen, Wade; Tzoukermann, Evelyne, 2016, "IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5", https://hdl.handle.net/11272.1/AB2/GYXA1F, Abacus Data Network, V1

Introduction IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5 was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 213 hours of Turkish conversational and scripted telephone speech collected in 2012...

Richer Event Description

Oct 19, 2016

O'Gorman, Tim; Palmer, Martha, 2016, "Richer Event Description", https://hdl.handle.net/11272.1/AB2/H5RQJH, Abacus Data Network, V1

Introduction Richer Event Description was developed by the University of Colorado Boulder-CLEAR (Computational Language and Education Research, Carnegie Mellon University and LDC. It consists of coreference, bridging and event-event relations (temporal, causal, subevent and repor...

IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY

Sep 15, 2016

Adams, Nikki; Bills, Aric; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Jarrett, Amy; Khugyani, Kamila; Lin, Willa; Ray, Jessica; Rytting, Anton; Shen, Wade; Strahan, Tania; Tzoukermann, Evelyne, 2016, "IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY", https://hdl.handle.net/11272.1/AB2/GLFN3X, Abacus Data Network, V1

Introduction IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 214 hours of Pashto conversational and scripted telephone speech collected in 2011...

ARL Arabic Dependency Treebank

Sep 15, 2016

Tratz, Stephen, 2016, "ARL Arabic Dependency Treebank", https://hdl.handle.net/11272.1/AB2/GKAG4O, Abacus Data Network, V1

Introduction ARL Arabic Dependency Treebank was developed by the US Army Research Laboratory (ARL) and was derived from four LDC resources: Arabic Treebank (ATB) Part 1 v 4.1 (LDC2010T13), Part 2 v 3.1 (LDC2011T09), Part 3 v 3.2 (LDC2010T08) and Broadcast News v 1.0 (LDC2012T07)....

IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a

Aug 16, 2016

Bills, Aric; David, Anne; Dubinski, Eyal; Fiscus, Jonathan; Gillies, Breanna; Gnanadesikan, Amalia; Harper, Mary; Hammond, Simon; Jarrett, Amy; Molina, María; Ray, Jessica; Rytting, Anton; Paget, Shelly; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne; Wong, Jamie, 2016, "IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a", https://hdl.handle.net/11272.1/AB2/9JCM5S, Abacus Data Network, V1

Introduction IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 205 hours of Assamese conversational and scripted telephone speech collected in 2...

IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b

Aug 16, 2016

Bills, Aric; David, Anne; Dubinski, Eyal; Fiscus, Jonathan; Gillies, Breanna; Harper, Mary; Jarrett, Amy; Molina, María; Ray, Jessica; Rytting, Anton; Paget, Shelly; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne; Wong, Jamie, 2016, "IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b", https://hdl.handle.net/11272.1/AB2/WKL40N, Abacus Data Network, V1

Introduction IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 215 hours of Bengali conversational and scripted telephone speech collected in 201...

GALE Phase 3 Arabic Broadcast News Speech Part 1

Aug 15, 2016

Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio Denise; Strassel, Stephanie, 2016, "GALE Phase 3 Arabic Broadcast News Speech Part 1", https://hdl.handle.net/11272.1/AB2/B0XGQD, Abacus Data Network, V1

GALE Phase 3 Arabic Broadcast News Speech Part 1 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 132 hours of Arabic broadcast news speech collected in 2007 by the Linguistic Data Consortium (LDC), MediaNet, Tunis, Tunisia and MTC, Rabat, M...

GALE Phase 3 Arabic Broadcast News Transcripts Part 1

Aug 15, 2016

Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2016, "GALE Phase 3 Arabic Broadcast News Transcripts Part 1", https://hdl.handle.net/11272.1/AB2/IQOADN, Abacus Data Network, V1

GALE Phase 3 Arabic Broadcast News Transcripts Part 1 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 132 hours of Arabic broadcast news speech collected in 2007 by the Linguistic Data Consortium (LDC), MediaNet, Tunis, Tunisia a...

IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c

Jul 19, 2016

Andrus, Tony; Dubinski, Eyal; Fiscus, Jonathan G.; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Lin, Willa; Ray, Jessica; Rytting, Anton; Shen, Wade; Tzoukermann, Evelyne; Wong, Jamie, 2016, "IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c", https://hdl.handle.net/11272.1/AB2/01SD6T, Abacus Data Network, V1

IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 215 hours of Cantonese conversational and scripted telephone speech collected in 2011 along w...

GALE Phase 3 and 4 Chinese Broadcast News Parallel Text

Jul 15, 2016

Song, Zhiyi; Krug, Gary; Jiang, Zixin; Strassel, Stephanie, 2016, "GALE Phase 3 and 4 Chinese Broadcast News Parallel Text", https://hdl.handle.net/11272.1/AB2/CE2DP3, Abacus Data Network, V1

Introduction GALE Phase 3 and 4 Chinese Broadcast News Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA GALE (Global Autonomous Language Exploit...

English Speed Networking Conversational Transcripts

Jul 15, 2016

Muir, Kate; Joinson, Adam; Cotterill, Rachel; Dewdney, Nigel, 2016, "English Speed Networking Conversational Transcripts", https://hdl.handle.net/11272.1/AB2/LX2FQA, Abacus Data Network, V1

Introduction English Speed Networking Conversational Transcripts was developed at the University of the West of England and contains 388 transcripts of English face-to-face and instant messaging conversations about business ideas collected in 2014 and 2015 from participants (unde...

Digital Archive of Southern Speech - NLP Version

Jul 15, 2016

Kretzschmar Jr., William; Bounds, Paulina; Hettel, Jacqueline; Coats, Steven; Pederson, Lee; Lena Opas-Hänninen, Lisa; Juuso, Ilkka; Seppänen, Tapio, 2016, "Digital Archive of Southern Speech - NLP Version", https://hdl.handle.net/11272.1/AB2/F4QH6S, Abacus Data Network, V1

Introduction Digital Archive of Southern Speech - NLP Version (DASS-NLP) was developed by LDC as an alternate version of Digital Archive of Southern Speech (DASS) (LDC2012S03) suitable for natural language processing and human language technology applications. Specifically, the o...

GALE Phase 4 Arabic Weblog Parallel Sentences

Jun 15, 2016

Song, Zhiyi; Krug, Gary; Jiang, Zixin; Strassel, Stephanie, 2016, "GALE Phase 4 Arabic Weblog Parallel Sentences", https://hdl.handle.net/11272.1/AB2/3GAMIQ, Abacus Data Network, V1

Introduction GALE Phase 4 Arabic Weblog Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global Autonomous Language Exploitation) Program. T...

Chinese Treebank 9.0

Jun 15, 2016

Xue, Nianwen; Zhang, Xiuhong; Jiang, Zixin; Palmer, Martha; Xia, Fei; Chiou, Fu-Dong; Chang, Meiyu, 2016, "Chinese Treebank 9.0", https://hdl.handle.net/11272.1/AB2/YYY4FY, Abacus Data Network, V1

Introduction Chinese Treebank 9.0 consists of approximately two million words of annotated and parsed text from Chinese newswire, government documents, magazine articles, various broadcast news and broadcast conversation programs, web newsgroups, weblogs, discussion forums, chat...

GALE Phase 4 Chinese Broadcast Conversation Transcripts

May 16, 2016

Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2016, "GALE Phase 4 Chinese Broadcast Conversation Transcripts", https://hdl.handle.net/11272.1/AB2/QOKU34, Abacus Data Network, V1

Introduction GALE Phase 4 Chinese Broadcast Conversation Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 172 hours of Chinese broadcast conversation speech collected in 2008 by LDC and Hong Kong University of Science...

GALE Phase 4 Chinese Broadcast Conversation Speech

May 16, 2016

Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2016, "GALE Phase 4 Chinese Broadcast Conversation Speech", https://hdl.handle.net/11272.1/AB2/Y6ZKMX, Abacus Data Network, V1

Introduction GALE Phase 4 Chinese Broadcast Conversation Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 172 hours of Mandarin Chinese broadcast conversation speech collected in 2008 by LDC and Hong Kong University of Science and Tec...

GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences

Apr 18, 2016

Song, Zhiyi; Krug, Gary; Strassel, Stephanie, 2016, "GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences", https://hdl.handle.net/11272.1/AB2/FGSLZN, Abacus Data Network, V1

Introduction GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global Autonomous Language Exploita...

HAVIC Pilot Transcription

Apr 18, 2016

Tracey, Jennifer; Strassel, Stephanie; Morris, Amanda; Li, Xuansong; Antonishek, Brian; Fiscus, Jonathan, 2016, "HAVIC Pilot Transcription", https://hdl.handle.net/11272.1/AB2/ODUSVC, Abacus Data Network, V1

Introduction HAVIC Pilot Transcription was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 72 hours of user-generated videos with transcripts based on the English speech audio extracted from the videos. This data set was created in collaboratio...

H1 Children's Writing

Apr 18, 2016

Berkling, Kay, 2016, "H1 Children's Writing", https://hdl.handle.net/11272.1/AB2/OJCHNV, Abacus Data Network, V1

Introduction H1 Children's Writing was developed by the Cooperative State University Baden-Württemberg, University of Education. It consists of 996 texts written over three months by 88 German school children age seven through eleven years. The data in this corpus was collected b...

GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text

Mar 15, 2016

Song, Zhiyi; Krug, Gary; Strassel, Stephanie, 2016, "GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text", https://hdl.handle.net/11272.1/AB2/JVLMY4, Abacus Data Network, V1

Introduction GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA GALE (Global Autonomous Language...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications