Skip to main content
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 50 of 244 Results
Dec 2, 2021
Palmer, Martha; Hwang, Jena D.; Mansouri, Aous; Bonial, Claire; O'Gorman, Tim; Gung, James, 2021, "BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/YS81IR, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank annotation on Egyp...
Dec 2, 2021
Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2021, "Second DIHARD Challenge Development - Eleven Sources", https://hdl.handle.net/11272.1/AB2/CBFPZO, Abacus Data Network, V1
Abstract Introduction Second DIHARD Challenge Development - Eleven Sources was developed by LDC and contains approximately 22 hours of English and Chinese speech data along with corresponding annotations used in support of the Second DIHARD Challenge. The DIHARD Challenges are a...
Nov 18, 2021
Maamouri, Mohamed; Bies, Ann; Kulick, Seth; Krouna, Sondos; Tabassi, Dalila; Ciul, Michael, 2021, "BOLT Egyptian Arabic Treebank - SMS/Chat", https://hdl.handle.net/11272.1/AB2/1DSLOX, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic Treebank - SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of Egyptian Arabic SMS/Chat data with part-of-speech annotation, morphology, and syntactic tree annotation. The DARPA BOLT (Broad Operational Language...
Nov 18, 2021
Keating, Patricia; Kreiman, Jody; Alwan, Abeer; Chong, Adam; Lee, Yoonjeong, 2021, "UCLA Speaker Variability Database", https://hdl.handle.net/11272.1/AB2/CIIVXT, Abacus Data Network, V1
Abstract Introduction UCLA Speaker Variability Database was developed by UCLA Speech Processing and Auditory Perception Laboratory and is comprised of approximately 34 hours of English speech and orthographic transcripts. This corpus was designed to sample variability in speaking...
Oct 28, 2021
Graff, David; Ma, Xiaoyi; Strassel, Stephanie; Walker, Kevin; Jones, Karen, 2021, "RATS Speaker Identification", https://hdl.handle.net/11272.1/AB2/BZYHPS, Abacus Data Network, V1
Abstract Introduction RATS Speaker Identification was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 1,900 hours of Levantine Arabic, Farsi, Dari, Pashto and Urdu conversational telephone speech with annotations of speech segments. The audio w...
Oct 26, 2021
Godfrey, John J.; Holliman, Edward, 2021, "Switchboard-1 Release 2", https://hdl.handle.net/11272.1/AB2/VTPSCK, Abacus Data Network, V1
Abstract Introduction The Switchboard-1 Telephone Speech Corpus (LDC97S62) consists of approximately 260 hours of speech and was originally collected by Texas Instruments in 1990-1, under DARPA sponsorship. The first release of the corpus was published by NIST and distributed by...
Oct 14, 2021
Mena, Carlos Daniel Hernández; Ruiz, Iván Vladimir Meza, 2021, "Wikipedia Spanish Speech and Transcripts", https://hdl.handle.net/11272.1/AB2/L05NFF, Abacus Data Network, V1
Abstract Introduction Wikipedia Spanish Speech and Transcripts consists of approximately 25 hours of Spanish read speech and transcripts. The read text was taken from the Spanish version of WikiProject Spoken Wikipedia, referred to as Wikipedia Grabada. The transcripts were devel...
Oct 14, 2021
Tracey, Jennifer; Delgado, Dana; Chen, Song; Strassel, Stephanie, 2021, "BOLT Egyptian Arabic SMS/Chat Parallel Training Data", https://hdl.handle.net/11272.1/AB2/WXML9A, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic SMS/Chat Parallel Training Data was developed by the Linguistic Data Consortium (LDC) and consists of approximately 723,000 tokens of Egyptian Arabic SMS/Chat data collected for the DARPA BOLT program along with their corresponding Engli...
Oct 14, 2021
Alsheddi, Abeer, 2021, "Classical Arabic Dictionary", https://hdl.handle.net/11272.1/AB2/FQ7PIS, Abacus Data Network, V1
Abstract Introduction Classical Arabic Dictionary consists of approximately one hundred million words of Arabic collected from texts dating between 431 and 1104 CE, principally books and essays, along with word occurrences, source documents and related metadata. Data The dictiona...
Oct 1, 2021
Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Lim, Lynn-Li; Malyska, Nicolas; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Shen, Sinney; Smith, Rosanna, 2021, "IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b", https://hdl.handle.net/11272.1/AB2/IFBL6A, Abacus Data Network, V1
Abstract Introduction IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Halh Mongolian conversational and scripted telephone speec...
Sep 29, 2021
Andresen, Jess; Bills, Aric; Conners, Thomas; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Kozlov, Kirill; Malyska, Nicolas; Melot, Jennifer; Morrison, Michelle; Phillips, Josh; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne; Wong, Jamie, 2021, "IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d", https://hdl.handle.net/11272.1/AB2/TNSSDU, Abacus Data Network, V2
Abstract Introduction IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 350 hours of Swahili conversational and scripted telephone speech collect...
Sep 29, 2021
Tracey, Jennifer; Graff, David; Strassel, Stephanie; Arrigo, Michael; Wright, Jonathan; Bies, Ann, 2021, "LORELEI Oromo Incident Language Pack", https://hdl.handle.net/11272.1/AB2/EH7NXF, Abacus Data Network, V1
Abstract Introduction LORELEI Oromo Incident Language Pack was developed by the Linguistic Data Consortium and is comprised of approximately 3.9 million words of Oromo monolingual text, 25,000 words of English monolingual text, 135,000 words of parallel and comparable Oromo-Engli...
Sep 3, 2021
Neergaard, Karl David; Xu, Hongzhi; Huang, Chu-Ren, 2021, "Database of Word Level Statistics - Mandarin", https://hdl.handle.net/11272.1/AB2/VJDPA0, Abacus Data Network, V1
Abstract Introduction Database of Word Level Statistics - Mandarin was developed by The Hong Kong Polytechnic University. It provides lexical characteristics of a descriptive and statistical nature for words and nonwords of Mandarin Chinese. It is designed for researchers particu...
Sep 3, 2021
Knight, Kevin; Badarau, Bianca; Baranescu, Laura; Bonial, Claire; Bardocz, Madalina; Griffitt, Kira; Hermjakob, Ulf; Marcu, Daniel; Palmer, Martha; O'Gorman, Tim; Schneider, Nathan, 2021, "Abstract Meaning Representation (AMR) Annotation Release 3.0", https://hdl.handle.net/11272.1/AB2/82CVJF, Abacus Data Network, V1
Abstract Introduction Abstract Meaning Representation (AMR) Annotation Release 3.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Ins...
Sep 3, 2021
Sluyter-Gaethje, Henny; Bourgonje, Peter; Stede, Manfred, 2021, "Penn Discourse Treebank Version 2.0 - German Translation", https://hdl.handle.net/11272.1/AB2/1AXWBN, Abacus Data Network, V1
Abstract Introduction Penn Discourse Treebank Version 2.0 - German Translation was developed at the University of Potsdam's Applied Computational Linguistics group and consists of approximately one million tokens derived from Penn Discourse Treebank Version 2.0 (LDC2008T05). This...
Sep 3, 2021
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2021, "TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010", https://hdl.handle.net/11272.1/AB2/VAZOSD, Abacus Data Network, V1
Abstract Introduction TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010 was developed by the Linguistic Data Consortium and contains training and evaluation data produced in support of the 2010 TAC KBP Surprise Slot Filling track, the only y...
Sep 3, 2021
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2021, "TAC KBP English Sentiment Slot Filling -- Comprehensive Training and Evaluation Data 2013-2014", https://hdl.handle.net/11272.1/AB2/MRZALN, Abacus Data Network, V1
Abstract Introduction TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010 was developed by the Linguistic Data Consortium and contains training and evaluation data produced in support of the 2013 and 2014 TAC KBP Sentiment Slot Filling tracks....
Sep 3, 2021
Daza, Angel; Frank, Anette, 2021, "X-SRL: Parallel Cross-lingual Semantic Role Labeling", https://hdl.handle.net/11272.1/AB2/DNOJP9, Abacus Data Network, V1
Abstract Introduction X-SRL: Parallel Cross-lingual Semantic Role Labeling was developed by Heidelberg University, Department of Computational Linguistics and the Leibniz Institute for the German Language (IDS). It consists of approximately three million words of German, French a...
Sep 3, 2021
Arase, Yuki; Tsujii, Junichi, 2021, "ESPADA", https://hdl.handle.net/11272.1/AB2/ANSK9Z, Abacus Data Network, V1
Abstract Introduction ESPADA (Extended Syntactic Phrase Alignment DAtaset) consists of annotated parse trees and alignment on English sentential paraphrases extracted from machine translation evaluation corpora. It extends SPADE (LDC2018T09) by adding new annotated data for train...
Sep 3, 2021
Tracey, Jennifer; Delgado, Dana; Chen, Song; Strassel, Stephanie, 2021, "BOLT Chinese SMS/Chat Parallel Training Data", https://hdl.handle.net/11272.1/AB2/O3JTA9, Abacus Data Network, V1
Abstract Introduction BOLT Chinese SMS/Chat Parallel Training Data was developed by the Linguistic Data Consortium and consists of approximately 1.8 million tokens of Chinese SMS/Chat data collected for the DARPA BOLT program along with their corresponding English translations Th...
Sep 3, 2021
Li, Bin; Xiao, Liming; Liu, Yihuan; Wen, Yuan; Song, Li; Chun, Jayeol; Feng, Minxuan; Zhou, Junsheng; Qu, Weiguang; Xue, Nianwen, 2021, "Chinese Abstract Meaning Representation 2.0", https://hdl.handle.net/11272.1/AB2/LVQEZJ, Abacus Data Network, V1
Abstract Introduction Chinese Abstract Meaning Representation (CAMR) 2.0 was developed by Brandeis University and Nanjing Normal University and is comprised of semantic representations of a set of approximately 20,000 Chinese sentences from Chinese Treebank (CTB) 8.0 (LDC2013T21)...
Sep 3, 2021
Agarwal, Nitin; Francini, Michelle; Kappler, Michelle; Micciulla, Linnea; Pradhan, Sameer; Ramshaw, Lance, 2021, "BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/DXWM3B, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies and consists of co-reference annotation on Egyptian Arabic discussion forum (DF), SMS/Chat and conversational tele...
Sep 2, 2021
Mena, Carlos Daniel Hernández, 2021, "LibriVox Spanish", https://hdl.handle.net/11272.1/AB2/AHBO1C, Abacus Data Network, V1
Abstract Introduction LibriVox Spanish consists of approximately 73 hours of Spanish read speech and transcripts. The audio data was taken from Spanish audiobooks developed by LibriVox, a non-profit project that creates audiobooks from public domain works. The transcripts were de...
Sep 2, 2021
Ding, Hongwei; Liao, Sishi; Zhan, Yuqing; Yuan, Jiahong; Liberman, Mark, 2021, "Global TIMIT Mandarin Chinese", https://hdl.handle.net/11272.1/AB2/2CCXH8, Abacus Data Network, V1
Abstract Introduction Global TIMIT Mandarin Chinese was developed by the Linguistic Data Consortium and Shanghai Jiao Tong University and consists of approximately five hours of read speech and transcripts in Mandarin Chinese. The Global TIMIT project aimed to create a series of...
Sep 2, 2021
Beijing Magic Data Technology Co., 2021, "Magic Data Chinese Mandarin Conversational Speech", https://hdl.handle.net/11272.1/AB2/M4T1CO, Abacus Data Network, V1
Abstract Introduction Magic Data Chinese Mandarin Conversational Speech was developed by Beijing Magic Data Technology Co., Ltd. and consists of approximately 10 hours of Mandarin conversational speech from 60 speakers. Each conversation was recorded on multiple devices and is pr...
Sep 2, 2021
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2021, "TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017", https://hdl.handle.net/11272.1/AB2/DAW97M, Abacus Data Network, V1
Abstract Introduction TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP Entity Discovery and Linking (EDL) tasks in 2016...
Sep 2, 2021
Maamouri, Mohamed; Bies, Ann; Kulick, Seth; Krouna, Sondos; Tabassi, Dalila; Ciul, Michael, 2021, "BOLT Egyptian Arabic Treebank - Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/D9JRBV, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic Treebank - Conversational Telephone Speech was developed by the Linguistic Data Consortium (LDC) and consists of Egyptian Arabic conversational telephone speech data with part-of-speech annotation, morphology, gloss and syntactic tree an...
Sep 2, 2021
Agarwal, Nitin; Francini, Michelle; Kappler, Michelle; Micciulla, Linnea; Pradhan, Sameer; Ramshaw, Lance, 2021, "BOLT Chinese Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/LVUADW, Abacus Data Network, V1
Abstract Introduction BOLT Chinese Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies and consists of co-reference annotation on Chinese discussion forum (DF), SMS/Chat and conversational telephone speech (CT...
Sep 2, 2021
Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2021, "BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training", https://hdl.handle.net/11272.1/AB2/XACS3U, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training was developed by the Linguistic Data Consortium (LDC) and consists of 349,414 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate word relations. The DA...
Jun 14, 2021
Brandschain, Linda; Walker, Kevin; Graff, David; Cieri, Christopher; Neely, Abby; Mirghafori, Nikki; Peskin, Barbara; Godfrey, Jack; Strassel, Stephanie; Goodman, Fred; Doddington, George R.; King, Mike, 2021, "Mixer 4 and 5 Speech", https://hdl.handle.net/11272.1/AB2/LU0TQ8, Abacus Data Network, V1
Abstract Introduction Mixer 4 and 5 Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 14,185 hours of audio recordings of conversational telephone speech, interviews, elicitation exercises and transcript readings involving 616 distinct...
Jun 11, 2021
Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Le, Hanh; Malyska, Nicolas; Melot, Jennifer; Phillips, Josh; Ray, Jessica; Roomi, Bergul; Rytting, Anton; Strahan, Tania E., 2019, "IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b", https://hdl.handle.net/11272.1/AB2/U1H3H7, Abacus Data Network, V1
Abstract Introduction IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Amharic conversational and scripted telephone speech collect...
Jun 9, 2021
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2020, "TAC KBP English Event Argument - Training and Evaluation Data 2014-2015", https://hdl.handle.net/11272.1/AB2/TTCGFJ, Abacus Data Network, V1
Abstract Introduction TAC KBP English Event Argument - Training and Evaluation Data 2014-2015 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the 2014 TAC KBP English Event Argument Extraction Pilot and Evalua...
Jun 9, 2021
Simpson, Heather; Strassel, Stephanie; Wright, Jonathan; Griffitt, Kira, 2020, "Machine Reading Phase 1 IC Training Data", https://hdl.handle.net/11272.1/AB2/7GZ3YJ, Abacus Data Network, V1
Abstract Introduction Machine Reading Phase 1 IC Training Data was developed by the Linguistic Data Consortium and contains 248 English source documents and 116 standoff annotation files used in the DARPA (Defense Advanced Research Projects Agency) Machine Reading program. The Ma...
Jun 9, 2021
Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2020, "BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training", https://hdl.handle.net/11272.1/AB2/ZZOGLK, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training was developed by the Linguistic Data Consortium (LDC) and consists of 153,171 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate...
Jun 9, 2021
Li, Bin; Yin, Siqi; Xu, Jie; Song, Li; Feng, Minxuan, 2020, "Chinese CogBank", https://hdl.handle.net/11272.1/AB2/XQKHRG, Abacus Data Network, V1
Abstract Introduction Chinese CogBank is a database of cognitive properties of Chinese words intended for use in metaphor understanding and generation. It consists of 232,497 "word-property" pairs, which are comprised of 83,104 words and 100,195 properties. Each "word-property" t...
Jun 9, 2021
Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2021, "BOLT English Treebank - SMS/Chat", https://hdl.handle.net/11272.1/AB2/TMECTL, Abacus Data Network, V1
Abstract Introduction BOLT English Treebank - SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of English SMS and text chat data with part-of-speech and syntactic structure annotation. The DARPA BOLT (Broad Operational Language Translation) program deve...
Jun 9, 2021
Mansour, Saab; Haider, Batool, 2021, "ATIS - Seven Languages", https://hdl.handle.net/11272.1/AB2/1TL7TE, Abacus Data Network, V1
Abstract Introduction ATIS - Seven Languages was developed by Amazon Web Services, Inc. and consists of 5,871 English utterances from ATIS (Air Travel Information Services) corpora, specifically ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26), tran...
Jun 9, 2021
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2021, "LORELEI Akan Representative Language Pack", https://hdl.handle.net/11272.1/AB2/78MZYO, Abacus Data Network, V1
Abstract Introduction LORELEI Akan Representative Language Pack consists of Akan monolingual text, Akan-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LORELEI (Lo...
Dec 17, 2019
Gallardo, Laura Fernández, 2019, "Nautilus Speaker Characterization", https://hdl.handle.net/11272.1/AB2/JR6VMZ, Abacus Data Network, V1
Nautilus Speaker Characterization was developed at the Technical University of Berlin and is comprised of approximately 155 hours of conversational speech from 300 German speakers aged 18 to 35 years (126 males and 174 females) with no marked dialect or accent, recorded in an aco...
Nov 15, 2019
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017", https://hdl.handle.net/11272.1/AB2/KQWRTL, Abacus Data Network, V1
TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 was developed by the Linguistic Data Consortium (LDC) and contains Chinese, English and Spanish data produced in support of the TAC KBP Cold Start evaluation track conducted from 2012 to 2017. This includes source docum...
Nov 15, 2019
Tracey, Jennifer; Arrigo, Michael; Strassel, Stephanie, 2019, "DEFT English Committed Belief Annotation", https://hdl.handle.net/11272.1/AB2/WY5NZN, Abacus Data Network, V1
DEFT English Committed Belief Annotation was developed by the Linguistic Data Consortium (LDC) and consists of approximately 950,000 words of English discussion forum text annotated for “committed belief,” which marks the level of commitment displayed by the author to the truth o...
Nov 15, 2019
Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND American English-Non-Southern Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/OBLYDI, Abacus Data Network, V1
CALLFRIEND American English-Non-Southern Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of non-Southern dialects of American English. This second edi...
Oct 15, 2019
Szwelnik, Tomasz; Kawalec, Jacek; Gutowska, Dorota, 2019, "Polish Speech Database", https://hdl.handle.net/11272.1/AB2/GNGZEI, Abacus Data Network, V1
Polish Speech Database was developed by VoiceLab. It consists of 263,424 utterances of Polish speech data from 200 speakers, totaling approximately 280 hours, and corresponding transcripts. Data collection was performed in Poland. Speakers were asked to record themselves for at l...
Oct 15, 2019
Greenberg, Craig; Sadjadi, Omid; Kheyrkhah, Timothee; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Graff, David, 2019, "2016 NIST Speaker Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/WJ2G5L, Abacus Data Network, V1
2016 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 340 hours of short segments of Tagalog, Cantonese, Cebuano and Mandarin telephone speech us...
Oct 15, 2019
Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2019, "BOLT English Treebank - Discussion Forum", https://hdl.handle.net/11272.1/AB2/9OA0DB, Abacus Data Network, V1
BOLT English Treebank - Discussion Forum was developed by the Linguistic Data Consortium (LDC) and consists of English web discussion forum data with part-of-speech and syntactic structure annotations. The DARPA BOLT (Broad Operational Language Translation) program developed mach...
Sep 16, 2019
Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2019, "BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training", https://hdl.handle.net/11272.1/AB2/TJO8RI, Abacus Data Network, V1
BOLT Chinese-English Word Alignment and Tagging – SMS/Chat Training was developed by the Linguistic Data Consortium (LDC) and consists of 388,027 words of Chinese and English parallel text enhanced with linguistic tags to indicate word relations. The DARPA BOLT (Broad Operational...
Sep 16, 2019
Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND Canadian French Second Edition", https://hdl.handle.net/11272.1/AB2/PPNHVC, Abacus Data Network, V1
CALLFRIEND Canadian French Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Canadian French. This second edition updates the audio files to wav format, simp...
Sep 16, 2019
Simpson, Heather; Strassel, Stephanie; Wright, Jonathan; Griffitt, Kira, 2019, "Machine Reading Phase 1 NFL Scoring Training Data", https://hdl.handle.net/11272.1/AB2/AZSUUC, Abacus Data Network, V1
Machine Reading Phase 1 NFL Scoring Training Data was developed by the Linguistic Data Consortium (LDC) and contains 110 US NFL (National Football League) scoring source documents and 110 standoff annotation files used in the DARPA (Defense Advanced Research Projects Agency) Mach...
Aug 15, 2019
Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2019, "Multi-Language Conversational Telephone Speech 2011 -- East Asian", https://hdl.handle.net/11272.1/AB2/3MKZES, Abacus Data Network, V1
Multi-Language Conversational Telephone Speech 2011 – East Asian was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 19 hours of telephone speech in two distinct languages of East Asia: Thai and Lao. The data were collected primarily to support...
Aug 15, 2019
Adams, Nikki; Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kaiser-Schatzlein, Alice; Kazi, Michael; Malyska, Nicolas; Melot, Jennifer; Onaka, Akiko; Paget, Shelley; Ray, Jessica; Richardson, Fred; Rytting, Anton; Shen, Sinney, 2019, "IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c", https://hdl.handle.net/11272.1/AB2/39RDNJ, Abacus Data Network, V1
ARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 207 hours of Igbo conversational and scripted telephone speech collected in 2014 and 2015 along wit...
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact Abacus Data Network Support

Abacus Data Network Support

Please fill this out to prove you are not a robot.

+ =