201 to 250 of 399 Results
Sep 2, 2021
Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2021, "BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training", https://hdl.handle.net/11272.1/AB2/XACS3U, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training was developed by the Linguistic Data Consortium (LDC) and consists of 349,414 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate word relations. The DA... |
Jun 11, 2021
Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Le, Hanh; Malyska, Nicolas; Melot, Jennifer; Phillips, Josh; Ray, Jessica; Roomi, Bergul; Rytting, Anton; Strahan, Tania E., 2019, "IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b", https://hdl.handle.net/11272.1/AB2/U1H3H7, Abacus Data Network, V1
Abstract Introduction IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Amharic conversational and scripted telephone speech collect... |
Jun 9, 2021
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2020, "TAC KBP English Event Argument - Training and Evaluation Data 2014-2015", https://hdl.handle.net/11272.1/AB2/TTCGFJ, Abacus Data Network, V1
Abstract Introduction TAC KBP English Event Argument - Training and Evaluation Data 2014-2015 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the 2014 TAC KBP English Event Argument Extraction Pilot and Evalua... |
Jun 9, 2021
Simpson, Heather; Strassel, Stephanie; Wright, Jonathan; Griffitt, Kira, 2020, "Machine Reading Phase 1 IC Training Data", https://hdl.handle.net/11272.1/AB2/7GZ3YJ, Abacus Data Network, V1
Abstract Introduction Machine Reading Phase 1 IC Training Data was developed by the Linguistic Data Consortium and contains 248 English source documents and 116 standoff annotation files used in the DARPA (Defense Advanced Research Projects Agency) Machine Reading program. The Ma... |
Jun 9, 2021
Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2020, "BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training", https://hdl.handle.net/11272.1/AB2/ZZOGLK, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training was developed by the Linguistic Data Consortium (LDC) and consists of 153,171 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate... |
Jun 9, 2021
Li, Bin; Yin, Siqi; Xu, Jie; Song, Li; Feng, Minxuan, 2020, "Chinese CogBank", https://hdl.handle.net/11272.1/AB2/XQKHRG, Abacus Data Network, V1
Abstract Introduction Chinese CogBank is a database of cognitive properties of Chinese words intended for use in metaphor understanding and generation. It consists of 232,497 "word-property" pairs, which are comprised of 83,104 words and 100,195 properties. Each "word-property" t... |
Jun 9, 2021
Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2021, "BOLT English Treebank - SMS/Chat", https://hdl.handle.net/11272.1/AB2/TMECTL, Abacus Data Network, V1
Abstract Introduction BOLT English Treebank - SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of English SMS and text chat data with part-of-speech and syntactic structure annotation. The DARPA BOLT (Broad Operational Language Translation) program deve... |
Jun 9, 2021
Mansour, Saab; Haider, Batool, 2021, "ATIS - Seven Languages", https://hdl.handle.net/11272.1/AB2/1TL7TE, Abacus Data Network, V1
Abstract Introduction ATIS - Seven Languages was developed by Amazon Web Services, Inc. and consists of 5,871 English utterances from ATIS (Air Travel Information Services) corpora, specifically ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26), tran... |
Jun 9, 2021
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2021, "LORELEI Akan Representative Language Pack", https://hdl.handle.net/11272.1/AB2/78MZYO, Abacus Data Network, V1
Abstract Introduction LORELEI Akan Representative Language Pack consists of Akan monolingual text, Akan-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LORELEI (Lo... |
Dec 17, 2019
Gallardo, Laura Fernández, 2019, "Nautilus Speaker Characterization", https://hdl.handle.net/11272.1/AB2/JR6VMZ, Abacus Data Network, V1
Nautilus Speaker Characterization was developed at the Technical University of Berlin and is comprised of approximately 155 hours of conversational speech from 300 German speakers aged 18 to 35 years (126 males and 174 females) with no marked dialect or accent, recorded in an aco... |
Nov 15, 2019
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017", https://hdl.handle.net/11272.1/AB2/KQWRTL, Abacus Data Network, V1
TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 was developed by the Linguistic Data Consortium (LDC) and contains Chinese, English and Spanish data produced in support of the TAC KBP Cold Start evaluation track conducted from 2012 to 2017. This includes source docum... |
Nov 15, 2019
Tracey, Jennifer; Arrigo, Michael; Strassel, Stephanie, 2019, "DEFT English Committed Belief Annotation", https://hdl.handle.net/11272.1/AB2/WY5NZN, Abacus Data Network, V1
DEFT English Committed Belief Annotation was developed by the Linguistic Data Consortium (LDC) and consists of approximately 950,000 words of English discussion forum text annotated for “committed belief,” which marks the level of commitment displayed by the author to the truth o... |
Nov 15, 2019
Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND American English-Non-Southern Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/OBLYDI, Abacus Data Network, V1
CALLFRIEND American English-Non-Southern Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of non-Southern dialects of American English. This second edi... |
Oct 15, 2019
Szwelnik, Tomasz; Kawalec, Jacek; Gutowska, Dorota, 2019, "Polish Speech Database", https://hdl.handle.net/11272.1/AB2/GNGZEI, Abacus Data Network, V1
Polish Speech Database was developed by VoiceLab. It consists of 263,424 utterances of Polish speech data from 200 speakers, totaling approximately 280 hours, and corresponding transcripts. Data collection was performed in Poland. Speakers were asked to record themselves for at l... |
Oct 15, 2019
Greenberg, Craig; Sadjadi, Omid; Kheyrkhah, Timothee; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Graff, David, 2019, "2016 NIST Speaker Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/WJ2G5L, Abacus Data Network, V1
2016 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 340 hours of short segments of Tagalog, Cantonese, Cebuano and Mandarin telephone speech us... |
Oct 15, 2019
Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2019, "BOLT English Treebank - Discussion Forum", https://hdl.handle.net/11272.1/AB2/9OA0DB, Abacus Data Network, V1
BOLT English Treebank - Discussion Forum was developed by the Linguistic Data Consortium (LDC) and consists of English web discussion forum data with part-of-speech and syntactic structure annotations. The DARPA BOLT (Broad Operational Language Translation) program developed mach... |
Sep 16, 2019
Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND Canadian French Second Edition", https://hdl.handle.net/11272.1/AB2/PPNHVC, Abacus Data Network, V1
CALLFRIEND Canadian French Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Canadian French. This second edition updates the audio files to wav format, simp... |
Sep 16, 2019
Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2019, "BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training", https://hdl.handle.net/11272.1/AB2/TJO8RI, Abacus Data Network, V1
BOLT Chinese-English Word Alignment and Tagging – SMS/Chat Training was developed by the Linguistic Data Consortium (LDC) and consists of 388,027 words of Chinese and English parallel text enhanced with linguistic tags to indicate word relations. The DARPA BOLT (Broad Operational... |
Sep 16, 2019
Simpson, Heather; Strassel, Stephanie; Wright, Jonathan; Griffitt, Kira, 2019, "Machine Reading Phase 1 NFL Scoring Training Data", https://hdl.handle.net/11272.1/AB2/AZSUUC, Abacus Data Network, V1
Machine Reading Phase 1 NFL Scoring Training Data was developed by the Linguistic Data Consortium (LDC) and contains 110 US NFL (National Football League) scoring source documents and 110 standoff annotation files used in the DARPA (Defense Advanced Research Projects Agency) Mach... |
Aug 15, 2019
Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2019, "Multi-Language Conversational Telephone Speech 2011 -- East Asian", https://hdl.handle.net/11272.1/AB2/3MKZES, Abacus Data Network, V1
Multi-Language Conversational Telephone Speech 2011 – East Asian was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 19 hours of telephone speech in two distinct languages of East Asia: Thai and Lao. The data were collected primarily to support... |
Aug 15, 2019
Adams, Nikki; Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kaiser-Schatzlein, Alice; Kazi, Michael; Malyska, Nicolas; Melot, Jennifer; Onaka, Akiko; Paget, Shelley; Ray, Jessica; Richardson, Fred; Rytting, Anton; Shen, Sinney, 2019, "IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c", https://hdl.handle.net/11272.1/AB2/39RDNJ, Abacus Data Network, V1
ARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 207 hours of Igbo conversational and scripted telephone speech collected in 2014 and 2015 along wit... |
Aug 15, 2019
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Evaluation Source Corpora 2016-2017", https://hdl.handle.net/11272.1/AB2/JDNLHX, Abacus Data Network, V1
TAC KBP Evaluation Source Corpora 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains the 180,003 Chinese, English and Spanish source documents used in support of all TAC KBP evaluation tracks conducted in 2016 and 2017. Text Analysis Conference (TAC) is... |
Aug 15, 2019
Mohammadi, Ariana Negar, 2019, "Corpus of Conversational Persian Transcripts", https://hdl.handle.net/11272.1/AB2/VPL800, Abacus Data Network, V1
Corpus of Conversational Persian Transcripts consists of transcripts from approximately 20 hours of naturally occurring informal conversations in the Tehrani dialect of Iranian Persian. The corresponding speech is not included in this release. Data This corpus is extracted from 1... |
Jul 19, 2019
Linguistic Data Consortium, 2019, "First DIHARD Challenge Development - Eight Sources", https://hdl.handle.net/11272.1/AB2/XA6BRY, Abacus Data Network, V1
First DIHARD Challenge Development - Eight Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 17 hours of English and Chinese speech data along with corresponding annotations used in support of the First DIHARD Challenge. The First DIHARD Cha... |
Jul 15, 2019
Linguistic Data Consortium, 2019, "First DIHARD Challenge Evaluation - Nine Sources", https://hdl.handle.net/11272.1/AB2/HGTUHY, Abacus Data Network, V1
First DIHARD Challenge Evaluation - Nine Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 18 hours of English and Chinese speech data along with corresponding annotations used in support of the First DIHARD Challenge. The First DIHARD Chall... |
Jul 15, 2019
Chamberlain, Jon; Paun, Silviu; Yu, Juntao; Kruschwitz, Udo; Poesio, Massimo, 2019, "Phrase Detectives Corpus Version 2", https://hdl.handle.net/11272.1/AB2/6GWBA8, Abacus Data Network, V1
Phrase Detectives Corpus Version 2 was developed by the School of Computer Science and Electronic Engineering at the University of Essex and consists of approximately 407,000 tokens across 537 documents anaphorically-annotated by the Phrase Detectives Game, an online interactive... |
Jul 15, 2019
Qin, Xiaoyi; Liu, Xinzhong; Cai, Zexin; Li, Ming, 2019, "The DKU-JNU-EMA Electromagnetic Articulography Database", https://hdl.handle.net/11272.1/AB2/D9PQFH, Abacus Data Network, V1
The DKU-JNU-EMA Electromagnetic Articulography Database was developed by Duke Kunshan University and Jinan University and contains approximately 10 hours of articulography and speech data in Mandarin, Cantonese, Hakka, and Teochew Chinese from two to seven native speakers for eac... |
Jul 15, 2019
Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2019, "First DIHARD Challenge Evaluation - SEEDLingS", https://hdl.handle.net/11272.1/AB2/XH4KVV, Abacus Data Network, V1
First DIHARD Challenge Evaluation - SEEDLingS was developed by Duke University and the Linguistic Data Consortium (LDC) and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the First DIHARD Challenge. Th... |
Jun 17, 2019
Tracey, Jennifer; Arrigo, Michael; Strassel, Stephanie, 2019, "DEFT Spanish Committed Belief Annotation", https://hdl.handle.net/11272.1/AB2/HWOJGE, Abacus Data Network, V1
DEFT Spanish Committed Belief Annotation was developed by the Linguistic Data Consortium (LDC) and consists of approximately 67,000 tokens of Spanish discussion forum text annotated for "committed belief," which marks the level of commitment displayed by the author to the truth o... |
Jun 17, 2019
Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2019, "First DIHARD Challenge Development - SEEDLingS", https://hdl.handle.net/11272.1/AB2/KXC76R, Abacus Data Network, V1
First DIHARD Challenge Development - SEEDLingS was developed by Duke University and the Linguistic Data Consortium (LDC) and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the First DIHARD Challenge. T... |
Jun 17, 2019
Ramabhadran, Bhuvana; Gustman, Samuel; Byrne, William; Hajič, Jan; Oard, Douglas; Olsson, J. Scott; Picheny, Michael; Psutka, Josef, 2019, "USC-SFI MALACH Interviews and Transcripts English – Speech Recognition Edition", https://hdl.handle.net/11272.1/AB2/SGOMWO, Abacus Data Network, V1
USC-SFI MALACH Interviews and Transcripts English – Speech Recognition Edition, LDC Catalog Number LDC2019S11 and ISBN 1-58563-889-7, was developed by IBM as part of the MALACH (Multilingual Access to Large Spoken ArCHives) Project. This edition augments USC-SFI MALACH Interviews... |
May 15, 2019
Mena, Carlos Daniel Hernández, 2019, "CIEMPIESS Experimentation", https://hdl.handle.net/11272.1/AB2/DUUYQV, Abacus Data Network, V1
CIEMPIESS (Corpus de Investigación en Español de México del Posgrado de Ingeniería Eléctrica y Servicio Social) Experimentation was developed by the social service program "Desarrollo de Tecnologías del Habla" of the "Facultad de Ingeniería" (FI) at the National Autonomous Univer... |
May 15, 2019
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Chinese Regular Slot Filling - Comprehensive Training and Evaluation Data 2014", https://hdl.handle.net/11272.1/AB2/ZZMOPP, Abacus Data Network, V1
TAC KBP Chinese Regular Slot Filling - Comprehensive Training and Evaluation Data 2014 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP Chinese Regular Slot Filling evaluation track conducted in 201... |
May 15, 2019
Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2019, "Multi-Language Conversational Telephone Speech 2011 -- English Group", https://hdl.handle.net/11272.1/AB2/ACDWDL, Abacus Data Network, V1
Multi-Language Conversational Telephone Speech 2011 – English Group was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 18 hours of telephone speech in two general varieties of English: American and South Asian. The data were collected primaril... |
Apr 15, 2019
Li, Xuansong; Peterson, Katherine; Grimes, Stephen; Strassel, Stephanie, 2019, "BOLT Egyptian-English Word Alignment -- Discussion Forum Training", https://hdl.handle.net/11272.1/AB2/AR1QCS, Abacus Data Network, V1
BOLT Egyptian-English Word Alignment – Discussion Forum Training was developed by the Linguistic Data Consortium (LDC) and consists of 400,448 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate word relations. The DARPA BOLT (Broad Operat... |
Apr 15, 2019
Li, Bin; Wen, Yuan; Song, Li; Dai, Rubing; Qu, Weiguang; Xue, Nianwen, 2019, "Chinese Abstract Meaning Representation 1.0", https://hdl.handle.net/11272.1/AB2/TT5KRI, Abacus Data Network, V1
Chinese Abstract Meaning Representation was developed by Brandeis University and Nanjing Normal University and is comprised of semantic representations of a set of Chinese sentences from Chinese Treebank 8.0 (LDC2013T21). Abstract Meaning Representation (AMR) captures "who is doi... |
Mar 15, 2019
Prasad, Rashmi; Webber, Bonnie; Lee, Alan; Joshi, Aravind, 2019, "Penn Discourse Treebank Version 3.0", https://hdl.handle.net/11272.1/AB2/SUU9CB, Abacus Data Network, V1
Penn Discourse Treebank (PDTB) Version 3.0 is the third release in the Penn Discourse Treebank project, the goal of which is to annotate the Wall Street Journal (WSJ) section of Treebank-2 (LDC95T7) with discourse relations. Penn Discourse Treebank Version 2 (LDC2008T05) contains... |
Mar 15, 2019
Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND Egyptian Arabic Second Edition", https://hdl.handle.net/11272.1/AB2/4LCUFC, Abacus Data Network, V1
CALLFRIEND Egyptian Arabic Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 25 hours of unscripted telephone conversations between native speakers of Egyptian Arabic. This second edition updates the audio files to wav format, simp... |
Mar 15, 2019
Tracey, Jennifer; Strassel, Stephanie; Kuster, Neil, 2019, "VAST Chinese Speech and Transcripts", https://hdl.handle.net/11272.1/AB2/OE8XTX, Abacus Data Network, V1
VAST Chinese Speech and Transcripts was developed by the Linguistic Data Consortium (LDC) for the VAST (Video Annotation for Speech Technologies) project and is comprised of approximately 29 hours of Mandarin Chinese audio extracted from amateur video content harvested from the w... |
Feb 15, 2019
Tracey, Jennifer; Arrigo, Michael; Kuster, Neil; Strassel, Stephanie, 2019, "DEFT Chinese Committed Belief Annotation", https://hdl.handle.net/11272.1/AB2/EGZOQ9, Abacus Data Network, V1
DEFT Chinese Committed Belief Annotation was developed by the Linguistic Data Consortium (LDC) and consists of approximately 83,000 tokens of Chinese discussion forum text annotated for “committed belief,” which marks the level of commitment displayed by the author to the truth o... |
Feb 15, 2019
Upadhyay, Shyam; Hakkani-Tur, Dilek; Tur, Gokhan; Rastogi, Abhinav, 2019, "Multilingual ATIS", https://hdl.handle.net/11272.1/AB2/AGMWIU, Abacus Data Network, V1
Multilingual ATIS was developed by Google Inc. and consists of 5,871 utterances from ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26) annotated and translated into Hindi and Turkish. The ATIS (Air Travel Information Services) collection was develope... |
Feb 15, 2019
Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2019, "Multi-Language Conversational Telephone Speech 2011 -- Arabic Group", https://hdl.handle.net/11272.1/AB2/A5UT97, Abacus Data Network, V1
Multi-Language Conversational Telephone Speech 2011 – Arabic Group was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 117 hours of telephone speech in distinct dialects of colloquial Arabic: Iraqi, Levantine and Maghrebi. The data were collect... |
Jan 15, 2019
Richey, Colleen; D'Angelo, Cynthia; Alozie, Nonye; Bratt, Harry; Shriberg, Elizabeth, 2019, "SRI Speech-Based Collaborative Learning Corpus", https://hdl.handle.net/11272.1/AB2/YJWBEU, Abacus Data Network, V1
SRI Speech-Based Collaborative Learning Corpus was developed by SRI International and is comprised of approximately 120 hours of English speech from 134 US middle school students working collaboratively. The data set also contains orthographic transcriptions, manual annotation of... |
Jan 15, 2019
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014-2015", https://hdl.handle.net/11272.1/AB2/LCPM63, Abacus Data Network, V1
TAC KBP Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014-2015 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP Entity Discovery and Linking (EDL) tasks in 2014 and 2015... |
Jan 15, 2019
Song, Zhiyi; Tracey, Jennifer; Walker, Christopher; Stephanie, Strassel,, 2019, "BOLT Arabic Discussion Forum Parallel Training Data", https://hdl.handle.net/11272.1/AB2/CZR6SG, Abacus Data Network, V1
BOLT Arabic Discussion Forum Parallel Training Data was developed by the Linguistic Data Consortium (LDC) and consists of 1,169,599 tokens of Egyptian Arabic discussion forum data collected for the DARPA BOLT program along with their corresponding English translations. The BOLT (... |
Dec 17, 2018
Linguistic Data Consortium, 2018, "HUB5 Mandarin Telephone Speech and Transcripts Second Edition", https://hdl.handle.net/11272.1/AB2/2JAJJE, Abacus Data Network, V1
HUB5 Mandarin Telephone Speech and Transcripts Second Edition was developed by the Linguistic Data Consortium (LDC) in support of US government projects for language recognition and Large Vocabulary Conversational Speech Recognition (LVCSR). The first edition was released by LDC... |
Dec 15, 2018
Zhong, Victor; Zhang, Yuhao; Chen, Danqi; Angeli, Gabor; Manning, Christopher, 2018, "TAC Relation Extraction Dataset", https://hdl.handle.net/11272.1/AB2/SOYGGB, Abacus Data Network, V1
TAC Relation Extraction Dataset (TACRED) was developed by The Stanford NLP Group and is a large-scale relation extraction dataset with 106,264 examples built over English newswire and web text used in the NIST TAC KBP English slot filling evaluations during the period 2009-2014.... |
Nov 15, 2018
Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Hammond, Simon; Harper, Mary; Kaiser-Schatzlein, Alice; Melot, Jennifer; Paget, Shelley; Ray, Jessica; Rytting, Anton; Shen, Sinney; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2018, "IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a", https://hdl.handle.net/11272.1/AB2/OTDPUV, Abacus Data Network, V1
Introduction IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 201 hours of Telugu conversational and scripted telephone speech collected in 2013... |
Nov 15, 2018
Maamouri, Mohamed; Bies, Ann; Kulick, Seth; Krouna, Sondos; Tabassi,Dalila; Ciul, Michael, 2018, "BOLT Egyptian Arabic Treebank - Discussion Forum", https://hdl.handle.net/11272.1/AB2/CAA0JW, Abacus Data Network, V1
BOLT Egyptian Arabic Treebank – Discussion Forum was developed by the Linguistic Data Consortium (LDC) and consists of Egyptian Arabic web discussion forum data with part-of-speech annotation, morphology, gloss and syntactic tree annotation. The DARPA BOLT (Broad Operational Lang... |
Nov 15, 2018
Maciel, Alexandre M. A.; Rodrigues, Rodrigo L.; Barbosa, Danilo S., 2018, "Avatar Education Portuguese", https://hdl.handle.net/11272.1/AB2/BSQ4NP, Abacus Data Network, V1
Avatar Education Portuguese was developed by the University of Pernambuco and consists of approximately 80 minutes of Brazilian Portuguese microphone speech with phonetic and orthographic transcriptions. The data was developed for Avatar Education, an animated virtual assistant d... |