Skip to main content
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

151 to 200 of 403 Results
Mar 18, 2022
Jiang, Yue; Zhan, Juhong; Han, Hongjian; Xu, Zuohao; Zhou, Haiyan; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Mandarin Chinese-Guanzhong Dialect", https://hdl.handle.net/11272.1/AB2/MFTAUQ, Abacus Data Network, V1
Abstract Introduction Global TIMIT Mandarin Chinese-Guanzhong Dialect was developed by the Linguistic Data Consortium and Xi'an Jiaotong University and consists of approximately five hours of read speech and transcripts in the Guanzhong dialect of Mandarin Chinese as spoken in Sh...
Mar 18, 2022
Ding, Hongwei; Liao, Sishi; Zhan, Yuqing; Feng, Hui; He, Wenchao; Hu, Xiaoyan; Wu, Yu; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Learner Simple English", https://hdl.handle.net/11272.1/AB2/NMUWWH, Abacus Data Network, V1
Abstract Introduction Global TIMIT Learner Simple English was developed by the Linguistic Data Consortium and Shanghai Jiao Tong University and consists of approximately 12 hours of L1 and L2 English read speech and transcripts. The Global TIMIT project aimed to create a series o...
Mar 18, 2022
Luan, Huan; Wang, Yanhong; Feng, Hui; He, Wenchao; Hu, Xiaoyan; Wu, Yu; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Learner Treebank English", https://hdl.handle.net/11272.1/AB2/A2ZRDI, Abacus Data Network, V1
Abstract Introduction Global TIMIT Learner Treebank English was developed by the Linguistic Data Consortium and LAIX Inc. and consists of approximately 24 hours of L1 and L2 English read speech and transcripts. The Global TIMIT project aimed to create a series of corpora in a var...
Mar 18, 2022
Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2022, "CALLFRIEND American English-Southern Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/O0EZK5, Abacus Data Network, V1
Abstract Introduction CALLFRIEND American English-Southern Dialect Second Edition was developed by LDC and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Southern dialects of American English. This second edition updates the au...
Mar 18, 2022
Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2022, "CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/AT8NRM, Abacus Data Network, V1
Abstract Introduction CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 27 hours of unscripted telephone conversations between native speakers of the Taiwan dialect of Mandarin Chinese. Th...
Mar 18, 2022
Chen, Song; Yuan, Jiahong; Ma, Xiaoyi; Strassel, Stephanie, 2022, "Chinese Lexical Resources for Gender, Number, Animacy", https://hdl.handle.net/11272.1/AB2/2CSZDM, Abacus Data Network, V1
Abstract Introduction Chinese Lexical Resources for Gender, Number, Animacy was developed by the Linguistic Data Consortium (LDC) and consists of gender, number, and animacy lexicons produced in support of the DARPA DEFT program. Gender, number and animacy are lexical indicators...
Mar 18, 2022
Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2022, "GALE Phase 4 Chinese Broadcast News Transcripts", https://hdl.handle.net/11272.1/AB2/TVASI8, Abacus Data Network, V1
Abstract Introduction GALE Phase 4 Chinese Broadcast News Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 134 hours of Chinese broadcast news speech collected in 2008 by LDC and Hong University of Science and Technolo...
Mar 18, 2022
Hirschberg, Julia; Gravano, Agustin; Benus, Stefan; Ward, Gregory; Sneed German, Elisa, 2022, "Columbia Games Corpus", https://hdl.handle.net/11272.1/AB2/TPZYOR, Abacus Data Network, V1
Abstract Introduction Columbia Games Corpus was developed by the Spoken Language Group, Columbia University and the Department of Linguistics, Northwestern University. It consists of approximately 10 hours of spontaneous English conversation along with corresponding orthographic...
Mar 18, 2022
Mohammadi, Ariana Negar, 2022, "Corpus of Law, Academic, and News", https://hdl.handle.net/11272.1/AB2/VMWYC0, Abacus Data Network, V1
Abstract Introduction Corpus of Law, Academic, and News consists of 400 Persian documents divided into three genres: legal, academic, and news. The legal section contains texts from official publications, including the civil penal code, the criminal penal code, and the constituti...
Mar 18, 2022
Kroch, Anthony, 2022, "Penn Parsed Corpora of Historical English", https://hdl.handle.net/11272.1/AB2/NWMKHI, Abacus Data Network, V1
Abstract Introduction Penn Parsed Corpora of Historical English was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the First World War (19...
Mar 18, 2022
Jiang, Yue; Zhan, Juhong; Han, Hongjian; Xu, Zuohao; Zhou, Haiyan; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Mandarin Chinese-Guanzhong Dialect", https://hdl.handle.net/11272.1/AB2/FF5DX5, Abacus Data Network, V1
Abstract Introduction Global TIMIT Mandarin Chinese-Guanzhong Dialect was developed by the Linguistic Data Consortium and Xi'an Jiaotong University and consists of approximately five hours of read speech and transcripts in the Guanzhong dialect of Mandarin Chinese as spoken in Sh...
Mar 18, 2022
Bills, Aric; Conners, Thomas; David, Anne; Cruz, Luanne Dela; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Le, Hanh; Malyska, Nicolas; Melot, Jennifer; Ray, Jessica; Richardson, Fred; Rytting, Anton; Zwanenburg, Jacqui, 2022, "IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b", https://hdl.handle.net/11272.1/AB2/BBDKDK, Abacus Data Network, V1
Abstract Introduction IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Javanese conversational and scripted telephone speech colle...
Mar 18, 2022
Andresen, Lucy; Bills, Aric; Brugman, Claudia; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Le, Hanh; Malyska, Nicolas; Maurillo, Arlene; Melot, Jennifer; Paget, Shelley; Prebble, Jane Elizabeth; Ray, Jessica; Richardson, Fred; Rytting, Anton; Shen, Sinney, 2022, "IARPA Babel Guarani Language Pack IARPA-babel305b-v1.0c", https://hdl.handle.net/11272.1/AB2/C2XGCW, Abacus Data Network, V1
Abstract Introduction IARPA Babel Guarani Language Pack IARPA-babel305b-v1.0c was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 198 hours of Guarani conversational and scripted telephone speech collect...
Mar 18, 2022
Benowitz, Daniel; Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Hefright, Brook; Le, Hanh; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Shen, Sinney; Smith, Rosanna; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2022, "IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b", https://hdl.handle.net/11272.1/AB2/5MR7Z2, Abacus Data Network, V1
Abstract Introduction IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 210 hours of Lithuanian conversational and scripted telephone speech c...
Mar 18, 2022
Consortium, Linguistic Data, 2022, "2007 CoNLL Shared Task - Arabic & English", https://hdl.handle.net/11272.1/AB2/X7AEOJ, Abacus Data Network, V1
Abstract Introduction 2007 CoNLL Shared Task - Arabic & English consists of dependency treebanks in two languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are Arabic and English. LD...
Mar 18, 2022
Country, University of the Basque; Catalunya, Technical University of; University, Charles; University, Middle East Technical; University, Sabanci, 2022, "2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish", https://hdl.handle.net/11272.1/AB2/R8ZR6Q, Abacus Data Network, V1
Abstract Introduction 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Basq...
Mar 18, 2022
Bills, Aric; Conners, Thomas; Corris, Miriam; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Heighway, Melanie; Kozlov, Kirill; Malyska, Nicolas; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2022, "IARPA Babel Tok Pisin Language Pack IARPA-babel207b-v1.0e", https://hdl.handle.net/11272.1/AB2/CTDWII, Abacus Data Network, V1
Abstract Introduction IARPA Babel Tok Pisin Language Pack IARPA-babel207b-v1.0e was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 200 hours of Tok Pisin conversational and scripted telephone speech col...
Mar 18, 2022
Bills, Aric; Conners, Thomas; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Heighway, Melanie; Lin, Willa; Melot, Jennifer; Paget, Shelley; Ray, Jessica; Roomi, Bergul; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne; Zwanenburg, Jacqui, 2022, "IARPA Babel Kurmanji Kurdish Language Pack IARPA-babel205b-v1.0a", https://hdl.handle.net/11272.1/AB2/HRUQMM, Abacus Data Network, V1
Abstract Introduction IARPA Babel Kurmanji Kurdish Language Pack IARPA-babel205b-v1.0a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 203 hours of Kurmanji Kurdish conversational and scripted teleph...
Mar 18, 2022
Adams, Nikki; Bills, Aric; Conners, Thomas; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Lin, Willa; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne; Wong, Jamie, 2022, "IARPA Babel Zulu Language Pack IARPA-babel206b-v0.1e", https://hdl.handle.net/11272.1/AB2/SJQNLO, Abacus Data Network, V1
Abstract Introduction IARPA Babel Zulu Language Pack IARPA-babel206b-v0.1e was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 211 hours of Zulu conversational and scripted telephone speech collected in...
Mar 18, 2022
Andrus, Tony; Bills, Aric; Conners, Thomas; Crabb, Erin Smith; Dubinski, Eyal; Fiscus, Jonathan G.; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Le, Hanh; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2022, "IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b", https://hdl.handle.net/11272.1/AB2/O4K5VU, Abacus Data Network, V1
Abstract Introduction IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 203 hours of Haitian Creole conversational and scripted telephone...
Mar 18, 2022
Richie, Carolyn; Warburton, Sarah; Carter, Megan, 2022, "Audiovisual Database of Spoken American English", https://hdl.handle.net/11272.1/AB2/8KIBXB, Abacus Data Network, V1
Abstract Introduction The Audiovisual Database of Spoken American English, Linguistic Data Consortium (LDC) catalog number LDC2009V01 and isbn 1-58563-496-4, was developed at Butler University, Indianapolis, IN in 2007 for use by a a variety of researchers to evaluate speech prod...
Mar 18, 2022
Fung, Pascale; Huang, Shudong; Graff, David, 2022, "HKUST Mandarin Telephone Transcript Data, Part 1", https://hdl.handle.net/11272.1/AB2/UOHG3I, Abacus Data Network, V1
Abstract Introduction HKUST Mandarin Telephone Transcript Data Part 1 was developed by Hong Kong University of Science and Technology (HKUST) and contains transcripts for 897 telephone conversations in Mandarin Chinese. In 2004 HKUST was contracted to collect and transcribe 200 h...
Mar 18, 2022
Fung, Pascale; Huang, Shudong; Graff, David, 2022, "HKUST Mandarin Telephone Speech, Part 1", https://hdl.handle.net/11272.1/AB2/TKM8OR, Abacus Data Network, V1
Abstract Introduction HKUST Mandarin Telephone Speech, Part 1 was developed by Hong Kong University of Science and Technology (HKUST) and contains approximately 149 hours of conversational telephone speech (CTS) in Mandarin. Given that Standard Mandarin is not the native dialect...
Feb 7, 2022
Tracey, Jennifer; Graff, David; Strassel, Stephanie; Arrigo, Michael; Wright, Jonathan; Bies, Ann, 2022, "LORELEI Kinyarwanda Incident Language Pack", https://hdl.handle.net/11272.1/AB2/P1OIX0, Abacus Data Network, V1
Abstract Introduction LORELEI Kinyarwanda Incident Language Pack was developed by the Linguistic Data Consortium and is comprised of approximately 11.9 million words of Kinyarwanda monolingual text, 35,000 words of English monolingual text, 3.4 million words of parallel and compa...
Feb 7, 2022
Byers, Frederick, 2022, "2017 NIST OpenSAT Pilot - SSSF", https://hdl.handle.net/11272.1/AB2/PTU0AQ, Abacus Data Network, V1
Abstract Introduction 2017 NIST OpenSAT Pilot - SSSF was developed by NIST (National Institute of Standards and Technology) and contains approximately one hour of operational speech data, transcripts and annotation files used in the speech activity detection, automatic speech rec...
Feb 7, 2022
Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2022, "BOLT English Translation Treebank - Chinese SMS/Chat", https://hdl.handle.net/11272.1/AB2/JBOOKU, Abacus Data Network, V1
Abstract Introduction BOLT English Translation Treebank - Chinese SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of SMS and chat text data translated from Chinese to English and annotated for part-of-speech and syntactic structure. The DARPA BOLT (Bro...
Jan 24, 2022
Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2017, "GALE Phase 3 Arabic Broadcast News Transcripts Part 2", https://hdl.handle.net/11272.1/AB2/VM5MOD, Abacus Data Network, V2
Introduction GALE Phase 3 Arabic Broadcast News Transcripts Part 2 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 128 hours of Arabic broadcast news speech collected in 2007 by the Linguistic Data Consortium (LDC), MediaNet, Tun...
Dec 2, 2021
Palmer, Martha; Hwang, Jena D.; Mansouri, Aous; Bonial, Claire; O'Gorman, Tim; Gung, James, 2021, "BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/YS81IR, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank annotation on Egyp...
Dec 2, 2021
Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2021, "Second DIHARD Challenge Development - Eleven Sources", https://hdl.handle.net/11272.1/AB2/CBFPZO, Abacus Data Network, V1
Abstract Introduction Second DIHARD Challenge Development - Eleven Sources was developed by LDC and contains approximately 22 hours of English and Chinese speech data along with corresponding annotations used in support of the Second DIHARD Challenge. The DIHARD Challenges are a...
Nov 18, 2021
Maamouri, Mohamed; Bies, Ann; Kulick, Seth; Krouna, Sondos; Tabassi, Dalila; Ciul, Michael, 2021, "BOLT Egyptian Arabic Treebank - SMS/Chat", https://hdl.handle.net/11272.1/AB2/1DSLOX, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic Treebank - SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of Egyptian Arabic SMS/Chat data with part-of-speech annotation, morphology, and syntactic tree annotation. The DARPA BOLT (Broad Operational Language...
Nov 18, 2021
Keating, Patricia; Kreiman, Jody; Alwan, Abeer; Chong, Adam; Lee, Yoonjeong, 2021, "UCLA Speaker Variability Database", https://hdl.handle.net/11272.1/AB2/CIIVXT, Abacus Data Network, V1
Abstract Introduction UCLA Speaker Variability Database was developed by UCLA Speech Processing and Auditory Perception Laboratory and is comprised of approximately 34 hours of English speech and orthographic transcripts. This corpus was designed to sample variability in speaking...
Oct 26, 2021
Godfrey, John J.; Holliman, Edward, 2021, "Switchboard-1 Release 2", https://hdl.handle.net/11272.1/AB2/VTPSCK, Abacus Data Network, V1
Abstract Introduction The Switchboard-1 Telephone Speech Corpus (LDC97S62) consists of approximately 260 hours of speech and was originally collected by Texas Instruments in 1990-1, under DARPA sponsorship. The first release of the corpus was published by NIST and distributed by...
Oct 14, 2021
Mena, Carlos Daniel Hernández; Ruiz, Iván Vladimir Meza, 2021, "Wikipedia Spanish Speech and Transcripts", https://hdl.handle.net/11272.1/AB2/L05NFF, Abacus Data Network, V1
Abstract Introduction Wikipedia Spanish Speech and Transcripts consists of approximately 25 hours of Spanish read speech and transcripts. The read text was taken from the Spanish version of WikiProject Spoken Wikipedia, referred to as Wikipedia Grabada. The transcripts were devel...
Oct 14, 2021
Tracey, Jennifer; Delgado, Dana; Chen, Song; Strassel, Stephanie, 2021, "BOLT Egyptian Arabic SMS/Chat Parallel Training Data", https://hdl.handle.net/11272.1/AB2/WXML9A, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic SMS/Chat Parallel Training Data was developed by the Linguistic Data Consortium (LDC) and consists of approximately 723,000 tokens of Egyptian Arabic SMS/Chat data collected for the DARPA BOLT program along with their corresponding Engli...
Oct 14, 2021
Alsheddi, Abeer, 2021, "Classical Arabic Dictionary", https://hdl.handle.net/11272.1/AB2/FQ7PIS, Abacus Data Network, V1
Abstract Introduction Classical Arabic Dictionary consists of approximately one hundred million words of Arabic collected from texts dating between 431 and 1104 CE, principally books and essays, along with word occurrences, source documents and related metadata. Data The dictiona...
Oct 1, 2021
Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Lim, Lynn-Li; Malyska, Nicolas; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Shen, Sinney; Smith, Rosanna, 2021, "IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b", https://hdl.handle.net/11272.1/AB2/IFBL6A, Abacus Data Network, V1
Abstract Introduction IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Halh Mongolian conversational and scripted telephone speec...
Sep 29, 2021
Andresen, Jess; Bills, Aric; Conners, Thomas; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Kozlov, Kirill; Malyska, Nicolas; Melot, Jennifer; Morrison, Michelle; Phillips, Josh; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne; Wong, Jamie, 2021, "IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d", https://hdl.handle.net/11272.1/AB2/TNSSDU, Abacus Data Network, V2
Abstract Introduction IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 350 hours of Swahili conversational and scripted telephone speech collect...
Sep 29, 2021
Tracey, Jennifer; Graff, David; Strassel, Stephanie; Arrigo, Michael; Wright, Jonathan; Bies, Ann, 2021, "LORELEI Oromo Incident Language Pack", https://hdl.handle.net/11272.1/AB2/EH7NXF, Abacus Data Network, V1
Abstract Introduction LORELEI Oromo Incident Language Pack was developed by the Linguistic Data Consortium and is comprised of approximately 3.9 million words of Oromo monolingual text, 25,000 words of English monolingual text, 135,000 words of parallel and comparable Oromo-Engli...
Sep 3, 2021
Neergaard, Karl David; Xu, Hongzhi; Huang, Chu-Ren, 2021, "Database of Word Level Statistics - Mandarin", https://hdl.handle.net/11272.1/AB2/VJDPA0, Abacus Data Network, V1
Abstract Introduction Database of Word Level Statistics - Mandarin was developed by The Hong Kong Polytechnic University. It provides lexical characteristics of a descriptive and statistical nature for words and nonwords of Mandarin Chinese. It is designed for researchers particu...
Sep 3, 2021
Knight, Kevin; Badarau, Bianca; Baranescu, Laura; Bonial, Claire; Bardocz, Madalina; Griffitt, Kira; Hermjakob, Ulf; Marcu, Daniel; Palmer, Martha; O'Gorman, Tim; Schneider, Nathan, 2021, "Abstract Meaning Representation (AMR) Annotation Release 3.0", https://hdl.handle.net/11272.1/AB2/82CVJF, Abacus Data Network, V1
Abstract Introduction Abstract Meaning Representation (AMR) Annotation Release 3.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Ins...
Sep 3, 2021
Sluyter-Gaethje, Henny; Bourgonje, Peter; Stede, Manfred, 2021, "Penn Discourse Treebank Version 2.0 - German Translation", https://hdl.handle.net/11272.1/AB2/1AXWBN, Abacus Data Network, V1
Abstract Introduction Penn Discourse Treebank Version 2.0 - German Translation was developed at the University of Potsdam's Applied Computational Linguistics group and consists of approximately one million tokens derived from Penn Discourse Treebank Version 2.0 (LDC2008T05). This...
Sep 3, 2021
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2021, "TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010", https://hdl.handle.net/11272.1/AB2/VAZOSD, Abacus Data Network, V1
Abstract Introduction TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010 was developed by the Linguistic Data Consortium and contains training and evaluation data produced in support of the 2010 TAC KBP Surprise Slot Filling track, the only y...
Sep 3, 2021
Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2021, "TAC KBP English Sentiment Slot Filling -- Comprehensive Training and Evaluation Data 2013-2014", https://hdl.handle.net/11272.1/AB2/MRZALN, Abacus Data Network, V1
Abstract Introduction TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010 was developed by the Linguistic Data Consortium and contains training and evaluation data produced in support of the 2013 and 2014 TAC KBP Sentiment Slot Filling tracks....
Sep 3, 2021
Daza, Angel; Frank, Anette, 2021, "X-SRL: Parallel Cross-lingual Semantic Role Labeling", https://hdl.handle.net/11272.1/AB2/DNOJP9, Abacus Data Network, V1
Abstract Introduction X-SRL: Parallel Cross-lingual Semantic Role Labeling was developed by Heidelberg University, Department of Computational Linguistics and the Leibniz Institute for the German Language (IDS). It consists of approximately three million words of German, French a...
Sep 3, 2021
Arase, Yuki; Tsujii, Junichi, 2021, "ESPADA", https://hdl.handle.net/11272.1/AB2/ANSK9Z, Abacus Data Network, V1
Abstract Introduction ESPADA (Extended Syntactic Phrase Alignment DAtaset) consists of annotated parse trees and alignment on English sentential paraphrases extracted from machine translation evaluation corpora. It extends SPADE (LDC2018T09) by adding new annotated data for train...
Sep 3, 2021
Tracey, Jennifer; Delgado, Dana; Chen, Song; Strassel, Stephanie, 2021, "BOLT Chinese SMS/Chat Parallel Training Data", https://hdl.handle.net/11272.1/AB2/O3JTA9, Abacus Data Network, V1
Abstract Introduction BOLT Chinese SMS/Chat Parallel Training Data was developed by the Linguistic Data Consortium and consists of approximately 1.8 million tokens of Chinese SMS/Chat data collected for the DARPA BOLT program along with their corresponding English translations Th...
Sep 3, 2021
Li, Bin; Xiao, Liming; Liu, Yihuan; Wen, Yuan; Song, Li; Chun, Jayeol; Feng, Minxuan; Zhou, Junsheng; Qu, Weiguang; Xue, Nianwen, 2021, "Chinese Abstract Meaning Representation 2.0", https://hdl.handle.net/11272.1/AB2/LVQEZJ, Abacus Data Network, V1
Abstract Introduction Chinese Abstract Meaning Representation (CAMR) 2.0 was developed by Brandeis University and Nanjing Normal University and is comprised of semantic representations of a set of approximately 20,000 Chinese sentences from Chinese Treebank (CTB) 8.0 (LDC2013T21)...
Sep 3, 2021
Agarwal, Nitin; Francini, Michelle; Kappler, Michelle; Micciulla, Linnea; Pradhan, Sameer; Ramshaw, Lance, 2021, "BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/DXWM3B, Abacus Data Network, V1
Abstract Introduction BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies and consists of co-reference annotation on Egyptian Arabic discussion forum (DF), SMS/Chat and conversational tele...
Sep 2, 2021
Mena, Carlos Daniel Hernández, 2021, "LibriVox Spanish", https://hdl.handle.net/11272.1/AB2/AHBO1C, Abacus Data Network, V1
Abstract Introduction LibriVox Spanish consists of approximately 73 hours of Spanish read speech and transcripts. The audio data was taken from Spanish audiobooks developed by LibriVox, a non-profit project that creates audiobooks from public domain works. The transcripts were de...
Sep 2, 2021
Ding, Hongwei; Liao, Sishi; Zhan, Yuqing; Yuan, Jiahong; Liberman, Mark, 2021, "Global TIMIT Mandarin Chinese", https://hdl.handle.net/11272.1/AB2/2CCXH8, Abacus Data Network, V1
Abstract Introduction Global TIMIT Mandarin Chinese was developed by the Linguistic Data Consortium and Shanghai Jiao Tong University and consists of approximately five hours of read speech and transcripts in Mandarin Chinese. The Global TIMIT project aimed to create a series of...
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact Abacus Data Network Support

Abacus Data Network Support

Please fill this out to prove you are not a robot.

+ =