Linguistic Data Consortium

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

101 to 150 of 411 Results

LORELEI Tamil Representative Language Pack Apr 26, 2023 Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Tamil Representative Language Pack", https://hdl.handle.net/11272.1/AB2/TXXE33, Abacus Data Network, V1 Abstract Introduction LORELEI Tamil Representative Language Pack (LDC2023T03) consists of Tamil monolingual text, Tamil-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium (LDC) for the DARPA LORELEI pr...
Penn Korean Universal Dependency Treebank Apr 26, 2023 Choi, Jinho D.; Han, Na-Rae; Hwang, Jena D.; Kim, Hansaem, 2023, "Penn Korean Universal Dependency Treebank", https://hdl.handle.net/11272.1/AB2/ZW25WL, Abacus Data Network, V1 Abstract Introduction Penn Korean Universal Dependency Treebank contains 5,010 sentences and 132,041 tokens annotated in dependency format under the Universal Dependencies framework. It is a conversion of Korean Treebank Annotations Version 2.0 (LDC2006T09) which was produced in...
DEFT English Light and Rich ERE Annotation Apr 26, 2023 Chen, Song; Bies, Ann; Griffitt, Kira; Ellis, Joe; Strassel, Stephanie, 2023, "DEFT English Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/7KH7V4, Abacus Data Network, V1 Abstract Introduction DEFT English Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 1190 English discussion forum, newswire and proxy documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filt...
AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts Mar 20, 2023 Delgado, Dana; Walker, Kevin; Graff, David; Strassel, Stephanie, 2023, "AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts", https://hdl.handle.net/11272.1/AB2/CKALC2, Abacus Data Network, V1 Abstract Introduction AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 156 hours of Ukrainian conversational telephone speech (CTS) and broadcast news audio (BN) with 1.2 mi...
2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual Mar 17, 2023 Sadjadi, Omid; Greenberg, Craig; Li, Xuansong; Strassel, Stephanie, 2023, "2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual", https://hdl.handle.net/11272.1/AB2/RWQNK7, Abacus Data Network, V1 Abstract Introduction 2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 64 hours of English audio-visual data for development...
LORELEI Tagalog Representative Language Pack Mar 17, 2023 Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Tagalog Representative Language Pack", https://hdl.handle.net/11272.1/AB2/IALRRN, Abacus Data Network, V1 Abstract Introduction LORELEI Tagalog Representative Language Pack consists of Tagalog monolingual text, Tagalog-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LO...
LORELEI Swahili Representative Language Pack Mar 17, 2023 Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Swahili Representative Language Pack", https://hdl.handle.net/11272.1/AB2/RPNXXU, Abacus Data Network, V1 Abstract Introduction LORELEI Swahili Representative Language Pack consists of Swahili monolingual text, Swahili-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LO...
United Nations Proceedings Speech Feb 14, 2023 Chay, Kevin; Elizalde, Cecilia; Ziemski, Michal, 2023, "United Nations Proceedings Speech", https://hdl.handle.net/11272.1/AB2/3LTQ01, Abacus Data Network, V1 Abstract Introduction United Nations Proceedings Speech was developed by the United Nations (UN) and contains approximately 8,500 hours of recorded proceedings in the six official UN languages, Arabic, Chinese, English, French, Russian and Spanish. The data was recorded in 2009-2...
CAMIO Transcription Languages Jan 26, 2023 Arrigo, Michael; Strassel, Stephanie; Caruso, Christopher, 2023, "CAMIO Transcription Languages", https://hdl.handle.net/11272.1/AB2/IEJLCN, Abacus Data Network, V1 Abstract Introduction CAMIO Transcription Languages was developed by the Linguistic Data Consortium and contains nearly 70,000 images of machine printed text with corresponding annotations and transcripts in the following 13 languages: Arabic, Chinese, English, Farsi, Hindi, Japa...
CALLHOME Egyptian Arabic Transcripts Jan 25, 2023 Gadalla, Hassan; Kilany, Hanaa; Arram, Howaida; Yacoub, Ashraf; El-Habashi, Alaa; Shalaby, Amr; Karins, Krisjanis; Rowson, Everett; MacIntyre, Robert; Kingsbury, Paul; Graff, David; McLemore, Cynthia, 2023, "CALLHOME Egyptian Arabic Transcripts", https://hdl.handle.net/11272.1/AB2/Y03PCU, Abacus Data Network, V1 Abstract Introduction The text component of the CALLHOME Egyptian Arabic package includes transcripts and documentation files. The transcripts cover a contiguous five or ten minute segment taken from 120 unscripted telephone conversations between native speakers of Egyptian Collo...
CALLHOME Egyptian Arabic Speech Jan 25, 2023 Canavan, Alexandra; Zipperlen, George; Graff, David, 2023, "CALLHOME Egyptian Arabic Speech", https://hdl.handle.net/11272.1/AB2/J3CPAE, Abacus Data Network, V1 Abstract Introduction The CALLHOME Egyptian Arabic corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Egyptian Colloquial Arabic (ECA), the spoken variety of Arabic found in Egypt. The dialect of ECA that this dictionary repre...
GALE Phase 2 Arabic Broadcast News Transcripts Part 1 Jan 25, 2023 Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2023, "GALE Phase 2 Arabic Broadcast News Transcripts Part 1", https://hdl.handle.net/11272.1/AB2/YPCAIR, Abacus Data Network, V1 Abstract Introduction GALE Phase 2 Arabic Broadcast News Transcripts Part 1 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 165 hours of Arabic broadcast news speech collected in 2006 and 2007 by LDC, MediaNet, Tunis, Tunisia and...
GALE Phase 2 Arabic Broadcast News Speech Part 1 Jan 25, 2023 Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2023, "GALE Phase 2 Arabic Broadcast News Speech Part 1", https://hdl.handle.net/11272.1/AB2/CXPTR7, Abacus Data Network, V1 Abstract Introduction GALE Phase 2 Arabic Broadcast News Speech Part 1 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 165 hours of Arabic broadcast news speech collected in 2006 and 2007 by LDC, MediaNet, Tunis, Tunisia and MTC, Rabat, Mor...
GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2 Jan 25, 2023 Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2023, "GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2", https://hdl.handle.net/11272.1/AB2/CS2DU6, Abacus Data Network, V1 Abstract Introduction GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 128 hours of Arabic broadcast conversation speech collected in 2007 by LDC, MediaNet, Tunis, Tuni...
GALE Phase 2 Arabic Broadcast Conversation Speech Part 2 Jan 25, 2023 Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2023, "GALE Phase 2 Arabic Broadcast Conversation Speech Part 2", https://hdl.handle.net/11272.1/AB2/AJ2CAE, Abacus Data Network, V1 Abstract Introduction GALE Phase 2 Arabic Broadcast Conversation Speech Part 2 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 128 hours of Arabic broadcast conversation speech collected in 2007 by LDC, MediaNet, Tunis, Tunisia and MTC, Rab...
Third DIHARD Challenge Evaluation Jan 24, 2023 Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2023, "Third DIHARD Challenge Evaluation", https://hdl.handle.net/11272.1/AB2/VQPCKU, Abacus Data Network, V1 Abstract Introduction Third DIHARD Challenge Evaluation was developed by the Linguistic Data Consortium (LDC) and contains approximately 33 hours of English and Chinese speech data along with corresponding annotations used in support of the Third DIHARD Challenge. The DIHARD Chal...
Global TIMIT Thai Jan 24, 2023 Liberman, Mark; Yuan, Jiahong; Cieri, Christopher; Wright, Jonathan, 2023, "Global TIMIT Thai", https://hdl.handle.net/11272.1/AB2/JY8T3N, Abacus Data Network, V1 Abstract Introduction Global TIMIT Thai was developed by the Linguistic Data Consortium and consists of approximately 12 hours of read speech and time-aligned transcripts in Standard Thai. The Global TIMIT project aimed to create a series of corpora in a variety of languages with...
Third DIHARD Challenge Development Dec 8, 2022 Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2022, "Third DIHARD Challenge Development", https://hdl.handle.net/11272.1/AB2/UY5O0X, Abacus Data Network, V1 Abstract Introduction Third DIHARD Challenge Development was developed by Linguistic Data Consortium (LDC) and contains approximately 34 hours of English and Chinese speech data along with corresponding annotations used in support of the Third DIHARD Challenge. The DIHARD Challen...
BOLT English Translation Treebank - Egyptian Arabic SMS/Chat Dec 8, 2022 Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2022, "BOLT English Translation Treebank - Egyptian Arabic SMS/Chat", https://hdl.handle.net/11272.1/AB2/SPCYLS, Abacus Data Network, V1 Abstract Introduction BOLT English Translation Treebank - Egyptian Arabic SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of SMS and chat text data translated from Egyptian Arabic to English and annotated for part-of-speech and syntactic structure. The...
Hispanic-English Database Nov 30, 2022 Byrne, William; Knodt, Eva; Bernstein, Jared; Emami, Farzhad, 2022, "Hispanic-English Database", https://hdl.handle.net/11272.1/AB2/IIJZCH, Abacus Data Network, V1 Abstract Introduction Hispanic-English Database contains approximately 30 hours of English and Spanish conversational and read speech with transcripts (24 hours) and metadata collected from 22 non-native English speakers between 1996 and 1998. The corpus was developed by Entropic...
2017 NIST Language Recognition Evaluation Training and Development Sets Nov 30, 2022 Greenberg, Craig; Sadjadi, Omid; Reynolds, Douglas; Singer, Elliot; Graff, David, 2022, "2017 NIST Language Recognition Evaluation Training and Development Sets", https://hdl.handle.net/11272.1/AB2/K7LOKJ, Abacus Data Network, V1 Abstract Introduction 2017 NIST Language Recognition Evaluation Training and Development Sets contains training and development material for the 2017 NIST Language Recognition Evaluation. It consists of approximately 2,100 hours of conversational telephone speech, broadcast conve...
LORELEI Bengali Representative Language Pack Nov 29, 2022 Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Bengali Representative Language Pack", https://hdl.handle.net/11272.1/AB2/IG4DBS, Abacus Data Network, V1 Abstract Introduction LORELEI Bengali Representative Language Pack consists of Bengali monolingual text, Bengali-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LO...
Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon Nov 29, 2022 Lau, Mingfei; Zhong, Muhan; Lau, Chaak-ming; Su, Jian; Chan, Henry; Cheung, Bing, 2022, "Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon", https://hdl.handle.net/11272.1/AB2/URBMXM, Abacus Data Network, V1 Abstract Introduction Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon was developed by the Cantonese Computational Linguistics Infrastructure Working Group. It contains approximately 130,000 Cantonese character, word, and phrase entries paired with their corresponding rom...
Gulf Arabic Conversational Telephone Speech Oct 13, 2022 Appen Pty Ltd. Sydney, Australia, 2022, "Gulf Arabic Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/SCSMSJ, Abacus Data Network, V1 Abstract Introduction Gulf Arabic Conversational Telephone Speech is a database developed by Appen Pty Ltd., Sydney, Australia and contains roughly 2,800 min of spontaneous telephone conversations in Colloquial Gulf Arabic. This corpus was collected and transcribed in 2004 by App...
Iraqi Arabic Conversational Telephone Speech Oct 13, 2022 Appen Pty Ltd. Sydney, Australia, 2022, "Iraqi Arabic Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/YBQF3Y, Abacus Data Network, V1 Abstract Introduction Iraqi Arabic Conversational Telephone Speech was developed by Appen Pty Ltd, Sydney, Australia and contains roughly 3000 mins of speech from Iraqi Arabic speakers taking part in spontaneous telephone conversations in Colloquial Iraqi Arabic. This corpus was...
Gulf Arabic Conversational Telephone Speech, Transcripts Oct 13, 2022 Appen Pty Ltd. Sydney, Australia, 2022, "Gulf Arabic Conversational Telephone Speech, Transcripts", https://hdl.handle.net/11272.1/AB2/ZLBR2M, Abacus Data Network, V1 Abstract Introduction Gulf Arabic Conversational Telephone Speech, Transcripts is a database developed by Appen Pty Ltd., Sydney, Australia and contains transcripts of roughly 2,800 min of spontaneous telephone conversations in Colloquial Gulf Arabic. A total of 976 conversation...
Iraqi Arabic Conversational Telephone Speech, Transcripts Oct 13, 2022 Appen Pty Ltd. Sydney, Australia, 2022, "Iraqi Arabic Conversational Telephone Speech, Transcripts", https://hdl.handle.net/11272.1/AB2/ELQDGO, Abacus Data Network, V1 Abstract Introduction Iraqi Arabic Conversational Telephone Speech, Transcripts was developed by Appen Pty Ltd, Sydney, Australia and contains transcripts for roughly 3000 mins of speech from Iraqi Arabic speakers taking part in spontaneous telephone conversations in Colloquial I...
GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1 Oct 13, 2022 Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2022, "GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1", https://hdl.handle.net/11272.1/AB2/MZSDMN, Abacus Data Network, V1 Abstract Introduction GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 123 hours of Arabic broadcast conversation speech collected in 2006 and 2007 by LDC, MediaNet, Tu...
King Saud University Arabic Speech Database Oct 12, 2022 Alsulaiman, Mansour; Muhammad, Ghulam; Abdelkader, Bencherif Mohamed; Mahmood, Awais; Ali, Zulfiqar, 2022, "King Saud University Arabic Speech Database", https://hdl.handle.net/11272.1/AB2/4YVL4A, Abacus Data Network, V1 Abstract Introduction King Saud University Arabic Speech Database was developed by Speech Group (SG) at King Saud University and contains 590 hours of recorded Arabic speech from 269 male and female speakers. The utterances include read and spontaneous speech. The recordings were...
GALE Phase 2 Arabic Broadcast Conversation Speech Part 1 Oct 12, 2022 Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2022, "GALE Phase 2 Arabic Broadcast Conversation Speech Part 1", https://hdl.handle.net/11272.1/AB2/GGD0CB, Abacus Data Network, V1 Abstract Introduction GALE Phase 2 Arabic Broadcast Conversation Speech Part 1 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 123 hours of Arabic broadcast conversation speech collected in 2006 and 2007 by LDC as part of the DARPA GALE (Gl...
Xi'an Guanzhong Object Naming Oct 12, 2022 Cieri, Christopher; Zhan, Juhong; Jiang, Yue; Liberman, Mark; Yuan, Jiahong; Chen, Yiya; Scharenborg, Odette, 2022, "Xi'an Guanzhong Object Naming", https://hdl.handle.net/11272.1/AB2/D2DBLV, Abacus Data Network, V1 Abstract Introduction Xi'an Guanzhong Object Naming is comprised of approximately 15 hours of audio recordings from speakers of the Guanzhong dialect of Mandarin Chinese living in or near Xi'an in Shaangxi Province (China) naming objects that appeared in colored line drawings. Th...
HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation Sep 20, 2022 Li, Xuansong; Strassel, Stephanie; Jones, Karen; Antonishek, Brian; Fiscus, Jonathan G., 2022, "HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation", https://hdl.handle.net/11272.1/AB2/GNUQ1A, Abacus Data Network, V1 Abstract Introduction HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 6,200 hours of user-generated videos with annotation and metadata. To advance multimodal event detection and rel...
American English Nickname Collection Aug 9, 2022 Carvalho, Vitor R.; Kiran, Yigit; Borthwick, Andrew, 2022, "American English Nickname Collection", https://hdl.handle.net/11272.1/AB2/JR1WG6, Abacus Data Network, V1 Abstract Introduction American English Nickname Collection was developed by Intelius, Inc. and is a compilation of American English nicknames to given name mappings based on information in US government records, public web profiles and financial and property reports. This corpus...
Qatari Corpus of Argumentative Writing Aug 9, 2022 Ahmed, Abdelhamid M.; Myhill, Debra; Abdollahzadeh, Esmaeel; McCallum, Lee; Zaghouani, Wajdi; Rezk, Lameya; Jrad, Anissa; Zhang, Xiao, 2022, "Qatari Corpus of Argumentative Writing", https://hdl.handle.net/11272.1/AB2/F2P2EY, Abacus Data Network, V1 Abstract Introduction Qatari Corpus of Argumentative Writing was developed by Qatar University, University of Exeter and Hamad Bin Khalifa University and is comprised of approximately 200,000 tokens of Arabic and English writing by undergraduate students (159 female, 36 male) alo...
Second DIHARD Challenge Evaluation - Eleven Sources Jul 7, 2022 Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2022, "Second DIHARD Challenge Evaluation - Eleven Sources", https://hdl.handle.net/11272.1/AB2/ML7KD5, Abacus Data Network, V1 Abstract Introduction Second DIHARD Challenge Evaluation - Eleven Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 20 hours of English and Chinese speech data along with corresponding annotations used in support of the Second DIHARD Challen...
NUBUC Jul 7, 2022 Lewis, Gwyneth; van Rijn, Pol; Gwilliams, Laura; Larrouy-Maestri, Pauline; Poeppel, David; Ghitza, Oded, 2022, "NUBUC", https://hdl.handle.net/11272.1/AB2/IUFKIG, Abacus Data Network, V1 Abstract Introduction NUBUC (NyU-BU contextually controlled stories Corpus) was developed by New York University, Max Planck Institute for Empirical Aesthetics and Boston University. It contains approximately three hours of English read speech from eight stories focused on lingui...
LORELEI Wolof Representative Language Pack Jun 10, 2022 Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Wolof Representative Language Pack", https://hdl.handle.net/11272.1/AB2/1M9HI6, Abacus Data Network, V1 Abstract Introduction LORELEI Wolof Representative Language Pack consists of Wolof monolingual text, Wolof-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LORELEI...
AttImam Mar 31, 2022 Alsaif, Amal; Alyahya, Tasniem; Alotibi, Madawi; Almuzaini, Huda; Alqahtani, Abeer, 2022, "AttImam", https://hdl.handle.net/11272.1/AB2/9FBCBG, Abacus Data Network, V1 Abstract Introduction AttImam was developed by Al-Imam Mohammad Ibn Saud Islamic University and consists of approximately 2,000 attribution relations applied to Arabic newswire text from Arabic Treebank: Part 1 v 4.1 (LDC2010T13). Attribution refers to the process of reporting or...
IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 Mar 18, 2022 Andrus, Tony; Bills, Aric; Corris, Miriam; Dubinski, Eyal; Fiscus, Jonathan G.; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Le, Hanh; Ray, Jessica; Rytting, Anton; Silber, Ronnie; Shen, Wade; Tzoukermann, Evelyne, 2022, "IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7", https://hdl.handle.net/11272.1/AB2/WJGWAP, Abacus Data Network, V1 Abstract Introduction IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 201 hours of Vietnamese conversational and scripted telephone speech co...
IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b Mar 18, 2022 Bills, Aric; Conners, Thomas; Corris, Miriam; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Malyska, Nicolas; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Zawaydeh, Bushra, 2022, "IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b", https://hdl.handle.net/11272.1/AB2/HSAU9N, Abacus Data Network, V1 Abstract Introduction IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Dholuo conversational and scripted telephone speech collected...
IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b Mar 18, 2022 Andresen, Lucy; Bills, Aric; Conners, Thomas; Cruz, Luanne Dela; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Le, Hanh; Maurillo, Arlene; Melot, Jennifer; Phillips, Josh; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2022, "IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b", https://hdl.handle.net/11272.1/AB2/3EYPZM, Abacus Data Network, V1 Abstract Introduction IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 191 hours of Cebuano conversational and scripted telephone speech collect...
The Child Subglottal Resonances Database Mar 18, 2022 Lulich, Steven M.; Alwan, Abeer; Sommers, Mitchell S.; Yeung, Gary, 2022, "The Child Subglottal Resonances Database", https://hdl.handle.net/11272.1/AB2/O4SRBR, Abacus Data Network, V1 Abstract Introduction The Child Subglottal Resonances Database was developed by Washington University and University of California Los Angeles and consists of 15.5 hours of simultaneous microphone and subglottal accelerometer recordings of 19 male and 9 female child speakers of A...
The SSNCE Database of Tamil Dysarthric Speech Mar 18, 2022 Vijayalakshmi, P.; Celin, T. A. Mariya; Nagarajan, T., 2022, "The SSNCE Database of Tamil Dysarthric Speech", https://hdl.handle.net/11272.1/AB2/QXP9LM, Abacus Data Network, V1 Abstract Introduction The SSNCE Database of Tamil Dysarthric Speech was developed by the Speech Lab, SSN College of Engineering, India, in collaboration with the Indian National Institute of Empowerment of Persons with Multiple Disabilities (NIEPMD) and contains approximately eig...
LORELEI Ukrainian Representative Language Pack Mar 18, 2022 Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Ma, Xiaoyi; Kulick, Seth; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Ukrainian Representative Language Pack", https://hdl.handle.net/11272.1/AB2/GUYCZL, Abacus Data Network, V1 Abstract Introduction LORELEI Ukrainian Representative Language Pack consists of Ukrainian monolingual text, Ukrainian-English parallel and comparable text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LO...
LORELEI Tigrinya Incident Language Pack Mar 18, 2022 Tracey, Jennifer; Graff, David; Strassel, Stephanie; Arrigo, Michael; Wright, Jonathan; Bies, Ann, 2022, "LORELEI Tigrinya Incident Language Pack", https://hdl.handle.net/11272.1/AB2/CTYB7Q, Abacus Data Network, V1 Abstract Introduction LORELEI Tigrinya Incident Language Pack was developed by the Linguistic Data Consortium and is comprised of approximately 4.5 million words of Tigrinya monolingual text, 25,000 words of English monolingual text, 235,000 words of parallel and comparable Tigri...
BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech Mar 18, 2022 Palmer, Martha; Hwang, Jena D.; Bonial, Claire; O'Gorman, Tim; Gung, James; Stowe, Kevin; Green, Meredith, 2022, "BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/QABG8N, Abacus Data Network, V1 Abstract Introduction BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank and verb sense disambiguat...
BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech Mar 18, 2022 Agarwal, Nitin; Franchini, Michelle; Kappler, Michelle; Micciulla, Linnea; Pradhan, Sameer; Ramshaw, Lance, 2022, "BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/3JEVXI, Abacus Data Network, V1 Abstract Introduction BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies and consists of co-reference annotation on English discussion forum (DF), SMS/Chat and conversational telephone speech (CT...
DEFT Chinese Light and Rich ERE Annotation Mar 18, 2022 Chen, Song; Strassel, Stephanie; Mott, Justin, 2022, "DEFT Chinese Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/MUVS7U, Abacus Data Network, V1 Abstract Introduction DEFT Chinese Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 157 Chinese discussion forum documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filtering of Text (DEFT)...
TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017 Mar 18, 2022 Ellis, Joe; Getman, Jeremy; Chen, Song; Strassel, Stephanie, 2022, "TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017", https://hdl.handle.net/11272.1/AB2/KSIXIZ, Abacus Data Network, V1 Abstract Introduction TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the 2016 TAC KBP Event Argument Linking Pilot and Evaluation...
LORELEI Vietnamese Representative Language Pack Mar 18, 2022 Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Vietnamese Representative Language Pack", https://hdl.handle.net/11272.1/AB2/JWPEIA, Abacus Data Network, V1 Abstract Introduction LORELEI Vietnamese Representative Language Pack consists of Vietnamese monolingual text, Vietnamese-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI progra...

LORELEI Tamil Representative Language Pack

Apr 26, 2023

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Tamil Representative Language Pack", https://hdl.handle.net/11272.1/AB2/TXXE33, Abacus Data Network, V1

Abstract Introduction LORELEI Tamil Representative Language Pack (LDC2023T03) consists of Tamil monolingual text, Tamil-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium (LDC) for the DARPA LORELEI pr...

Penn Korean Universal Dependency Treebank

Apr 26, 2023

Choi, Jinho D.; Han, Na-Rae; Hwang, Jena D.; Kim, Hansaem, 2023, "Penn Korean Universal Dependency Treebank", https://hdl.handle.net/11272.1/AB2/ZW25WL, Abacus Data Network, V1

Abstract Introduction Penn Korean Universal Dependency Treebank contains 5,010 sentences and 132,041 tokens annotated in dependency format under the Universal Dependencies framework. It is a conversion of Korean Treebank Annotations Version 2.0 (LDC2006T09) which was produced in...

DEFT English Light and Rich ERE Annotation

Apr 26, 2023

Chen, Song; Bies, Ann; Griffitt, Kira; Ellis, Joe; Strassel, Stephanie, 2023, "DEFT English Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/7KH7V4, Abacus Data Network, V1

Abstract Introduction DEFT English Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 1190 English discussion forum, newswire and proxy documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filt...

AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts

Mar 20, 2023

Delgado, Dana; Walker, Kevin; Graff, David; Strassel, Stephanie, 2023, "AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts", https://hdl.handle.net/11272.1/AB2/CKALC2, Abacus Data Network, V1

Abstract Introduction AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 156 hours of Ukrainian conversational telephone speech (CTS) and broadcast news audio (BN) with 1.2 mi...

2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual

Mar 17, 2023

Sadjadi, Omid; Greenberg, Craig; Li, Xuansong; Strassel, Stephanie, 2023, "2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual", https://hdl.handle.net/11272.1/AB2/RWQNK7, Abacus Data Network, V1

Abstract Introduction 2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 64 hours of English audio-visual data for development...

LORELEI Tagalog Representative Language Pack

Mar 17, 2023

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Tagalog Representative Language Pack", https://hdl.handle.net/11272.1/AB2/IALRRN, Abacus Data Network, V1

Abstract Introduction LORELEI Tagalog Representative Language Pack consists of Tagalog monolingual text, Tagalog-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LO...

LORELEI Swahili Representative Language Pack

Mar 17, 2023

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2023, "LORELEI Swahili Representative Language Pack", https://hdl.handle.net/11272.1/AB2/RPNXXU, Abacus Data Network, V1

Abstract Introduction LORELEI Swahili Representative Language Pack consists of Swahili monolingual text, Swahili-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LO...

United Nations Proceedings Speech

Feb 14, 2023

Chay, Kevin; Elizalde, Cecilia; Ziemski, Michal, 2023, "United Nations Proceedings Speech", https://hdl.handle.net/11272.1/AB2/3LTQ01, Abacus Data Network, V1

Abstract Introduction United Nations Proceedings Speech was developed by the United Nations (UN) and contains approximately 8,500 hours of recorded proceedings in the six official UN languages, Arabic, Chinese, English, French, Russian and Spanish. The data was recorded in 2009-2...

CAMIO Transcription Languages

Jan 26, 2023

Arrigo, Michael; Strassel, Stephanie; Caruso, Christopher, 2023, "CAMIO Transcription Languages", https://hdl.handle.net/11272.1/AB2/IEJLCN, Abacus Data Network, V1

Abstract Introduction CAMIO Transcription Languages was developed by the Linguistic Data Consortium and contains nearly 70,000 images of machine printed text with corresponding annotations and transcripts in the following 13 languages: Arabic, Chinese, English, Farsi, Hindi, Japa...

CALLHOME Egyptian Arabic Transcripts

Jan 25, 2023

Gadalla, Hassan; Kilany, Hanaa; Arram, Howaida; Yacoub, Ashraf; El-Habashi, Alaa; Shalaby, Amr; Karins, Krisjanis; Rowson, Everett; MacIntyre, Robert; Kingsbury, Paul; Graff, David; McLemore, Cynthia, 2023, "CALLHOME Egyptian Arabic Transcripts", https://hdl.handle.net/11272.1/AB2/Y03PCU, Abacus Data Network, V1

Abstract Introduction The text component of the CALLHOME Egyptian Arabic package includes transcripts and documentation files. The transcripts cover a contiguous five or ten minute segment taken from 120 unscripted telephone conversations between native speakers of Egyptian Collo...

CALLHOME Egyptian Arabic Speech

Jan 25, 2023

Canavan, Alexandra; Zipperlen, George; Graff, David, 2023, "CALLHOME Egyptian Arabic Speech", https://hdl.handle.net/11272.1/AB2/J3CPAE, Abacus Data Network, V1

Abstract Introduction The CALLHOME Egyptian Arabic corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Egyptian Colloquial Arabic (ECA), the spoken variety of Arabic found in Egypt. The dialect of ECA that this dictionary repre...

GALE Phase 2 Arabic Broadcast News Transcripts Part 1

Jan 25, 2023

Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2023, "GALE Phase 2 Arabic Broadcast News Transcripts Part 1", https://hdl.handle.net/11272.1/AB2/YPCAIR, Abacus Data Network, V1

Abstract Introduction GALE Phase 2 Arabic Broadcast News Transcripts Part 1 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 165 hours of Arabic broadcast news speech collected in 2006 and 2007 by LDC, MediaNet, Tunis, Tunisia and...

GALE Phase 2 Arabic Broadcast News Speech Part 1

Jan 25, 2023

Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2023, "GALE Phase 2 Arabic Broadcast News Speech Part 1", https://hdl.handle.net/11272.1/AB2/CXPTR7, Abacus Data Network, V1

Abstract Introduction GALE Phase 2 Arabic Broadcast News Speech Part 1 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 165 hours of Arabic broadcast news speech collected in 2006 and 2007 by LDC, MediaNet, Tunis, Tunisia and MTC, Rabat, Mor...

GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2

Jan 25, 2023

Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2023, "GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2", https://hdl.handle.net/11272.1/AB2/CS2DU6, Abacus Data Network, V1

Abstract Introduction GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 128 hours of Arabic broadcast conversation speech collected in 2007 by LDC, MediaNet, Tunis, Tuni...

GALE Phase 2 Arabic Broadcast Conversation Speech Part 2

Jan 25, 2023

Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2023, "GALE Phase 2 Arabic Broadcast Conversation Speech Part 2", https://hdl.handle.net/11272.1/AB2/AJ2CAE, Abacus Data Network, V1

Abstract Introduction GALE Phase 2 Arabic Broadcast Conversation Speech Part 2 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 128 hours of Arabic broadcast conversation speech collected in 2007 by LDC, MediaNet, Tunis, Tunisia and MTC, Rab...

Third DIHARD Challenge Evaluation

Jan 24, 2023

Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2023, "Third DIHARD Challenge Evaluation", https://hdl.handle.net/11272.1/AB2/VQPCKU, Abacus Data Network, V1

Abstract Introduction Third DIHARD Challenge Evaluation was developed by the Linguistic Data Consortium (LDC) and contains approximately 33 hours of English and Chinese speech data along with corresponding annotations used in support of the Third DIHARD Challenge. The DIHARD Chal...

Global TIMIT Thai

Jan 24, 2023

Liberman, Mark; Yuan, Jiahong; Cieri, Christopher; Wright, Jonathan, 2023, "Global TIMIT Thai", https://hdl.handle.net/11272.1/AB2/JY8T3N, Abacus Data Network, V1

Abstract Introduction Global TIMIT Thai was developed by the Linguistic Data Consortium and consists of approximately 12 hours of read speech and time-aligned transcripts in Standard Thai. The Global TIMIT project aimed to create a series of corpora in a variety of languages with...

Third DIHARD Challenge Development

Dec 8, 2022

Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2022, "Third DIHARD Challenge Development", https://hdl.handle.net/11272.1/AB2/UY5O0X, Abacus Data Network, V1

Abstract Introduction Third DIHARD Challenge Development was developed by Linguistic Data Consortium (LDC) and contains approximately 34 hours of English and Chinese speech data along with corresponding annotations used in support of the Third DIHARD Challenge. The DIHARD Challen...

BOLT English Translation Treebank - Egyptian Arabic SMS/Chat

Dec 8, 2022

Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2022, "BOLT English Translation Treebank - Egyptian Arabic SMS/Chat", https://hdl.handle.net/11272.1/AB2/SPCYLS, Abacus Data Network, V1

Abstract Introduction BOLT English Translation Treebank - Egyptian Arabic SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of SMS and chat text data translated from Egyptian Arabic to English and annotated for part-of-speech and syntactic structure. The...

Hispanic-English Database

Nov 30, 2022

Byrne, William; Knodt, Eva; Bernstein, Jared; Emami, Farzhad, 2022, "Hispanic-English Database", https://hdl.handle.net/11272.1/AB2/IIJZCH, Abacus Data Network, V1

Abstract Introduction Hispanic-English Database contains approximately 30 hours of English and Spanish conversational and read speech with transcripts (24 hours) and metadata collected from 22 non-native English speakers between 1996 and 1998. The corpus was developed by Entropic...

2017 NIST Language Recognition Evaluation Training and Development Sets

Nov 30, 2022

Greenberg, Craig; Sadjadi, Omid; Reynolds, Douglas; Singer, Elliot; Graff, David, 2022, "2017 NIST Language Recognition Evaluation Training and Development Sets", https://hdl.handle.net/11272.1/AB2/K7LOKJ, Abacus Data Network, V1

Abstract Introduction 2017 NIST Language Recognition Evaluation Training and Development Sets contains training and development material for the 2017 NIST Language Recognition Evaluation. It consists of approximately 2,100 hours of conversational telephone speech, broadcast conve...

LORELEI Bengali Representative Language Pack

Nov 29, 2022

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Bengali Representative Language Pack", https://hdl.handle.net/11272.1/AB2/IG4DBS, Abacus Data Network, V1

Abstract Introduction LORELEI Bengali Representative Language Pack consists of Bengali monolingual text, Bengali-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LO...

Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon

Nov 29, 2022

Lau, Mingfei; Zhong, Muhan; Lau, Chaak-ming; Su, Jian; Chan, Henry; Cheung, Bing, 2022, "Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon", https://hdl.handle.net/11272.1/AB2/URBMXM, Abacus Data Network, V1

Abstract Introduction Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon was developed by the Cantonese Computational Linguistics Infrastructure Working Group. It contains approximately 130,000 Cantonese character, word, and phrase entries paired with their corresponding rom...

Gulf Arabic Conversational Telephone Speech

Oct 13, 2022

Appen Pty Ltd. Sydney, Australia, 2022, "Gulf Arabic Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/SCSMSJ, Abacus Data Network, V1

Abstract Introduction Gulf Arabic Conversational Telephone Speech is a database developed by Appen Pty Ltd., Sydney, Australia and contains roughly 2,800 min of spontaneous telephone conversations in Colloquial Gulf Arabic. This corpus was collected and transcribed in 2004 by App...

Iraqi Arabic Conversational Telephone Speech

Oct 13, 2022

Appen Pty Ltd. Sydney, Australia, 2022, "Iraqi Arabic Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/YBQF3Y, Abacus Data Network, V1

Abstract Introduction Iraqi Arabic Conversational Telephone Speech was developed by Appen Pty Ltd, Sydney, Australia and contains roughly 3000 mins of speech from Iraqi Arabic speakers taking part in spontaneous telephone conversations in Colloquial Iraqi Arabic. This corpus was...

Gulf Arabic Conversational Telephone Speech, Transcripts

Oct 13, 2022

Appen Pty Ltd. Sydney, Australia, 2022, "Gulf Arabic Conversational Telephone Speech, Transcripts", https://hdl.handle.net/11272.1/AB2/ZLBR2M, Abacus Data Network, V1

Abstract Introduction Gulf Arabic Conversational Telephone Speech, Transcripts is a database developed by Appen Pty Ltd., Sydney, Australia and contains transcripts of roughly 2,800 min of spontaneous telephone conversations in Colloquial Gulf Arabic. A total of 976 conversation...

Iraqi Arabic Conversational Telephone Speech, Transcripts

Oct 13, 2022

Appen Pty Ltd. Sydney, Australia, 2022, "Iraqi Arabic Conversational Telephone Speech, Transcripts", https://hdl.handle.net/11272.1/AB2/ELQDGO, Abacus Data Network, V1

Abstract Introduction Iraqi Arabic Conversational Telephone Speech, Transcripts was developed by Appen Pty Ltd, Sydney, Australia and contains transcripts for roughly 3000 mins of speech from Iraqi Arabic speakers taking part in spontaneous telephone conversations in Colloquial I...

GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1

Oct 13, 2022

Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2022, "GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1", https://hdl.handle.net/11272.1/AB2/MZSDMN, Abacus Data Network, V1

Abstract Introduction GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1 was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 123 hours of Arabic broadcast conversation speech collected in 2006 and 2007 by LDC, MediaNet, Tu...

King Saud University Arabic Speech Database

Oct 12, 2022

Alsulaiman, Mansour; Muhammad, Ghulam; Abdelkader, Bencherif Mohamed; Mahmood, Awais; Ali, Zulfiqar, 2022, "King Saud University Arabic Speech Database", https://hdl.handle.net/11272.1/AB2/4YVL4A, Abacus Data Network, V1

Abstract Introduction King Saud University Arabic Speech Database was developed by Speech Group (SG) at King Saud University and contains 590 hours of recorded Arabic speech from 269 male and female speakers. The utterances include read and spontaneous speech. The recordings were...

GALE Phase 2 Arabic Broadcast Conversation Speech Part 1

Oct 12, 2022

Walker, Kevin; Caruso, Christopher; Maeda, Kazuaki; DiPersio, Denise; Strassel, Stephanie, 2022, "GALE Phase 2 Arabic Broadcast Conversation Speech Part 1", https://hdl.handle.net/11272.1/AB2/GGD0CB, Abacus Data Network, V1

Abstract Introduction GALE Phase 2 Arabic Broadcast Conversation Speech Part 1 was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 123 hours of Arabic broadcast conversation speech collected in 2006 and 2007 by LDC as part of the DARPA GALE (Gl...

Xi'an Guanzhong Object Naming

Oct 12, 2022

Cieri, Christopher; Zhan, Juhong; Jiang, Yue; Liberman, Mark; Yuan, Jiahong; Chen, Yiya; Scharenborg, Odette, 2022, "Xi'an Guanzhong Object Naming", https://hdl.handle.net/11272.1/AB2/D2DBLV, Abacus Data Network, V1

Abstract Introduction Xi'an Guanzhong Object Naming is comprised of approximately 15 hours of audio recordings from speakers of the Guanzhong dialect of Mandarin Chinese living in or near Xi'an in Shaangxi Province (China) naming objects that appeared in colored line drawings. Th...

HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation

Sep 20, 2022

Li, Xuansong; Strassel, Stephanie; Jones, Karen; Antonishek, Brian; Fiscus, Jonathan G., 2022, "HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation", https://hdl.handle.net/11272.1/AB2/GNUQ1A, Abacus Data Network, V1

Abstract Introduction HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 6,200 hours of user-generated videos with annotation and metadata. To advance multimodal event detection and rel...

American English Nickname Collection

Aug 9, 2022

Carvalho, Vitor R.; Kiran, Yigit; Borthwick, Andrew, 2022, "American English Nickname Collection", https://hdl.handle.net/11272.1/AB2/JR1WG6, Abacus Data Network, V1

Abstract Introduction American English Nickname Collection was developed by Intelius, Inc. and is a compilation of American English nicknames to given name mappings based on information in US government records, public web profiles and financial and property reports. This corpus...

Qatari Corpus of Argumentative Writing

Aug 9, 2022

Ahmed, Abdelhamid M.; Myhill, Debra; Abdollahzadeh, Esmaeel; McCallum, Lee; Zaghouani, Wajdi; Rezk, Lameya; Jrad, Anissa; Zhang, Xiao, 2022, "Qatari Corpus of Argumentative Writing", https://hdl.handle.net/11272.1/AB2/F2P2EY, Abacus Data Network, V1

Abstract Introduction Qatari Corpus of Argumentative Writing was developed by Qatar University, University of Exeter and Hamad Bin Khalifa University and is comprised of approximately 200,000 tokens of Arabic and English writing by undergraduate students (159 female, 36 male) alo...

Second DIHARD Challenge Evaluation - Eleven Sources

Jul 7, 2022

Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2022, "Second DIHARD Challenge Evaluation - Eleven Sources", https://hdl.handle.net/11272.1/AB2/ML7KD5, Abacus Data Network, V1

Abstract Introduction Second DIHARD Challenge Evaluation - Eleven Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 20 hours of English and Chinese speech data along with corresponding annotations used in support of the Second DIHARD Challen...

NUBUC

Jul 7, 2022

Lewis, Gwyneth; van Rijn, Pol; Gwilliams, Laura; Larrouy-Maestri, Pauline; Poeppel, David; Ghitza, Oded, 2022, "NUBUC", https://hdl.handle.net/11272.1/AB2/IUFKIG, Abacus Data Network, V1

Abstract Introduction NUBUC (NyU-BU contextually controlled stories Corpus) was developed by New York University, Max Planck Institute for Empirical Aesthetics and Boston University. It contains approximately three hours of English read speech from eight stories focused on lingui...

LORELEI Wolof Representative Language Pack

Jun 10, 2022

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Wolof Representative Language Pack", https://hdl.handle.net/11272.1/AB2/1M9HI6, Abacus Data Network, V1

Abstract Introduction LORELEI Wolof Representative Language Pack consists of Wolof monolingual text, Wolof-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LORELEI...

AttImam

Mar 31, 2022

Alsaif, Amal; Alyahya, Tasniem; Alotibi, Madawi; Almuzaini, Huda; Alqahtani, Abeer, 2022, "AttImam", https://hdl.handle.net/11272.1/AB2/9FBCBG, Abacus Data Network, V1

Abstract Introduction AttImam was developed by Al-Imam Mohammad Ibn Saud Islamic University and consists of approximately 2,000 attribution relations applied to Arabic newswire text from Arabic Treebank: Part 1 v 4.1 (LDC2010T13). Attribution refers to the process of reporting or...

IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7

Mar 18, 2022

Andrus, Tony; Bills, Aric; Corris, Miriam; Dubinski, Eyal; Fiscus, Jonathan G.; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Le, Hanh; Ray, Jessica; Rytting, Anton; Silber, Ronnie; Shen, Wade; Tzoukermann, Evelyne, 2022, "IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7", https://hdl.handle.net/11272.1/AB2/WJGWAP, Abacus Data Network, V1

Abstract Introduction IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 201 hours of Vietnamese conversational and scripted telephone speech co...

IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b

Mar 18, 2022

Bills, Aric; Conners, Thomas; Corris, Miriam; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Malyska, Nicolas; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Zawaydeh, Bushra, 2022, "IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b", https://hdl.handle.net/11272.1/AB2/HSAU9N, Abacus Data Network, V1

Abstract Introduction IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Dholuo conversational and scripted telephone speech collected...

IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b

Mar 18, 2022

Andresen, Lucy; Bills, Aric; Conners, Thomas; Cruz, Luanne Dela; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Le, Hanh; Maurillo, Arlene; Melot, Jennifer; Phillips, Josh; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2022, "IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b", https://hdl.handle.net/11272.1/AB2/3EYPZM, Abacus Data Network, V1

Abstract Introduction IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 191 hours of Cebuano conversational and scripted telephone speech collect...

The Child Subglottal Resonances Database

Mar 18, 2022

Lulich, Steven M.; Alwan, Abeer; Sommers, Mitchell S.; Yeung, Gary, 2022, "The Child Subglottal Resonances Database", https://hdl.handle.net/11272.1/AB2/O4SRBR, Abacus Data Network, V1

Abstract Introduction The Child Subglottal Resonances Database was developed by Washington University and University of California Los Angeles and consists of 15.5 hours of simultaneous microphone and subglottal accelerometer recordings of 19 male and 9 female child speakers of A...

The SSNCE Database of Tamil Dysarthric Speech

Mar 18, 2022

Vijayalakshmi, P.; Celin, T. A. Mariya; Nagarajan, T., 2022, "The SSNCE Database of Tamil Dysarthric Speech", https://hdl.handle.net/11272.1/AB2/QXP9LM, Abacus Data Network, V1

Abstract Introduction The SSNCE Database of Tamil Dysarthric Speech was developed by the Speech Lab, SSN College of Engineering, India, in collaboration with the Indian National Institute of Empowerment of Persons with Multiple Disabilities (NIEPMD) and contains approximately eig...

LORELEI Ukrainian Representative Language Pack

Mar 18, 2022

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Ma, Xiaoyi; Kulick, Seth; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Ukrainian Representative Language Pack", https://hdl.handle.net/11272.1/AB2/GUYCZL, Abacus Data Network, V1

Abstract Introduction LORELEI Ukrainian Representative Language Pack consists of Ukrainian monolingual text, Ukrainian-English parallel and comparable text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LO...

LORELEI Tigrinya Incident Language Pack

Mar 18, 2022

Tracey, Jennifer; Graff, David; Strassel, Stephanie; Arrigo, Michael; Wright, Jonathan; Bies, Ann, 2022, "LORELEI Tigrinya Incident Language Pack", https://hdl.handle.net/11272.1/AB2/CTYB7Q, Abacus Data Network, V1

Abstract Introduction LORELEI Tigrinya Incident Language Pack was developed by the Linguistic Data Consortium and is comprised of approximately 4.5 million words of Tigrinya monolingual text, 25,000 words of English monolingual text, 235,000 words of parallel and comparable Tigri...

BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

Mar 18, 2022

Palmer, Martha; Hwang, Jena D.; Bonial, Claire; O'Gorman, Tim; Gung, James; Stowe, Kevin; Green, Meredith, 2022, "BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/QABG8N, Abacus Data Network, V1

Abstract Introduction BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank and verb sense disambiguat...

BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

Mar 18, 2022

Agarwal, Nitin; Franchini, Michelle; Kappler, Michelle; Micciulla, Linnea; Pradhan, Sameer; Ramshaw, Lance, 2022, "BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/3JEVXI, Abacus Data Network, V1

Abstract Introduction BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies and consists of co-reference annotation on English discussion forum (DF), SMS/Chat and conversational telephone speech (CT...

DEFT Chinese Light and Rich ERE Annotation

Mar 18, 2022

Chen, Song; Strassel, Stephanie; Mott, Justin, 2022, "DEFT Chinese Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/MUVS7U, Abacus Data Network, V1

Abstract Introduction DEFT Chinese Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 157 Chinese discussion forum documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filtering of Text (DEFT)...

TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017

Mar 18, 2022

Ellis, Joe; Getman, Jeremy; Chen, Song; Strassel, Stephanie, 2022, "TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017", https://hdl.handle.net/11272.1/AB2/KSIXIZ, Abacus Data Network, V1

Abstract Introduction TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the 2016 TAC KBP Event Argument Linking Pilot and Evaluation...

LORELEI Vietnamese Representative Language Pack

Mar 18, 2022

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Vietnamese Representative Language Pack", https://hdl.handle.net/11272.1/AB2/JWPEIA, Abacus Data Network, V1

Abstract Introduction LORELEI Vietnamese Representative Language Pack consists of Vietnamese monolingual text, Vietnamese-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI progra...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications