Skip to main content
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 50 of 399 Results
Apr 3, 2025
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2025, "LORELEI Hungarian Representative Language Pack", https://hdl.handle.net/11272.1/AB2/6G8DZZ, Abacus Data Network, V1
Abstract Introduction LORELEI Hungarian Representative Language Pack consists of Hungarian monolingual text, Hungarian-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program....
Apr 3, 2025
Vanroy, Bram, 2025, "Abstract Meaning Representation 3.0 - Machine Translations", https://hdl.handle.net/11272.1/AB2/TKRDFD, Abacus Data Network, V1
Abstract Introduction Abstract Meaning Representation 3.0 - Machine Translations was developed by the Center for Computational Linguistics at KU Leuven in the HORIZON2020 project SignON. It is an automatic translation of a subset of sentences from Abstract Meaning Representation...
Apr 3, 2025
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2025, "AIDA Scenario 3 Practice Topic Source Data and Annotation", https://hdl.handle.net/11272.1/AB2/KAFV5Q, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 3 Practice Topic Source Data and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of English, Russian and Spanish web documents (text, video, image) and annotations. The DARPA AIDA (Active Interpretation of Disp...
Apr 1, 2025
Linguistic Data Consortium; Appen Pty Ltd., 2025, "ASpIRE Development and Development Test Sets", https://hdl.handle.net/11272.1/AB2/YS9IIX, Abacus Data Network, V1
Abstract Introduction ASpIRE Development and Development Test Sets was developed for the Automatic Speech recognition In Reverberant Environments (ASpIRE) Challenge sponsored by IARPA (the Intelligent Advanced Research Projects Activity). It contains approximately 226 hours of En...
Mar 28, 2025
Asatiani, Sandro; Bills, Aric; Brunckhorst, Rachael; Chouder, Sarra; Corey, Cassian; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Kalkhitashvili, Tamar; Kazi, Michael; Tong, Audrey; Lam, Julie; Le, Hanh; Malyska, Nicolas; Marcucci, Giorgia; Marvi, Sarah; McConnell, Sara; Melot, Jennifer; Mensch, Alyssa; Morrison, Michelle; Paget, Shelley; Richardson, Frederick; Roberts, Annette; Rubino, Carl; Samushia, Lela, 2025, "MATERIAL Georgian-English Language Pack", https://hdl.handle.net/11272.1/AB2/H5DHYO, Abacus Data Network, V1
Abstract Introduction MATERIAL Georgian-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 79 hours of...
Mar 28, 2025
Bills, Aric; Chouder, Sarra; Corey, Cassian; Davoodian, Marjan; Dubinski, Eyal; Ellis, Corinna; Farnam, Reza; Gibby, Paul; Hartwig, Luke; Kalnins, Dagmara; Kazi, Michael; Lam, Julie; Le, Hanh; Malyska, Nicolas; Marvi, Sarah; McConnell, Sara; Melot, Jennifer; Mensch, Alyssa; Moore, Alex; Morrison, Michelle; Paget, Shelley; Richardson, Frederick; Roberts, Annette; Rubino, Carl; Moaddel, Marjan Sadeghi, 2025, "MATERIAL Farsi-English Language Pack", https://hdl.handle.net/11272.1/AB2/WLFTJ6, Abacus Data Network, V1
Abstract Introduction MATERIAL Farsi-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 61 hours of Fa...
Mar 28, 2025
Abdi, Zeinab; Ali, Zahra; Bills, Aric; Bishop, Judith; Boyle, Anne; Chouder, Sarra; Clair, Nathaniel; Conners, Tom; Corey, Cassian; Dubinski, Eyal; Ellis, Corinna; Fernando, Jess; Gibby, Paul; Abdi, Farah H; Hammond, Simon; Hubert, Maxime; Kaiser-Schatzlein, Alice; Kazi, Michael; Lam, Julie; Lazar, Rosie; Le, Hanh; Levot, Michael; Malyska, Nicolas; Melot, Jennifer; Mensch, Alyssa; Omar, Abdulkadir Arale; Paget, Shelley; Richardson, Frederick; Rubino, Carl; Samko, Bern; Sanders, Gregory; Soh, Stephanie; Strahan, Tania E.; Taylor, Jonathan; Thompson, Brian; Tong, Audrey; Tong, Richard; Yelle, Julie; Yu, Jennifer; Zavorin, Ilya, 2025, "MATERIAL Somali-English Language Pack", https://hdl.handle.net/11272.1/AB2/2FKSLF, Abacus Data Network, V1
Abstract Introduction MATERIAL Somali-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 80 hours of S...
Mar 28, 2025
Bills, Aric; Bishop, Judith; Boyle, Anne; Chouder, Sarra; Clair, Nathaniel; Conners, Tom; Corey, Cassian; Cronin, Kristina; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Hammond, Simon; Hidalgo, Guia; Kaiser-Schatzlein, Alice; Kalnins, Dagmara; Kazi, Michael; Lam, Julie; Lazar, Rosie; Le, Hanh; Malyska, Nicolas; Medel, Olivia; Melot, Jennifer; Mensch, Alyssa; Moore, Alex; Morrison, Michelle; Paget, Shelley; Raymer, Alston; Richardson, Fred; Ridgway, Hristina; Roberts, Annette; Rubino, Carl; Saw, Kenneth; Shen, Sinney; Soh, Stephanie; Taylor, Jonathan; Thompson, Brian; Tong, Audrey; Tong, Richard; Williams, Mariana; Yelle, Julie; Yu, Jennifer; Zavora, Yoanna; Zavorin, Ilya, 2025, "MATERIAL Bulgarian-English Language Pack", https://hdl.handle.net/11272.1/AB2/WCU3PV, Abacus Data Network, V1
Abstract Introduction MATERIAL Bulgarian-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 78 hours o...
Feb 3, 2025
Hernández Mena, Carlos Daniel; Örnólfsson, Gunnar Thor; Gudnason, Jon, 2025, "Samrómur Synthetic", https://hdl.handle.net/11272.1/AB2/DZUB82, Abacus Data Network, V1
Abstract Introduction Samrómur Synthetic was developed by the Language and Voice Lab, Reykjavik University and contains 72 hours of Icelandic synthetic speech, transcripts and metadata. Data Source sentences were extracted from the Samrómur platform, comprised of texts and transc...
Feb 3, 2025
Hernández Mena, Carlos Daniel; Simonsen, Annika; Gudnason, Jon, 2025, "Ravnursson Faroese Speech and Transcripts", https://hdl.handle.net/11272.1/AB2/OBXEAK, Abacus Data Network, V1
Abstract Introduction Ravnursson Faroese Speech and Transcripts contains 109 hours of Faroese prompted speech from 433 speakers (249 female, 184 male), corresponding transcripts and speaker metadata. It is an extract from the Basic Language Resource Kit 1.0 (BLARK 1.0) developed...
Feb 3, 2025
Alrashoudi, Norah; AlKhalifa, Hend; Alotaibi, Yousef Ajami, 2025, "L2-KSU Native and Non-Native Arabic Speech", https://hdl.handle.net/11272.1/AB2/N7YZP8, Abacus Data Network, V1
Abstract Introduction L2-KSU Native and Non-Native Arabic Speech was developed by King Saud University (KSU) and contains approximately six hours of Modern Standard Arabic read speech from 80 subjects, along with transcripts and speaker metadata. Data The speech data was collecte...
Feb 3, 2025
Maamouri, Mohamed; Graff, David, 2025, "Iraqi Arabic - English Lexical Database", https://hdl.handle.net/11272.1/AB2/EUPXQD, Abacus Data Network, V1
Abstract Introduction Iraqi Arabic - English Lexical Database was developed by the Linguistic Data Consortium (LDC). It contains six interrelated tables presenting over 67,000 Iraqi Arabic words as orthographic forms in Arabic script and pronunciation forms in International Phone...
Jan 21, 2025
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2025, "LORELEI Yoruba Representative Language Pack", https://hdl.handle.net/11272.1/AB2/ATPB58, Abacus Data Network, V1
Abstract Introduction LORELEI Yoruba Representative Language Pack (LDC2024T10) consists of Yoruba monolingual text, Yoruba-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI progr...
Jan 21, 2025
Hennig, Leonhard; Thomas, Philippe; Möller, Sebastian, 2025, "MultiTACRED", https://hdl.handle.net/11272.1/AB2/GIEQ7J, Abacus Data Network, V1
Abstract Introduction MultiTACRED was developed by the German Research Center for Artificial Intelligence (DFKI) Speech and Language Technology Lab and is a machine translation of TAC Relation Extraction Dataset (LDC2018T24) (TACRED) into twelve languages with projected entity an...
Jan 21, 2025
Das, Debopam; Egg, Markus, 2025, "RST Continuity Corpus", https://hdl.handle.net/11272.1/AB2/YSIB2J, Abacus Data Network, V1
Abstract Introduction RST Continuity Corpus was developed at Åbo Akademi University and Humboldt-Universität zu Berlin and contains annotations for continuity dimensions added to RST Discourse Treebank (LDC2002T07). RST Discourse Treebank is a collection of English news texts fro...
Oct 25, 2024
Larson, Brian N., 2024, "First-Year Law Students' Court Memoranda", https://hdl.handle.net/11272.1/AB2/CC9MT6, Abacus Data Network, V1
Abstract Introduction First-Year Law Students' Court Memoranda consists of 197 English law student writing samples of legal briefs annotated for certain characteristics along with accompanying survey responses by the student writers. The briefs were created in a law school writin...
Oct 25, 2024
Hedström, Staffan; Fong, Judy; Þórhallsdóttir, Ragnheiður; Mollberg, David; Guðmundsson, Smári Freyr; Jónsson, Ólafur Helgi; Þorsteinsdóttir, Sunneva; Magnusdottir, Eydis Huld; Gudnason, Jon, 2024, "Samrómur Queries Icelandic Speech 1.0", https://hdl.handle.net/11272.1/AB2/DGPHQR, Abacus Data Network, V1
Abstract Introduction Samrómur Queries Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 20 hours of Icelandic prompted queries from 3,809 speakers represent...
Oct 25, 2024
Consortium, Linguistic Data; ELDA,, 2024, "TRAD Arabic-French Parallel Text -- Newswire", https://hdl.handle.net/11272.1/AB2/48BBWO, Abacus Data Network, V1
Abstract Introduction TRAD Arabic-French Parallel Text -- Newswire was developed by ELDA as part of the PEA-TRAD project. It contains French translations of a subset of approximately 20,000 Arabic words from NIST 2008 Open Machine Translation (OpenMT) Evaluation (LDC2010T21). The...
Oct 25, 2024
Consortium, Linguistic Data; ELDA,, 2024, "TRAD Chinese-French Parallel Text -- Broadcast News", https://hdl.handle.net/11272.1/AB2/IZFPYW, Abacus Data Network, V1
Abstract Introduction TRAD Chinese-French Parallel Text -- Broadcast News was developed by ELDA as part of the PEA-TRAD project. It contains French translations of a subset of approximately 30,000 Chinese characters from GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3...
Oct 25, 2024
Pisa, Dipartimento di Informatica of the University of; ILC-CNR,; Processing, Institute for Language and Speech; Szeged, Institute of Informatics at the University of; Sciences, Institute of Linguistics at the Hungarian Academy of; Ltd., Morphologic, 2024, "2007 CoNLL Shared Task - Greek, Hungarian & Italian", https://hdl.handle.net/11272.1/AB2/JLYA64, Abacus Data Network, V1
Abstract Introduction 2007 CoNLL Shared Task - Greek, Hungarian & Italian consists of dependency treebanks in three languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Greek, Hu...
Oct 25, 2024
Britt, Erica, 2024, "Vehicle City Voices Corpus – Part I", https://hdl.handle.net/11272.1/AB2/8XVBZS, Abacus Data Network, V1
Abstract Introduction Vehicle City Voices Corpus – Part I was developed at the University of Michigan-Flint, and is an ongoing oral history project and survey of English language variation in Flint, Michigan. It contains approximately 16 hours of speech with corresponding transcr...
Oct 25, 2024
Mena, Carlos Daniel Hernández; Herrera, Abel, 2024, "CHM150", https://hdl.handle.net/11272.1/AB2/UWURFR, Abacus Data Network, V1
Abstract Introduction CHM150 (Corpus Hecho en México 150) was developed by the Speech Processing Laboratory of the Faculty of Engineering at the National Autonomous University of Mexico (UNAM) and consists of approximately 1.63 hours of Mexican Spanish speech, associated transcri...
Oct 25, 2024
Alfaifi, Abdullah; Atwell, Eric, 2024, "Arabic Learner Corpus", https://hdl.handle.net/11272.1/AB2/DPQWPU, Abacus Data Network, V1
Abstract Introduction Arabic Learner Corpus was developed at the University of Leeds and consists of written essays and spoken recordings by Arabic learners collected in Saudi Arabia in 2012 and 2013. The corpus includes 282,732 words in 1,585 materials, produced by 942 students...
Oct 25, 2024
Slaney, Malcolm; McRoberts, Gerald; Scheirer, Jocelyn, 2024, "BabyEars Affective Vocalizations", https://hdl.handle.net/11272.1/AB2/VK52W9, Abacus Data Network, V1
Abstract Introduction BabyEars Affective Vocalizations was developed by Malcolm Slaney, Gerald McRoberts, and Jocelyn Scheirer. It contains approximately 22 minutes of spontaneous English speech by 12 adults interacting with their infant children, for a total of 509 infant-direct...
Oct 25, 2024
Kang, Okim; Hirschi, Kevin; Looney, Stephen D.; Hansen, John H. L., 2024, "Second Language University Speech Intelligibility Corpus", https://hdl.handle.net/11272.1/AB2/QHVV2O, Abacus Data Network, V1
Abstract Introduction Second Language University Speech Intelligibility Corpus was developed by Northern Arizona University, The Pennsylvania State University, and The University of Texas at Dallas. It contains 10.5 hours of English speech by 66 international faculty and universi...
Sep 17, 2024
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2024, "AIDA Scenario 2 Practice Topic Annotation", https://hdl.handle.net/11272.1/AB2/BFKQTZ, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 2 Practice Topic Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of annotations for 29 English, Russian and Spanish web documents (text, image and video) from AIDA Scenario 2 Practice Topic Source Data (LDC2024...
Sep 17, 2024
Ward, Nigel G.; Avila, Jonathan E.; Rivas, Emilia; Marco, Divette, 2024, "Dialogs Re-Enacted Across Languages", https://hdl.handle.net/11272.1/AB2/XRMWND, Abacus Data Network, V1
Abstract Introduction Dialogs Re-Enacted Across Languages was developed at the University of Texas at El Paso. It contains approximately 17 hours of conversational speech in English and Spanish by 129 unique bilingual speakers, specifically, short fragments extracted from spontan...
Sep 17, 2024
Geissler, Christopher; Babinski, Sarah; Shaw, Jason, 2024, "Diaspora Tibetan Speech", https://hdl.handle.net/11272.1/AB2/OPZ58Z, Abacus Data Network, V1
Abstract Introduction Diaspora Tibetan Speech was developed at Yale University. It contains approximately 28 hours of Tibetan elicited speech by 73 speakers from the diaspora Tibetan community in Kathmandu, Nepal, along with transcripts, elicitation materials and speaker demograp...
Sep 17, 2024
Tracey, Jennifer; Strassel, Stephanie; Arrigo, Michael; Wright, Jonathan; Graff, David; Bies, Ann, 2024, "LORELEI Uyghur Incident Language Pack", https://hdl.handle.net/11272.1/AB2/VRJN4A, Abacus Data Network, V1
Abstract Introduction LORELEI Uyghur Incident Language Pack (LDC2024T07) was developed by the Linguistic Data Consortium and consists of approximately 28 million words of Uyghur monolingual text, 500,000 words of English monolingual text, 3.3 million words of parallel and compara...
Jul 30, 2024
Jones, Karen; Walker, Kevin; Graff, David; Wright, Jonathan; Strassel, Stephanie, 2024, "Call My Net 1", https://hdl.handle.net/11272.1/AB2/RJMIEI, Abacus Data Network, V1
Abstract Introduction Call My Net 1 was developed by the Linguistic Data Consortium and contains 364 hours of conversational telephone speech in four languages (Tagalog, Cebuano, Cantonese and Mandarin) collected in 2015 from 221 native speakers located in the Philippines and Chi...
Jul 30, 2024
Cunha, Luís Filipe; Silvano, Purificação; Campos, Ricardo; Jorge, Alípio, 2024, "Automatic Content Extraction for Portuguese", https://hdl.handle.net/11272.1/AB2/5VRIQB, Abacus Data Network, V1
Abstract Introduction Automatic Content Extraction for Portuguese (LDC2024T05) was developed at INESC TEC - Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência and consists of automatic Brazilian Portuguese and European Portuguese translations of the English...
May 13, 2024
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Griffitt, Kira; Ryant, Neville; Kulick, Seth; Delgado, Dana; Arrigo, Michael, 2024, "LoReHLT Hausa Representative Language Pack", https://hdl.handle.net/11272.1/AB2/7MWKZC, Abacus Data Network, V1
Abstract Introduction LoReHLT Hausa Representative Language Pack consists of Hausa monolingual text, Hausa-English parallel text, annotations, amateur web audio recordings, supplemental resources and related software tools developed by the Linguistic Data Consortium for LoReHLT,...
May 13, 2024
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2024, "AIDA Scenario 2 Practice Topic Source Data", https://hdl.handle.net/11272.1/AB2/TXAWUL, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 2 Practice Topic Source Data was developed by the Linguistic Data Consortium (LDC) and is comprised of 1500 root documents, including text, image, and video, from English, Russian, and Spanish web sources. The DARPA AIDA (Active Interpretation...
May 13, 2024
Walker, Kevin; Graff, David; Ma, Xiaoyi; Strassel, Stephanie; Jones, Karen, 2024, "RATS Low Speech Density", https://hdl.handle.net/11272.1/AB2/CXVUXZ, Abacus Data Network, V1
Abstract Introduction RATS Low Speech Density was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 87 hours of English, Levantine Arabic, Farsi, Pashto and Urdu speech and non-speech samples. The recordings were assembled by concatenating a rand...
Mar 28, 2024
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2024, "LORELEI Farsi Representative Language Pack", https://hdl.handle.net/11272.1/AB2/UMEVGY, Abacus Data Network, V1
Abstract Introduction LORELEI Farsi Representative Language Pack consists of Farsi monolingual text, Farsi-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LORELEI...
Mar 28, 2024
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2024, "AIDA Scenario 1 Practice Topic Annotation", https://hdl.handle.net/11272.1/AB2/XPPJWR, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 1 Practice Topic Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of annotations for 212 English, Russian and Ukrainian web documents (text, image and video) from AIDA Scenario 1 Practice Topic Source Data (LDC2...
Mar 28, 2024
Delgado, Dana; Walker, Kevin; Strassel, Stephanie; Graff, David; Caruso, Christopher, 2024, "KASET - Kurmanji and Sorani Kurdish Speech and Transcripts", https://hdl.handle.net/11272.1/AB2/ODAGYC, Abacus Data Network, V1
Abstract Introduction KASET - Kurmanji and Sorani Kurdish Speech and Transcripts was developed by the Linguistic Data Consortium (LDC) and consists of approximately 147 hours of telephone conversations (289 recordings) and broadcast news (410 recordings) in two Kurdish dialects:...
Jan 11, 2024
Tracey, Jennifer; Strassel, Stephanie; Arrigo, Michael, 2024, "TAC KBP Belief and Sentiment - Comprehensive Training and Evaluation Data 2016-2017", https://hdl.handle.net/11272.1/AB2/OM2WHS, Abacus Data Network, V1
Abstract Introduction TAC KBP Belief and Sentiment - Comprehensive Training and Evaluation Data 2016-2017 (LDC2023T13) was developed by the Linguistic Data Consortium and contains training and evaluation data produced in support of the 2016 and 2017 TAC KBP Belief and Sentiment (...
Jan 11, 2024
Belhadj, Mourad; Bendellali, Ilham; Lakhdari, Elalia, 2024, "Kasdi-Merbah (University) Emotional Database in Arabic Speech", https://hdl.handle.net/11272.1/AB2/Y4LDPA, Abacus Data Network, V1
Abstract Introduction Kasdi-Merbah Emotional Database in Arabic Speech was developed by the University of Kasdi Merbah Ouargla. The corpus contains two hours of Modern Standard Arabic prompted speech from 500 speakers (254 female, 246 male) representing 5,000 utterances. Data Spe...
Dec 5, 2023
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2023, "AIDA Scenario 1 and 2 Reference Knowledge Base", https://hdl.handle.net/11272.1/AB2/YTF9AB, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 1 and 2 Reference Knowledge Base was developed by the Linguistic Data Consortium (LDC) and contains the English knowledge base (KB) used for all AIDA entity linking annotation in Scenario 1 (Russia-Ukraine Relations) and Scenario 2 (Crisis in V...
Dec 5, 2023
Graff, David; Jones, Karen; Strassel, Stephanie; Walker, Kevin, 2023, "REMIX Telephone Collection", https://hdl.handle.net/11272.1/AB2/VJPGYX, Abacus Data Network, V1
Abstract Introduction REMIX Telephone Collection was developed by the Linguistic Data Consortium (LDC) and contains 320 hours of English conversational telephone speech from 358 speakers who had completed all tasks in one of the previous LDC Mixer collections, specifically, Mixer...
Dec 5, 2023
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2023, "AIDA Scenario 1 Practice Topic Source Data", https://hdl.handle.net/11272.1/AB2/M4QWGV, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 1 Practice Topic Source Data was developed by the Linguistic Data Consortium (LDC) and is comprised of 1511 documents (text, image, and video) from English, Russian, and Ukrainian web sources. The DARPA AIDA (Active Interpretation of Disparate...
Oct 17, 2023
Miller, David; Walker, Kevin; Graff, David; Canavan, Alexandra, 2023, "CALLFRIEND Russian Text", https://hdl.handle.net/11272.1/AB2/BNFFSZ, Abacus Data Network, V1
Abstract Introduction CALLFRIEND Russian Text (LDC2023T09) was developed by the Linguistic Data Consortium and consists of transcripts for approximately 48 hours of telephone conversations (100 recordings) between native Russian speakers. The calls were recorded in 1999 as part o...
Oct 17, 2023
Delgado, Dana; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Caruso, Christopher; Graff, David, 2023, "2019 OpenSAT Public Safety Communications Simulation", https://hdl.handle.net/11272.1/AB2/BOXO5O, Abacus Data Network, V1
Abstract Introduction 2019 OpenSAT Public Safety Communications Simulation was developed by the Linguistic Data Consortium (LDC) and contains approximately 141 hours of speech recordings and transcripts used in the used in the National Institute of Standards and Technology (NIST)...
Oct 16, 2023
Miller, David; Walker, Kevin; Graff, David; Canavan, Alexandra, 2023, "CALLFRIEND Russian Speech", https://hdl.handle.net/11272.1/AB2/NGRVVO, Abacus Data Network, V1
Abstract Introduction CALLFRIEND Russian Speech (LDC2023S08) was developed by the Linguistic Data Consortium (LDC) and consists of approximately 48 hours of telephone conversations (100 recordings) between native speakers of Russian. The calls were recorded in 1999 as part of the...
Aug 29, 2023
Luqman, Hamzah; Mahmoud, Sabri; Awaida, Sameh, 2016, "KAFD: Arabic Font Database", https://hdl.handle.net/11272.1/AB2/A0JPYM, Abacus Data Network, V2
Introduction KAFD: Arabic Font Database was developed by King Fahd University of Petroleum & Minerals and Qassim University. It is comprised of approximately 2.5 million scanned Arabic printed pages in a variety of fonts, sizes and resolutions along with corresponding transcripts...
Aug 29, 2023
Abdulaziz, Azhar; Kepuska, Veton, 2017, "Noisy TIMIT Speech", https://hdl.handle.net/11272.1/AB2/FFFXT2, Abacus Data Network, V2
Introduction Noisy TIMIT Speech was developed by the Florida Institute of Technology and contains approximately 322 hours of speech from the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) modified with different additive noise levels. Only the audio has been modified;...
Aug 29, 2023
Chen, Gang; Neubauer, Juergen; Garellek, Marc; Samlan, Robin; Gerratt, Bruce R.; Kreiman, Jody; Alwan, Abeer, 2017, "UCLA High-Speed Laryngeal Video and Audio", https://hdl.handle.net/11272.1/AB2/OWLHMG, Abacus Data Network, V2
UCLA High-Speed Laryngeal Video and Audio was developed by UCLA Speech Processing and Auditory Perception Laboratory and is comprised of high-speed laryngeal video recordings of the vocal folds and synchronized audio recordings from nine subjects collected between April 2012 and...
Aug 29, 2023
Vincent, Emmanuel; Barker, Jon; Watanabe, Shinji; Le Roux, Jonathan; Nesta, Francesco; Matassoni, Marco, 2017, "CHiME2 WSJ0", https://hdl.handle.net/11272.1/AB2/IUB8PD, Abacus Data Network, V2
CHiME2 WSJ0 was developed as part of The 2nd CHiME Speech Separation and Recognition Challenge and contains approximately 166 hours of English speech from a noisy living room environment. The CHiME Challenges focus on distant-microphone automatic speech recognition (ASR) in real-...
Aug 29, 2023
Tracey, Jennifer; Lee, Haejoong; Strassel, Stephanie, 2017, "BOLT English Discussion Forums", https://hdl.handle.net/11272.1/AB2/VDFID2, Abacus Data Network, V2
BOLT English Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 830,440 discussion forum threads in English harvested from the Internet using a combination of manual and automatic processes. The DARPA BOLT (Broad Operational Language Translati...
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact Abacus Data Network Support

Abacus Data Network Support

Please fill this out to prove you are not a robot.

+ =