Skip to main content
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 50 of 428 Results
Apr 14, 2026
Li, Bin; Minxuan, Feng; Junyang, Dai; Huidan, Xu; Xin, Lu; Xinyu, Tuo; Lezhi, Wang; Yuqin, Zhang, 2026, "Ancient Chinese WordNet", https://hdl.handle.net/11272.1/AB2/TT82RX, Abacus Data Network, V1
Abstract Introduction Ancient Chinese WordNet was developed by Nanjing Normal University and contains lexical and semantic information for Ancient Chinese vocabulary dating back to the Pre-Qin period (before 221 BCE). The WordNet comprises 38,781 word forms and 55,100 senses, eac...
Apr 14, 2026
Greenberg, Craig; Walker, Kevin; Jones, Karen; Wright, Jonathan; Strassel, Stephanie, 2026, "2022 NIST Language Recognition Evaluation Test and Development Sets", https://hdl.handle.net/11272.1/AB2/U4RCRP, Abacus Data Network, V1
Abstract Introduction 2022 NIST Language Recognition Evaluation Test and Development Sets was developed by the Linguistic Data Consortium (LDC) and the National Institute of Standards and Technology (NIST). This release contains the test and development data, metadata, answer key...
Apr 14, 2026
Caravan, Alexandra; Zipperlen, George; Wheatley, Barbara; Ryant, Neville; Ma, Danni, 2026, "CALLHOME Spanish Second Edition", https://hdl.handle.net/11272.1/AB2/N6DUXZ, Abacus Data Network, V1
Abstract Introduction CALLHOME Spanish Second Edition was developed by the Linguistic Data Consortium (LDC) and contains approximately 38 hours of speech from 120 unscripted telephone conversations between native Spanish speakers. This publication is a re-release of the original...
Apr 14, 2026
Canavan, Alexandra; Zipperlen, George; Wheatley, Barbara; Kaneko, Masayo; Kobayashi, Megumi; Ryant, Neville; Ma, Danni, 2026, "CALLHOME Japanese Second Edition", https://hdl.handle.net/11272.1/AB2/DJTYNT, Abacus Data Network, V1
Abstract Introduction CALLHOME Japanese Second Edition was developed by the Linguistic Data Consortium (LDC) and contains approximately 49 hours of speech from 120 unscripted telephone conversations between native Japanese speakers. This publication is a re-release of the origina...
Feb 20, 2026
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2026, "AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment", https://hdl.handle.net/11272.1/AB2/LAUKPU, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment was developed by the Linguistic Data Consortium (LDC) and is comprised of English, Russian, and Ukrainian web documents (text, video, image), annotations and assessments used in the AID...
Feb 19, 2026
Wiesner, Matthew; Raj, Desh; Maciejewski, Matthew; Haviland, Chloe; Cornell, Samuele; Chodroff, Eleanor; Khudanpur, Sanjeev; Godfrey, Jack, 2026, "Mixer 6 - CHiME 8 Transcribed Calls and Interviews", https://hdl.handle.net/11272.1/AB2/GFEHMX, Abacus Data Network, V1
Abstract Introduction Mixer 6 - CHiME 8 Transcribed Calls and Interviews was developed for the 7th and 8th CHiME (Computational Hearing in Multisource Environments) challenges. It contains 80 hours of English interviews and telephone speech from Mixer 6 Speech (LDC2013S03) with t...
Feb 19, 2026
Sadjadi, Omid; Greenberg, Craig; Walker, Kevin; Jones, Karen; Caruso, Christopher; Strassel, Stephanie, 2026, "2021 NIST Speaker Recognition Evaluation Development and Test Set", https://hdl.handle.net/11272.1/AB2/JRRAME, Abacus Data Network, V1
Abstract Introduction 2021 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 447 hours of Cantonese, Mandarin, and English conversational telephon...
Feb 19, 2026
Tracey, Jennifer; Graff, David; Strassel, Stephanie; Wright, Jonathan; Bies, Ann, 2026, "LORELEI Sinhala Incident Language Pack", https://hdl.handle.net/11272.1/AB2/U3SCIC, Abacus Data Network, V1
Abstract Introduction LORELEI Sinhala Incident Language Pack was developed by the Linguistic Data Consortium (LDC) and consists of approximately 8.1 million words of Sinhala monolingual text, 70,000 words of English monolingual text, 6.4 million words of parallel Sinhala-English...
Feb 19, 2026
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Arrigo, Michael; Wright, Jonathan; Bies, Ann, 2026, "LORELEI Ilocano Incident Language Pack", https://hdl.handle.net/11272.1/AB2/AQND73, Abacus Data Network, V1
Abstract Introduction LORELEI Ilocano Incident Language Pack was developed by the Linguistic Data Consortium (LDC) and consists of approximately 8.9 million words of Ilocano monolingual text, 3.3 million words of English monolingual text, 3.2 million words of parallel Ilocano-Eng...
Feb 19, 2026
Chen, Song; Bies, Ann; Caruso, Christopher; Tracey, Jennifer; Strassel, Stephanie, 2026, "KAIROS Phase 2 Quizlet", https://hdl.handle.net/11272.1/AB2/9GWNUH, Abacus Data Network, V1
Abstract Introduction KAIROS Phase 2 Quizlet was developed by the Linguistic Data Consortium (LDC). It contains English and Spanish text, video and image data and annotations used for pre-evaluation research and system development during Phase 2 of the DARPA KAIROS program. KAIRO...
Feb 19, 2026
Tracey, Jennifer; Chen, Song; Delgado, Dana; Strassel, Stephanie, 2026, "BOLT CTS CALLFRIEND CALLHOME Egyptian Arabic Transcripts and Translations", https://hdl.handle.net/11272.1/AB2/GII9MH, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Egyptian Arabic Transcripts and Translations was developed by the Linguistic Data Consortium (LDC) and consists of transcripts and their corresponding English translations for 116 hours of conversational telephone speech between...
Feb 19, 2026
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2026, "LORELEI Hindi Representative Language Pack", https://hdl.handle.net/11272.1/AB2/AB5FR4, Abacus Data Network, V1
Abstract Introduction LORELEI Hindi Representative Language Pack consists of Hindi monolingual text, Hindi-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LORELEI...
Feb 19, 2026
Chen, Song; Bies, Ann; Mott, Justin; Caruso, Christopher; Tracey, Jennifer; Strassel, Stephanie, 2026, "KAIROS Phase 1 Quizlet", https://hdl.handle.net/11272.1/AB2/YH4L2N, Abacus Data Network, V1
Abstract Introduction KAIROS Phase 1 Quizlet was developed by the Linguistic Data Consortium (LDC). It contains English and Spanish text, video and image data and annotations used for pre-evaluation research and system development during Phase 1 of the DARPA KAIROS program. KAIRO...
Feb 19, 2026
Cohen, Shay; Sennrich, Rico; Wu, Guojun, 2026, "Abstract Meaning Representation 2.0 - Machine Translations", https://hdl.handle.net/11272.1/AB2/PS3KWH, Abacus Data Network, V1
Abstract Introduction Abstract Meaning Representation 2.0 - Machine Translations was developed by researchers at the University of Edinburgh, School of Informatics and the University of Zurich, Department of Computational Linguistics. It consists of Spanish, German, Italian and M...
Feb 19, 2026
Cieri, Christopher; Fiumara, James; Walker, Kevin; Ryant, Neville; Liberman, Mark, 2026, "AnnoDIFP CTS Audio and Transcripts", https://hdl.handle.net/11272.1/AB2/CH3X95, Abacus Data Network, V1
Abstract Introduction AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) CTS (Conversational Telephone Speech) Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the Florida Institute of Technology and the University of New Haven to...
Feb 19, 2026
Brandschain, Linda; Walker, Kevin; Graff, David, 2026, "Mixer 7 English Speech", https://hdl.handle.net/11272.1/AB2/CFN2CG, Abacus Data Network, V1
Abstract Introduction Mixer 7 English Speech was developed by the Linguistic Data Consortium (LDC) and contains 12,321 hours of audio recordings of interviews, transcript readings, and conversational telephone speech involving 222 distinct English speakers. This material was coll...
Feb 18, 2026
Graff, David; Chen, Song; Strassel, Stephanie, 2026, "BOLT CTS CALLFRIEND CALLHOME Egyptian Arabic Audio", https://hdl.handle.net/11272.1/AB2/KFINIW, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Egyptian Arabic Audio was developed by the Linguistic Data Consortium (LDC) and consists of approximately 116 hours of speech from 274 unscripted telephone conversations between native speakers of the Arabic dialect spoken in Egy...
Sep 19, 2025
Chen, Song; Tracey, Jennifer; Bies, Ann; Caruso, Christopher; Strassel, Stephanie, 2025, "KAIROS Schema Learning Complex Event Annotation", https://hdl.handle.net/11272.1/AB2/Y1KPTS, Abacus Data Network, V1
Abstract Introduction KAIROS Schema Learning Complex Event Annotation was developed by the Linguistic Data Consortium (LDC) to support the DARPA KAIROS program. It contains English and Spanish text, audio, video and image data labeled for 93 real-world complex events (CEs) with e...
Aug 19, 2025
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Delgado, Dana; Arrigo, Michael, 2025, "LoReHLT Uzbek Representative Language Pack", https://hdl.handle.net/11272.1/AB2/VM5TBL, Abacus Data Network, V1
Abstract Introduction LoReHLT Uzbek Representative Language Pack consists of Uzbek monolingual text, Uzbek-English parallel text, annotations, audio recordings, supplemental resources and related software tools developed by the Linguistic Data Consortium for LoReHLT, a companion...
Aug 18, 2025
Peng, Weiming; Zhao, Min; He, Jing; Song, Yuchen; Song, Tianbao; Guo, Dongdong; Sun, Jingbo; Zhu, Shuqin; Zhang, Yinbin; Wei, Zuntian; Hu, Jiajia; Song, Jihua; Sui, Zhifang; Wang, Ning, 2025, "Chinese Sentence Pattern Structure Treebank", https://hdl.handle.net/11272.1/AB2/QZUMNU, Abacus Data Network, V1
Abstract Introduction Chinese Sentence Pattern Structure Treebank (the SPS Treebank) was developed at Beijing Normal University and Peking University. It contains 5,016 sentences and 119,627 tokens syntactically annotated following the concept of sentence constituent analysis whi...
Aug 18, 2025
Tracey, Jennifer; Chen, Song; Delgado, Dana; Strassel, Stephanie, 2025, "BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Transcripts and Translations", https://hdl.handle.net/11272.1/AB2/LGXOHL, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Transcripts and Translations was developed by the Linguistic Data Consortium (LDC) and consists of transcripts and their corresponding English translations for 93 hours of conversational telephone speech...
Aug 14, 2025
Arrigo, Michael; Delgado, Dana; Strassel, Stephanie; Graff, David, 2025, "IWSLT 2022-2023 Shared Task Training, Development and Test Set", https://hdl.handle.net/11272.1/AB2/ONUJ54, Abacus Data Network, V1
Abstract Introduction IWSLT 2022 - 2023 Shared Task Training, Development and Test Set was developed by the Linguistic Data Consortium (LDC). It contains 210 hours of Tunisian Arabic conversational telephone speech, transcripts and their English translations covering 175 hours of...
Aug 14, 2025
Cieri, Christopher; Fiumara, James; Walker, Kevin; Liberman, Mark; Ryant, Neville, 2025, "AnnoDIFP Session Audio and Transcripts", https://hdl.handle.net/11272.1/AB2/OGBCJ9, Abacus Data Network, V1
Abstract Introduction AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the Florida Institute of Technology (FIT), and the University of New Haven (UNH) to support algorith...
Aug 14, 2025
Tracey, Jennifer; Graff, David; Chen, Song; Strassel, Stephanie, 2025, "BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio", https://hdl.handle.net/11272.1/AB2/1BGPSO, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio was developed by the Linguistic Data Consortium (LDC) and consists of approximately 93 hours of speech from 236 unscripted telephone conversations between native speakers of the Mandarin Chinese di...
Jul 23, 2025
Kroch, Anthony; Santorini, Beatrice; Taylor, Ann; Diertani, Ariel, 2025, "Penn Parsed Corpora of Historical English Second Release", https://hdl.handle.net/11272.1/AB2/E4NMWX, Abacus Data Network, V1
Abstract Introduction Penn Parsed Corpora of Historical English Second Release was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the Firs...
Jun 9, 2025
Bekkozhanova, Gulnar; Bills, Aric; Chouder, Sarra; Jaralve, Vanessa; Corey, Cassian; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Kazi, Michael; Lam, Julie; Le, Hanh; Malyska, Nicolas; Marcucci, Giorgia; Marvi, Sarah; McConnell, Sara; Melot, Jennifer; Mensch, Alyssa; Morrison, Michelle; Paget, Shelley; Ramizo, Katerina; Richardson, Frederick; Roberts, Annette; Rubino, Carl; Sarseke, Gulnar; Taubayev, Zharas, 2025, "MATERIAL Kazakh-English Language Pack", https://hdl.handle.net/11272.1/AB2/5G61UB, Abacus Data Network, V1
Abstract Introduction MATERIAL Kazakh-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 57 hours of K...
Apr 29, 2025
Greenberg, Craig; Sadjadi, Omid; Graff, David; Walker, Kevin; Jones, Karen; Caruso, Christopher; Strassel, Stephanie; Wright, Jonathan, 2025, "2015 NIST Language Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/TPVLOA, Abacus Data Network, V1
Abstract Introduction 2015 NIST Language Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and the National Institute of Standards and Technology (NIST). It contains the evaluation test set for the 2015 NIST Language Recognition Evaluation, app...
Apr 29, 2025
Chen, Song; Mott, Justin; Strassel, Stephanie, 2025, "DEFT Spanish Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/WMSO8E, Abacus Data Network, V1
Abstract Introduction DEFT Spanish Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 158 Spanish discussion forum and newswire documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filtering of...
Apr 29, 2025
Zhang, Xiao; Zhang, Ling; Dang, Tian; Feng, Yuanzhao; Ji, Yujing; Jiang, Xiaohui; Kang, Zhewen; Lu, Yan; Nie, Wen; Ren, Hanyu; Wang, Canjun; Wang, Jiayi; Wang, Yu; Wu, Chen; Wu, Mei; Xu, Tingting; Yang, Ruhai; Zhao, Kai; Zhao, Ran; Zhou, Quanjie; Zhu, Lei, 2025, "The Xi’an Multi-Language Learner Corpus", https://hdl.handle.net/11272.1/AB2/KEPEYK, Abacus Data Network, V1
Abstract Introduction The Xi’an Multi-Language Learner Corpus was developed by Xi'an International Studies University (XISU). It is comprised of 526 argumentative essays in 15 languages by Chinese L1 university students studying second languages, along with student metadata and w...
Apr 3, 2025
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2025, "LORELEI Hungarian Representative Language Pack", https://hdl.handle.net/11272.1/AB2/6G8DZZ, Abacus Data Network, V1
Abstract Introduction LORELEI Hungarian Representative Language Pack consists of Hungarian monolingual text, Hungarian-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program....
Apr 3, 2025
Vanroy, Bram, 2025, "Abstract Meaning Representation 3.0 - Machine Translations", https://hdl.handle.net/11272.1/AB2/TKRDFD, Abacus Data Network, V1
Abstract Introduction Abstract Meaning Representation 3.0 - Machine Translations was developed by the Center for Computational Linguistics at KU Leuven in the HORIZON2020 project SignON. It is an automatic translation of a subset of sentences from Abstract Meaning Representation...
Apr 3, 2025
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2025, "AIDA Scenario 3 Practice Topic Source Data and Annotation", https://hdl.handle.net/11272.1/AB2/KAFV5Q, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 3 Practice Topic Source Data and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of English, Russian and Spanish web documents (text, video, image) and annotations. The DARPA AIDA (Active Interpretation of Disp...
Apr 1, 2025
Linguistic Data Consortium; Appen Pty Ltd., 2025, "ASpIRE Development and Development Test Sets", https://hdl.handle.net/11272.1/AB2/YS9IIX, Abacus Data Network, V1
Abstract Introduction ASpIRE Development and Development Test Sets was developed for the Automatic Speech recognition In Reverberant Environments (ASpIRE) Challenge sponsored by IARPA (the Intelligent Advanced Research Projects Activity). It contains approximately 226 hours of En...
Mar 28, 2025
Asatiani, Sandro; Bills, Aric; Brunckhorst, Rachael; Chouder, Sarra; Corey, Cassian; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Kalkhitashvili, Tamar; Kazi, Michael; Tong, Audrey; Lam, Julie; Le, Hanh; Malyska, Nicolas; Marcucci, Giorgia; Marvi, Sarah; McConnell, Sara; Melot, Jennifer; Mensch, Alyssa; Morrison, Michelle; Paget, Shelley; Richardson, Frederick; Roberts, Annette; Rubino, Carl; Samushia, Lela, 2025, "MATERIAL Georgian-English Language Pack", https://hdl.handle.net/11272.1/AB2/H5DHYO, Abacus Data Network, V1
Abstract Introduction MATERIAL Georgian-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 79 hours of...
Mar 28, 2025
Bills, Aric; Chouder, Sarra; Corey, Cassian; Davoodian, Marjan; Dubinski, Eyal; Ellis, Corinna; Farnam, Reza; Gibby, Paul; Hartwig, Luke; Kalnins, Dagmara; Kazi, Michael; Lam, Julie; Le, Hanh; Malyska, Nicolas; Marvi, Sarah; McConnell, Sara; Melot, Jennifer; Mensch, Alyssa; Moore, Alex; Morrison, Michelle; Paget, Shelley; Richardson, Frederick; Roberts, Annette; Rubino, Carl; Moaddel, Marjan Sadeghi, 2025, "MATERIAL Farsi-English Language Pack", https://hdl.handle.net/11272.1/AB2/WLFTJ6, Abacus Data Network, V1
Abstract Introduction MATERIAL Farsi-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 61 hours of Fa...
Mar 28, 2025
Abdi, Zeinab; Ali, Zahra; Bills, Aric; Bishop, Judith; Boyle, Anne; Chouder, Sarra; Clair, Nathaniel; Conners, Tom; Corey, Cassian; Dubinski, Eyal; Ellis, Corinna; Fernando, Jess; Gibby, Paul; Abdi, Farah H; Hammond, Simon; Hubert, Maxime; Kaiser-Schatzlein, Alice; Kazi, Michael; Lam, Julie; Lazar, Rosie; Le, Hanh; Levot, Michael; Malyska, Nicolas; Melot, Jennifer; Mensch, Alyssa; Omar, Abdulkadir Arale; Paget, Shelley; Richardson, Frederick; Rubino, Carl; Samko, Bern; Sanders, Gregory; Soh, Stephanie; Strahan, Tania E.; Taylor, Jonathan; Thompson, Brian; Tong, Audrey; Tong, Richard; Yelle, Julie; Yu, Jennifer; Zavorin, Ilya, 2025, "MATERIAL Somali-English Language Pack", https://hdl.handle.net/11272.1/AB2/2FKSLF, Abacus Data Network, V1
Abstract Introduction MATERIAL Somali-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 80 hours of S...
Mar 28, 2025
Bills, Aric; Bishop, Judith; Boyle, Anne; Chouder, Sarra; Clair, Nathaniel; Conners, Tom; Corey, Cassian; Cronin, Kristina; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Hammond, Simon; Hidalgo, Guia; Kaiser-Schatzlein, Alice; Kalnins, Dagmara; Kazi, Michael; Lam, Julie; Lazar, Rosie; Le, Hanh; Malyska, Nicolas; Medel, Olivia; Melot, Jennifer; Mensch, Alyssa; Moore, Alex; Morrison, Michelle; Paget, Shelley; Raymer, Alston; Richardson, Fred; Ridgway, Hristina; Roberts, Annette; Rubino, Carl; Saw, Kenneth; Shen, Sinney; Soh, Stephanie; Taylor, Jonathan; Thompson, Brian; Tong, Audrey; Tong, Richard; Williams, Mariana; Yelle, Julie; Yu, Jennifer; Zavora, Yoanna; Zavorin, Ilya, 2025, "MATERIAL Bulgarian-English Language Pack", https://hdl.handle.net/11272.1/AB2/WCU3PV, Abacus Data Network, V1
Abstract Introduction MATERIAL Bulgarian-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 78 hours o...
Feb 3, 2025
Hernández Mena, Carlos Daniel; Örnólfsson, Gunnar Thor; Gudnason, Jon, 2025, "Samrómur Synthetic", https://hdl.handle.net/11272.1/AB2/DZUB82, Abacus Data Network, V1
Abstract Introduction Samrómur Synthetic was developed by the Language and Voice Lab, Reykjavik University and contains 72 hours of Icelandic synthetic speech, transcripts and metadata. Data Source sentences were extracted from the Samrómur platform, comprised of texts and transc...
Feb 3, 2025
Hernández Mena, Carlos Daniel; Simonsen, Annika; Gudnason, Jon, 2025, "Ravnursson Faroese Speech and Transcripts", https://hdl.handle.net/11272.1/AB2/OBXEAK, Abacus Data Network, V1
Abstract Introduction Ravnursson Faroese Speech and Transcripts contains 109 hours of Faroese prompted speech from 433 speakers (249 female, 184 male), corresponding transcripts and speaker metadata. It is an extract from the Basic Language Resource Kit 1.0 (BLARK 1.0) developed...
Feb 3, 2025
Alrashoudi, Norah; AlKhalifa, Hend; Alotaibi, Yousef Ajami, 2025, "L2-KSU Native and Non-Native Arabic Speech", https://hdl.handle.net/11272.1/AB2/N7YZP8, Abacus Data Network, V1
Abstract Introduction L2-KSU Native and Non-Native Arabic Speech was developed by King Saud University (KSU) and contains approximately six hours of Modern Standard Arabic read speech from 80 subjects, along with transcripts and speaker metadata. Data The speech data was collecte...
Feb 3, 2025
Maamouri, Mohamed; Graff, David, 2025, "Iraqi Arabic - English Lexical Database", https://hdl.handle.net/11272.1/AB2/EUPXQD, Abacus Data Network, V1
Abstract Introduction Iraqi Arabic - English Lexical Database was developed by the Linguistic Data Consortium (LDC). It contains six interrelated tables presenting over 67,000 Iraqi Arabic words as orthographic forms in Arabic script and pronunciation forms in International Phone...
Jan 21, 2025
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2025, "LORELEI Yoruba Representative Language Pack", https://hdl.handle.net/11272.1/AB2/ATPB58, Abacus Data Network, V1
Abstract Introduction LORELEI Yoruba Representative Language Pack (LDC2024T10) consists of Yoruba monolingual text, Yoruba-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI progr...
Jan 21, 2025
Hennig, Leonhard; Thomas, Philippe; Möller, Sebastian, 2025, "MultiTACRED", https://hdl.handle.net/11272.1/AB2/GIEQ7J, Abacus Data Network, V1
Abstract Introduction MultiTACRED was developed by the German Research Center for Artificial Intelligence (DFKI) Speech and Language Technology Lab and is a machine translation of TAC Relation Extraction Dataset (LDC2018T24) (TACRED) into twelve languages with projected entity an...
Jan 21, 2025
Das, Debopam; Egg, Markus, 2025, "RST Continuity Corpus", https://hdl.handle.net/11272.1/AB2/YSIB2J, Abacus Data Network, V1
Abstract Introduction RST Continuity Corpus was developed at Åbo Akademi University and Humboldt-Universität zu Berlin and contains annotations for continuity dimensions added to RST Discourse Treebank (LDC2002T07). RST Discourse Treebank is a collection of English news texts fro...
Oct 25, 2024
Larson, Brian N., 2024, "First-Year Law Students' Court Memoranda", https://hdl.handle.net/11272.1/AB2/CC9MT6, Abacus Data Network, V1
Abstract Introduction First-Year Law Students' Court Memoranda consists of 197 English law student writing samples of legal briefs annotated for certain characteristics along with accompanying survey responses by the student writers. The briefs were created in a law school writin...
Oct 25, 2024
Hedström, Staffan; Fong, Judy; Þórhallsdóttir, Ragnheiður; Mollberg, David; Guðmundsson, Smári Freyr; Jónsson, Ólafur Helgi; Þorsteinsdóttir, Sunneva; Magnusdottir, Eydis Huld; Gudnason, Jon, 2024, "Samrómur Queries Icelandic Speech 1.0", https://hdl.handle.net/11272.1/AB2/DGPHQR, Abacus Data Network, V1
Abstract Introduction Samrómur Queries Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 20 hours of Icelandic prompted queries from 3,809 speakers represent...
Oct 25, 2024
Consortium, Linguistic Data; ELDA,, 2024, "TRAD Arabic-French Parallel Text -- Newswire", https://hdl.handle.net/11272.1/AB2/48BBWO, Abacus Data Network, V1
Abstract Introduction TRAD Arabic-French Parallel Text -- Newswire was developed by ELDA as part of the PEA-TRAD project. It contains French translations of a subset of approximately 20,000 Arabic words from NIST 2008 Open Machine Translation (OpenMT) Evaluation (LDC2010T21). The...
Oct 25, 2024
Consortium, Linguistic Data; ELDA,, 2024, "TRAD Chinese-French Parallel Text -- Broadcast News", https://hdl.handle.net/11272.1/AB2/IZFPYW, Abacus Data Network, V1
Abstract Introduction TRAD Chinese-French Parallel Text -- Broadcast News was developed by ELDA as part of the PEA-TRAD project. It contains French translations of a subset of approximately 30,000 Chinese characters from GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3...
Oct 25, 2024
Pisa, Dipartimento di Informatica of the University of; ILC-CNR,; Processing, Institute for Language and Speech; Szeged, Institute of Informatics at the University of; Sciences, Institute of Linguistics at the Hungarian Academy of; Ltd., Morphologic, 2024, "2007 CoNLL Shared Task - Greek, Hungarian & Italian", https://hdl.handle.net/11272.1/AB2/JLYA64, Abacus Data Network, V1
Abstract Introduction 2007 CoNLL Shared Task - Greek, Hungarian & Italian consists of dependency treebanks in three languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Greek, Hu...
Oct 25, 2024
Britt, Erica, 2024, "Vehicle City Voices Corpus – Part I", https://hdl.handle.net/11272.1/AB2/8XVBZS, Abacus Data Network, V1
Abstract Introduction Vehicle City Voices Corpus – Part I was developed at the University of Michigan-Flint, and is an ongoing oral history project and survey of English language variation in Flint, Michigan. It contains approximately 16 hours of speech with corresponding transcr...
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact Abacus Data Network Support

Abacus Data Network Support

Please fill this out to prove you are not a robot.

+ =