Skip to main content
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 50 of 1,837 Results
Aug 14, 2025
Arrigo, Michael; Delgado, Dana; Strassel, Stephanie; Graff, David, 2025, "IWSLT 2022-2023 Shared Task Training, Development and Test Set", https://hdl.handle.net/11272.1/AB2/ONUJ54, Abacus Data Network, V1
Abstract Introduction IWSLT 2022 - 2023 Shared Task Training, Development and Test Set was developed by the Linguistic Data Consortium (LDC). It contains 210 hours of Tunisian Arabic conversational telephone speech, transcripts and their English translations covering 175 hours of...
Plain Text - 218.8 KB - MD5: 0d75f120af3b88f1180df5f2dfe6346c
Documentation
File manifest for disc 1
Optical Disc Image - 4.3 GB - MD5: ad7314c945f61b989c11b9ac6697b6ac
Data
ISO disc image containing all documentation and data: disc 1
Plain Text - 204.8 KB - MD5: 4feb59006445543997ee5a3450f62a62
Documentation
File manifest for disc 2
Optical Disc Image - 3.3 GB - MD5: 823f98962b928ebcff977225a0f33fc0
Data
ISO disc image containing all documentation and data: disc 2
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Aug 14, 2025
Cieri, Christopher; Fiumara, James; Walker, Kevin; Liberman, Mark; Ryant, Neville, 2025, "AnnoDIFP Session Audio and Transcripts", https://hdl.handle.net/11272.1/AB2/OGBCJ9, Abacus Data Network, V1
Abstract Introduction AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the Florida Institute of Technology (FIT), and the University of New Haven (UNH) to support algorith...
Plain Text - 159.7 KB - MD5: a0fb25c92d6550897ecf4973c9d2eabb
Documentation
File manifest
Markdown Text - 3.1 KB - MD5: 891064c78a8e46a2f9922b793aafa160
Documentation
Instructions on how to access LDC data via UBC's Teamshare service (Markdown / ASCII text)
Adobe PDF - 31.2 KB - MD5: 2a043207829f9ab259df770590941165
Documentation
Instructions on how to access LDC data via UBC's Teamshare service
Aug 14, 2025
Tracey, Jennifer; Graff, David; Chen, Song; Strassel, Stephanie, 2025, "BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio", https://hdl.handle.net/11272.1/AB2/1BGPSO, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio was developed by the Linguistic Data Consortium (LDC) and consists of approximately 93 hours of speech from 236 unscripted telephone conversations between native speakers of the Mandarin Chinese di...
Plain Text - 9.6 KB - MD5: e0fa130b05b8ef250a2acd001a272d26
Documentation
File manifest
Optical Disc Image - 3.9 GB - MD5: 6504896b88df7c7b8d1eaa09c8761f24
Data
ISO disc image containing all documentation and data
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Jul 23, 2025
Kroch, Anthony; Santorini, Beatrice; Taylor, Ann; Diertani, Ariel, 2025, "Penn Parsed Corpora of Historical English Second Release", https://hdl.handle.net/11272.1/AB2/E4NMWX, Abacus Data Network, V1
Abstract Introduction Penn Parsed Corpora of Historical English Second Release was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the Firs...
Plain Text - 124.3 KB - MD5: df9c3a39a9ea7706a70e8bdacc7874ea
Documentation
File manifest
Optical Disc Image - 232.2 MB - MD5: 73478f7463591442b50fab50e4d79cc6
Data
ISO disc image containing all documentation and data
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Jun 9, 2025
Bekkozhanova, Gulnar; Bills, Aric; Chouder, Sarra; Jaralve, Vanessa; Corey, Cassian; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Kazi, Michael; Lam, Julie; Le, Hanh; Malyska, Nicolas; Marcucci, Giorgia; Marvi, Sarah; McConnell, Sara; Melot, Jennifer; Mensch, Alyssa; Morrison, Michelle; Paget, Shelley; Ramizo, Katerina; Richardson, Frederick; Roberts, Annette; Rubino, Carl; Sarseke, Gulnar; Taubayev, Zharas, 2025, "MATERIAL Kazakh-English Language Pack", https://hdl.handle.net/11272.1/AB2/5G61UB, Abacus Data Network, V1
Abstract Introduction MATERIAL Kazakh-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 57 hours of K...
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Optical Disc Image - 9.0 GB - MD5: 368db6e6280771ca15d57d25f32b7c35
Data
ISO disc image containing all documentation and data
Plain Text - 225.1 KB - MD5: 3b3588fc37a241f870756de3dcc14bcc
Documentation
File manifest
Apr 29, 2025
Greenberg, Craig; Sadjadi, Omid; Graff, David; Walker, Kevin; Jones, Karen; Caruso, Christopher; Strassel, Stephanie; Wright, Jonathan, 2025, "2015 NIST Language Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/TPVLOA, Abacus Data Network, V1
Abstract Introduction 2015 NIST Language Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and the National Institute of Standards and Technology (NIST). It contains the evaluation test set for the 2015 NIST Language Recognition Evaluation, app...
Markdown Text - 3.1 KB - MD5: 891064c78a8e46a2f9922b793aafa160
Documentation
Instructions on how to access LDC data via UBC's Teamshare service (Markdown / ASCII text)
Adobe PDF - 31.2 KB - MD5: 2a043207829f9ab259df770590941165
Documentation
Instructions on how to access LDC data via UBC's Teamshare service
Apr 29, 2025
Chen, Song; Mott, Justin; Strassel, Stephanie, 2025, "DEFT Spanish Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/WMSO8E, Abacus Data Network, V1
Abstract Introduction DEFT Spanish Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 158 Spanish discussion forum and newswire documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filtering of...
Optical Disc Image - 23.2 MB - MD5: 06ea6b3331938ae5191eb765a0a133e1
Data
ISO disc image containing all documentation and data
Plain Text - 26.8 KB - MD5: da4eb003789c09c742dde08c99ac5c28
Documentation
File manifest
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Apr 29, 2025
Zhang, Xiao; Zhang, Ling; Dang, Tian; Feng, Yuanzhao; Ji, Yujing; Jiang, Xiaohui; Kang, Zhewen; Lu, Yan; Nie, Wen; Ren, Hanyu; Wang, Canjun; Wang, Jiayi; Wang, Yu; Wu, Chen; Wu, Mei; Xu, Tingting; Yang, Ruhai; Zhao, Kai; Zhao, Ran; Zhou, Quanjie; Zhu, Lei, 2025, "The Xi’an Multi-Language Learner Corpus", https://hdl.handle.net/11272.1/AB2/KEPEYK, Abacus Data Network, V1
Abstract Introduction The Xi’an Multi-Language Learner Corpus was developed by Xi'an International Studies University (XISU). It is comprised of 526 argumentative essays in 15 languages by Chinese L1 university students studying second languages, along with student metadata and w...
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Optical Disc Image - 4.0 MB - MD5: 1a62577f66c1a9312e4d3f0bd98dc9e2
Data
ISO disc image containing all documentation and data
Plain Text - 26.1 KB - MD5: d11477f4a0506d9b0434d12ff92e1669
Documentation
File manifest
Apr 3, 2025
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2025, "LORELEI Hungarian Representative Language Pack", https://hdl.handle.net/11272.1/AB2/6G8DZZ, Abacus Data Network, V1
Abstract Introduction LORELEI Hungarian Representative Language Pack consists of Hungarian monolingual text, Hungarian-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program....
Optical Disc Image - 3.3 GB - MD5: 1430ccce8b8fe03ee716ff6dbd5d0d9a
Data
ISO disc image containing all documentation and data: disc 1
Optical Disc Image - 3.9 GB - MD5: b1a1c1e87b1ad8e4cc7e57bc05ccd10b
Data
ISO disc image containing all documentation and data: disc 3
Optical Disc Image - 3.3 GB - MD5: a53f5725ab5bf1f8e69cfd499e53ee9a
Data
ISO disc image containing all documentation and data: disc 4
Optical Disc Image - 4.2 GB - MD5: 557498e2f274d6f53853f434eb9018fe
Data
ISO disc image containing all documentation and data: disc 2
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Plain Text - 9.1 KB - MD5: 5a70d0b26b2ee9a4bf5358a49aab9618
Documentation
File manifest for disc 4
Plain Text - 9.1 KB - MD5: 5a70d0b26b2ee9a4bf5358a49aab9618
Documentation
File manifest for disc 1
Plain Text - 2.7 KB - MD5: ca6e249ba8435f664c237c7de1202f95
Documentation
File manifest for disc 3
Plain Text - 931 B - MD5: 69ed20efabf74ddaa3a60c67b23aa9c2
Documentation
File manifest for disc 2
Apr 3, 2025
Vanroy, Bram, 2025, "Abstract Meaning Representation 3.0 - Machine Translations", https://hdl.handle.net/11272.1/AB2/TKRDFD, Abacus Data Network, V1
Abstract Introduction Abstract Meaning Representation 3.0 - Machine Translations was developed by the Center for Computational Linguistics at KU Leuven in the HORIZON2020 project SignON. It is an automatic translation of a subset of sentences from Abstract Meaning Representation...
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Optical Disc Image - 138.1 MB - MD5: 808c0ea3fd032deddc567b7b5db8ce48
Data
ISO disc image containing all documentation and data
Plain Text - 6.1 KB - MD5: 60655c1a9b133e7fe1e1a6ec07dc3116
Documentation
File manifest
Apr 3, 2025
Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher, 2025, "AIDA Scenario 3 Practice Topic Source Data and Annotation", https://hdl.handle.net/11272.1/AB2/KAFV5Q, Abacus Data Network, V1
Abstract Introduction AIDA Scenario 3 Practice Topic Source Data and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of English, Russian and Spanish web documents (text, video, image) and annotations. The DARPA AIDA (Active Interpretation of Disp...
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Optical Disc Image - 1.4 GB - MD5: f03f3a4db3433e4b7021abe6121eeeee
Data
ISO disc image containing all documentation and data
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact Abacus Data Network Support

Abacus Data Network Support

Please fill this out to prove you are not a robot.

+ =