Skip to main content
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 50 of 1,855 Results
Aug 19, 2025
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Delgado, Dana; Arrigo, Michael, 2025, "LoReHLT Uzbek Representative Language Pack", https://hdl.handle.net/11272.1/AB2/VM5TBL, Abacus Data Network, V1
Abstract Introduction LoReHLT Uzbek Representative Language Pack consists of Uzbek monolingual text, Uzbek-English parallel text, annotations, audio recordings, supplemental resources and related software tools developed by the Linguistic Data Consortium for LoReHLT, a companion...
Plain Text - 33.9 KB - MD5: 22cae8add1bcb9e81daeba53162d3e2e
Documentation
File manifest for disc 1
Optical Disc Image - 686.4 MB - MD5: dd4b50034c9cc92a62846ee8d8e6d008
Data
ISO disc image containing all documentation and data: disc 1
Plain Text - 2.0 MB - MD5: 083e7f5a9ee8632a5e811d96885af8e3
Documentation
File manifest for disc 2
Optical Disc Image - 681.0 MB - MD5: 56ba066b8af764c241a1efee1cb6443a
Data
ISO disc image containing all documentation and data: disc 2
Plain Text - 119.4 KB - MD5: e0cbcb1a6e63cfddd938244b902e73ae
Documentation
File manifest for disc 3
Optical Disc Image - 555.7 MB - MD5: 24899b86bd1140a1951970e3c5b53034
Data
ISO disc image containing all documentation and data: disc 3
Plain Text - 255 B - MD5: 827743ee6dc05bb7b5755b9c1487f13c
Documentation
File manifest for disc 4
Optical Disc Image - 639.5 MB - MD5: fb1d2f5b04b311f74fa46b634766ce6e
Data
ISO disc image containing all documentation and data: disc 4
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Aug 18, 2025
Peng, Weiming; Zhao, Min; He, Jing; Song, Yuchen; Song, Tianbao; Guo, Dongdong; Sun, Jingbo; Zhu, Shuqin; Zhang, Yinbin; Wei, Zuntian; Hu, Jiajia; Song, Jihua; Sui, Zhifang; Wang, Ning, 2025, "Chinese Sentence Pattern Structure Treebank", https://hdl.handle.net/11272.1/AB2/QZUMNU, Abacus Data Network, V1
Abstract Introduction Chinese Sentence Pattern Structure Treebank (the SPS Treebank) was developed at Beijing Normal University and Peking University. It contains 5,016 sentences and 119,627 tokens syntactically annotated following the concept of sentence constituent analysis whi...
Plain Text - 1.6 KB - MD5: 34fde916fb009bc0080f31950cc6a1ab
Documentation
File manifest
Optical Disc Image - 10.7 MB - MD5: 75ff54d8b4da1d95c5c34ebd93608f85
Data
ISO disc image containing all documentation and data
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Aug 18, 2025
Tracey, Jennifer; Chen, Song; Delgado, Dana; Strassel, Stephanie, 2025, "BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Transcripts and Translations", https://hdl.handle.net/11272.1/AB2/LGXOHL, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Transcripts and Translations was developed by the Linguistic Data Consortium (LDC) and consists of transcripts and their corresponding English translations for 93 hours of conversational telephone speech...
Plain Text - 33.2 KB - MD5: dd94244699a05159b3c99f59b75cfd5f
Documentation
File manifest
Optical Disc Image - 41.4 MB - MD5: e6c63e89b5f7d71916fa325bb65c554d
Data
ISO disc image containing all documentation and data
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Aug 14, 2025
Arrigo, Michael; Delgado, Dana; Strassel, Stephanie; Graff, David, 2025, "IWSLT 2022-2023 Shared Task Training, Development and Test Set", https://hdl.handle.net/11272.1/AB2/ONUJ54, Abacus Data Network, V1
Abstract Introduction IWSLT 2022 - 2023 Shared Task Training, Development and Test Set was developed by the Linguistic Data Consortium (LDC). It contains 210 hours of Tunisian Arabic conversational telephone speech, transcripts and their English translations covering 175 hours of...
Plain Text - 218.8 KB - MD5: 0d75f120af3b88f1180df5f2dfe6346c
Documentation
File manifest for disc 1
Optical Disc Image - 4.3 GB - MD5: ad7314c945f61b989c11b9ac6697b6ac
Data
ISO disc image containing all documentation and data: disc 1
Plain Text - 204.8 KB - MD5: 4feb59006445543997ee5a3450f62a62
Documentation
File manifest for disc 2
Optical Disc Image - 3.3 GB - MD5: 823f98962b928ebcff977225a0f33fc0
Data
ISO disc image containing all documentation and data: disc 2
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Aug 14, 2025
Cieri, Christopher; Fiumara, James; Walker, Kevin; Liberman, Mark; Ryant, Neville, 2025, "AnnoDIFP Session Audio and Transcripts", https://hdl.handle.net/11272.1/AB2/OGBCJ9, Abacus Data Network, V1
Abstract Introduction AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the Florida Institute of Technology (FIT), and the University of New Haven (UNH) to support algorith...
Plain Text - 159.7 KB - MD5: a0fb25c92d6550897ecf4973c9d2eabb
Documentation
File manifest
Markdown Text - 3.1 KB - MD5: 891064c78a8e46a2f9922b793aafa160
Documentation
Instructions on how to access LDC data via UBC's Teamshare service (Markdown / ASCII text)
Adobe PDF - 31.2 KB - MD5: 2a043207829f9ab259df770590941165
Documentation
Instructions on how to access LDC data via UBC's Teamshare service
Aug 14, 2025
Tracey, Jennifer; Graff, David; Chen, Song; Strassel, Stephanie, 2025, "BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio", https://hdl.handle.net/11272.1/AB2/1BGPSO, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio was developed by the Linguistic Data Consortium (LDC) and consists of approximately 93 hours of speech from 236 unscripted telephone conversations between native speakers of the Mandarin Chinese di...
Plain Text - 9.6 KB - MD5: e0fa130b05b8ef250a2acd001a272d26
Documentation
File manifest
Optical Disc Image - 3.9 GB - MD5: 6504896b88df7c7b8d1eaa09c8761f24
Data
ISO disc image containing all documentation and data
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Jul 23, 2025
Kroch, Anthony; Santorini, Beatrice; Taylor, Ann; Diertani, Ariel, 2025, "Penn Parsed Corpora of Historical English Second Release", https://hdl.handle.net/11272.1/AB2/E4NMWX, Abacus Data Network, V1
Abstract Introduction Penn Parsed Corpora of Historical English Second Release was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the Firs...
Plain Text - 124.3 KB - MD5: df9c3a39a9ea7706a70e8bdacc7874ea
Documentation
File manifest
Optical Disc Image - 232.2 MB - MD5: 73478f7463591442b50fab50e4d79cc6
Data
ISO disc image containing all documentation and data
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Jun 9, 2025
Bekkozhanova, Gulnar; Bills, Aric; Chouder, Sarra; Jaralve, Vanessa; Corey, Cassian; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Kazi, Michael; Lam, Julie; Le, Hanh; Malyska, Nicolas; Marcucci, Giorgia; Marvi, Sarah; McConnell, Sara; Melot, Jennifer; Mensch, Alyssa; Morrison, Michelle; Paget, Shelley; Ramizo, Katerina; Richardson, Frederick; Roberts, Annette; Rubino, Carl; Sarseke, Gulnar; Taubayev, Zharas, 2025, "MATERIAL Kazakh-English Language Pack", https://hdl.handle.net/11272.1/AB2/5G61UB, Abacus Data Network, V1
Abstract Introduction MATERIAL Kazakh-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 57 hours of K...
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Optical Disc Image - 9.0 GB - MD5: 368db6e6280771ca15d57d25f32b7c35
Data
ISO disc image containing all documentation and data
Plain Text - 225.1 KB - MD5: 3b3588fc37a241f870756de3dcc14bcc
Documentation
File manifest
Apr 29, 2025
Greenberg, Craig; Sadjadi, Omid; Graff, David; Walker, Kevin; Jones, Karen; Caruso, Christopher; Strassel, Stephanie; Wright, Jonathan, 2025, "2015 NIST Language Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/TPVLOA, Abacus Data Network, V1
Abstract Introduction 2015 NIST Language Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and the National Institute of Standards and Technology (NIST). It contains the evaluation test set for the 2015 NIST Language Recognition Evaluation, app...
Markdown Text - 3.1 KB - MD5: 891064c78a8e46a2f9922b793aafa160
Documentation
Instructions on how to access LDC data via UBC's Teamshare service (Markdown / ASCII text)
Adobe PDF - 31.2 KB - MD5: 2a043207829f9ab259df770590941165
Documentation
Instructions on how to access LDC data via UBC's Teamshare service
Apr 29, 2025
Chen, Song; Mott, Justin; Strassel, Stephanie, 2025, "DEFT Spanish Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/WMSO8E, Abacus Data Network, V1
Abstract Introduction DEFT Spanish Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 158 Spanish discussion forum and newswire documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filtering of...
Optical Disc Image - 23.2 MB - MD5: 06ea6b3331938ae5191eb765a0a133e1
Data
ISO disc image containing all documentation and data
Plain Text - 26.8 KB - MD5: da4eb003789c09c742dde08c99ac5c28
Documentation
File manifest
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Apr 29, 2025
Zhang, Xiao; Zhang, Ling; Dang, Tian; Feng, Yuanzhao; Ji, Yujing; Jiang, Xiaohui; Kang, Zhewen; Lu, Yan; Nie, Wen; Ren, Hanyu; Wang, Canjun; Wang, Jiayi; Wang, Yu; Wu, Chen; Wu, Mei; Xu, Tingting; Yang, Ruhai; Zhao, Kai; Zhao, Ran; Zhou, Quanjie; Zhu, Lei, 2025, "The Xi’an Multi-Language Learner Corpus", https://hdl.handle.net/11272.1/AB2/KEPEYK, Abacus Data Network, V1
Abstract Introduction The Xi’an Multi-Language Learner Corpus was developed by Xi'an International Studies University (XISU). It is comprised of 526 argumentative essays in 15 languages by Chinese L1 university students studying second languages, along with student metadata and w...
Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea
Documentation
Working with ISO disc images
Optical Disc Image - 4.0 MB - MD5: 1a62577f66c1a9312e4d3f0bd98dc9e2
Data
ISO disc image containing all documentation and data
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact Abacus Data Network Support

Abacus Data Network Support

Please fill this out to prove you are not a robot.

+ =