1 to 50 of 1,855 Results
Aug 19, 2025
Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Delgado, Dana; Arrigo, Michael, 2025, "LoReHLT Uzbek Representative Language Pack", https://hdl.handle.net/11272.1/AB2/VM5TBL, Abacus Data Network, V1
Abstract Introduction LoReHLT Uzbek Representative Language Pack consists of Uzbek monolingual text, Uzbek-English parallel text, annotations, audio recordings, supplemental resources and related software tools developed by the Linguistic Data Consortium for LoReHLT, a companion... |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Plain Text - 33.9 KB -
MD5: 22cae8add1bcb9e81daeba53162d3e2e
File manifest for disc 1 |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Optical Disc Image - 686.4 MB -
MD5: dd4b50034c9cc92a62846ee8d8e6d008
ISO disc image containing all documentation and data: disc 1 |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Plain Text - 2.0 MB -
MD5: 083e7f5a9ee8632a5e811d96885af8e3
File manifest for disc 2 |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Optical Disc Image - 681.0 MB -
MD5: 56ba066b8af764c241a1efee1cb6443a
ISO disc image containing all documentation and data: disc 2 |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Plain Text - 119.4 KB -
MD5: e0cbcb1a6e63cfddd938244b902e73ae
File manifest for disc 3 |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Optical Disc Image - 555.7 MB -
MD5: 24899b86bd1140a1951970e3c5b53034
ISO disc image containing all documentation and data: disc 3 |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Plain Text - 255 B -
MD5: 827743ee6dc05bb7b5755b9c1487f13c
File manifest for disc 4 |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Optical Disc Image - 639.5 MB -
MD5: fb1d2f5b04b311f74fa46b634766ce6e
ISO disc image containing all documentation and data: disc 4 |
Aug 19, 2025 -
LoReHLT Uzbek Representative Language Pack
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Aug 18, 2025
Peng, Weiming; Zhao, Min; He, Jing; Song, Yuchen; Song, Tianbao; Guo, Dongdong; Sun, Jingbo; Zhu, Shuqin; Zhang, Yinbin; Wei, Zuntian; Hu, Jiajia; Song, Jihua; Sui, Zhifang; Wang, Ning, 2025, "Chinese Sentence Pattern Structure Treebank", https://hdl.handle.net/11272.1/AB2/QZUMNU, Abacus Data Network, V1
Abstract Introduction Chinese Sentence Pattern Structure Treebank (the SPS Treebank) was developed at Beijing Normal University and Peking University. It contains 5,016 sentences and 119,627 tokens syntactically annotated following the concept of sentence constituent analysis whi... |
Aug 18, 2025 -
Chinese Sentence Pattern Structure Treebank
Plain Text - 1.6 KB -
MD5: 34fde916fb009bc0080f31950cc6a1ab
File manifest |
Aug 18, 2025 -
Chinese Sentence Pattern Structure Treebank
Optical Disc Image - 10.7 MB -
MD5: 75ff54d8b4da1d95c5c34ebd93608f85
ISO disc image containing all documentation and data |
Aug 18, 2025 -
Chinese Sentence Pattern Structure Treebank
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Aug 18, 2025
Tracey, Jennifer; Chen, Song; Delgado, Dana; Strassel, Stephanie, 2025, "BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Transcripts and Translations", https://hdl.handle.net/11272.1/AB2/LGXOHL, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Transcripts and Translations was developed by the Linguistic Data Consortium (LDC) and consists of transcripts and their corresponding English translations for 93 hours of conversational telephone speech... |
Plain Text - 33.2 KB -
MD5: dd94244699a05159b3c99f59b75cfd5f
File manifest |
Optical Disc Image - 41.4 MB -
MD5: e6c63e89b5f7d71916fa325bb65c554d
ISO disc image containing all documentation and data |
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Aug 14, 2025
Arrigo, Michael; Delgado, Dana; Strassel, Stephanie; Graff, David, 2025, "IWSLT 2022-2023 Shared Task Training, Development and Test Set", https://hdl.handle.net/11272.1/AB2/ONUJ54, Abacus Data Network, V1
Abstract Introduction IWSLT 2022 - 2023 Shared Task Training, Development and Test Set was developed by the Linguistic Data Consortium (LDC). It contains 210 hours of Tunisian Arabic conversational telephone speech, transcripts and their English translations covering 175 hours of... |
Plain Text - 218.8 KB -
MD5: 0d75f120af3b88f1180df5f2dfe6346c
File manifest for disc 1 |
Optical Disc Image - 4.3 GB -
MD5: ad7314c945f61b989c11b9ac6697b6ac
ISO disc image containing all documentation and data: disc 1 |
Plain Text - 204.8 KB -
MD5: 4feb59006445543997ee5a3450f62a62
File manifest for disc 2 |
Optical Disc Image - 3.3 GB -
MD5: 823f98962b928ebcff977225a0f33fc0
ISO disc image containing all documentation and data: disc 2 |
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Aug 14, 2025
Cieri, Christopher; Fiumara, James; Walker, Kevin; Liberman, Mark; Ryant, Neville, 2025, "AnnoDIFP Session Audio and Transcripts", https://hdl.handle.net/11272.1/AB2/OGBCJ9, Abacus Data Network, V1
Abstract Introduction AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the Florida Institute of Technology (FIT), and the University of New Haven (UNH) to support algorith... |
Aug 14, 2025 -
AnnoDIFP Session Audio and Transcripts
Plain Text - 159.7 KB -
MD5: a0fb25c92d6550897ecf4973c9d2eabb
File manifest |
Aug 14, 2025 -
AnnoDIFP Session Audio and Transcripts
Markdown Text - 3.1 KB -
MD5: 891064c78a8e46a2f9922b793aafa160
Instructions on how to access LDC data via UBC's Teamshare service (Markdown / ASCII text) |
Aug 14, 2025 -
AnnoDIFP Session Audio and Transcripts
Adobe PDF - 31.2 KB -
MD5: 2a043207829f9ab259df770590941165
Instructions on how to access LDC data via UBC's Teamshare service |
Aug 14, 2025
Tracey, Jennifer; Graff, David; Chen, Song; Strassel, Stephanie, 2025, "BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio", https://hdl.handle.net/11272.1/AB2/1BGPSO, Abacus Data Network, V1
Abstract Introduction BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio was developed by the Linguistic Data Consortium (LDC) and consists of approximately 93 hours of speech from 236 unscripted telephone conversations between native speakers of the Mandarin Chinese di... |
Aug 14, 2025 -
BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio
Plain Text - 9.6 KB -
MD5: e0fa130b05b8ef250a2acd001a272d26
File manifest |
Aug 14, 2025 -
BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio
Optical Disc Image - 3.9 GB -
MD5: 6504896b88df7c7b8d1eaa09c8761f24
ISO disc image containing all documentation and data |
Aug 14, 2025 -
BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Jul 23, 2025
Kroch, Anthony; Santorini, Beatrice; Taylor, Ann; Diertani, Ariel, 2025, "Penn Parsed Corpora of Historical English Second Release", https://hdl.handle.net/11272.1/AB2/E4NMWX, Abacus Data Network, V1
Abstract Introduction Penn Parsed Corpora of Historical English Second Release was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the Firs... |
Jul 23, 2025 -
Penn Parsed Corpora of Historical English Second Release
Plain Text - 124.3 KB -
MD5: df9c3a39a9ea7706a70e8bdacc7874ea
File manifest |
Jul 23, 2025 -
Penn Parsed Corpora of Historical English Second Release
Optical Disc Image - 232.2 MB -
MD5: 73478f7463591442b50fab50e4d79cc6
ISO disc image containing all documentation and data |
Jul 23, 2025 -
Penn Parsed Corpora of Historical English Second Release
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Jun 9, 2025
Bekkozhanova, Gulnar; Bills, Aric; Chouder, Sarra; Jaralve, Vanessa; Corey, Cassian; Dubinski, Eyal; Ellis, Corinna; Gibby, Paul; Kazi, Michael; Lam, Julie; Le, Hanh; Malyska, Nicolas; Marcucci, Giorgia; Marvi, Sarah; McConnell, Sara; Melot, Jennifer; Mensch, Alyssa; Morrison, Michelle; Paget, Shelley; Ramizo, Katerina; Richardson, Frederick; Roberts, Annette; Rubino, Carl; Sarseke, Gulnar; Taubayev, Zharas, 2025, "MATERIAL Kazakh-English Language Pack", https://hdl.handle.net/11272.1/AB2/5G61UB, Abacus Data Network, V1
Abstract Introduction MATERIAL Kazakh-English Language Pack was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 57 hours of K... |
Jun 9, 2025 -
MATERIAL Kazakh-English Language Pack
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Jun 9, 2025 -
MATERIAL Kazakh-English Language Pack
Optical Disc Image - 9.0 GB -
MD5: 368db6e6280771ca15d57d25f32b7c35
ISO disc image containing all documentation and data |
Jun 9, 2025 -
MATERIAL Kazakh-English Language Pack
Plain Text - 225.1 KB -
MD5: 3b3588fc37a241f870756de3dcc14bcc
File manifest |
Apr 29, 2025
Greenberg, Craig; Sadjadi, Omid; Graff, David; Walker, Kevin; Jones, Karen; Caruso, Christopher; Strassel, Stephanie; Wright, Jonathan, 2025, "2015 NIST Language Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/TPVLOA, Abacus Data Network, V1
Abstract Introduction 2015 NIST Language Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and the National Institute of Standards and Technology (NIST). It contains the evaluation test set for the 2015 NIST Language Recognition Evaluation, app... |
Apr 29, 2025 -
2015 NIST Language Recognition Evaluation Test Set
Markdown Text - 3.1 KB -
MD5: 891064c78a8e46a2f9922b793aafa160
Instructions on how to access LDC data via UBC's Teamshare service (Markdown / ASCII text) |
Apr 29, 2025 -
2015 NIST Language Recognition Evaluation Test Set
Adobe PDF - 31.2 KB -
MD5: 2a043207829f9ab259df770590941165
Instructions on how to access LDC data via UBC's Teamshare service |
Apr 29, 2025
Chen, Song; Mott, Justin; Strassel, Stephanie, 2025, "DEFT Spanish Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/WMSO8E, Abacus Data Network, V1
Abstract Introduction DEFT Spanish Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 158 Spanish discussion forum and newswire documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filtering of... |
Apr 29, 2025 -
DEFT Spanish Light and Rich ERE Annotation
Optical Disc Image - 23.2 MB -
MD5: 06ea6b3331938ae5191eb765a0a133e1
ISO disc image containing all documentation and data |
Apr 29, 2025 -
DEFT Spanish Light and Rich ERE Annotation
Plain Text - 26.8 KB -
MD5: da4eb003789c09c742dde08c99ac5c28
File manifest |
Apr 29, 2025 -
DEFT Spanish Light and Rich ERE Annotation
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Apr 29, 2025
Zhang, Xiao; Zhang, Ling; Dang, Tian; Feng, Yuanzhao; Ji, Yujing; Jiang, Xiaohui; Kang, Zhewen; Lu, Yan; Nie, Wen; Ren, Hanyu; Wang, Canjun; Wang, Jiayi; Wang, Yu; Wu, Chen; Wu, Mei; Xu, Tingting; Yang, Ruhai; Zhao, Kai; Zhao, Ran; Zhou, Quanjie; Zhu, Lei, 2025, "The Xi’an Multi-Language Learner Corpus", https://hdl.handle.net/11272.1/AB2/KEPEYK, Abacus Data Network, V1
Abstract Introduction The Xi’an Multi-Language Learner Corpus was developed by Xi'an International Studies University (XISU). It is comprised of 526 argumentative essays in 15 languages by Chinese L1 university students studying second languages, along with student metadata and w... |
Apr 29, 2025 -
The Xi’an Multi-Language Learner Corpus
Plain Text - 1.3 KB -
MD5: 4d4231d07ac669e105f71e602457efea
Working with ISO disc images |
Apr 29, 2025 -
The Xi’an Multi-Language Learner Corpus
Optical Disc Image - 4.0 MB -
MD5: 1a62577f66c1a9312e4d3f0bd98dc9e2
ISO disc image containing all documentation and data |