Linguistic Data Consortium

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1,601 to 1,650 of 1,819 Results

LDC2015T20_File_Manifest.txt Aug 29, 2020 - ACE 2007 Spanish DevTest - Pilot Evaluation Plain Text - 190.7 KB - MD5: 4304c24a9a70b2dbca199dcb85f138db Documentation File manifest
LDC2017T10.iso Aug 29, 2020 - Abstract Meaning Representation (AMR) Annotation Release 2.0 Optical Disc Image - 150.5 MB - MD5: 265110a665aaa921442dc26016e2c56b Data ISO disc image including all documentation and data
Working_with_ISO_Images.txt Aug 29, 2020 - Abstract Meaning Representation (AMR) Annotation Release 2.0 Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea Documentation How to work with ISO disc images
LDC2017T10_File_Manifest.txt Aug 29, 2020 - Abstract Meaning Representation (AMR) Annotation Release 2.0 Plain Text - 252.7 KB - MD5: e144e332e6dcbb5b806491a5a11a1464 Documentation File manifest
LDC2019S20_d2.iso Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set Optical Disc Image - 2.9 GB - MD5: 68882d5a34b2c486a832b93ae5039b53 Data ISO disc image including all documentation and data: disc 2
LDC2019S20_d3.iso Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set Optical Disc Image - 2.2 GB - MD5: e38c441c11e6e8f1e49b8be2954470cc Data ISO disc image including all documentation and data: disc 3
LDC2019S20_d1.iso Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set Optical Disc Image - 4.1 GB - MD5: 29caa769b9e1d0e0c42f37e837e60bda Data ISO disc image including all documentation and data: disc 1
LDC2019S20_d3_File_Manifest.txt Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set Plain Text - 265.1 KB - MD5: 18f3a3e8572f651af5b2816d2cc1f5be Documentation File manifest for disc 3
LDC2019S20_d1_File_Manifest.txt Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set Plain Text - 348.8 KB - MD5: f43ce2999843eb839b647665e52c34d5 Documentation File manifest for disc 1
LDC2019S20_d2_File_Manifest.txt Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set Plain Text - 290.7 KB - MD5: 162aa0bee328b26fe12936b688f3b670 Documentation File manifest for disc 2
Working_with_ISO_Images.txt Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea Documentation Working with ISO disc images
Working_with_ISO_Images.txt Aug 29, 2020 - 2015-2016 CoNLL Shared Task Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea Documentation Working with ISO disc images
LDC2017T13.iso Aug 29, 2020 - 2015-2016 CoNLL Shared Task Optical Disc Image - 498.4 MB - MD5: 5197a26f203ddcc6c729d5315823927b Data ISO disc image including all documentation and data
LDC2017T13_File_Manifest.txt Aug 29, 2020 - 2015-2016 CoNLL Shared Task Plain Text - 266.8 KB - MD5: eb0be5127a074c62e5403abb91cf5dc7 Documentation File manifest
LDC2018S06_d2.iso Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Optical Disc Image - 4.1 GB - MD5: da528d4d8f4222f85e57048afe1cde81 Data Disk 2 - ISO disc image including all documentation and data
LDC2018S06_d1.iso Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Optical Disc Image - 4.1 GB - MD5: 314d97660ac51a3cea23f3503d9b577f Data Disk 1 - ISO disc image including all documentation and data
LDC2018S06_d4.iso Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Optical Disc Image - 3.3 GB - MD5: 28a73d2dccbcee5921e56c76c8db00d6 Data Disk 4 - ISO disc image including all documentation and data
LDC2018S06_d3.iso Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Optical Disc Image - 3.7 GB - MD5: c84481ebed387968fae9af35e2d6cead Data Disk 3 - ISO disc image including all documentation and data
Working_with_ISO_Images.txt Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea Documentation How to work with ISO disc images
LDC2018S06_d3_File_Manifest.txt Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Plain Text - 674.1 KB - MD5: a8790ce86318c28cdd1c72aba2653341 Documentation File manifest for disk 3
LDC2018S06_d1_File_Manifest.txt Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Plain Text - 69.2 KB - MD5: eb5b5ea83f35c177566569efbedb435d Documentation File manifest for disk 1
LDC2018S06_d4_File_Manifest.txt Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Plain Text - 654.6 KB - MD5: 12ca4d05e0a3fe52a1fcd77954b6ac28 Documentation File manifest for disk 4
LDC2018S06_d2_File_Manifest.txt Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set Plain Text - 10.8 KB - MD5: db05735329fdf1b9d3fe0dc21fe04293 Documentation File manifest for disk 2
Working_with_ISO_Images.txt Aug 29, 2020 - 2006 CoNLL Shared Task - Ten Languages Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea Documentation Working with ISO disc images
LDC2015T11.iso Aug 29, 2020 - 2006 CoNLL Shared Task - Ten Languages Optical Disc Image - 85.6 MB - MD5: 688b69a249912bea886dca37ddd4130a Data ISO disc image including all documentation and data
LDC2015T11_File_Manifest.txt Aug 29, 2020 - 2006 CoNLL Shared Task - Ten Languages Plain Text - 10.5 KB - MD5: fd3b07224d744a067beb23e2c2f1cdf3 Documentation File manifest
Working_with_ISO_Images.txt Aug 29, 2020 - 2006 CoNLL Shared Task - Arabic & Czech Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea Documentation Working with ISO disc images
LDC2015T12.iso Aug 29, 2020 - 2006 CoNLL Shared Task - Arabic & Czech Optical Disc Image - 61.8 MB - MD5: 7c44f59b456c8ae015663552228f8186 Data ISO disc image including all documentation and data
LCD2015T12_File_Manifest.txt Aug 29, 2020 - 2006 CoNLL Shared Task - Arabic & Czech Plain Text - 1.8 KB - MD5: 52153ee079c833748806472fd9b01df7 Documentation File manifest
Nautilus Speaker Characterization Dec 17, 2019 Gallardo, Laura Fernández, 2019, "Nautilus Speaker Characterization", https://hdl.handle.net/11272.1/AB2/JR6VMZ, Abacus Data Network, V1 Nautilus Speaker Characterization was developed at the Technical University of Berlin and is comprised of approximately 155 hours of conversational speech from 300 German speakers aged 18 to 35 years (126 males and 174 females) with no marked dialect or accent, recorded in an aco...
TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 Nov 15, 2019 Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017", https://hdl.handle.net/11272.1/AB2/KQWRTL, Abacus Data Network, V1 TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 was developed by the Linguistic Data Consortium (LDC) and contains Chinese, English and Spanish data produced in support of the TAC KBP Cold Start evaluation track conducted from 2012 to 2017. This includes source docum...
DEFT English Committed Belief Annotation Nov 15, 2019 Tracey, Jennifer; Arrigo, Michael; Strassel, Stephanie, 2019, "DEFT English Committed Belief Annotation", https://hdl.handle.net/11272.1/AB2/WY5NZN, Abacus Data Network, V1 DEFT English Committed Belief Annotation was developed by the Linguistic Data Consortium (LDC) and consists of approximately 950,000 words of English discussion forum text annotated for “committed belief,” which marks the level of commitment displayed by the author to the truth o...
CALLFRIEND American English-Non-Southern Dialect Second Edition Nov 15, 2019 Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND American English-Non-Southern Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/OBLYDI, Abacus Data Network, V1 CALLFRIEND American English-Non-Southern Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of non-Southern dialects of American English. This second edi...
Polish Speech Database Oct 15, 2019 Szwelnik, Tomasz; Kawalec, Jacek; Gutowska, Dorota, 2019, "Polish Speech Database", https://hdl.handle.net/11272.1/AB2/GNGZEI, Abacus Data Network, V1 Polish Speech Database was developed by VoiceLab. It consists of 263,424 utterances of Polish speech data from 200 speakers, totaling approximately 280 hours, and corresponding transcripts. Data collection was performed in Poland. Speakers were asked to record themselves for at l...
2016 NIST Speaker Recognition Evaluation Test Set Oct 15, 2019 Greenberg, Craig; Sadjadi, Omid; Kheyrkhah, Timothee; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Graff, David, 2019, "2016 NIST Speaker Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/WJ2G5L, Abacus Data Network, V1 2016 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 340 hours of short segments of Tagalog, Cantonese, Cebuano and Mandarin telephone speech us...
BOLT English Treebank - Discussion Forum Oct 15, 2019 Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2019, "BOLT English Treebank - Discussion Forum", https://hdl.handle.net/11272.1/AB2/9OA0DB, Abacus Data Network, V1 BOLT English Treebank - Discussion Forum was developed by the Linguistic Data Consortium (LDC) and consists of English web discussion forum data with part-of-speech and syntactic structure annotations. The DARPA BOLT (Broad Operational Language Translation) program developed mach...
CALLFRIEND Canadian French Second Edition Sep 16, 2019 Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND Canadian French Second Edition", https://hdl.handle.net/11272.1/AB2/PPNHVC, Abacus Data Network, V1 CALLFRIEND Canadian French Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Canadian French. This second edition updates the audio files to wav format, simp...
BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training Sep 16, 2019 Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2019, "BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training", https://hdl.handle.net/11272.1/AB2/TJO8RI, Abacus Data Network, V1 BOLT Chinese-English Word Alignment and Tagging – SMS/Chat Training was developed by the Linguistic Data Consortium (LDC) and consists of 388,027 words of Chinese and English parallel text enhanced with linguistic tags to indicate word relations. The DARPA BOLT (Broad Operational...
Machine Reading Phase 1 NFL Scoring Training Data Sep 16, 2019 Simpson, Heather; Strassel, Stephanie; Wright, Jonathan; Griffitt, Kira, 2019, "Machine Reading Phase 1 NFL Scoring Training Data", https://hdl.handle.net/11272.1/AB2/AZSUUC, Abacus Data Network, V1 Machine Reading Phase 1 NFL Scoring Training Data was developed by the Linguistic Data Consortium (LDC) and contains 110 US NFL (National Football League) scoring source documents and 110 standoff annotation files used in the DARPA (Defense Advanced Research Projects Agency) Mach...
Multi-Language Conversational Telephone Speech 2011 -- East Asian Aug 15, 2019 Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2019, "Multi-Language Conversational Telephone Speech 2011 -- East Asian", https://hdl.handle.net/11272.1/AB2/3MKZES, Abacus Data Network, V1 Multi-Language Conversational Telephone Speech 2011 – East Asian was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 19 hours of telephone speech in two distinct languages of East Asia: Thai and Lao. The data were collected primarily to support...
IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c Aug 15, 2019 Adams, Nikki; Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kaiser-Schatzlein, Alice; Kazi, Michael; Malyska, Nicolas; Melot, Jennifer; Onaka, Akiko; Paget, Shelley; Ray, Jessica; Richardson, Fred; Rytting, Anton; Shen, Sinney, 2019, "IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c", https://hdl.handle.net/11272.1/AB2/39RDNJ, Abacus Data Network, V1 ARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 207 hours of Igbo conversational and scripted telephone speech collected in 2014 and 2015 along wit...
TAC KBP Evaluation Source Corpora 2016-2017 Aug 15, 2019 Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Evaluation Source Corpora 2016-2017", https://hdl.handle.net/11272.1/AB2/JDNLHX, Abacus Data Network, V1 TAC KBP Evaluation Source Corpora 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains the 180,003 Chinese, English and Spanish source documents used in support of all TAC KBP evaluation tracks conducted in 2016 and 2017. Text Analysis Conference (TAC) is...
Corpus of Conversational Persian Transcripts Aug 15, 2019 Mohammadi, Ariana Negar, 2019, "Corpus of Conversational Persian Transcripts", https://hdl.handle.net/11272.1/AB2/VPL800, Abacus Data Network, V1 Corpus of Conversational Persian Transcripts consists of transcripts from approximately 20 hours of naturally occurring informal conversations in the Tehrani dialect of Iranian Persian. The corresponding speech is not included in this release. Data This corpus is extracted from 1...
First DIHARD Challenge Development - Eight Sources Jul 19, 2019 Linguistic Data Consortium, 2019, "First DIHARD Challenge Development - Eight Sources", https://hdl.handle.net/11272.1/AB2/XA6BRY, Abacus Data Network, V1 First DIHARD Challenge Development - Eight Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 17 hours of English and Chinese speech data along with corresponding annotations used in support of the First DIHARD Challenge. The First DIHARD Cha...
First DIHARD Challenge Evaluation - Nine Sources Jul 15, 2019 Linguistic Data Consortium, 2019, "First DIHARD Challenge Evaluation - Nine Sources", https://hdl.handle.net/11272.1/AB2/HGTUHY, Abacus Data Network, V1 First DIHARD Challenge Evaluation - Nine Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 18 hours of English and Chinese speech data along with corresponding annotations used in support of the First DIHARD Challenge. The First DIHARD Chall...
Phrase Detectives Corpus Version 2 Jul 15, 2019 Chamberlain, Jon; Paun, Silviu; Yu, Juntao; Kruschwitz, Udo; Poesio, Massimo, 2019, "Phrase Detectives Corpus Version 2", https://hdl.handle.net/11272.1/AB2/6GWBA8, Abacus Data Network, V1 Phrase Detectives Corpus Version 2 was developed by the School of Computer Science and Electronic Engineering at the University of Essex and consists of approximately 407,000 tokens across 537 documents anaphorically-annotated by the Phrase Detectives Game, an online interactive...
The DKU-JNU-EMA Electromagnetic Articulography Database Jul 15, 2019 Qin, Xiaoyi; Liu, Xinzhong; Cai, Zexin; Li, Ming, 2019, "The DKU-JNU-EMA Electromagnetic Articulography Database", https://hdl.handle.net/11272.1/AB2/D9PQFH, Abacus Data Network, V1 The DKU-JNU-EMA Electromagnetic Articulography Database was developed by Duke Kunshan University and Jinan University and contains approximately 10 hours of articulography and speech data in Mandarin, Cantonese, Hakka, and Teochew Chinese from two to seven native speakers for eac...
First DIHARD Challenge Evaluation - SEEDLingS Jul 15, 2019 Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2019, "First DIHARD Challenge Evaluation - SEEDLingS", https://hdl.handle.net/11272.1/AB2/XH4KVV, Abacus Data Network, V1 First DIHARD Challenge Evaluation - SEEDLingS was developed by Duke University and the Linguistic Data Consortium (LDC) and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the First DIHARD Challenge. Th...
DEFT Spanish Committed Belief Annotation Jun 17, 2019 Tracey, Jennifer; Arrigo, Michael; Strassel, Stephanie, 2019, "DEFT Spanish Committed Belief Annotation", https://hdl.handle.net/11272.1/AB2/HWOJGE, Abacus Data Network, V1 DEFT Spanish Committed Belief Annotation was developed by the Linguistic Data Consortium (LDC) and consists of approximately 67,000 tokens of Spanish discussion forum text annotated for "committed belief," which marks the level of commitment displayed by the author to the truth o...
First DIHARD Challenge Development - SEEDLingS Jun 17, 2019 Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2019, "First DIHARD Challenge Development - SEEDLingS", https://hdl.handle.net/11272.1/AB2/KXC76R, Abacus Data Network, V1 First DIHARD Challenge Development - SEEDLingS was developed by Duke University and the Linguistic Data Consortium (LDC) and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the First DIHARD Challenge. T...

LDC2015T20_File_Manifest.txt

Aug 29, 2020 - ACE 2007 Spanish DevTest - Pilot Evaluation

Plain Text - 190.7 KB -

Documentation

File manifest

LDC2017T10.iso

Aug 29, 2020 - Abstract Meaning Representation (AMR) Annotation Release 2.0

Optical Disc Image - 150.5 MB -

Data

ISO disc image including all documentation and data

Working_with_ISO_Images.txt

Aug 29, 2020 - Abstract Meaning Representation (AMR) Annotation Release 2.0

Plain Text - 1.3 KB -

Documentation

How to work with ISO disc images

LDC2017T10_File_Manifest.txt

Aug 29, 2020 - Abstract Meaning Representation (AMR) Annotation Release 2.0

Plain Text - 252.7 KB -

Documentation

File manifest

LDC2019S20_d2.iso

Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set

Optical Disc Image - 2.9 GB -

Data

ISO disc image including all documentation and data: disc 2

LDC2019S20_d3.iso

Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set

Optical Disc Image - 2.2 GB -

Data

ISO disc image including all documentation and data: disc 3

LDC2019S20_d1.iso

Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set

Optical Disc Image - 4.1 GB -

Data

ISO disc image including all documentation and data: disc 1

LDC2019S20_d3_File_Manifest.txt

Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set

Plain Text - 265.1 KB -

Documentation

File manifest for disc 3

LDC2019S20_d1_File_Manifest.txt

Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set

Plain Text - 348.8 KB -

Documentation

File manifest for disc 1

LDC2019S20_d2_File_Manifest.txt

Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set

Plain Text - 290.7 KB -

Documentation

File manifest for disc 2

Working_with_ISO_Images.txt

Aug 29, 2020 - 2016 NIST Speaker Recognition Evaluation Test Set

Plain Text - 1.3 KB -

Documentation

Working with ISO disc images

Working_with_ISO_Images.txt

Aug 29, 2020 - 2015-2016 CoNLL Shared Task

Plain Text - 1.3 KB -

Documentation

Working with ISO disc images

LDC2017T13.iso

Aug 29, 2020 - 2015-2016 CoNLL Shared Task

Optical Disc Image - 498.4 MB -

Data

ISO disc image including all documentation and data

LDC2017T13_File_Manifest.txt

Aug 29, 2020 - 2015-2016 CoNLL Shared Task

Plain Text - 266.8 KB -

Documentation

File manifest

LDC2018S06_d2.iso

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Optical Disc Image - 4.1 GB -

Data

Disk 2 - ISO disc image including all documentation and data

LDC2018S06_d1.iso

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Optical Disc Image - 4.1 GB -

Data

Disk 1 - ISO disc image including all documentation and data

LDC2018S06_d4.iso

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Optical Disc Image - 3.3 GB -

Data

Disk 4 - ISO disc image including all documentation and data

LDC2018S06_d3.iso

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Optical Disc Image - 3.7 GB -

Data

Disk 3 - ISO disc image including all documentation and data

Working_with_ISO_Images.txt

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Plain Text - 1.3 KB -

Documentation

How to work with ISO disc images

LDC2018S06_d3_File_Manifest.txt

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Plain Text - 674.1 KB -

Documentation

File manifest for disk 3

LDC2018S06_d1_File_Manifest.txt

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Plain Text - 69.2 KB -

Documentation

File manifest for disk 1

LDC2018S06_d4_File_Manifest.txt

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Plain Text - 654.6 KB -

Documentation

File manifest for disk 4

LDC2018S06_d2_File_Manifest.txt

Aug 29, 2020 - 2011 NIST Language Recognition Evaluation Test Set

Plain Text - 10.8 KB -

Documentation

File manifest for disk 2

Working_with_ISO_Images.txt

Aug 29, 2020 - 2006 CoNLL Shared Task - Ten Languages

Plain Text - 1.3 KB -

Documentation

Working with ISO disc images

LDC2015T11.iso

Aug 29, 2020 - 2006 CoNLL Shared Task - Ten Languages

Optical Disc Image - 85.6 MB -

Data

ISO disc image including all documentation and data

LDC2015T11_File_Manifest.txt

Aug 29, 2020 - 2006 CoNLL Shared Task - Ten Languages

Plain Text - 10.5 KB -

Documentation

File manifest

Working_with_ISO_Images.txt

Aug 29, 2020 - 2006 CoNLL Shared Task - Arabic & Czech

Plain Text - 1.3 KB -

Documentation

Working with ISO disc images

LDC2015T12.iso

Aug 29, 2020 - 2006 CoNLL Shared Task - Arabic & Czech

Optical Disc Image - 61.8 MB -

Data

ISO disc image including all documentation and data

LCD2015T12_File_Manifest.txt

Aug 29, 2020 - 2006 CoNLL Shared Task - Arabic & Czech

Plain Text - 1.8 KB -

Documentation

File manifest

Nautilus Speaker Characterization

Dec 17, 2019

Gallardo, Laura Fernández, 2019, "Nautilus Speaker Characterization", https://hdl.handle.net/11272.1/AB2/JR6VMZ, Abacus Data Network, V1

Nautilus Speaker Characterization was developed at the Technical University of Berlin and is comprised of approximately 155 hours of conversational speech from 300 German speakers aged 18 to 35 years (126 males and 174 females) with no marked dialect or accent, recorded in an aco...

TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017

Nov 15, 2019

Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017", https://hdl.handle.net/11272.1/AB2/KQWRTL, Abacus Data Network, V1

TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 was developed by the Linguistic Data Consortium (LDC) and contains Chinese, English and Spanish data produced in support of the TAC KBP Cold Start evaluation track conducted from 2012 to 2017. This includes source docum...

DEFT English Committed Belief Annotation

Nov 15, 2019

Tracey, Jennifer; Arrigo, Michael; Strassel, Stephanie, 2019, "DEFT English Committed Belief Annotation", https://hdl.handle.net/11272.1/AB2/WY5NZN, Abacus Data Network, V1

DEFT English Committed Belief Annotation was developed by the Linguistic Data Consortium (LDC) and consists of approximately 950,000 words of English discussion forum text annotated for “committed belief,” which marks the level of commitment displayed by the author to the truth o...

CALLFRIEND American English-Non-Southern Dialect Second Edition

Nov 15, 2019

Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND American English-Non-Southern Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/OBLYDI, Abacus Data Network, V1

CALLFRIEND American English-Non-Southern Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of non-Southern dialects of American English. This second edi...

Polish Speech Database

Oct 15, 2019

Szwelnik, Tomasz; Kawalec, Jacek; Gutowska, Dorota, 2019, "Polish Speech Database", https://hdl.handle.net/11272.1/AB2/GNGZEI, Abacus Data Network, V1

Polish Speech Database was developed by VoiceLab. It consists of 263,424 utterances of Polish speech data from 200 speakers, totaling approximately 280 hours, and corresponding transcripts. Data collection was performed in Poland. Speakers were asked to record themselves for at l...

2016 NIST Speaker Recognition Evaluation Test Set

Oct 15, 2019

Greenberg, Craig; Sadjadi, Omid; Kheyrkhah, Timothee; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Graff, David, 2019, "2016 NIST Speaker Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/WJ2G5L, Abacus Data Network, V1

2016 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 340 hours of short segments of Tagalog, Cantonese, Cebuano and Mandarin telephone speech us...

BOLT English Treebank - Discussion Forum

Oct 15, 2019

Bies, Ann; Mott, Justin; Warner, Colin; Kulick, Seth, 2019, "BOLT English Treebank - Discussion Forum", https://hdl.handle.net/11272.1/AB2/9OA0DB, Abacus Data Network, V1

BOLT English Treebank - Discussion Forum was developed by the Linguistic Data Consortium (LDC) and consists of English web discussion forum data with part-of-speech and syntactic structure annotations. The DARPA BOLT (Broad Operational Language Translation) program developed mach...

CALLFRIEND Canadian French Second Edition

Sep 16, 2019

Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2019, "CALLFRIEND Canadian French Second Edition", https://hdl.handle.net/11272.1/AB2/PPNHVC, Abacus Data Network, V1

CALLFRIEND Canadian French Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Canadian French. This second edition updates the audio files to wav format, simp...

BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training

Sep 16, 2019

Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2019, "BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training", https://hdl.handle.net/11272.1/AB2/TJO8RI, Abacus Data Network, V1

BOLT Chinese-English Word Alignment and Tagging – SMS/Chat Training was developed by the Linguistic Data Consortium (LDC) and consists of 388,027 words of Chinese and English parallel text enhanced with linguistic tags to indicate word relations. The DARPA BOLT (Broad Operational...

Machine Reading Phase 1 NFL Scoring Training Data

Sep 16, 2019

Simpson, Heather; Strassel, Stephanie; Wright, Jonathan; Griffitt, Kira, 2019, "Machine Reading Phase 1 NFL Scoring Training Data", https://hdl.handle.net/11272.1/AB2/AZSUUC, Abacus Data Network, V1

Machine Reading Phase 1 NFL Scoring Training Data was developed by the Linguistic Data Consortium (LDC) and contains 110 US NFL (National Football League) scoring source documents and 110 standoff annotation files used in the DARPA (Defense Advanced Research Projects Agency) Mach...

Multi-Language Conversational Telephone Speech 2011 -- East Asian

Aug 15, 2019

Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie, 2019, "Multi-Language Conversational Telephone Speech 2011 -- East Asian", https://hdl.handle.net/11272.1/AB2/3MKZES, Abacus Data Network, V1

Multi-Language Conversational Telephone Speech 2011 – East Asian was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 19 hours of telephone speech in two distinct languages of East Asia: Thai and Lao. The data were collected primarily to support...

IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c

Aug 15, 2019

Adams, Nikki; Bills, Aric; Conners, Thomas; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kaiser-Schatzlein, Alice; Kazi, Michael; Malyska, Nicolas; Melot, Jennifer; Onaka, Akiko; Paget, Shelley; Ray, Jessica; Richardson, Fred; Rytting, Anton; Shen, Sinney, 2019, "IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c", https://hdl.handle.net/11272.1/AB2/39RDNJ, Abacus Data Network, V1

ARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 207 hours of Igbo conversational and scripted telephone speech collected in 2014 and 2015 along wit...

TAC KBP Evaluation Source Corpora 2016-2017

Aug 15, 2019

Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2019, "TAC KBP Evaluation Source Corpora 2016-2017", https://hdl.handle.net/11272.1/AB2/JDNLHX, Abacus Data Network, V1

TAC KBP Evaluation Source Corpora 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains the 180,003 Chinese, English and Spanish source documents used in support of all TAC KBP evaluation tracks conducted in 2016 and 2017. Text Analysis Conference (TAC) is...

Corpus of Conversational Persian Transcripts

Aug 15, 2019

Mohammadi, Ariana Negar, 2019, "Corpus of Conversational Persian Transcripts", https://hdl.handle.net/11272.1/AB2/VPL800, Abacus Data Network, V1

Corpus of Conversational Persian Transcripts consists of transcripts from approximately 20 hours of naturally occurring informal conversations in the Tehrani dialect of Iranian Persian. The corresponding speech is not included in this release. Data This corpus is extracted from 1...

First DIHARD Challenge Development - Eight Sources

Jul 19, 2019

Linguistic Data Consortium, 2019, "First DIHARD Challenge Development - Eight Sources", https://hdl.handle.net/11272.1/AB2/XA6BRY, Abacus Data Network, V1

First DIHARD Challenge Development - Eight Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 17 hours of English and Chinese speech data along with corresponding annotations used in support of the First DIHARD Challenge. The First DIHARD Cha...

First DIHARD Challenge Evaluation - Nine Sources

Jul 15, 2019

Linguistic Data Consortium, 2019, "First DIHARD Challenge Evaluation - Nine Sources", https://hdl.handle.net/11272.1/AB2/HGTUHY, Abacus Data Network, V1

First DIHARD Challenge Evaluation - Nine Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 18 hours of English and Chinese speech data along with corresponding annotations used in support of the First DIHARD Challenge. The First DIHARD Chall...

Phrase Detectives Corpus Version 2

Jul 15, 2019

Chamberlain, Jon; Paun, Silviu; Yu, Juntao; Kruschwitz, Udo; Poesio, Massimo, 2019, "Phrase Detectives Corpus Version 2", https://hdl.handle.net/11272.1/AB2/6GWBA8, Abacus Data Network, V1

Phrase Detectives Corpus Version 2 was developed by the School of Computer Science and Electronic Engineering at the University of Essex and consists of approximately 407,000 tokens across 537 documents anaphorically-annotated by the Phrase Detectives Game, an online interactive...

The DKU-JNU-EMA Electromagnetic Articulography Database

Jul 15, 2019

Qin, Xiaoyi; Liu, Xinzhong; Cai, Zexin; Li, Ming, 2019, "The DKU-JNU-EMA Electromagnetic Articulography Database", https://hdl.handle.net/11272.1/AB2/D9PQFH, Abacus Data Network, V1

The DKU-JNU-EMA Electromagnetic Articulography Database was developed by Duke Kunshan University and Jinan University and contains approximately 10 hours of articulography and speech data in Mandarin, Cantonese, Hakka, and Teochew Chinese from two to seven native speakers for eac...

First DIHARD Challenge Evaluation - SEEDLingS

Jul 15, 2019

Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2019, "First DIHARD Challenge Evaluation - SEEDLingS", https://hdl.handle.net/11272.1/AB2/XH4KVV, Abacus Data Network, V1

First DIHARD Challenge Evaluation - SEEDLingS was developed by Duke University and the Linguistic Data Consortium (LDC) and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the First DIHARD Challenge. Th...

DEFT Spanish Committed Belief Annotation

Jun 17, 2019

Tracey, Jennifer; Arrigo, Michael; Strassel, Stephanie, 2019, "DEFT Spanish Committed Belief Annotation", https://hdl.handle.net/11272.1/AB2/HWOJGE, Abacus Data Network, V1

DEFT Spanish Committed Belief Annotation was developed by the Linguistic Data Consortium (LDC) and consists of approximately 67,000 tokens of Spanish discussion forum text annotated for "committed belief," which marks the level of commitment displayed by the author to the truth o...

First DIHARD Challenge Development - SEEDLingS

Jun 17, 2019

Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2019, "First DIHARD Challenge Development - SEEDLingS", https://hdl.handle.net/11272.1/AB2/KXC76R, Abacus Data Network, V1

First DIHARD Challenge Development - SEEDLingS was developed by Duke University and the Linguistic Data Consortium (LDC) and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the First DIHARD Challenge. T...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications