Linguistic Data Consortium

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

201 to 250 of 1,819 Results

Working_with_ISO_Images.txt Oct 17, 2023 - CALLFRIEND Russian Text Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea Documentation Working with ISO disc images
LDC2023T09.iso Oct 17, 2023 - CALLFRIEND Russian Text Optical Disc Image - 5.0 MB - MD5: b21d44ff5c5033d97ef6bea10bde1fee Data ISO disc image containing all documentation and data
LDC2023T09_File_Manifest.txt Oct 17, 2023 - CALLFRIEND Russian Text Plain Text - 4.2 KB - MD5: 49b4f9795a65804ce4359346a341f00a Documentation File manifest
2019 OpenSAT Public Safety Communications Simulation Oct 17, 2023 Delgado, Dana; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Caruso, Christopher; Graff, David, 2023, "2019 OpenSAT Public Safety Communications Simulation", https://hdl.handle.net/11272.1/AB2/BOXO5O, Abacus Data Network, V1 Abstract Introduction 2019 OpenSAT Public Safety Communications Simulation was developed by the Linguistic Data Consortium (LDC) and contains approximately 141 hours of speech recordings and transcripts used in the used in the National Institute of Standards and Technology (NIST)...
LDCTeamshare.md Oct 17, 2023 - 2019 OpenSAT Public Safety Communications Simulation Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Oct 17, 2023 - 2019 OpenSAT Public Safety Communications Simulation Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service
LDC2023S06_File_Manifest.txt Oct 17, 2023 - 2019 OpenSAT Public Safety Communications Simulation Plain Text - 66.2 KB - MD5: d873c909d2fc85b317f0628e073a5d36 Documentation File manifest
CALLFRIEND Russian Speech Oct 16, 2023 Miller, David; Walker, Kevin; Graff, David; Canavan, Alexandra, 2023, "CALLFRIEND Russian Speech", https://hdl.handle.net/11272.1/AB2/NGRVVO, Abacus Data Network, V1 Abstract Introduction CALLFRIEND Russian Speech (LDC2023S08) was developed by the Linguistic Data Consortium (LDC) and consists of approximately 48 hours of telephone conversations (100 recordings) between native speakers of Russian. The calls were recorded in 1999 as part of the...
Working_with_ISO_Images.txt Oct 16, 2023 - CALLFRIEND Russian Speech Plain Text - 1.3 KB - MD5: 4d4231d07ac669e105f71e602457efea Documentation Working with ISO disc images
LDC2023S08.iso Oct 16, 2023 - CALLFRIEND Russian Speech Optical Disc Image - 2.2 GB - MD5: f2f1d3efb4da5b636930ab6c60bb9644 Data ISO disc image containing all documentation and data
LDC2023S08_File_Manifest.txt Oct 16, 2023 - CALLFRIEND Russian Speech Plain Text - 4.1 KB - MD5: a1275e46f9f823c2ba27996ee26cc83f Documentation File manifest
KAFD: Arabic Font Database Aug 29, 2023 Luqman, Hamzah; Mahmoud, Sabri; Awaida, Sameh, 2016, "KAFD: Arabic Font Database", https://hdl.handle.net/11272.1/AB2/A0JPYM, Abacus Data Network, V2 Introduction KAFD: Arabic Font Database was developed by King Fahd University of Petroleum & Minerals and Qassim University. It is comprised of approximately 2.5 million scanned Arabic printed pages in a variety of fonts, sizes and resolutions along with corresponding transcripts...
LDCTeamshare.md Aug 29, 2023 - KAFD: Arabic Font Database Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
Noisy TIMIT Speech Aug 29, 2023 Abdulaziz, Azhar; Kepuska, Veton, 2017, "Noisy TIMIT Speech", https://hdl.handle.net/11272.1/AB2/FFFXT2, Abacus Data Network, V2 Introduction Noisy TIMIT Speech was developed by the Florida Institute of Technology and contains approximately 322 hours of speech from the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) modified with different additive noise levels. Only the audio has been modified;...
LDCTeamshare.md Aug 29, 2023 - Noisy TIMIT Speech Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
UCLA High-Speed Laryngeal Video and Audio Aug 29, 2023 Chen, Gang; Neubauer, Juergen; Garellek, Marc; Samlan, Robin; Gerratt, Bruce R.; Kreiman, Jody; Alwan, Abeer, 2017, "UCLA High-Speed Laryngeal Video and Audio", https://hdl.handle.net/11272.1/AB2/OWLHMG, Abacus Data Network, V2 UCLA High-Speed Laryngeal Video and Audio was developed by UCLA Speech Processing and Auditory Perception Laboratory and is comprised of high-speed laryngeal video recordings of the vocal folds and synchronized audio recordings from nine subjects collected between April 2012 and...
LDCTeamshare.md Aug 29, 2023 - UCLA High-Speed Laryngeal Video and Audio Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - UCLA High-Speed Laryngeal Video and Audio Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
CHiME2 WSJ0 Aug 29, 2023 Vincent, Emmanuel; Barker, Jon; Watanabe, Shinji; Le Roux, Jonathan; Nesta, Francesco; Matassoni, Marco, 2017, "CHiME2 WSJ0", https://hdl.handle.net/11272.1/AB2/IUB8PD, Abacus Data Network, V2 CHiME2 WSJ0 was developed as part of The 2nd CHiME Speech Separation and Recognition Challenge and contains approximately 166 hours of English speech from a noisy living room environment. The CHiME Challenges focus on distant-microphone automatic speech recognition (ASR) in real-...
LDCTeamshare.md Aug 29, 2023 - CHiME2 WSJ0 Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - CHiME2 WSJ0 Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
BOLT English Discussion Forums Aug 29, 2023 Tracey, Jennifer; Lee, Haejoong; Strassel, Stephanie, 2017, "BOLT English Discussion Forums", https://hdl.handle.net/11272.1/AB2/VDFID2, Abacus Data Network, V2 BOLT English Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 830,440 discussion forum threads in English harvested from the Internet using a combination of manual and automatic processes. The DARPA BOLT (Broad Operational Language Translati...
LDCTeamshare.md Aug 29, 2023 - BOLT English Discussion Forums Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - BOLT English Discussion Forums Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
BOLT Arabic Discussion Forums Aug 29, 2023 Tracey, Jennifer; Lee, Haejoong; Strassel, Stephanie; Ismael, Safa, 2018, "BOLT Arabic Discussion Forums", https://hdl.handle.net/11272.1/AB2/DP4INP, Abacus Data Network, V2 BOLT Arabic Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 813,080 discussion forum threads in Egyptian Arabic harvested from the Internet using a combination of manual and automatic processes. The DARPA BOLT (Broad Operational Language Tr...
LDCTeamshare.md Aug 29, 2023 - BOLT Arabic Discussion Forums Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - BOLT Arabic Discussion Forums Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
Concretely Annotated New York Times Aug 29, 2023 Ferraro, Francis; Thomas, Max; Wolfe, Travis; R. Gormley, Matthew; Harman, Craig; Van Durme, Benjamin, 2018, "Concretely Annotated New York Times", https://hdl.handle.net/11272.1/AB2/VA98GM, Abacus Data Network, V2 Introduction Concretely Annotated New York Times was developed by Johns Hopkins University’s Human Language Technology Center of Excellence. It adds multiple kinds and instances of automatically-generated syntactic, semantic and coreference annotations to The New York Times Annot...
LDCTeamshare.md Aug 29, 2023 - Concretely Annotated New York Times Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
Concretely Annotated English Gigaword Aug 29, 2023 Ferraro, Francis; Thomas, Max; Gormley, Matthew R.; Wolfe, Travis; Harman, Craig; Van Durme, Benjamin, 2018, "Concretely Annotated English Gigaword", https://hdl.handle.net/11272.1/AB2/NQCDFR, Abacus Data Network, V2 Concretely Annotated English Gigaword was developed by Johns Hopkins University’s Human Language Technology Center of Excellence (JHU). It adds multiple kinds and instances of automatically-generated syntactic, semantic and coreference annotations to English Gigaword Fifth Editio...
LDCTeamshare.md Aug 29, 2023 - Concretely Annotated English Gigaword Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - Concretely Annotated English Gigaword Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
HAVIC MED Progress Test -- Videos, Metadata and Annotation Aug 29, 2023 Morris, Amanda; Strassel, Stephanie; Li, Xuansong; Antonishek, Brian; Fiscus, Jonathan G., 2019, "HAVIC MED Progress Test -- Videos, Metadata and Annotation", https://hdl.handle.net/11272.1/AB2/QYTBMD, Abacus Data Network, V2 HAVIC MED Progress Test – Videos, Metadata and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 3,650 hours of user-generated videos with annotation and metadata. To advance multimodal event detection and related technologies, LDC...
LDCTeamshare.md Aug 29, 2023 - HAVIC MED Progress Test -- Videos, Metadata and Annotation Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - HAVIC MED Progress Test -- Videos, Metadata and Annotation Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
2010 NIST Speaker Recognition Evaluation Test Set Aug 29, 2023 Greenberg, Craig; Martin, Alvin; Graff, David; Brandschain, Linda; Walker, Kevin, 2017, "2010 NIST Speaker Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/2CPM3O, Abacus Data Network, V2 Introduction 2010 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains 2,255 hours of American English telephone speech and speech recorded over a microphone chann...
LDCTeamshare.md Aug 29, 2023 - 2010 NIST Speaker Recognition Evaluation Test Set Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - 2010 NIST Speaker Recognition Evaluation Test Set Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
CHiME3 Aug 29, 2023 Barker, Jon; Marxer, Ricard; Vincent, Emmanuel; Watanabe, Shinji, 2017, "CHiME3", https://hdl.handle.net/11272.1/AB2/HGHM4U, Abacus Data Network, V2 Introduction CHiME3 was developed as part of The 3rd CHiME Speech Separation and Recognition Challenge and contains approximately 342 hours of English speech and transcripts from noisy environments and 50 hours of noisy environment audio. The CHiME Challenges focus on distant-mic...
LDCTeamshare.md Aug 29, 2023 - CHiME3 Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - CHiME3 Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
AISHELL-1 Aug 29, 2023 Bu, Hui, 2018, "AISHELL-1", https://hdl.handle.net/11272.1/AB2/2WMDTT, Abacus Data Network, V2 AISHELL-1 was developed by Beijing Shell Shell Technology Co., Ltd. It contains approximately 520 hours of Chinese Mandarin speech from 400 speakers recorded simultaneously on three different devices with associated transcripts. The goal of the collection was to support speech re...
LDCTeamshare.md Aug 29, 2023 - AISHELL-1 Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - AISHELL-1 Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
Mixer 4 and 5 Speech Aug 29, 2023 Brandschain, Linda; Walker, Kevin; Graff, David; Cieri, Christopher; Neely, Abby; Mirghafori, Nikki; Peskin, Barbara; Godfrey, Jack; Strassel, Stephanie; Goodman, Fred; Doddington, George R.; King, Mike, 2021, "Mixer 4 and 5 Speech", https://hdl.handle.net/11272.1/AB2/LU0TQ8, Abacus Data Network, V2 Abstract Introduction Mixer 4 and 5 Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 14,185 hours of audio recordings of conversational telephone speech, interviews, elicitation exercises and transcript readings involving 616 distinct...
LDCTeamshare.pdf Aug 29, 2023 - Mixer 4 and 5 Speech Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
RATS Speaker Identification Aug 29, 2023 Graff, David; Ma, Xiaoyi; Strassel, Stephanie; Walker, Kevin; Jones, Karen, 2021, "RATS Speaker Identification", https://hdl.handle.net/11272.1/AB2/BZYHPS, Abacus Data Network, V2 Abstract Introduction RATS Speaker Identification was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 1,900 hours of Levantine Arabic, Farsi, Dari, Pashto and Urdu conversational telephone speech with annotations of speech segments. The audio w...
LDCTeamshare.md Aug 29, 2023 - RATS Speaker Identification Plain Text - 3.1 KB - MD5: 1b8a8741370964dcfff1eeec66e4b151 Documentation Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)
LDCTeamshare.pdf Aug 29, 2023 - RATS Speaker Identification Adobe PDF - 31.2 KB - MD5: 100c549ff1bb48ed76f05d01f6342eb3 Documentation Instructions on how to access LDC data via UBC's Teamshare service (PDF)
HAVIC MED Training Data -- Videos, Metadata and Annotation Aug 29, 2023 Morris, Amanda; Strassel, Stephanie; Li, Xuansong; Antonishek, Brian; Fiscus, Jonathan G., 2022, "HAVIC MED Training Data -- Videos, Metadata and Annotation", https://hdl.handle.net/11272.1/AB2/TQLGAR, Abacus Data Network, V2 Abstract Introduction HAVIC MED Training Data -- Videos, Metadata and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 2,100 hours of user-generated videos with annotation and metadata. To advance multimodal event detection and re...

Working_with_ISO_Images.txt

Oct 17, 2023 - CALLFRIEND Russian Text

Plain Text - 1.3 KB -

Documentation

Working with ISO disc images

LDC2023T09.iso

Oct 17, 2023 - CALLFRIEND Russian Text

Optical Disc Image - 5.0 MB -

Data

ISO disc image containing all documentation and data

LDC2023T09_File_Manifest.txt

Oct 17, 2023 - CALLFRIEND Russian Text

Plain Text - 4.2 KB -

Documentation

File manifest

2019 OpenSAT Public Safety Communications Simulation

Oct 17, 2023

Delgado, Dana; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Caruso, Christopher; Graff, David, 2023, "2019 OpenSAT Public Safety Communications Simulation", https://hdl.handle.net/11272.1/AB2/BOXO5O, Abacus Data Network, V1

Abstract Introduction 2019 OpenSAT Public Safety Communications Simulation was developed by the Linguistic Data Consortium (LDC) and contains approximately 141 hours of speech recordings and transcripts used in the used in the National Institute of Standards and Technology (NIST)...

LDCTeamshare.md

Oct 17, 2023 - 2019 OpenSAT Public Safety Communications Simulation

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Oct 17, 2023 - 2019 OpenSAT Public Safety Communications Simulation

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service

LDC2023S06_File_Manifest.txt

Oct 17, 2023 - 2019 OpenSAT Public Safety Communications Simulation

Plain Text - 66.2 KB -

Documentation

File manifest

CALLFRIEND Russian Speech

Oct 16, 2023

Miller, David; Walker, Kevin; Graff, David; Canavan, Alexandra, 2023, "CALLFRIEND Russian Speech", https://hdl.handle.net/11272.1/AB2/NGRVVO, Abacus Data Network, V1

Abstract Introduction CALLFRIEND Russian Speech (LDC2023S08) was developed by the Linguistic Data Consortium (LDC) and consists of approximately 48 hours of telephone conversations (100 recordings) between native speakers of Russian. The calls were recorded in 1999 as part of the...

Working_with_ISO_Images.txt

Oct 16, 2023 - CALLFRIEND Russian Speech

Plain Text - 1.3 KB -

Documentation

Working with ISO disc images

LDC2023S08.iso

Oct 16, 2023 - CALLFRIEND Russian Speech

Optical Disc Image - 2.2 GB -

Data

ISO disc image containing all documentation and data

LDC2023S08_File_Manifest.txt

Oct 16, 2023 - CALLFRIEND Russian Speech

Plain Text - 4.1 KB -

Documentation

File manifest

KAFD: Arabic Font Database

Aug 29, 2023

Luqman, Hamzah; Mahmoud, Sabri; Awaida, Sameh, 2016, "KAFD: Arabic Font Database", https://hdl.handle.net/11272.1/AB2/A0JPYM, Abacus Data Network, V2

Introduction KAFD: Arabic Font Database was developed by King Fahd University of Petroleum & Minerals and Qassim University. It is comprised of approximately 2.5 million scanned Arabic printed pages in a variety of fonts, sizes and resolutions along with corresponding transcripts...

LDCTeamshare.md

Aug 29, 2023 - KAFD: Arabic Font Database

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

Noisy TIMIT Speech

Aug 29, 2023

Abdulaziz, Azhar; Kepuska, Veton, 2017, "Noisy TIMIT Speech", https://hdl.handle.net/11272.1/AB2/FFFXT2, Abacus Data Network, V2

Introduction Noisy TIMIT Speech was developed by the Florida Institute of Technology and contains approximately 322 hours of speech from the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) modified with different additive noise levels. Only the audio has been modified;...

LDCTeamshare.md

Aug 29, 2023 - Noisy TIMIT Speech

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

UCLA High-Speed Laryngeal Video and Audio

Aug 29, 2023

Chen, Gang; Neubauer, Juergen; Garellek, Marc; Samlan, Robin; Gerratt, Bruce R.; Kreiman, Jody; Alwan, Abeer, 2017, "UCLA High-Speed Laryngeal Video and Audio", https://hdl.handle.net/11272.1/AB2/OWLHMG, Abacus Data Network, V2

UCLA High-Speed Laryngeal Video and Audio was developed by UCLA Speech Processing and Auditory Perception Laboratory and is comprised of high-speed laryngeal video recordings of the vocal folds and synchronized audio recordings from nine subjects collected between April 2012 and...

LDCTeamshare.md

Aug 29, 2023 - UCLA High-Speed Laryngeal Video and Audio

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - UCLA High-Speed Laryngeal Video and Audio

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

CHiME2 WSJ0

Aug 29, 2023

Vincent, Emmanuel; Barker, Jon; Watanabe, Shinji; Le Roux, Jonathan; Nesta, Francesco; Matassoni, Marco, 2017, "CHiME2 WSJ0", https://hdl.handle.net/11272.1/AB2/IUB8PD, Abacus Data Network, V2

CHiME2 WSJ0 was developed as part of The 2nd CHiME Speech Separation and Recognition Challenge and contains approximately 166 hours of English speech from a noisy living room environment. The CHiME Challenges focus on distant-microphone automatic speech recognition (ASR) in real-...

LDCTeamshare.md

Aug 29, 2023 - CHiME2 WSJ0

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - CHiME2 WSJ0

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

BOLT English Discussion Forums

Aug 29, 2023

Tracey, Jennifer; Lee, Haejoong; Strassel, Stephanie, 2017, "BOLT English Discussion Forums", https://hdl.handle.net/11272.1/AB2/VDFID2, Abacus Data Network, V2

BOLT English Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 830,440 discussion forum threads in English harvested from the Internet using a combination of manual and automatic processes. The DARPA BOLT (Broad Operational Language Translati...

LDCTeamshare.md

Aug 29, 2023 - BOLT English Discussion Forums

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - BOLT English Discussion Forums

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

BOLT Arabic Discussion Forums

Aug 29, 2023

Tracey, Jennifer; Lee, Haejoong; Strassel, Stephanie; Ismael, Safa, 2018, "BOLT Arabic Discussion Forums", https://hdl.handle.net/11272.1/AB2/DP4INP, Abacus Data Network, V2

BOLT Arabic Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 813,080 discussion forum threads in Egyptian Arabic harvested from the Internet using a combination of manual and automatic processes. The DARPA BOLT (Broad Operational Language Tr...

LDCTeamshare.md

Aug 29, 2023 - BOLT Arabic Discussion Forums

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - BOLT Arabic Discussion Forums

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

Concretely Annotated New York Times

Aug 29, 2023

Ferraro, Francis; Thomas, Max; Wolfe, Travis; R. Gormley, Matthew; Harman, Craig; Van Durme, Benjamin, 2018, "Concretely Annotated New York Times", https://hdl.handle.net/11272.1/AB2/VA98GM, Abacus Data Network, V2

Introduction Concretely Annotated New York Times was developed by Johns Hopkins University’s Human Language Technology Center of Excellence. It adds multiple kinds and instances of automatically-generated syntactic, semantic and coreference annotations to The New York Times Annot...

LDCTeamshare.md

Aug 29, 2023 - Concretely Annotated New York Times

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

Concretely Annotated English Gigaword

Aug 29, 2023

Ferraro, Francis; Thomas, Max; Gormley, Matthew R.; Wolfe, Travis; Harman, Craig; Van Durme, Benjamin, 2018, "Concretely Annotated English Gigaword", https://hdl.handle.net/11272.1/AB2/NQCDFR, Abacus Data Network, V2

Concretely Annotated English Gigaword was developed by Johns Hopkins University’s Human Language Technology Center of Excellence (JHU). It adds multiple kinds and instances of automatically-generated syntactic, semantic and coreference annotations to English Gigaword Fifth Editio...

LDCTeamshare.md

Aug 29, 2023 - Concretely Annotated English Gigaword

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - Concretely Annotated English Gigaword

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

HAVIC MED Progress Test -- Videos, Metadata and Annotation

Aug 29, 2023

Morris, Amanda; Strassel, Stephanie; Li, Xuansong; Antonishek, Brian; Fiscus, Jonathan G., 2019, "HAVIC MED Progress Test -- Videos, Metadata and Annotation", https://hdl.handle.net/11272.1/AB2/QYTBMD, Abacus Data Network, V2

HAVIC MED Progress Test – Videos, Metadata and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 3,650 hours of user-generated videos with annotation and metadata. To advance multimodal event detection and related technologies, LDC...

LDCTeamshare.md

Aug 29, 2023 - HAVIC MED Progress Test -- Videos, Metadata and Annotation

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - HAVIC MED Progress Test -- Videos, Metadata and Annotation

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

2010 NIST Speaker Recognition Evaluation Test Set

Aug 29, 2023

Greenberg, Craig; Martin, Alvin; Graff, David; Brandschain, Linda; Walker, Kevin, 2017, "2010 NIST Speaker Recognition Evaluation Test Set", https://hdl.handle.net/11272.1/AB2/2CPM3O, Abacus Data Network, V2

Introduction 2010 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains 2,255 hours of American English telephone speech and speech recorded over a microphone chann...

LDCTeamshare.md

Aug 29, 2023 - 2010 NIST Speaker Recognition Evaluation Test Set

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - 2010 NIST Speaker Recognition Evaluation Test Set

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

CHiME3

Aug 29, 2023

Barker, Jon; Marxer, Ricard; Vincent, Emmanuel; Watanabe, Shinji, 2017, "CHiME3", https://hdl.handle.net/11272.1/AB2/HGHM4U, Abacus Data Network, V2

Introduction CHiME3 was developed as part of The 3rd CHiME Speech Separation and Recognition Challenge and contains approximately 342 hours of English speech and transcripts from noisy environments and 50 hours of noisy environment audio. The CHiME Challenges focus on distant-mic...

LDCTeamshare.md

Aug 29, 2023 - CHiME3

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - CHiME3

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

AISHELL-1

Aug 29, 2023

Bu, Hui, 2018, "AISHELL-1", https://hdl.handle.net/11272.1/AB2/2WMDTT, Abacus Data Network, V2

AISHELL-1 was developed by Beijing Shell Shell Technology Co., Ltd. It contains approximately 520 hours of Chinese Mandarin speech from 400 speakers recorded simultaneously on three different devices with associated transcripts. The goal of the collection was to support speech re...

LDCTeamshare.md

Aug 29, 2023 - AISHELL-1

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - AISHELL-1

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

Mixer 4 and 5 Speech

Aug 29, 2023

Brandschain, Linda; Walker, Kevin; Graff, David; Cieri, Christopher; Neely, Abby; Mirghafori, Nikki; Peskin, Barbara; Godfrey, Jack; Strassel, Stephanie; Goodman, Fred; Doddington, George R.; King, Mike, 2021, "Mixer 4 and 5 Speech", https://hdl.handle.net/11272.1/AB2/LU0TQ8, Abacus Data Network, V2

Abstract Introduction Mixer 4 and 5 Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 14,185 hours of audio recordings of conversational telephone speech, interviews, elicitation exercises and transcript readings involving 616 distinct...

LDCTeamshare.pdf

Aug 29, 2023 - Mixer 4 and 5 Speech

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

RATS Speaker Identification

Aug 29, 2023

Graff, David; Ma, Xiaoyi; Strassel, Stephanie; Walker, Kevin; Jones, Karen, 2021, "RATS Speaker Identification", https://hdl.handle.net/11272.1/AB2/BZYHPS, Abacus Data Network, V2

Abstract Introduction RATS Speaker Identification was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 1,900 hours of Levantine Arabic, Farsi, Dari, Pashto and Urdu conversational telephone speech with annotations of speech segments. The audio w...

LDCTeamshare.md

Aug 29, 2023 - RATS Speaker Identification

Plain Text - 3.1 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (Markdown/ASCII text)

LDCTeamshare.pdf

Aug 29, 2023 - RATS Speaker Identification

Adobe PDF - 31.2 KB -

Documentation

Instructions on how to access LDC data via UBC's Teamshare service (PDF)

HAVIC MED Training Data -- Videos, Metadata and Annotation

Aug 29, 2023

Morris, Amanda; Strassel, Stephanie; Li, Xuansong; Antonishek, Brian; Fiscus, Jonathan G., 2022, "HAVIC MED Training Data -- Videos, Metadata and Annotation", https://hdl.handle.net/11272.1/AB2/TQLGAR, Abacus Data Network, V2

Abstract Introduction HAVIC MED Training Data -- Videos, Metadata and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 2,100 hours of user-generated videos with annotation and metadata. To advance multimodal event detection and re...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications