Linguistic Data Consortium

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1,801 to 1,819 of 1,819 Results

The New York Times Annotated Corpus Oct 17, 2008 Sandhaus, Evan, 2008, "The New York Times Annotated Corpus", https://hdl.handle.net/11272.1/AB2/GZC6PL, Abacus Data Network, V1 The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online productio...
STC-TIMIT 1.0 Mar 19, 2008 Morales, Nicholas, 2008, "STC-TIMIT 1.0", https://hdl.handle.net/11272.1/AB2/XWIVLE, Abacus Data Network, V1 This file contains documentation for STC-TIMIT 1.0, Linguistic Data Consortium (LDC) catalog number LDC2008S03 and isbn 1-58563-468-9. STC-TIMIT 1.0 is a telephone version of TIMIT Acoustic Phonetic Continuous Speech Corpus, LDC93S1 (TIMIT). TIMIT contains broadband recordings of...
Mandarin Affective Speech Jul 17, 2007 Yang, Yingchun; Wu, Zhaohui; Wu, Tian; Li, Dongdong, 2007, "Mandarin Affective Speech", https://hdl.handle.net/11272.1/AB2/USGIFG, Abacus Data Network, V1 Mandarin Affective Speech is a database of emotional speech consisting of audio recordings and corresponding transcripts collected in 2005 at the Advance Computing and System Laboratory, College of Computer Science and Technology, Zhejiang University, Hangzhou, People's Republic...
ISI Arabic-English Automatically Extracted Parallel Text Feb 20, 2007 Munteanu, Dragos Stefan; Marcu, Daniel, 2007, "ISI Arabic-English Automatically Extracted Parallel Text", https://hdl.handle.net/11272.1/AB2/QOOTEO, Abacus Data Network, V1 This distribution contains a corpus of Arabic-English parallel sentences, which were extracted automatically from two monolingual corpora: Arabic Gigaword Second Edition (LDC2006T02) and English Gigaword Second Edition (LDC2005T12). The data was extracted from news articles publi...
Fisher English Training Speech Part 1 Transcripts Dec 15, 2004 Cieri, Christopher; Graff, David; Kimball, Owen; Miller, Dave; Walker, Kevin, 2004, "Fisher English Training Speech Part 1 Transcripts", https://hdl.handle.net/11272.1/AB2/2NDQPL, Abacus Data Network, V1 Fisher English Training Speech Part 1 Transcripts represents the first half of a collection of conversational telephone speech (CTS) that was created at LDC in 2003. It contains time-aligned transcript data for 5,850 complete conversations, each lasting up to 10 minutes. In addit...
Fisher English Training Speech Part 1 Speech Dec 15, 2004 Cieri, Christopher; Graff, David; Kimball, Owen; Miller, Dave; Walker,Kevin, 2004, "Fisher English Training Speech Part 1 Speech", https://hdl.handle.net/11272.1/AB2/KST6JM, Abacus Data Network, V1 Fisher English Training Speech Part 1 Speech represents the first half of a collection of conversational telephone speech (CTS) that was created at the LDC during 2003. It contains 5,850 audio files, each one containing a full conversation of up to 10 minutes. Additional informat...
Arabic English Parallel News Part 1 Oct 24, 2004 Several (sic), 2004, "Arabic English Parallel News Part 1", https://hdl.handle.net/11272.1/AB2/AWOGQE, Abacus Data Network, V1 This corpus contains Arabic news stories and their English translations LDC collected via Ummah Press Service from January 2001 to September 2004. It totals 8,439 story pairs, 68,685 sentence pairs, 2M Arabic words and 2.5M English words. The corpus is aligned at sentence level....
Arabic News Translation Text Part 1 Sep 23, 2004 Ma, Xiaoyi; Zakhary, Dalal; Bamba, Moussa, 2004, "Arabic News Translation Text Part 1", https://hdl.handle.net/11272.1/AB2/OXMNRV, Abacus Data Network, V1 Arabic News Translation Text Part 1 was produced by Linguistic Data Consortium (LDC) catalog number LDC2004T17 and ISBN 1-58563-307-0. To support the development of automatic machine translation systems, the LDC was sponsored to solicit English translations for a single set of Ar...
Korean Telephone Conversations Transcripts May 16, 2003 Ko, Eon-Suk; Han, Na-Rae; Strassel, Stephanie; Martey, Nii, 2003, "Korean Telephone Conversations Transcripts", https://hdl.handle.net/11272.1/AB2/NLHMOC, Abacus Data Network, V1 Korean Telephone Conversations Transcripts was produced by Linguistic Data Consortium (LDC) catalog number LDC2003T08 and ISBN 1-58563-264-3. The telephone conversations on which these transcripts are based were originally recorded as part of the CALLFRIEND project. The CALLFRIEN...
RST Discourse Treebank Feb 21, 2002 Carlson, Lynn; Marcu, Daniel; Okurowski, Mary Ellen, 2002, "RST Discourse Treebank", https://hdl.handle.net/11272.1/AB2/T4O5YK, Abacus Data Network, V1 Rhetorical Structure Theory (RST) Discourse Treebank was developed by researchers at the Information Sciences Institute (University of Southern California), the US Department of Defense and the Linguistic Data Consortium (LDC). It consists of 385 Wall Street Journal articles from...
HTIMIT Jan 1, 1998 Reynolds, Douglas, 1998, "HTIMIT", https://hdl.handle.net/11272.1/AB2/HO3TZV, Abacus Data Network, V1 The HTIMIT corpus is a re-recording of a subset of the TIMIT corpus through different telephone handsets. The aim was to create a corpus for the study of telephone transducer effects on speech which minimized confounding factors, such as variable telephone channels and background...
FFMTIMIT Jan 1, 1996 Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1996, "FFMTIMIT", https://hdl.handle.net/11272.1/AB2/MJ60CA, Abacus Data Network, V1 The FFMTIMIT corpus contains the previously unreleased secondary microphone waveforms for the TIMIT Acoustic-Phonetic Continuous Speech corpus. The primary microphone waveforms, which were recorded using a close-talking noise-cancelling head-mounted Sennheiser microphone (model H...
JEIDA/JCSD-Channel 1 Control Words Jan 1, 1996 Hamaker, Jonathan; Duncan, Richard; Picone, Joe; Itahashi, Shuichi, 1996, "JEIDA/JCSD-Channel 1 Control Words", https://hdl.handle.net/11272.1/AB2/L4QJD1, Abacus Data Network, V1 The Japanese Electronic Industry Development Association's (JEIDA) Common Speech Data Corpus (JCSD) was prepared by Jonathan Hamaker, Richard J. Duncan and Joe Picone of the Institute for Signal and Information Processing at Mississippi State University.
CALLHOME Spanish Lexicon Jan 1, 1996 Garrett, Susan; Morton, Tom; McLemore, Cynthia, 1996, "CALLHOME Spanish Lexicon", https://hdl.handle.net/11272.1/AB2/YRJRSK, Abacus Data Network, V1 The CALLHOME Spanish collection includes a lexical component. The CALLHOME Spanish Lexicon consists of 45,582 words and contains separate information fields with phonological, morphological and frequency information for each word. The token coverage by the LDC Spanish lexicon of...
CELEX2 Jan 1, 1996 Baayen, R; Piepenbrock, R; Gulikers, L, 1996, "CELEX2", https://hdl.handle.net/11272.1/AB2/WLSRWH, Abacus Data Network, V1 This corpus contains ASCII versions of the CELEX lexical databases of English (Version 2.5), Dutch (Version 3.1) and German (Version 2.0). CELEX was developed as a joint enterprise of the University of Nijmegen, the Institute for Dutch Lexicology in Leiden, the Max Planck Institu...
CTIMIT Jan 1, 1996 George, E. Bryan; Brown, Kathy; Birnbaum, Martha; Macon, Michael, 1996, "CTIMIT", https://hdl.handle.net/11272.1/AB2/DPIQCD, Abacus Data Network, V1 The CTIMIT corpus is a cellular-bandwidth adjunct to the TIMIT Acoustic Phonetic Continuous Speech Corpus (NIST Speech Disc CD1-1.1/NTIS Pb91-505065, October 1990). The corpus was contributed by Lockheed-Martin Sanders to the LDC for distribution on CD-ROM media. The CTIMIT read...
NTIMIT Jan 1, 1993 Fisher, William; Doddington, George; Goudie-Marshall, Kathleen; Jankowski, Charles; Kalyanswamy, Ashok; Basson, Sara; Spitz, Judith, 1993, "NTIMIT", https://hdl.handle.net/11272.1/AB2/AXQJUZ, Abacus Data Network, V1 The NTIMIT corpus was developed by the NYNEX Science and Technology Speech Communication Group to provide a telephone bandwidth adjunct to TIMIT. NTIMIT was collected by transmitting all 6,300 original TIMIT recordings through a telephone handset and over various channels in the...
TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version) Jan 1, 1993 Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1993, "TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)", https://hdl.handle.net/11272.1/AB2/BU0KGP, Abacus Data Network, V1 This version of the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) has all the waveform files formatted with ms-wav / RIFF headers, to make the corpus more accessible to a wider audience. The TIMIT corpus of read speech is designed to provide speech data for acoustic-...
TIMIT Acoustic-Phonetic Continuous Speech Corpus Jan 1, 1993 Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1993, "TIMIT Acoustic-Phonetic Continuous Speech Corpus", https://hdl.handle.net/11272.1/AB2/SWVENO, Abacus Data Network, V1 The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each r...

The New York Times Annotated Corpus

Oct 17, 2008

Sandhaus, Evan, 2008, "The New York Times Annotated Corpus", https://hdl.handle.net/11272.1/AB2/GZC6PL, Abacus Data Network, V1

The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online productio...

STC-TIMIT 1.0

Mar 19, 2008

Morales, Nicholas, 2008, "STC-TIMIT 1.0", https://hdl.handle.net/11272.1/AB2/XWIVLE, Abacus Data Network, V1

This file contains documentation for STC-TIMIT 1.0, Linguistic Data Consortium (LDC) catalog number LDC2008S03 and isbn 1-58563-468-9. STC-TIMIT 1.0 is a telephone version of TIMIT Acoustic Phonetic Continuous Speech Corpus, LDC93S1 (TIMIT). TIMIT contains broadband recordings of...

Mandarin Affective Speech

Jul 17, 2007

Yang, Yingchun; Wu, Zhaohui; Wu, Tian; Li, Dongdong, 2007, "Mandarin Affective Speech", https://hdl.handle.net/11272.1/AB2/USGIFG, Abacus Data Network, V1

Mandarin Affective Speech is a database of emotional speech consisting of audio recordings and corresponding transcripts collected in 2005 at the Advance Computing and System Laboratory, College of Computer Science and Technology, Zhejiang University, Hangzhou, People's Republic...

ISI Arabic-English Automatically Extracted Parallel Text

Feb 20, 2007

Munteanu, Dragos Stefan; Marcu, Daniel, 2007, "ISI Arabic-English Automatically Extracted Parallel Text", https://hdl.handle.net/11272.1/AB2/QOOTEO, Abacus Data Network, V1

This distribution contains a corpus of Arabic-English parallel sentences, which were extracted automatically from two monolingual corpora: Arabic Gigaword Second Edition (LDC2006T02) and English Gigaword Second Edition (LDC2005T12). The data was extracted from news articles publi...

Fisher English Training Speech Part 1 Transcripts

Dec 15, 2004

Cieri, Christopher; Graff, David; Kimball, Owen; Miller, Dave; Walker, Kevin, 2004, "Fisher English Training Speech Part 1 Transcripts", https://hdl.handle.net/11272.1/AB2/2NDQPL, Abacus Data Network, V1

Fisher English Training Speech Part 1 Transcripts represents the first half of a collection of conversational telephone speech (CTS) that was created at LDC in 2003. It contains time-aligned transcript data for 5,850 complete conversations, each lasting up to 10 minutes. In addit...

Fisher English Training Speech Part 1 Speech

Dec 15, 2004

Cieri, Christopher; Graff, David; Kimball, Owen; Miller, Dave; Walker,Kevin, 2004, "Fisher English Training Speech Part 1 Speech", https://hdl.handle.net/11272.1/AB2/KST6JM, Abacus Data Network, V1

Fisher English Training Speech Part 1 Speech represents the first half of a collection of conversational telephone speech (CTS) that was created at the LDC during 2003. It contains 5,850 audio files, each one containing a full conversation of up to 10 minutes. Additional informat...

Arabic English Parallel News Part 1

Oct 24, 2004

Several (sic), 2004, "Arabic English Parallel News Part 1", https://hdl.handle.net/11272.1/AB2/AWOGQE, Abacus Data Network, V1

This corpus contains Arabic news stories and their English translations LDC collected via Ummah Press Service from January 2001 to September 2004. It totals 8,439 story pairs, 68,685 sentence pairs, 2M Arabic words and 2.5M English words. The corpus is aligned at sentence level....

Arabic News Translation Text Part 1

Sep 23, 2004

Ma, Xiaoyi; Zakhary, Dalal; Bamba, Moussa, 2004, "Arabic News Translation Text Part 1", https://hdl.handle.net/11272.1/AB2/OXMNRV, Abacus Data Network, V1

Arabic News Translation Text Part 1 was produced by Linguistic Data Consortium (LDC) catalog number LDC2004T17 and ISBN 1-58563-307-0. To support the development of automatic machine translation systems, the LDC was sponsored to solicit English translations for a single set of Ar...

Korean Telephone Conversations Transcripts

May 16, 2003

Ko, Eon-Suk; Han, Na-Rae; Strassel, Stephanie; Martey, Nii, 2003, "Korean Telephone Conversations Transcripts", https://hdl.handle.net/11272.1/AB2/NLHMOC, Abacus Data Network, V1

Korean Telephone Conversations Transcripts was produced by Linguistic Data Consortium (LDC) catalog number LDC2003T08 and ISBN 1-58563-264-3. The telephone conversations on which these transcripts are based were originally recorded as part of the CALLFRIEND project. The CALLFRIEN...

RST Discourse Treebank

Feb 21, 2002

Carlson, Lynn; Marcu, Daniel; Okurowski, Mary Ellen, 2002, "RST Discourse Treebank", https://hdl.handle.net/11272.1/AB2/T4O5YK, Abacus Data Network, V1

Rhetorical Structure Theory (RST) Discourse Treebank was developed by researchers at the Information Sciences Institute (University of Southern California), the US Department of Defense and the Linguistic Data Consortium (LDC). It consists of 385 Wall Street Journal articles from...

HTIMIT

Jan 1, 1998

Reynolds, Douglas, 1998, "HTIMIT", https://hdl.handle.net/11272.1/AB2/HO3TZV, Abacus Data Network, V1

The HTIMIT corpus is a re-recording of a subset of the TIMIT corpus through different telephone handsets. The aim was to create a corpus for the study of telephone transducer effects on speech which minimized confounding factors, such as variable telephone channels and background...

FFMTIMIT

Jan 1, 1996

Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1996, "FFMTIMIT", https://hdl.handle.net/11272.1/AB2/MJ60CA, Abacus Data Network, V1

The FFMTIMIT corpus contains the previously unreleased secondary microphone waveforms for the TIMIT Acoustic-Phonetic Continuous Speech corpus. The primary microphone waveforms, which were recorded using a close-talking noise-cancelling head-mounted Sennheiser microphone (model H...

JEIDA/JCSD-Channel 1 Control Words

Jan 1, 1996

Hamaker, Jonathan; Duncan, Richard; Picone, Joe; Itahashi, Shuichi, 1996, "JEIDA/JCSD-Channel 1 Control Words", https://hdl.handle.net/11272.1/AB2/L4QJD1, Abacus Data Network, V1

The Japanese Electronic Industry Development Association's (JEIDA) Common Speech Data Corpus (JCSD) was prepared by Jonathan Hamaker, Richard J. Duncan and Joe Picone of the Institute for Signal and Information Processing at Mississippi State University.

CALLHOME Spanish Lexicon

Jan 1, 1996

Garrett, Susan; Morton, Tom; McLemore, Cynthia, 1996, "CALLHOME Spanish Lexicon", https://hdl.handle.net/11272.1/AB2/YRJRSK, Abacus Data Network, V1

The CALLHOME Spanish collection includes a lexical component. The CALLHOME Spanish Lexicon consists of 45,582 words and contains separate information fields with phonological, morphological and frequency information for each word. The token coverage by the LDC Spanish lexicon of...

CELEX2

Jan 1, 1996

Baayen, R; Piepenbrock, R; Gulikers, L, 1996, "CELEX2", https://hdl.handle.net/11272.1/AB2/WLSRWH, Abacus Data Network, V1

This corpus contains ASCII versions of the CELEX lexical databases of English (Version 2.5), Dutch (Version 3.1) and German (Version 2.0). CELEX was developed as a joint enterprise of the University of Nijmegen, the Institute for Dutch Lexicology in Leiden, the Max Planck Institu...

CTIMIT

Jan 1, 1996

George, E. Bryan; Brown, Kathy; Birnbaum, Martha; Macon, Michael, 1996, "CTIMIT", https://hdl.handle.net/11272.1/AB2/DPIQCD, Abacus Data Network, V1

The CTIMIT corpus is a cellular-bandwidth adjunct to the TIMIT Acoustic Phonetic Continuous Speech Corpus (NIST Speech Disc CD1-1.1/NTIS Pb91-505065, October 1990). The corpus was contributed by Lockheed-Martin Sanders to the LDC for distribution on CD-ROM media. The CTIMIT read...

NTIMIT

Jan 1, 1993

Fisher, William; Doddington, George; Goudie-Marshall, Kathleen; Jankowski, Charles; Kalyanswamy, Ashok; Basson, Sara; Spitz, Judith, 1993, "NTIMIT", https://hdl.handle.net/11272.1/AB2/AXQJUZ, Abacus Data Network, V1

The NTIMIT corpus was developed by the NYNEX Science and Technology Speech Communication Group to provide a telephone bandwidth adjunct to TIMIT. NTIMIT was collected by transmitting all 6,300 original TIMIT recordings through a telephone handset and over various channels in the...

TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)

Jan 1, 1993

Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1993, "TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)", https://hdl.handle.net/11272.1/AB2/BU0KGP, Abacus Data Network, V1

This version of the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) has all the waveform files formatted with ms-wav / RIFF headers, to make the corpus more accessible to a wider audience. The TIMIT corpus of read speech is designed to provide speech data for acoustic-...

TIMIT Acoustic-Phonetic Continuous Speech Corpus

Jan 1, 1993

Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1993, "TIMIT Acoustic-Phonetic Continuous Speech Corpus", https://hdl.handle.net/11272.1/AB2/SWVENO, Abacus Data Network, V1

The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each r...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications