Skip to main content
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1,801 to 1,819 of 1,819 Results
Oct 17, 2008
Sandhaus, Evan, 2008, "The New York Times Annotated Corpus", https://hdl.handle.net/11272.1/AB2/GZC6PL, Abacus Data Network, V1
The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online productio...
Mar 19, 2008
Morales, Nicholas, 2008, "STC-TIMIT 1.0", https://hdl.handle.net/11272.1/AB2/XWIVLE, Abacus Data Network, V1
This file contains documentation for STC-TIMIT 1.0, Linguistic Data Consortium (LDC) catalog number LDC2008S03 and isbn 1-58563-468-9. STC-TIMIT 1.0 is a telephone version of TIMIT Acoustic Phonetic Continuous Speech Corpus, LDC93S1 (TIMIT). TIMIT contains broadband recordings of...
Jul 17, 2007
Yang, Yingchun; Wu, Zhaohui; Wu, Tian; Li, Dongdong, 2007, "Mandarin Affective Speech", https://hdl.handle.net/11272.1/AB2/USGIFG, Abacus Data Network, V1
Mandarin Affective Speech is a database of emotional speech consisting of audio recordings and corresponding transcripts collected in 2005 at the Advance Computing and System Laboratory, College of Computer Science and Technology, Zhejiang University, Hangzhou, People's Republic...
Feb 20, 2007
Munteanu, Dragos Stefan; Marcu, Daniel, 2007, "ISI Arabic-English Automatically Extracted Parallel Text", https://hdl.handle.net/11272.1/AB2/QOOTEO, Abacus Data Network, V1
This distribution contains a corpus of Arabic-English parallel sentences, which were extracted automatically from two monolingual corpora: Arabic Gigaword Second Edition (LDC2006T02) and English Gigaword Second Edition (LDC2005T12). The data was extracted from news articles publi...
Dec 15, 2004
Cieri, Christopher; Graff, David; Kimball, Owen; Miller, Dave; Walker, Kevin, 2004, "Fisher English Training Speech Part 1 Transcripts", https://hdl.handle.net/11272.1/AB2/2NDQPL, Abacus Data Network, V1
Fisher English Training Speech Part 1 Transcripts represents the first half of a collection of conversational telephone speech (CTS) that was created at LDC in 2003. It contains time-aligned transcript data for 5,850 complete conversations, each lasting up to 10 minutes. In addit...
Dec 15, 2004
Cieri, Christopher; Graff, David; Kimball, Owen; Miller, Dave; Walker,Kevin, 2004, "Fisher English Training Speech Part 1 Speech", https://hdl.handle.net/11272.1/AB2/KST6JM, Abacus Data Network, V1
Fisher English Training Speech Part 1 Speech represents the first half of a collection of conversational telephone speech (CTS) that was created at the LDC during 2003. It contains 5,850 audio files, each one containing a full conversation of up to 10 minutes. Additional informat...
Oct 24, 2004
Several (sic), 2004, "Arabic English Parallel News Part 1", https://hdl.handle.net/11272.1/AB2/AWOGQE, Abacus Data Network, V1
This corpus contains Arabic news stories and their English translations LDC collected via Ummah Press Service from January 2001 to September 2004. It totals 8,439 story pairs, 68,685 sentence pairs, 2M Arabic words and 2.5M English words. The corpus is aligned at sentence level....
Sep 23, 2004
Ma, Xiaoyi; Zakhary, Dalal; Bamba, Moussa, 2004, "Arabic News Translation Text Part 1", https://hdl.handle.net/11272.1/AB2/OXMNRV, Abacus Data Network, V1
Arabic News Translation Text Part 1 was produced by Linguistic Data Consortium (LDC) catalog number LDC2004T17 and ISBN 1-58563-307-0. To support the development of automatic machine translation systems, the LDC was sponsored to solicit English translations for a single set of Ar...
May 16, 2003
Ko, Eon-Suk; Han, Na-Rae; Strassel, Stephanie; Martey, Nii, 2003, "Korean Telephone Conversations Transcripts", https://hdl.handle.net/11272.1/AB2/NLHMOC, Abacus Data Network, V1
Korean Telephone Conversations Transcripts was produced by Linguistic Data Consortium (LDC) catalog number LDC2003T08 and ISBN 1-58563-264-3. The telephone conversations on which these transcripts are based were originally recorded as part of the CALLFRIEND project. The CALLFRIEN...
Feb 21, 2002
Carlson, Lynn; Marcu, Daniel; Okurowski, Mary Ellen, 2002, "RST Discourse Treebank", https://hdl.handle.net/11272.1/AB2/T4O5YK, Abacus Data Network, V1
Rhetorical Structure Theory (RST) Discourse Treebank was developed by researchers at the Information Sciences Institute (University of Southern California), the US Department of Defense and the Linguistic Data Consortium (LDC). It consists of 385 Wall Street Journal articles from...
Jan 1, 1998
Reynolds, Douglas, 1998, "HTIMIT", https://hdl.handle.net/11272.1/AB2/HO3TZV, Abacus Data Network, V1
The HTIMIT corpus is a re-recording of a subset of the TIMIT corpus through different telephone handsets. The aim was to create a corpus for the study of telephone transducer effects on speech which minimized confounding factors, such as variable telephone channels and background...
Jan 1, 1996
Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1996, "FFMTIMIT", https://hdl.handle.net/11272.1/AB2/MJ60CA, Abacus Data Network, V1
The FFMTIMIT corpus contains the previously unreleased secondary microphone waveforms for the TIMIT Acoustic-Phonetic Continuous Speech corpus. The primary microphone waveforms, which were recorded using a close-talking noise-cancelling head-mounted Sennheiser microphone (model H...
Jan 1, 1996
Hamaker, Jonathan; Duncan, Richard; Picone, Joe; Itahashi, Shuichi, 1996, "JEIDA/JCSD-Channel 1 Control Words", https://hdl.handle.net/11272.1/AB2/L4QJD1, Abacus Data Network, V1
The Japanese Electronic Industry Development Association's (JEIDA) Common Speech Data Corpus (JCSD) was prepared by Jonathan Hamaker, Richard J. Duncan and Joe Picone of the Institute for Signal and Information Processing at Mississippi State University.
Jan 1, 1996
Garrett, Susan; Morton, Tom; McLemore, Cynthia, 1996, "CALLHOME Spanish Lexicon", https://hdl.handle.net/11272.1/AB2/YRJRSK, Abacus Data Network, V1
The CALLHOME Spanish collection includes a lexical component. The CALLHOME Spanish Lexicon consists of 45,582 words and contains separate information fields with phonological, morphological and frequency information for each word. The token coverage by the LDC Spanish lexicon of...
Jan 1, 1996
Baayen, R; Piepenbrock, R; Gulikers, L, 1996, "CELEX2", https://hdl.handle.net/11272.1/AB2/WLSRWH, Abacus Data Network, V1
This corpus contains ASCII versions of the CELEX lexical databases of English (Version 2.5), Dutch (Version 3.1) and German (Version 2.0). CELEX was developed as a joint enterprise of the University of Nijmegen, the Institute for Dutch Lexicology in Leiden, the Max Planck Institu...
Jan 1, 1996
George, E. Bryan; Brown, Kathy; Birnbaum, Martha; Macon, Michael, 1996, "CTIMIT", https://hdl.handle.net/11272.1/AB2/DPIQCD, Abacus Data Network, V1
The CTIMIT corpus is a cellular-bandwidth adjunct to the TIMIT Acoustic Phonetic Continuous Speech Corpus (NIST Speech Disc CD1-1.1/NTIS Pb91-505065, October 1990). The corpus was contributed by Lockheed-Martin Sanders to the LDC for distribution on CD-ROM media. The CTIMIT read...
Jan 1, 1993
Fisher, William; Doddington, George; Goudie-Marshall, Kathleen; Jankowski, Charles; Kalyanswamy, Ashok; Basson, Sara; Spitz, Judith, 1993, "NTIMIT", https://hdl.handle.net/11272.1/AB2/AXQJUZ, Abacus Data Network, V1
The NTIMIT corpus was developed by the NYNEX Science and Technology Speech Communication Group to provide a telephone bandwidth adjunct to TIMIT. NTIMIT was collected by transmitting all 6,300 original TIMIT recordings through a telephone handset and over various channels in the...
Jan 1, 1993
Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1993, "TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)", https://hdl.handle.net/11272.1/AB2/BU0KGP, Abacus Data Network, V1
This version of the TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) has all the waveform files formatted with ms-wav / RIFF headers, to make the corpus more accessible to a wider audience. The TIMIT corpus of read speech is designed to provide speech data for acoustic-...
Jan 1, 1993
Garofolo, John; Lamel, Lori; Fisher, William; Fiscus, Jonathan; Pallett, David; Dahlgren, Nancy; Zue, Victor, 1993, "TIMIT Acoustic-Phonetic Continuous Speech Corpus", https://hdl.handle.net/11272.1/AB2/SWVENO, Abacus Data Network, V1
The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each r...
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact Abacus Data Network Support

Abacus Data Network Support

Please fill this out to prove you are not a robot.

+ =