Switchboard-1 Release 2

Version 1.0

Godfrey, John J.; Holliman, Edward, 2021, "Switchboard-1 Release 2", https://hdl.handle.net/11272.1/AB2/VTPSCK, Abacus Data Network, V1

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

48 Downloads

Description	Abstract Introduction The Switchboard-1 Telephone Speech Corpus (LDC97S62) consists of approximately 260 hours of speech and was originally collected by Texas Instruments in 1990-1, under DARPA sponsorship. The first release of the corpus was published by NIST and distributed by the LDC in 1992-3. Since that release, a number of corrections have been made to the data files as presented on the original CD-ROM set and all copies of the first pressing have been distributed. Switchboard is a collection of about 2,400 two-sided telephone conversations among 543 speakers (302 male, 241 female) from all areas of the United States. A computer-driven robot operator system handled the calls, giving the caller appropriate recorded prompts, selecting and dialing another person (the callee) to take part in a conversation, introducing a topic for discussion and recording the speech from the two subjects into separate channels until the conversation was finished. About 70 topics were provided, of which about 50 were used frequently. Selection of topics and callees was constrained so that: (1) no two speakers would converse together more than once and (2) no one spoke more than once on a given topic. Data In this release, assembled and published by the LDC, all known errors affecting the original publication of speech files were corrected. In addition, modifications have been made to the contents of the NIST Sphere headers of all speech files, to identify each file as being part of the new release and to make the usage of the sample_count header field consistent with standard Sphere usage. (In particular, the sample_count field should reflect the number of samples on each channel in the file. In the initial release, this field was improperly set to be the total number of samples in both channels of the file this has been corrected in the new release.) Since the 1997 release, the Switchboard transcripts have been carefully revised at The Institute for Signal and Information Processing (ISIP) and additional problems have been discovered and patched. Three speech files, part of the original release, were inadvertently left off the 1997 revision. After corpus users noted some problems in the original speaker attribution table, LDC audited the problem calls and corrected the attributions. The latest version of ISIP transcriptions, the ISIP update of the ICSI phonetic transcriptions, and corrected word alignments are all available at ISIP. The LDC makes the transcript summaries available via in the online docs folder. Researchers have used SWB-1 data for various annotation projects including discourse annotation/speech acts, part-of-speech tagging and parsing, up-to-date orthographic transcriptions, and phonetic transcriptions. This summary documents which files have been used for the various annotations. In addition to the index of these file characteristics, there is also a table detailing speaker attributes. (1997)
Subject	Other
Keyword	Linguistics
Notes	DCMI Type(s): Sound Application(s): Speaker identification, speech recognition Language(s): English Language ID(s): eng Metadata automatically created from https://catalog.ldc.upenn.edu/LDC97S62 [26 Oct 2021]

Change View

Table

Tree

Filter by

	1 to 9 of 9 Files
	Working_with_ISO_Images.txt Documentation/Plain Text - 1.3 KB Published Oct 26, 2021 2 Downloads MD5: 4d4231d07ac669e105f71e602457efea How to work with ISO disc images Documentation	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC97S62_d4_File_Manifest.txt Documentation/Plain Text - 9.3 KB Published Oct 26, 2021 3 Downloads MD5: 8d051fb7e30fc20ca59dd976661ad2be File manifest for disc 4 Documentation	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC97S62_d1_File_Manifest.txt Documentation/Plain Text - 12.5 KB Published Oct 26, 2021 3 Downloads MD5: 0207c919387a0b1cf21d53604df00081 File manifest for disc 1 Documentation	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC97S62_d2_File_Manifest.txt Documentation/Plain Text - 13.2 KB Published Oct 26, 2021 3 Downloads MD5: cab39c34bac11ac6b636cd4e31e9a285 File manifest for disc 2 Documentation	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC97S62_d3_File_Manifest.txt Documentation/Plain Text - 15.7 KB Published Oct 26, 2021 3 Downloads MD5: 2cca27707fcf97d9a44083d7dfdae787 File manifest for disc 3 Documentation	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC97S62_d4.iso Data/Optical Disc Image - 2.0 GB Published Oct 26, 2021 6 Downloads MD5: 72f28f71755ef73d2151a584ffce73ed ISO disc image including all documentation and data: disc 4 Data	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC97S62_d2.iso Data/Optical Disc Image - 4.0 GB Published Oct 26, 2021 6 Downloads MD5: 05a70c56b7380fc2da9860ea7e6ab823 ISO disc image including all documentation and data: disc 2 Data	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC97S62_d1.iso Data/Optical Disc Image - 4.0 GB Published Oct 26, 2021 16 Downloads MD5: 9244159cd247c31a86f15b7ebac9a8b4 ISO disc image including all documentation and data: disc 1 Data	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC97S62_d3.iso Data/Optical Disc Image - 4.0 GB Published Oct 26, 2021 6 Downloads MD5: 71936549c675ea85aa9a9974a3338f8c ISO disc image including all documentation and data: disc 3 Data	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX

Citation Metadata

Dataset Persistent ID	hdl:11272.1/AB2/VTPSCK
Publication Date	2021-10-26
Title	Switchboard-1 Release 2
Other ID	Linguistic Data Consortium: LDC97S62 ISBN: 1-58563-121-3 ISLRN: 988-076-156-109-5 DOI: https://doi.org/10.35111/sw3h-rw02
Author	Godfrey, John J. Holliman, Edward
Contact	Use email button above to contact. Abacus Support
Description	Abstract Introduction The Switchboard-1 Telephone Speech Corpus (LDC97S62) consists of approximately 260 hours of speech and was originally collected by Texas Instruments in 1990-1, under DARPA sponsorship. The first release of the corpus was published by NIST and distributed by the LDC in 1992-3. Since that release, a number of corrections have been made to the data files as presented on the original CD-ROM set and all copies of the first pressing have been distributed. Switchboard is a collection of about 2,400 two-sided telephone conversations among 543 speakers (302 male, 241 female) from all areas of the United States. A computer-driven robot operator system handled the calls, giving the caller appropriate recorded prompts, selecting and dialing another person (the callee) to take part in a conversation, introducing a topic for discussion and recording the speech from the two subjects into separate channels until the conversation was finished. About 70 topics were provided, of which about 50 were used frequently. Selection of topics and callees was constrained so that: (1) no two speakers would converse together more than once and (2) no one spoke more than once on a given topic. Data In this release, assembled and published by the LDC, all known errors affecting the original publication of speech files were corrected. In addition, modifications have been made to the contents of the NIST Sphere headers of all speech files, to identify each file as being part of the new release and to make the usage of the sample_count header field consistent with standard Sphere usage. (In particular, the sample_count field should reflect the number of samples on each channel in the file. In the initial release, this field was improperly set to be the total number of samples in both channels of the file this has been corrected in the new release.) Since the 1997 release, the Switchboard transcripts have been carefully revised at The Institute for Signal and Information Processing (ISIP) and additional problems have been discovered and patched. Three speech files, part of the original release, were inadvertently left off the 1997 revision. After corpus users noted some problems in the original speaker attribution table, LDC audited the problem calls and corrected the attributions. The latest version of ISIP transcriptions, the ISIP update of the ICSI phonetic transcriptions, and corrected word alignments are all available at ISIP. The LDC makes the transcript summaries available via in the online docs folder. Researchers have used SWB-1 data for various annotation projects including discourse annotation/speech acts, part-of-speech tagging and parsing, up-to-date orthographic transcriptions, and phonetic transcriptions. This summary documents which files have been used for the various annotations. In addition to the index of these file characteristics, there is also a table detailing speaker attributes. (1997)
Subject	Other
Keyword	Linguistics (Linguistic Data Consortium)
Notes	DCMI Type(s): Sound Application(s): Speaker identification, speech recognition Language(s): English Language ID(s): eng Metadata automatically created from https://catalog.ldc.upenn.edu/LDC97S62 [26 Oct 2021]
Producer	Linguistic Data Consortium https://www.ldc.upenn.edu/
Distribution Date	1997
Deposit Date	1997
Series	LDC: Linguistic Data Consortium
Data Sources	Telephone conversations

Waiver

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation above, generated by the Dataverse.

No waiver has been selected for this dataset.

Linguistic Data Consortium Data Use Agreement

A. Except as to the extent prohibited by any user agreement, the user shall have the right to

incorporate portions of the LDC (Linguistic Data Consortium) data into its own work products for internal, non-commercial use and not for redistribution,
incorporate small excerpts of text or audio data from the LDC data for display or publication in a scientific or technical context, but only for the purpose of descriving the research and related issues, and
publish statistics and other summaries of the LDC data.

B. License

Except as otherwise provided herein, the user shall have no right to copy, redistribute, transmit, publish, sell, transfer, or otherwise use the LDC data for any purpose. The user shall give appropriate attribution to the LDC data in all scholarly or similar publications for which the LDC data or potions thereof have been used.

C. Access to Individual Users

Only individuals who are then-current faculty, students or staff members of LDC Member institutions or consultants or individuals providing services or doing research for Member institutions shall have access to the LDC data.

D. Copyright

The LDC data is protected by copyright as a collective work or compilation under the laws of the United States and other countries. All content, material, and other elements comprising LDC data are also copyrighted works. Users must abide by all additional copyright notices or restrictions contained in the LDC data license agreement supplements.

Guestbook

No guestbook is assigned to this dataset, you will not be prompted to provide any information on file download.

Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Files and Add Dataset Terms of Access

Restricting limits access to published files. You can add or edit Terms of Access for the dataset, and allow people to Request Access to restricted files.

Terms of Access

Request Access

Enable access request

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Restricted Files Selected

The selected file(s) may not be downloaded because you have not been granted access.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 4.0 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Log In to request access.

???file.mapData.unpublished.header???

???file.mapData.unpublished.message???

Dataset Terms

Please confirm and/or complete the information needed below in order to continue.

Linguistic Data Consortium Data Use Agreement

A. Except as to the extent prohibited by any user agreement, the user shall have the right to

incorporate portions of the LDC (Linguistic Data Consortium) data into its own work products for internal, non-commercial use and not for redistribution,
incorporate small excerpts of text or audio data from the LDC data for display or publication in a scientific or technical context, but only for the purpose of descriving the research and related issues, and
publish statistics and other summaries of the LDC data.

B. License

C. Access to Individual Users

D. Copyright

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://abacus.library.ubc.ca/api/access/datafile/

Request Access

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

Linguistic Data Consortium Data Use Agreement

A. Except as to the extent prohibited by any user agreement, the user shall have the right to

incorporate portions of the LDC (Linguistic Data Consortium) data into its own work products for internal, non-commercial use and not for redistribution,
incorporate small excerpts of text or audio data from the LDC data for display or publication in a scientific or technical context, but only for the purpose of descriving the research and related issues, and
publish statistics and other summaries of the LDC data.

B. License

C. Access to Individual Users

D. Copyright

Compute Batch

Clear Batch

Dataset	Dataset Persistent ID

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.1)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until Linguistic Data Consortium is published by its administrator.

Publish Dataset

This dataset cannot be published until Linguistic Data Consortium and Abacus Data Network are published.

Return to Author

Return this dataset to contributor for modification.