Digital Archive of Southern Speech - NLP Version

Version 1.0

Kretzschmar Jr., William; Bounds, Paulina; Hettel, Jacqueline; Coats, Steven; Pederson, Lee; Lena Opas-Hänninen, Lisa; Juuso, Ilkka; Seppänen, Tapio, 2016, "Digital Archive of Southern Speech - NLP Version", https://hdl.handle.net/11272.1/AB2/F4QH6S, Abacus Data Network, V1

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

0 Downloads

Description	Introduction Digital Archive of Southern Speech - NLP Version (DASS-NLP) was developed by LDC as an alternate version of Digital Archive of Southern Speech (DASS) (LDC2012S03) suitable for natural language processing and human language technology applications. Specifically, the original audio files have been converted to 16kHz 16-bit flac compressed wav and file names have been normalized to facilitate automatic processing. DASS was developed by the University of Georgia. It is a subset of the Linguistic Atlas of the Gulf States (LAGS), which is in turn part of the Linguist Atlas Project (LAP). DASS-NLP contains approximately 366 hours of English speech data from 30 female speakers and 34 male speakers in flac compressed wav format, along with associated metadata about the speakers and the recordings and maps in .jpeg format relating to the recording locations. LAP consists of a set of survey research projects about the words and pronunciation of everyday American English, the largest project of its kind in the United States. Interviews with thousands of native speakers across the country have been carried out since 1929. LAGS surveyed the everyday speech of Georgia, Tennessee, Florida, Alabama, Mississippi, Arkansas, Louisiana, and Texas in a series of 914 audio-taped interviews conducted from 1968-1983. Interviews average approximately six hours in length; the systematic LAGS tape archive amounts to 5500 hours of sound recordings. DASS is a collection of 64 interviews from LAGS selected to cover a range of speech across the region and to represent multiple education levels and ethnic backgrounds. Data The DASS-NLP speakers' average age is 61 years; there are 30 women and 34 men from the Gulf States region represented in this release. The interviews cover common topics such as family, the weather, household articles and activities, agriculture and social connections. The interviews were originally recorded in the field on reel-to-reel audio tape. A digital version of every reel of tape was then made, one .wav file per reel, usually about one hour of sound. Each interview thus consists of a set of 3 to 13 reels, or roughly 3 to 13 interview hours. Personally identifying or sensitive information in the files was replaced with a tone to protect the privacy and to assure ethical treatment of speakers.
Subject	Other
Keyword	Linguistics

Change View

Table

Tree

Filter by

	1 to 3 of 3 Files
	LCD2016S05_File_Manifest.txt Documentation/Plain Text - 17.7 KB Published Aug 30, 2020 0 Downloads MD5: ce70855d9aaa0072ca3ae4b5ca437108 File manifest Documentation	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	LDC2016S05.iso Data/Optical Disc Image - 21.9 GB Published Aug 30, 2020 0 Downloads MD5: 2707164d6ff10537aaea318e500f833c ISO disc image including all documentation and data Data	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX
	Working_with_ISO_Images.txt Documentation/Plain Text - 1.3 KB Published Aug 30, 2020 0 Downloads MD5: 4d4231d07ac669e105f71e602457efea Working with ISO disc images Documentation	Access File File Access Restricted Users may not request access to files. Download Metadata Data File Citation EndNote XML RIS BibTeX

Citation Metadata

Dataset Persistent ID	hdl:11272.1/AB2/F4QH6S
Publication Date	2016-07-15
Citation Date
Title	Digital Archive of Southern Speech - NLP Version
Other ID	Linguistic Data Consortium: LDC2016S05 ISBN: 1-58563-761-0 ISLRN: 920-059-271-034-1
Author	Kretzschmar Jr., William Bounds, Paulina Hettel, Jacqueline Coats, Steven Pederson, Lee Lena Opas-Hänninen, Lisa Juuso, Ilkka Seppänen, Tapio
Contact	Use email button above to contact. Abacus support
Description	Introduction Digital Archive of Southern Speech - NLP Version (DASS-NLP) was developed by LDC as an alternate version of Digital Archive of Southern Speech (DASS) (LDC2012S03) suitable for natural language processing and human language technology applications. Specifically, the original audio files have been converted to 16kHz 16-bit flac compressed wav and file names have been normalized to facilitate automatic processing. DASS was developed by the University of Georgia. It is a subset of the Linguistic Atlas of the Gulf States (LAGS), which is in turn part of the Linguist Atlas Project (LAP). DASS-NLP contains approximately 366 hours of English speech data from 30 female speakers and 34 male speakers in flac compressed wav format, along with associated metadata about the speakers and the recordings and maps in .jpeg format relating to the recording locations. LAP consists of a set of survey research projects about the words and pronunciation of everyday American English, the largest project of its kind in the United States. Interviews with thousands of native speakers across the country have been carried out since 1929. LAGS surveyed the everyday speech of Georgia, Tennessee, Florida, Alabama, Mississippi, Arkansas, Louisiana, and Texas in a series of 914 audio-taped interviews conducted from 1968-1983. Interviews average approximately six hours in length; the systematic LAGS tape archive amounts to 5500 hours of sound recordings. DASS is a collection of 64 interviews from LAGS selected to cover a range of speech across the region and to represent multiple education levels and ethnic backgrounds. Data The DASS-NLP speakers' average age is 61 years; there are 30 women and 34 men from the Gulf States region represented in this release. The interviews cover common topics such as family, the weather, household articles and activities, agriculture and social connections. The interviews were originally recorded in the field on reel-to-reel audio tape. A digital version of every reel of tape was then made, one .wav file per reel, usually about one hour of sound. Each interview thus consists of a set of 3 to 13 reels, or roughly 3 to 13 interview hours. Personally identifying or sensitive information in the files was replaced with a tone to protect the privacy and to assure ethical treatment of speakers.
Subject	Other
Keyword	Linguistics (ACV)
Producer	Linguistic Data Consortium (University of Pennsylvania) (LDC) https://www.ldc.upenn.edu/
Production Date	2016-07-15
Production Place	Philadelphia
Distributor	Linguistic Data Consortium (University of Pennsylvania) (LDC) https://www.ldc.upenn.edu/
Deposit Date	2016-09-22
Kind of Data	Linguistic data
Series	LDC: Linguistic Data Consortium
Data Sources	Field recordings

Geospatial Metadata

Geographic Coverage	United States

Social Science and Humanities Metadata

Study Level Error Notes

DCMI type: SoundSample type: pcmSample rate: 16000Application: Discourse analysisApplication: SociolinguisticsLanguage: EnglishLanguage ID: engSponsorship: Sponsorship The Atlas Data contained herein comprises information collected in the period spanning from the 1930s to 2010 and has been compiled from diverse sources, by, and under the direction of, Dr. William A. Kretzschmar, Harry and Jane Wilson Professor in Humanities at the Department of English of The University of Georgia. Compilation and digitalization of this work was funded, in part, by the US National Science Foundation and by the US National Endowment for the Humanities. Additional information about the Atlas Project can be obtained at http://www.lap.uga.edu/. Previous handle https://hdl.handle.net/11272/NUCKF has been deprecated

Waiver

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation above, generated by the Dataverse.

No waiver has been selected for this dataset.

Linguistic Data Consortium Data Use Agreement

A. Except as to the extent prohibited by any user agreement, the user shall have the right to

incorporate portions of the LDC (Linguistic Data Consortium) data into its own work products for internal, non-commercial use and not for redistribution,
incorporate small excerpts of text or audio data from the LDC data for display or publication in a scientific or technical context, but only for the purpose of descriving the research and related issues, and
publish statistics and other summaries of the LDC data.

B. License

Except as otherwise provided herein, the user shall have no right to copy, redistribute, transmit, publish, sell, transfer, or otherwise use the LDC data for any purpose. The user shall give appropriate attribution to the LDC data in all scholarly or similar publications for which the LDC data or potions thereof have been used.

C. Access to Individual Users

Only individuals who are then-current faculty, students or staff members of LDC Member institutions or consultants or individuals providing services or doing research for Member institutions shall have access to the LDC data.

D. Copyright

The LDC data is protected by copyright as a collective work or compilation under the laws of the United States and other countries. All content, material, and other elements comprising LDC data are also copyrighted works. Users must abide by all additional copyright notices or restrictions contained in the LDC data license agreement supplements.

User License Agreement for Digital Archive of Southern Speech - NLP Version (LDC2016S05)

____________("User"), an organization or person engaging in language education, research and technology development agrees to use the data data designated as Digital Archive of Southern Speech - NLP Version (LDC20156S05) (the "Data") and distributed by the LDC subject to the following understandings, terms and conditions.

Permitted and Prohibited Uses

1.1. The Data may only be used for linguistic education, research and technology development.

1.2. The user shall not publish, retransmit, display or redistribute or reproduce the the Data in any form, except that all fair use rights under U.S. copyright law are reserved and User may include limited excerpts from the Data in articles, reports and other documents describing the results of the User's linguistic education, research and technology development.

1.3. User may include analyses based on the Data in their products, including commercial products, except that the outright inclusion of the Data itself, or portions thereof, shall be preceded by execution of a commercial license between on one side the American Dialect Society ("ADS") and the University of Georgia Research Foundatation ("UGARF") and on the other, the User intending to make such commmercial use.
Copyright Notice and Disclaimer

2.1. The Data is the property of ADS, UGARF and the Board of Regents of the University System of Georgia by and on behalf of the University of Georgia and is protected by all applicable copyright law. In no event shall User publish, retransmit, display, redistribute, or otherwise reproduce any or all of the Data in any format to anyone, except as allowed in Section 1 of this agreement.

2.2. User acknowledges and agrees that Digital Archive of Southern Speech - NLP Version (LDC2016S05) is provided on an "as-is" basis and that LDC and its host institution the University of Pennsylvania make no representations or warranties of merchantibility, fitness for a particular purpose, or conformity with whatever documentation is provides. In no event shall LDC or its host institution be liable for special, direct, indirect, consequential, punitive, incidental or other damages, losses, costs, charges, claims, demands, fees or expenses of any nature or kind arising in any way from the furnishing of, or User's use of, Digital Archive of Southern Speech - NLP Version (LDC2016S05).

Disclaimer

Terms of Use

Introduction
1. The "Service" means, collectively, all aspects of the Abacus / NESSTAR and associated services and websites.
2. The term "Content" means the data, text, graphics, photos, sounds, music, videos, audiovisual combinations, interactive features, software, scripts, and any other electronic materials you may view on or access through the Service.
Your Acceptance of this Agreement
1. By clicking you agree to the terms and conditions of this Agreement, which supplement the policies, rules and requirements of your institution.
2. If you do not agree to these Terms of Use you must not log in, access, browse or otherwise use the Service. If you have questions or concerns, please contact abacus-support@lists.ubc.ca.
Use of the Service and Content
Use of the Service and Your Content. You may access and use the Content uploaded on the Service strictly in compliance with the copyright terms identified on or associated with such Content.
General Conditions of Use
1. Without limiting the foregoing and the prohibited uses set out in Policy #104, Acceptable Use and Security of UBC Electronic Information and Systems, which is hereby incorporated by reference, the following is not permitted:
  1. using any automated system, including without limitation, "robots," "spiders," or "offline readers," to harvest or scrape information from the Service or any part(s) thereof, or to send more request messages in a given period of time than a human can reasonably produce in the same period by using a conventional on-line web browser; or
  2. in any way intentionally placing undue burden on the technical systems or networks connected to the Service.
2. UBC may suspend your account, or access to the Service, if it learns or is credibly notified (as determined by UBC) that your conduct is in violation of these Terms of Use.
Liability and Indemnity
1. The Service and the Content is provided to you AS IS. You understand that UBC does not endorse any Content submitted to the Service by any user, or any opinion, recommendation, or advice expressed therein, and UBC expressly disclaims any and all liability in connection with Content, including without limitation all direct, indirect, special, incidental or consequential damage or any other damages whatsoever and howsoever caused, arising out of or in connection with the use of the Service or any Content, or in reliance on the Service or the Content.
2. In addition, the Service may contain links to third party websites. UBC has no control over, and assumes no responsibility for, the content, privacy policies, or practices of any third party websites.
3. You agree to indemnify and hold harmless UBC, its Board of Governors, agents, contractors, licensors, and licensees against any all claims arising from or in any way relating to your use of the Service.
Trademarks
Certain words, phrases, names, designs or logos used on the Site may constitute trademarks, service marks or trade names of the UBC or other entities. The display of any such marks or names on the Site does not imply that UBC or other entities have granted a license or authorization of any kind to use such marks or names. You may not use any of UBC's trademarks, service marks or trade names without UBC's prior written permission.
Choice of Law
The laws of the Province of British Columbia and the laws of Canada applicable therein shall govern as to the interpretation, validity and effect of this document, notwithstanding any conflict of laws provisions of your domicile, residence or physical location. You hereby consent and submit to the exclusive jurisdiction of the courts of the Province of British Columbia in any action or proceeding instituted under or related to your use of the Service.

Guestbook

No guestbook is assigned to this dataset, you will not be prompted to provide any information on file download.

Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Files and Add Dataset Terms of Access

Restricting limits access to published files. You can add or edit Terms of Access for the dataset, and allow people to Request Access to restricted files.

Terms of Access

Request Access

Enable access request

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Restricted Files Selected

The selected file(s) may not be downloaded because you have not been granted access.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 4.0 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Log In to request access.

???file.mapData.unpublished.header???

???file.mapData.unpublished.message???

Dataset Terms

Please confirm and/or complete the information needed below in order to continue.

Linguistic Data Consortium Data Use Agreement

A. Except as to the extent prohibited by any user agreement, the user shall have the right to

incorporate portions of the LDC (Linguistic Data Consortium) data into its own work products for internal, non-commercial use and not for redistribution,
incorporate small excerpts of text or audio data from the LDC data for display or publication in a scientific or technical context, but only for the purpose of descriving the research and related issues, and
publish statistics and other summaries of the LDC data.

B. License

C. Access to Individual Users

D. Copyright

User License Agreement for Digital Archive of Southern Speech - NLP Version (LDC2016S05)

Permitted and Prohibited Uses

1.1. The Data may only be used for linguistic education, research and technology development.

1.2. The user shall not publish, retransmit, display or redistribute or reproduce the the Data in any form, except that all fair use rights under U.S. copyright law are reserved and User may include limited excerpts from the Data in articles, reports and other documents describing the results of the User's linguistic education, research and technology development.

1.3. User may include analyses based on the Data in their products, including commercial products, except that the outright inclusion of the Data itself, or portions thereof, shall be preceded by execution of a commercial license between on one side the American Dialect Society ("ADS") and the University of Georgia Research Foundatation ("UGARF") and on the other, the User intending to make such commmercial use.
Copyright Notice and Disclaimer

2.1. The Data is the property of ADS, UGARF and the Board of Regents of the University System of Georgia by and on behalf of the University of Georgia and is protected by all applicable copyright law. In no event shall User publish, retransmit, display, redistribute, or otherwise reproduce any or all of the Data in any format to anyone, except as allowed in Section 1 of this agreement.

2.2. User acknowledges and agrees that Digital Archive of Southern Speech - NLP Version (LDC2016S05) is provided on an "as-is" basis and that LDC and its host institution the University of Pennsylvania make no representations or warranties of merchantibility, fitness for a particular purpose, or conformity with whatever documentation is provides. In no event shall LDC or its host institution be liable for special, direct, indirect, consequential, punitive, incidental or other damages, losses, costs, charges, claims, demands, fees or expenses of any nature or kind arising in any way from the furnishing of, or User's use of, Digital Archive of Southern Speech - NLP Version (LDC2016S05).

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://abacus.library.ubc.ca/api/access/datafile/

Request Access

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

Linguistic Data Consortium Data Use Agreement

A. Except as to the extent prohibited by any user agreement, the user shall have the right to

incorporate portions of the LDC (Linguistic Data Consortium) data into its own work products for internal, non-commercial use and not for redistribution,
incorporate small excerpts of text or audio data from the LDC data for display or publication in a scientific or technical context, but only for the purpose of descriving the research and related issues, and
publish statistics and other summaries of the LDC data.

B. License

C. Access to Individual Users

D. Copyright

User License Agreement for Digital Archive of Southern Speech - NLP Version (LDC2016S05)

Permitted and Prohibited Uses

1.1. The Data may only be used for linguistic education, research and technology development.

1.2. The user shall not publish, retransmit, display or redistribute or reproduce the the Data in any form, except that all fair use rights under U.S. copyright law are reserved and User may include limited excerpts from the Data in articles, reports and other documents describing the results of the User's linguistic education, research and technology development.

1.3. User may include analyses based on the Data in their products, including commercial products, except that the outright inclusion of the Data itself, or portions thereof, shall be preceded by execution of a commercial license between on one side the American Dialect Society ("ADS") and the University of Georgia Research Foundatation ("UGARF") and on the other, the User intending to make such commmercial use.
Copyright Notice and Disclaimer

2.1. The Data is the property of ADS, UGARF and the Board of Regents of the University System of Georgia by and on behalf of the University of Georgia and is protected by all applicable copyright law. In no event shall User publish, retransmit, display, redistribute, or otherwise reproduce any or all of the Data in any format to anyone, except as allowed in Section 1 of this agreement.

2.2. User acknowledges and agrees that Digital Archive of Southern Speech - NLP Version (LDC2016S05) is provided on an "as-is" basis and that LDC and its host institution the University of Pennsylvania make no representations or warranties of merchantibility, fitness for a particular purpose, or conformity with whatever documentation is provides. In no event shall LDC or its host institution be liable for special, direct, indirect, consequential, punitive, incidental or other damages, losses, costs, charges, claims, demands, fees or expenses of any nature or kind arising in any way from the furnishing of, or User's use of, Digital Archive of Southern Speech - NLP Version (LDC2016S05).

Compute Batch

Clear Batch

Dataset	Dataset Persistent ID

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.1)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until Linguistic Data Consortium is published by its administrator.

Publish Dataset

This dataset cannot be published until Linguistic Data Consortium and Abacus Data Network are published.

Return to Author

Return this dataset to contributor for modification.