Abacus Data Network

Metrics

734,669 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Publication Year: 2022

51 to 100 of 122 Results

Ethiopia 1:50,000 Scale Topographic Maps Aug 24, 2022 - UBC Library licensed data EastView Geospatial Inc., 2022, "Ethiopia 1:50,000 Scale Topographic Maps", https://hdl.handle.net/11272.1/AB2/MQR7AY, Abacus Data Network, V2 Georeferenced topographic maps of selected regions of Ethiopia. Maps are in GeoTIFF raster format.
Survey on Health Care Workers' Experiences During the Pandemic (SHCWEP), 2021 Aug 23, 2022 - Statistics Canada Open License Statistics Canada, 2022, "Survey on Health Care Workers' Experiences During the Pandemic (SHCWEP), 2021", https://hdl.handle.net/11272.1/AB2/IFL0RX, Abacus Data Network, V1, UNF:6:rjyrygwtTrjPvrtEJf6pYw== [fileUNF] This public use microdata file includes information on the impacts of COVID-19 on Canadian health care workers, with particular focus on job type and setting, personal protective equipment (PPE) and infection prevention and control (IPC) practices and protocols, and the impacts o...
American English Nickname Collection Aug 9, 2022 - Linguistic Data Consortium Carvalho, Vitor R.; Kiran, Yigit; Borthwick, Andrew, 2022, "American English Nickname Collection", https://hdl.handle.net/11272.1/AB2/JR1WG6, Abacus Data Network, V1 Abstract Introduction American English Nickname Collection was developed by Intelius, Inc. and is a compilation of American English nicknames to given name mappings based on information in US government records, public web profiles and financial and property reports. This corpus...
Qatari Corpus of Argumentative Writing Aug 9, 2022 - Linguistic Data Consortium Ahmed, Abdelhamid M.; Myhill, Debra; Abdollahzadeh, Esmaeel; McCallum, Lee; Zaghouani, Wajdi; Rezk, Lameya; Jrad, Anissa; Zhang, Xiao, 2022, "Qatari Corpus of Argumentative Writing", https://hdl.handle.net/11272.1/AB2/F2P2EY, Abacus Data Network, V1 Abstract Introduction Qatari Corpus of Argumentative Writing was developed by Qatar University, University of Exeter and Hamad Bin Khalifa University and is comprised of approximately 200,000 tokens of Arabic and English writing by undergraduate students (159 female, 36 male) alo...
Survey on Early Learning and Child Care Arrangements, 2019 Jul 15, 2022 - Statistics Canada Open License Statistics Canada, 2022, "Survey on Early Learning and Child Care Arrangements, 2019", https://hdl.handle.net/11272.1/AB2/YAGGCP, Abacus Data Network, V1, UNF:6:C82FnsE4W7hBbvDkSZeb2w== [fileUNF] Statistics Canada is gathering information from families who use child care as well as those who do not. The survey, which addresses child care in Canada for children younger than 6 years old, asks about the different types of early learning and child care arrangements that famil...
[Custom tabulation: Sex and selected socio-demographic characteristics of population 15 years and over in private household for British Columbia, Chilliwack CSD and the custom area, 2001 and 2016 Censuses of Canada, 25% Sample Data] Jul 13, 2022 - Statistics Canada Open License Statistics Canada, 2022, "[Custom tabulation: Sex and selected socio-demographic characteristics of population 15 years and over in private household for British Columbia, Chilliwack CSD and the custom area, 2001 and 2016 Censuses of Canada, 25% Sample Data]", https://hdl.handle.net/11272.1/AB2/JMIQZV, Abacus Data Network, V1 These tables includes information on sex and selected socio-demographic characteristics of population 15 years and over in private household for British Columbia, Chilliwack CSD and custom area.
Second DIHARD Challenge Evaluation - Eleven Sources Jul 7, 2022 - Linguistic Data Consortium Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2022, "Second DIHARD Challenge Evaluation - Eleven Sources", https://hdl.handle.net/11272.1/AB2/ML7KD5, Abacus Data Network, V1 Abstract Introduction Second DIHARD Challenge Evaluation - Eleven Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 20 hours of English and Chinese speech data along with corresponding annotations used in support of the Second DIHARD Challen...
NUBUC Jul 7, 2022 - Linguistic Data Consortium Lewis, Gwyneth; van Rijn, Pol; Gwilliams, Laura; Larrouy-Maestri, Pauline; Poeppel, David; Ghitza, Oded, 2022, "NUBUC", https://hdl.handle.net/11272.1/AB2/IUFKIG, Abacus Data Network, V1 Abstract Introduction NUBUC (NyU-BU contextually controlled stories Corpus) was developed by New York University, Max Planck Institute for Empirical Aesthetics and Boston University. It contains approximately three hours of English read speech from eight stories focused on lingui...
Canadian Alcohol and Drugs Survey, 2019 Jul 6, 2022 - Statistics Canada Open License Statistics Canada, 2022, "Canadian Alcohol and Drugs Survey, 2019", https://hdl.handle.net/11272.1/AB2/7KS8TV, Abacus Data Network, V2, UNF:6:c62qJ6h/bvWispQtQVUHZw== [fileUNF] The main objective of this survey is to collect information on Canadians’ use of alcohol and drugs. Health Canada and other organizations will use the information to monitor changes in alcohol and drug use. The other objectives of the Canadian Alcohol and Drugs Survey (CADS) are...
Postal Codes by Federal Ridings File (PCFRF) 2013 Representation Order, June 2022 Postal Codes, 2022 Jun 30, 2022 - Statistics Canada - DLI Statistics Canada, 2022, "Postal Codes by Federal Ridings File (PCFRF) 2013 Representation Order, June 2022 Postal Codes, 2022", https://hdl.handle.net/11272.1/AB2/JOVEPQ, Abacus Data Network, V1 The Postal Code Project is responsible for linking the approximately 900,000 single postal codes in Canada to Statistics Canada’s Census dissemination geography, (presently 2021 Census geography). This process is performed by using data provided by Canada Post Corporation and lin...
Postal Code Conversion File, June 2022 Postal Codes, 2022 Jun 30, 2022 - Statistics Canada - DLI Statistics Canada, 2022, "Postal Code Conversion File, June 2022 Postal Codes, 2022", https://hdl.handle.net/11272.1/AB2/7DCHD8, Abacus Data Network, V1 The Postal Code Project is responsible for linking the approximately 900,000 single postal codes in Canada to Statistics Canada’s Census dissemination geography, (presently 2021 Census geography). This process is performed by using data provided by Canada Post Corporation and lin...
Postal Code Conversion File Plus (PCCF+) Version 7E, November 2021 Postal Codes Jun 24, 2022 - Statistics Canada - DLI Statistics Canada, 2022, "Postal Code Conversion File Plus (PCCF+) Version 7E, November 2021 Postal Codes", https://hdl.handle.net/11272.1/AB2/D1AO5H, Abacus Data Network, V1 The Postal Code Conversion File Plus (PCCF+) is a SAS control program and set of associated datasets derived from the Postal Code Conversion File (PCCF), a 2016 postal code population weight file, the Geographic Attribute File, Health Region boundary files, and other supplementar...
Canadian Legal Problems Survey, 2021 Jun 16, 2022 - Statistics Canada Open License Statistics Canada, 2022, "Canadian Legal Problems Survey, 2021", https://hdl.handle.net/11272.1/AB2/BZZXO3, Abacus Data Network, V2, UNF:6:4MsCj17XMOXfCtv4Ma3y3w== [fileUNF] The purpose of the Canadian Legal Problems Survey (CLPS) is to identify the kinds of serious problems people face, how they attempt to resolve them, and how these experiences may impact their lives. The information collected will be used to better understand the various methods p...
LORELEI Wolof Representative Language Pack Jun 10, 2022 - Linguistic Data Consortium Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Wolof Representative Language Pack", https://hdl.handle.net/11272.1/AB2/1M9HI6, Abacus Data Network, V1 Abstract Introduction LORELEI Wolof Representative Language Pack consists of Wolof monolingual text, Wolof-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LORELEI...
Survey on Early Learning and Child Care Arrangements, 2020 May 19, 2022 - Statistics Canada Open License Statistics Canada, 2022, "Survey on Early Learning and Child Care Arrangements, 2020", https://hdl.handle.net/11272.1/AB2/RRSNDB, Abacus Data Network, V1, UNF:6:G16e9NSu5tgMCWHuMHfoMg== [fileUNF] This survey is designed to collect information on the use (or non-use) of early learning and child care in Canada for children younger than 6 years old. It asks whether or not a particular child under the age of 6 in the family is enrolled in early learning and child care, and co...
[Homicide Survey 1960 - 2020. Custom Tabulation] May 18, 2022 - Statistics Canada Open License Statistics Canada, 2022, "[Homicide Survey 1960 - 2020. Custom Tabulation]", https://hdl.handle.net/11272.1/AB2/CWQQJF, Abacus Data Network, V1 Statistics Canada custom tabulation from the Homicide Survey, 1960-2020. This survey collects detailed data on homicide in Canada. The survey has collected police-reported data on the characteristics of all murder incidents, victims and accused persons / chargeable suspects since...
Postal Codes by Federal Ridings File (PCFRF) 2013 Representation Order, March 2022 Postal Codes, 2022 May 5, 2022 - Statistics Canada - DLI Statistics Canada, 2022, "Postal Codes by Federal Ridings File (PCFRF) 2013 Representation Order, March 2022 Postal Codes, 2022", https://hdl.handle.net/11272.1/AB2/ZICIEN, Abacus Data Network, V2 The Postal Code Project is responsible for linking the approximately 900,000 single postal codes in Canada to Statistics Canada’s Census dissemination geography, (presently 2021 Census geography). This process is performed by using data provided by Canada Post Corporation and lin...
[Census Families in private households by selected sociodemographic characteristics and Census Family structure for Montréal and Québec Census Metropolitan Areas (CMA) and their Census Tracts (CT): 2016, 2011 and 2006] Apr 13, 2022 - Statistics Canada Open License Statistics Canada, 2022, "[Census Families in private households by selected sociodemographic characteristics and Census Family structure for Montréal and Québec Census Metropolitan Areas (CMA) and their Census Tracts (CT): 2016, 2011 and 2006]", https://hdl.handle.net/11272.1/AB2/Z8V2YC, Abacus Data Network, V1 Custom tabulation requested by the Université de Montréal, released 13 April 2022. Families that immigrated with children 0-17 years old Parents without diploma or certificate among parents of children 0-17 years old Unemployment rate of parents with children 0-17 years old Famil...
AttImam Mar 31, 2022 - Linguistic Data Consortium Alsaif, Amal; Alyahya, Tasniem; Alotibi, Madawi; Almuzaini, Huda; Alqahtani, Abeer, 2022, "AttImam", https://hdl.handle.net/11272.1/AB2/9FBCBG, Abacus Data Network, V1 Abstract Introduction AttImam was developed by Al-Imam Mohammad Ibn Saud Islamic University and consists of approximately 2,000 attribution relations applied to Arabic newswire text from Arabic Treebank: Part 1 v 4.1 (LDC2010T13). Attribution refers to the process of reporting or...
IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 Mar 18, 2022 - Linguistic Data Consortium Andrus, Tony; Bills, Aric; Corris, Miriam; Dubinski, Eyal; Fiscus, Jonathan G.; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Le, Hanh; Ray, Jessica; Rytting, Anton; Silber, Ronnie; Shen, Wade; Tzoukermann, Evelyne, 2022, "IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7", https://hdl.handle.net/11272.1/AB2/WJGWAP, Abacus Data Network, V1 Abstract Introduction IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 201 hours of Vietnamese conversational and scripted telephone speech co...
IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b Mar 18, 2022 - Linguistic Data Consortium Bills, Aric; Conners, Thomas; Corris, Miriam; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Malyska, Nicolas; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Zawaydeh, Bushra, 2022, "IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b", https://hdl.handle.net/11272.1/AB2/HSAU9N, Abacus Data Network, V1 Abstract Introduction IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Dholuo conversational and scripted telephone speech collected...
IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b Mar 18, 2022 - Linguistic Data Consortium Andresen, Lucy; Bills, Aric; Conners, Thomas; Cruz, Luanne Dela; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Le, Hanh; Maurillo, Arlene; Melot, Jennifer; Phillips, Josh; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2022, "IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b", https://hdl.handle.net/11272.1/AB2/3EYPZM, Abacus Data Network, V1 Abstract Introduction IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 191 hours of Cebuano conversational and scripted telephone speech collect...
The Child Subglottal Resonances Database Mar 18, 2022 - Linguistic Data Consortium Lulich, Steven M.; Alwan, Abeer; Sommers, Mitchell S.; Yeung, Gary, 2022, "The Child Subglottal Resonances Database", https://hdl.handle.net/11272.1/AB2/O4SRBR, Abacus Data Network, V1 Abstract Introduction The Child Subglottal Resonances Database was developed by Washington University and University of California Los Angeles and consists of 15.5 hours of simultaneous microphone and subglottal accelerometer recordings of 19 male and 9 female child speakers of A...
The SSNCE Database of Tamil Dysarthric Speech Mar 18, 2022 - Linguistic Data Consortium Vijayalakshmi, P.; Celin, T. A. Mariya; Nagarajan, T., 2022, "The SSNCE Database of Tamil Dysarthric Speech", https://hdl.handle.net/11272.1/AB2/QXP9LM, Abacus Data Network, V1 Abstract Introduction The SSNCE Database of Tamil Dysarthric Speech was developed by the Speech Lab, SSN College of Engineering, India, in collaboration with the Indian National Institute of Empowerment of Persons with Multiple Disabilities (NIEPMD) and contains approximately eig...
LORELEI Ukrainian Representative Language Pack Mar 18, 2022 - Linguistic Data Consortium Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Ma, Xiaoyi; Kulick, Seth; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Ukrainian Representative Language Pack", https://hdl.handle.net/11272.1/AB2/GUYCZL, Abacus Data Network, V1 Abstract Introduction LORELEI Ukrainian Representative Language Pack consists of Ukrainian monolingual text, Ukrainian-English parallel and comparable text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LO...
LORELEI Tigrinya Incident Language Pack Mar 18, 2022 - Linguistic Data Consortium Tracey, Jennifer; Graff, David; Strassel, Stephanie; Arrigo, Michael; Wright, Jonathan; Bies, Ann, 2022, "LORELEI Tigrinya Incident Language Pack", https://hdl.handle.net/11272.1/AB2/CTYB7Q, Abacus Data Network, V1 Abstract Introduction LORELEI Tigrinya Incident Language Pack was developed by the Linguistic Data Consortium and is comprised of approximately 4.5 million words of Tigrinya monolingual text, 25,000 words of English monolingual text, 235,000 words of parallel and comparable Tigri...
BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech Mar 18, 2022 - Linguistic Data Consortium Palmer, Martha; Hwang, Jena D.; Bonial, Claire; O'Gorman, Tim; Gung, James; Stowe, Kevin; Green, Meredith, 2022, "BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/QABG8N, Abacus Data Network, V1 Abstract Introduction BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank and verb sense disambiguat...
BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech Mar 18, 2022 - Linguistic Data Consortium Agarwal, Nitin; Franchini, Michelle; Kappler, Michelle; Micciulla, Linnea; Pradhan, Sameer; Ramshaw, Lance, 2022, "BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/3JEVXI, Abacus Data Network, V1 Abstract Introduction BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies and consists of co-reference annotation on English discussion forum (DF), SMS/Chat and conversational telephone speech (CT...
DEFT Chinese Light and Rich ERE Annotation Mar 18, 2022 - Linguistic Data Consortium Chen, Song; Strassel, Stephanie; Mott, Justin, 2022, "DEFT Chinese Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/MUVS7U, Abacus Data Network, V1 Abstract Introduction DEFT Chinese Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 157 Chinese discussion forum documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filtering of Text (DEFT)...
TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017 Mar 18, 2022 - Linguistic Data Consortium Ellis, Joe; Getman, Jeremy; Chen, Song; Strassel, Stephanie, 2022, "TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017", https://hdl.handle.net/11272.1/AB2/KSIXIZ, Abacus Data Network, V1 Abstract Introduction TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the 2016 TAC KBP Event Argument Linking Pilot and Evaluation...
LORELEI Vietnamese Representative Language Pack Mar 18, 2022 - Linguistic Data Consortium Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Vietnamese Representative Language Pack", https://hdl.handle.net/11272.1/AB2/JWPEIA, Abacus Data Network, V1 Abstract Introduction LORELEI Vietnamese Representative Language Pack consists of Vietnamese monolingual text, Vietnamese-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI progra...
BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training Mar 18, 2022 - Linguistic Data Consortium Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2022, "BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training", https://hdl.handle.net/11272.1/AB2/N2DIGA, Abacus Data Network, V1 Abstract Introduction BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training was developed by the Linguistic Data Consortium (LDC) and consists of 158,651 words of Chinese and English parallel text enhanced with linguistic tags to indicate wor...
Speech Sentiment Annotations Mar 18, 2022 - Linguistic Data Consortium Chen, Eric Y.; Lu, Zhiyun; Xu, Hao; Cao, Liangliang; Zhang, Yu; Fan, James, 2022, "Speech Sentiment Annotations", https://hdl.handle.net/11272.1/AB2/HD3CEY, Abacus Data Network, V1 Abstract Introduction Speech Sentiment Annotations was developed by Google Inc. It consists of sentiment labels (positive, negative, neutral) for approximately 49,500 utterances covering 140 hours of audio from Switchboard-1 Release 2 (LDC97S62). Switchboard-1 Release 2 consists...
TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015 Mar 18, 2022 - Linguistic Data Consortium Bies, Ann; Ellis, Joe; Getman, Jeremy; Chen, Song; Strassel, Stephanie, 2022, "TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015", https://hdl.handle.net/11272.1/AB2/UHLXHR, Abacus Data Network, V1 Abstract Introduction TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP English Event Nug...
SemTransCNC Mar 18, 2022 - Linguistic Data Consortium Wang, Shichang; Huang, Chu-Ren; Yao, Yao; Chan, Angel, 2022, "SemTransCNC", https://hdl.handle.net/11272.1/AB2/TV07UB, Abacus Data Network, V1 Abstract Introduction SemTransCNC was developed by The Hong Kong Polytechnic University. It is comprised of a semantic transparency dataset of Chinese nominal compounds built using a series of crowd-based experiments. Nominal compounds were selected from the Sinica Corpus and a m...
TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013 Mar 18, 2022 - Linguistic Data Consortium Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2022, "TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013", https://hdl.handle.net/11272.1/AB2/TVJSBF, Abacus Data Network, V1 Abstract Introduction TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP English Temporal Slot Filling...
Abstract Meaning Representation 2.0 - Four Translations Mar 18, 2022 - Linguistic Data Consortium Damonte, Marco; Cohen, Shay, 2022, "Abstract Meaning Representation 2.0 - Four Translations", https://hdl.handle.net/11272.1/AB2/5OU0AQ, Abacus Data Network, V1 Abstract Introduction Abstract Meaning Representation 2.0 - Four Translations was developed by researchers at the University of Edinburgh, School of Informatics and consists of Spanish, German, Italian and Chinese Mandarin translations of a subset of sentences from Abstract Meani...
EVALution Mar 18, 2022 - Linguistic Data Consortium Santus, Enrico; Liu, Hongchao; Huang, Chu-Ren, 2022, "EVALution", https://hdl.handle.net/11272.1/AB2/JQ231B, Abacus Data Network, V1 Abstract Introduction EVALution was developed by The Hong Kong Polytechnic University. It is comprised of English and Mandarin Chinese data sets -- EVALution 1.0 and EVALution-Man, respectively -- that contain semantic relations and metadata for training and evaluating distributi...
Phonemes of Arabic Mar 18, 2022 - Linguistic Data Consortium Alshaari, Mohamed; ElHarati, Hussien; Kepuska, Veton, 2022, "Phonemes of Arabic", https://hdl.handle.net/11272.1/AB2/WSRL3A, Abacus Data Network, V1 Abstract Introduction Phonemes of Arabic was developed at the Florida Institute of Technology. It consists of approximately one hour of speech from native Arabic speakers that includes all Arabic sounds (consonants and vowels) and 24 words with specific consonant-vowel patterns....
Global TIMIT Mandarin Chinese-Guanzhong Dialect Mar 18, 2022 - Linguistic Data Consortium Jiang, Yue; Zhan, Juhong; Han, Hongjian; Xu, Zuohao; Zhou, Haiyan; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Mandarin Chinese-Guanzhong Dialect", https://hdl.handle.net/11272.1/AB2/MFTAUQ, Abacus Data Network, V1 Abstract Introduction Global TIMIT Mandarin Chinese-Guanzhong Dialect was developed by the Linguistic Data Consortium and Xi'an Jiaotong University and consists of approximately five hours of read speech and transcripts in the Guanzhong dialect of Mandarin Chinese as spoken in Sh...
Global TIMIT Learner Simple English Mar 18, 2022 - Linguistic Data Consortium Ding, Hongwei; Liao, Sishi; Zhan, Yuqing; Feng, Hui; He, Wenchao; Hu, Xiaoyan; Wu, Yu; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Learner Simple English", https://hdl.handle.net/11272.1/AB2/NMUWWH, Abacus Data Network, V1 Abstract Introduction Global TIMIT Learner Simple English was developed by the Linguistic Data Consortium and Shanghai Jiao Tong University and consists of approximately 12 hours of L1 and L2 English read speech and transcripts. The Global TIMIT project aimed to create a series o...
Global TIMIT Learner Treebank English Mar 18, 2022 - Linguistic Data Consortium Luan, Huan; Wang, Yanhong; Feng, Hui; He, Wenchao; Hu, Xiaoyan; Wu, Yu; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Learner Treebank English", https://hdl.handle.net/11272.1/AB2/A2ZRDI, Abacus Data Network, V1 Abstract Introduction Global TIMIT Learner Treebank English was developed by the Linguistic Data Consortium and LAIX Inc. and consists of approximately 24 hours of L1 and L2 English read speech and transcripts. The Global TIMIT project aimed to create a series of corpora in a var...
CALLFRIEND American English-Southern Dialect Second Edition Mar 18, 2022 - Linguistic Data Consortium Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2022, "CALLFRIEND American English-Southern Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/O0EZK5, Abacus Data Network, V1 Abstract Introduction CALLFRIEND American English-Southern Dialect Second Edition was developed by LDC and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Southern dialects of American English. This second edition updates the au...
CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition Mar 18, 2022 - Linguistic Data Consortium Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2022, "CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/AT8NRM, Abacus Data Network, V1 Abstract Introduction CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 27 hours of unscripted telephone conversations between native speakers of the Taiwan dialect of Mandarin Chinese. Th...
Chinese Lexical Resources for Gender, Number, Animacy Mar 18, 2022 - Linguistic Data Consortium Chen, Song; Yuan, Jiahong; Ma, Xiaoyi; Strassel, Stephanie, 2022, "Chinese Lexical Resources for Gender, Number, Animacy", https://hdl.handle.net/11272.1/AB2/2CSZDM, Abacus Data Network, V1 Abstract Introduction Chinese Lexical Resources for Gender, Number, Animacy was developed by the Linguistic Data Consortium (LDC) and consists of gender, number, and animacy lexicons produced in support of the DARPA DEFT program. Gender, number and animacy are lexical indicators...
GALE Phase 4 Chinese Broadcast News Transcripts Mar 18, 2022 - Linguistic Data Consortium Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2022, "GALE Phase 4 Chinese Broadcast News Transcripts", https://hdl.handle.net/11272.1/AB2/TVASI8, Abacus Data Network, V1 Abstract Introduction GALE Phase 4 Chinese Broadcast News Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 134 hours of Chinese broadcast news speech collected in 2008 by LDC and Hong University of Science and Technolo...
Columbia Games Corpus Mar 18, 2022 - Linguistic Data Consortium Hirschberg, Julia; Gravano, Agustin; Benus, Stefan; Ward, Gregory; Sneed German, Elisa, 2022, "Columbia Games Corpus", https://hdl.handle.net/11272.1/AB2/TPZYOR, Abacus Data Network, V1 Abstract Introduction Columbia Games Corpus was developed by the Spoken Language Group, Columbia University and the Department of Linguistics, Northwestern University. It consists of approximately 10 hours of spontaneous English conversation along with corresponding orthographic...
Corpus of Law, Academic, and News Mar 18, 2022 - Linguistic Data Consortium Mohammadi, Ariana Negar, 2022, "Corpus of Law, Academic, and News", https://hdl.handle.net/11272.1/AB2/VMWYC0, Abacus Data Network, V1 Abstract Introduction Corpus of Law, Academic, and News consists of 400 Persian documents divided into three genres: legal, academic, and news. The legal section contains texts from official publications, including the civil penal code, the criminal penal code, and the constituti...
Penn Parsed Corpora of Historical English Mar 18, 2022 - Linguistic Data Consortium Kroch, Anthony, 2022, "Penn Parsed Corpora of Historical English", https://hdl.handle.net/11272.1/AB2/NWMKHI, Abacus Data Network, V1 Abstract Introduction Penn Parsed Corpora of Historical English was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the First World War (19...
Global TIMIT Mandarin Chinese-Guanzhong Dialect Mar 18, 2022 - Linguistic Data Consortium Jiang, Yue; Zhan, Juhong; Han, Hongjian; Xu, Zuohao; Zhou, Haiyan; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Mandarin Chinese-Guanzhong Dialect", https://hdl.handle.net/11272.1/AB2/FF5DX5, Abacus Data Network, V1 Abstract Introduction Global TIMIT Mandarin Chinese-Guanzhong Dialect was developed by the Linguistic Data Consortium and Xi'an Jiaotong University and consists of approximately five hours of read speech and transcripts in the Guanzhong dialect of Mandarin Chinese as spoken in Sh...

Ethiopia 1:50,000 Scale Topographic Maps

Aug 24, 2022 - UBC Library licensed data

EastView Geospatial Inc., 2022, "Ethiopia 1:50,000 Scale Topographic Maps", https://hdl.handle.net/11272.1/AB2/MQR7AY, Abacus Data Network, V2

Georeferenced topographic maps of selected regions of Ethiopia. Maps are in GeoTIFF raster format.

Survey on Health Care Workers' Experiences During the Pandemic (SHCWEP), 2021

Aug 23, 2022 - Statistics Canada Open License

Statistics Canada, 2022, "Survey on Health Care Workers' Experiences During the Pandemic (SHCWEP), 2021", https://hdl.handle.net/11272.1/AB2/IFL0RX, Abacus Data Network, V1, UNF:6:rjyrygwtTrjPvrtEJf6pYw== [fileUNF]

This public use microdata file includes information on the impacts of COVID-19 on Canadian health care workers, with particular focus on job type and setting, personal protective equipment (PPE) and infection prevention and control (IPC) practices and protocols, and the impacts o...

American English Nickname Collection

Aug 9, 2022 - Linguistic Data Consortium

Carvalho, Vitor R.; Kiran, Yigit; Borthwick, Andrew, 2022, "American English Nickname Collection", https://hdl.handle.net/11272.1/AB2/JR1WG6, Abacus Data Network, V1

Abstract Introduction American English Nickname Collection was developed by Intelius, Inc. and is a compilation of American English nicknames to given name mappings based on information in US government records, public web profiles and financial and property reports. This corpus...

Qatari Corpus of Argumentative Writing

Aug 9, 2022 - Linguistic Data Consortium

Ahmed, Abdelhamid M.; Myhill, Debra; Abdollahzadeh, Esmaeel; McCallum, Lee; Zaghouani, Wajdi; Rezk, Lameya; Jrad, Anissa; Zhang, Xiao, 2022, "Qatari Corpus of Argumentative Writing", https://hdl.handle.net/11272.1/AB2/F2P2EY, Abacus Data Network, V1

Abstract Introduction Qatari Corpus of Argumentative Writing was developed by Qatar University, University of Exeter and Hamad Bin Khalifa University and is comprised of approximately 200,000 tokens of Arabic and English writing by undergraduate students (159 female, 36 male) alo...

Survey on Early Learning and Child Care Arrangements, 2019

Jul 15, 2022 - Statistics Canada Open License

Statistics Canada, 2022, "Survey on Early Learning and Child Care Arrangements, 2019", https://hdl.handle.net/11272.1/AB2/YAGGCP, Abacus Data Network, V1, UNF:6:C82FnsE4W7hBbvDkSZeb2w== [fileUNF]

Statistics Canada is gathering information from families who use child care as well as those who do not. The survey, which addresses child care in Canada for children younger than 6 years old, asks about the different types of early learning and child care arrangements that famil...

[Custom tabulation: Sex and selected socio-demographic characteristics of population 15 years and over in private household for British Columbia, Chilliwack CSD and the custom area, 2001 and 2016 Censuses of Canada, 25% Sample Data]

Jul 13, 2022 - Statistics Canada Open License

Statistics Canada, 2022, "[Custom tabulation: Sex and selected socio-demographic characteristics of population 15 years and over in private household for British Columbia, Chilliwack CSD and the custom area, 2001 and 2016 Censuses of Canada, 25% Sample Data]", https://hdl.handle.net/11272.1/AB2/JMIQZV, Abacus Data Network, V1

These tables includes information on sex and selected socio-demographic characteristics of population 15 years and over in private household for British Columbia, Chilliwack CSD and custom area.

Second DIHARD Challenge Evaluation - Eleven Sources

Jul 7, 2022 - Linguistic Data Consortium

Ryant, Neville; Liberman, Mark; Fiumara, James; Cieri, Christopher, 2022, "Second DIHARD Challenge Evaluation - Eleven Sources", https://hdl.handle.net/11272.1/AB2/ML7KD5, Abacus Data Network, V1

Abstract Introduction Second DIHARD Challenge Evaluation - Eleven Sources was developed by the Linguistic Data Consortium (LDC) and contains approximately 20 hours of English and Chinese speech data along with corresponding annotations used in support of the Second DIHARD Challen...

NUBUC

Jul 7, 2022 - Linguistic Data Consortium

Lewis, Gwyneth; van Rijn, Pol; Gwilliams, Laura; Larrouy-Maestri, Pauline; Poeppel, David; Ghitza, Oded, 2022, "NUBUC", https://hdl.handle.net/11272.1/AB2/IUFKIG, Abacus Data Network, V1

Abstract Introduction NUBUC (NyU-BU contextually controlled stories Corpus) was developed by New York University, Max Planck Institute for Empirical Aesthetics and Boston University. It contains approximately three hours of English read speech from eight stories focused on lingui...

Canadian Alcohol and Drugs Survey, 2019

Jul 6, 2022 - Statistics Canada Open License

Statistics Canada, 2022, "Canadian Alcohol and Drugs Survey, 2019", https://hdl.handle.net/11272.1/AB2/7KS8TV, Abacus Data Network, V2, UNF:6:c62qJ6h/bvWispQtQVUHZw== [fileUNF]

The main objective of this survey is to collect information on Canadians’ use of alcohol and drugs. Health Canada and other organizations will use the information to monitor changes in alcohol and drug use. The other objectives of the Canadian Alcohol and Drugs Survey (CADS) are...

Postal Codes by Federal Ridings File (PCFRF) 2013 Representation Order, June 2022 Postal Codes, 2022

Jun 30, 2022 - Statistics Canada - DLI

Statistics Canada, 2022, "Postal Codes by Federal Ridings File (PCFRF) 2013 Representation Order, June 2022 Postal Codes, 2022", https://hdl.handle.net/11272.1/AB2/JOVEPQ, Abacus Data Network, V1

The Postal Code Project is responsible for linking the approximately 900,000 single postal codes in Canada to Statistics Canada’s Census dissemination geography, (presently 2021 Census geography). This process is performed by using data provided by Canada Post Corporation and lin...

Postal Code Conversion File, June 2022 Postal Codes, 2022

Jun 30, 2022 - Statistics Canada - DLI

Statistics Canada, 2022, "Postal Code Conversion File, June 2022 Postal Codes, 2022", https://hdl.handle.net/11272.1/AB2/7DCHD8, Abacus Data Network, V1

Postal Code Conversion File Plus (PCCF+) Version 7E, November 2021 Postal Codes

Jun 24, 2022 - Statistics Canada - DLI

Statistics Canada, 2022, "Postal Code Conversion File Plus (PCCF+) Version 7E, November 2021 Postal Codes", https://hdl.handle.net/11272.1/AB2/D1AO5H, Abacus Data Network, V1

The Postal Code Conversion File Plus (PCCF+) is a SAS control program and set of associated datasets derived from the Postal Code Conversion File (PCCF), a 2016 postal code population weight file, the Geographic Attribute File, Health Region boundary files, and other supplementar...

Canadian Legal Problems Survey, 2021

Jun 16, 2022 - Statistics Canada Open License

Statistics Canada, 2022, "Canadian Legal Problems Survey, 2021", https://hdl.handle.net/11272.1/AB2/BZZXO3, Abacus Data Network, V2, UNF:6:4MsCj17XMOXfCtv4Ma3y3w== [fileUNF]

The purpose of the Canadian Legal Problems Survey (CLPS) is to identify the kinds of serious problems people face, how they attempt to resolve them, and how these experiences may impact their lives. The information collected will be used to better understand the various methods p...

LORELEI Wolof Representative Language Pack

Jun 10, 2022 - Linguistic Data Consortium

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Wolof Representative Language Pack", https://hdl.handle.net/11272.1/AB2/1M9HI6, Abacus Data Network, V1

Abstract Introduction LORELEI Wolof Representative Language Pack consists of Wolof monolingual text, Wolof-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI program. The LORELEI...

Survey on Early Learning and Child Care Arrangements, 2020

May 19, 2022 - Statistics Canada Open License

Statistics Canada, 2022, "Survey on Early Learning and Child Care Arrangements, 2020", https://hdl.handle.net/11272.1/AB2/RRSNDB, Abacus Data Network, V1, UNF:6:G16e9NSu5tgMCWHuMHfoMg== [fileUNF]

This survey is designed to collect information on the use (or non-use) of early learning and child care in Canada for children younger than 6 years old. It asks whether or not a particular child under the age of 6 in the family is enrolled in early learning and child care, and co...

[Homicide Survey 1960 - 2020. Custom Tabulation]

May 18, 2022 - Statistics Canada Open License

Statistics Canada, 2022, "[Homicide Survey 1960 - 2020. Custom Tabulation]", https://hdl.handle.net/11272.1/AB2/CWQQJF, Abacus Data Network, V1

Statistics Canada custom tabulation from the Homicide Survey, 1960-2020. This survey collects detailed data on homicide in Canada. The survey has collected police-reported data on the characteristics of all murder incidents, victims and accused persons / chargeable suspects since...

Postal Codes by Federal Ridings File (PCFRF) 2013 Representation Order, March 2022 Postal Codes, 2022

May 5, 2022 - Statistics Canada - DLI

Statistics Canada, 2022, "Postal Codes by Federal Ridings File (PCFRF) 2013 Representation Order, March 2022 Postal Codes, 2022", https://hdl.handle.net/11272.1/AB2/ZICIEN, Abacus Data Network, V2

[Census Families in private households by selected sociodemographic characteristics and Census Family structure for Montréal and Québec Census Metropolitan Areas (CMA) and their Census Tracts (CT): 2016, 2011 and 2006]

Apr 13, 2022 - Statistics Canada Open License

Statistics Canada, 2022, "[Census Families in private households by selected sociodemographic characteristics and Census Family structure for Montréal and Québec Census Metropolitan Areas (CMA) and their Census Tracts (CT): 2016, 2011 and 2006]", https://hdl.handle.net/11272.1/AB2/Z8V2YC, Abacus Data Network, V1

Custom tabulation requested by the Université de Montréal, released 13 April 2022. Families that immigrated with children 0-17 years old Parents without diploma or certificate among parents of children 0-17 years old Unemployment rate of parents with children 0-17 years old Famil...

AttImam

Mar 31, 2022 - Linguistic Data Consortium

Alsaif, Amal; Alyahya, Tasniem; Alotibi, Madawi; Almuzaini, Huda; Alqahtani, Abeer, 2022, "AttImam", https://hdl.handle.net/11272.1/AB2/9FBCBG, Abacus Data Network, V1

Abstract Introduction AttImam was developed by Al-Imam Mohammad Ibn Saud Islamic University and consists of approximately 2,000 attribution relations applied to Arabic newswire text from Arabic Treebank: Part 1 v 4.1 (LDC2010T13). Attribution refers to the process of reporting or...

IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7

Mar 18, 2022 - Linguistic Data Consortium

Andrus, Tony; Bills, Aric; Corris, Miriam; Dubinski, Eyal; Fiscus, Jonathan G.; Gillies, Breanna; Harper, Mary; Hazen, T. J.; Hefright, Brook; Jarrett, Amy; Le, Hanh; Ray, Jessica; Rytting, Anton; Silber, Ronnie; Shen, Wade; Tzoukermann, Evelyne, 2022, "IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7", https://hdl.handle.net/11272.1/AB2/WJGWAP, Abacus Data Network, V1

Abstract Introduction IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 201 hours of Vietnamese conversational and scripted telephone speech co...

IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b

Mar 18, 2022 - Linguistic Data Consortium

Bills, Aric; Conners, Thomas; Corris, Miriam; David, Anne; Dubinski, Eyal; Fiscus, Jonathan G.; Gann, Ketty; Harper, Mary; Kazi, Michael; Malyska, Nicolas; Melot, Jennifer; Ray, Jessica; Rytting, Anton; Zawaydeh, Bushra, 2022, "IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b", https://hdl.handle.net/11272.1/AB2/HSAU9N, Abacus Data Network, V1

Abstract Introduction IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Dholuo conversational and scripted telephone speech collected...

IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b

Mar 18, 2022 - Linguistic Data Consortium

Andresen, Lucy; Bills, Aric; Conners, Thomas; Cruz, Luanne Dela; Dubinski, Eyal; Fiscus, Jonathan G.; Harper, Mary; Le, Hanh; Maurillo, Arlene; Melot, Jennifer; Phillips, Josh; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne, 2022, "IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b", https://hdl.handle.net/11272.1/AB2/3EYPZM, Abacus Data Network, V1

Abstract Introduction IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 191 hours of Cebuano conversational and scripted telephone speech collect...

The Child Subglottal Resonances Database

Mar 18, 2022 - Linguistic Data Consortium

Lulich, Steven M.; Alwan, Abeer; Sommers, Mitchell S.; Yeung, Gary, 2022, "The Child Subglottal Resonances Database", https://hdl.handle.net/11272.1/AB2/O4SRBR, Abacus Data Network, V1

Abstract Introduction The Child Subglottal Resonances Database was developed by Washington University and University of California Los Angeles and consists of 15.5 hours of simultaneous microphone and subglottal accelerometer recordings of 19 male and 9 female child speakers of A...

The SSNCE Database of Tamil Dysarthric Speech

Mar 18, 2022 - Linguistic Data Consortium

Vijayalakshmi, P.; Celin, T. A. Mariya; Nagarajan, T., 2022, "The SSNCE Database of Tamil Dysarthric Speech", https://hdl.handle.net/11272.1/AB2/QXP9LM, Abacus Data Network, V1

Abstract Introduction The SSNCE Database of Tamil Dysarthric Speech was developed by the Speech Lab, SSN College of Engineering, India, in collaboration with the Indian National Institute of Empowerment of Persons with Multiple Disabilities (NIEPMD) and contains approximately eig...

LORELEI Ukrainian Representative Language Pack

Mar 18, 2022 - Linguistic Data Consortium

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Ma, Xiaoyi; Kulick, Seth; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Ukrainian Representative Language Pack", https://hdl.handle.net/11272.1/AB2/GUYCZL, Abacus Data Network, V1

Abstract Introduction LORELEI Ukrainian Representative Language Pack consists of Ukrainian monolingual text, Ukrainian-English parallel and comparable text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LO...

LORELEI Tigrinya Incident Language Pack

Mar 18, 2022 - Linguistic Data Consortium

Tracey, Jennifer; Graff, David; Strassel, Stephanie; Arrigo, Michael; Wright, Jonathan; Bies, Ann, 2022, "LORELEI Tigrinya Incident Language Pack", https://hdl.handle.net/11272.1/AB2/CTYB7Q, Abacus Data Network, V1

Abstract Introduction LORELEI Tigrinya Incident Language Pack was developed by the Linguistic Data Consortium and is comprised of approximately 4.5 million words of Tigrinya monolingual text, 25,000 words of English monolingual text, 235,000 words of parallel and comparable Tigri...

BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

Mar 18, 2022 - Linguistic Data Consortium

Palmer, Martha; Hwang, Jena D.; Bonial, Claire; O'Gorman, Tim; Gung, James; Stowe, Kevin; Green, Meredith, 2022, "BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/QABG8N, Abacus Data Network, V1

Abstract Introduction BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank and verb sense disambiguat...

BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

Mar 18, 2022 - Linguistic Data Consortium

Agarwal, Nitin; Franchini, Michelle; Kappler, Michelle; Micciulla, Linnea; Pradhan, Sameer; Ramshaw, Lance, 2022, "BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech", https://hdl.handle.net/11272.1/AB2/3JEVXI, Abacus Data Network, V1

Abstract Introduction BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies and consists of co-reference annotation on English discussion forum (DF), SMS/Chat and conversational telephone speech (CT...

DEFT Chinese Light and Rich ERE Annotation

Mar 18, 2022 - Linguistic Data Consortium

Chen, Song; Strassel, Stephanie; Mott, Justin, 2022, "DEFT Chinese Light and Rich ERE Annotation", https://hdl.handle.net/11272.1/AB2/MUVS7U, Abacus Data Network, V1

Abstract Introduction DEFT Chinese Light and Rich ERE Annotation was developed by the Linguistic Data Consortium (LDC) and consists of 157 Chinese discussion forum documents annotated for entities, relations and events (ERE). DARPA's Deep Exploration and Filtering of Text (DEFT)...

TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017

Mar 18, 2022 - Linguistic Data Consortium

Ellis, Joe; Getman, Jeremy; Chen, Song; Strassel, Stephanie, 2022, "TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017", https://hdl.handle.net/11272.1/AB2/KSIXIZ, Abacus Data Network, V1

Abstract Introduction TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the 2016 TAC KBP Event Argument Linking Pilot and Evaluation...

LORELEI Vietnamese Representative Language Pack

Mar 18, 2022 - Linguistic Data Consortium

Tracey, Jennifer; Strassel, Stephanie; Graff, David; Wright, Jonathan; Chen, Song; Ryant, Neville; Kulick, Seth; Griffitt, Kira; Delgado, Dana; Arrigo, Michael, 2022, "LORELEI Vietnamese Representative Language Pack", https://hdl.handle.net/11272.1/AB2/JWPEIA, Abacus Data Network, V1

Abstract Introduction LORELEI Vietnamese Representative Language Pack consists of Vietnamese monolingual text, Vietnamese-English parallel text, annotations, supplemental resources and related software tools developed by the Linguistic Data Consortium for the DARPA LORELEI progra...

BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training

Mar 18, 2022 - Linguistic Data Consortium

Li, Xuansong; Grimes, Stephen; Strassel, Stephanie, 2022, "BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training", https://hdl.handle.net/11272.1/AB2/N2DIGA, Abacus Data Network, V1

Abstract Introduction BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training was developed by the Linguistic Data Consortium (LDC) and consists of 158,651 words of Chinese and English parallel text enhanced with linguistic tags to indicate wor...

Speech Sentiment Annotations

Mar 18, 2022 - Linguistic Data Consortium

Chen, Eric Y.; Lu, Zhiyun; Xu, Hao; Cao, Liangliang; Zhang, Yu; Fan, James, 2022, "Speech Sentiment Annotations", https://hdl.handle.net/11272.1/AB2/HD3CEY, Abacus Data Network, V1

Abstract Introduction Speech Sentiment Annotations was developed by Google Inc. It consists of sentiment labels (positive, negative, neutral) for approximately 49,500 utterances covering 140 hours of audio from Switchboard-1 Release 2 (LDC97S62). Switchboard-1 Release 2 consists...

TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015

Mar 18, 2022 - Linguistic Data Consortium

Bies, Ann; Ellis, Joe; Getman, Jeremy; Chen, Song; Strassel, Stephanie, 2022, "TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015", https://hdl.handle.net/11272.1/AB2/UHLXHR, Abacus Data Network, V1

Abstract Introduction TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP English Event Nug...

SemTransCNC

Mar 18, 2022 - Linguistic Data Consortium

Wang, Shichang; Huang, Chu-Ren; Yao, Yao; Chan, Angel, 2022, "SemTransCNC", https://hdl.handle.net/11272.1/AB2/TV07UB, Abacus Data Network, V1

Abstract Introduction SemTransCNC was developed by The Hong Kong Polytechnic University. It is comprised of a semantic transparency dataset of Chinese nominal compounds built using a series of crowd-based experiments. Nominal compounds were selected from the Sinica Corpus and a m...

TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013

Mar 18, 2022 - Linguistic Data Consortium

Ellis, Joe; Getman, Jeremy; Strassel, Stephanie, 2022, "TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013", https://hdl.handle.net/11272.1/AB2/TVJSBF, Abacus Data Network, V1

Abstract Introduction TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013 was developed by the Linguistic Data Consortium (LDC) and contains training and evaluation data produced in support of the TAC KBP English Temporal Slot Filling...

Abstract Meaning Representation 2.0 - Four Translations

Mar 18, 2022 - Linguistic Data Consortium

Damonte, Marco; Cohen, Shay, 2022, "Abstract Meaning Representation 2.0 - Four Translations", https://hdl.handle.net/11272.1/AB2/5OU0AQ, Abacus Data Network, V1

Abstract Introduction Abstract Meaning Representation 2.0 - Four Translations was developed by researchers at the University of Edinburgh, School of Informatics and consists of Spanish, German, Italian and Chinese Mandarin translations of a subset of sentences from Abstract Meani...

EVALution

Mar 18, 2022 - Linguistic Data Consortium

Santus, Enrico; Liu, Hongchao; Huang, Chu-Ren, 2022, "EVALution", https://hdl.handle.net/11272.1/AB2/JQ231B, Abacus Data Network, V1

Abstract Introduction EVALution was developed by The Hong Kong Polytechnic University. It is comprised of English and Mandarin Chinese data sets -- EVALution 1.0 and EVALution-Man, respectively -- that contain semantic relations and metadata for training and evaluating distributi...

Phonemes of Arabic

Mar 18, 2022 - Linguistic Data Consortium

Alshaari, Mohamed; ElHarati, Hussien; Kepuska, Veton, 2022, "Phonemes of Arabic", https://hdl.handle.net/11272.1/AB2/WSRL3A, Abacus Data Network, V1

Abstract Introduction Phonemes of Arabic was developed at the Florida Institute of Technology. It consists of approximately one hour of speech from native Arabic speakers that includes all Arabic sounds (consonants and vowels) and 24 words with specific consonant-vowel patterns....

Global TIMIT Mandarin Chinese-Guanzhong Dialect

Mar 18, 2022 - Linguistic Data Consortium

Jiang, Yue; Zhan, Juhong; Han, Hongjian; Xu, Zuohao; Zhou, Haiyan; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Mandarin Chinese-Guanzhong Dialect", https://hdl.handle.net/11272.1/AB2/MFTAUQ, Abacus Data Network, V1

Abstract Introduction Global TIMIT Mandarin Chinese-Guanzhong Dialect was developed by the Linguistic Data Consortium and Xi'an Jiaotong University and consists of approximately five hours of read speech and transcripts in the Guanzhong dialect of Mandarin Chinese as spoken in Sh...

Global TIMIT Learner Simple English

Mar 18, 2022 - Linguistic Data Consortium

Ding, Hongwei; Liao, Sishi; Zhan, Yuqing; Feng, Hui; He, Wenchao; Hu, Xiaoyan; Wu, Yu; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Learner Simple English", https://hdl.handle.net/11272.1/AB2/NMUWWH, Abacus Data Network, V1

Abstract Introduction Global TIMIT Learner Simple English was developed by the Linguistic Data Consortium and Shanghai Jiao Tong University and consists of approximately 12 hours of L1 and L2 English read speech and transcripts. The Global TIMIT project aimed to create a series o...

Global TIMIT Learner Treebank English

Mar 18, 2022 - Linguistic Data Consortium

Luan, Huan; Wang, Yanhong; Feng, Hui; He, Wenchao; Hu, Xiaoyan; Wu, Yu; Yuan, Jiahong; Liberman, Mark, 2022, "Global TIMIT Learner Treebank English", https://hdl.handle.net/11272.1/AB2/A2ZRDI, Abacus Data Network, V1

Abstract Introduction Global TIMIT Learner Treebank English was developed by the Linguistic Data Consortium and LAIX Inc. and consists of approximately 24 hours of L1 and L2 English read speech and transcripts. The Global TIMIT project aimed to create a series of corpora in a var...

CALLFRIEND American English-Southern Dialect Second Edition

Mar 18, 2022 - Linguistic Data Consortium

Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2022, "CALLFRIEND American English-Southern Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/O0EZK5, Abacus Data Network, V1

Abstract Introduction CALLFRIEND American English-Southern Dialect Second Edition was developed by LDC and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Southern dialects of American English. This second edition updates the au...

CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition

Mar 18, 2022 - Linguistic Data Consortium

Canavan, Alexandra; Zipperlen, George; Bartlett, John, 2022, "CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition", https://hdl.handle.net/11272.1/AB2/AT8NRM, Abacus Data Network, V1

Abstract Introduction CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 27 hours of unscripted telephone conversations between native speakers of the Taiwan dialect of Mandarin Chinese. Th...

Chinese Lexical Resources for Gender, Number, Animacy

Mar 18, 2022 - Linguistic Data Consortium

Chen, Song; Yuan, Jiahong; Ma, Xiaoyi; Strassel, Stephanie, 2022, "Chinese Lexical Resources for Gender, Number, Animacy", https://hdl.handle.net/11272.1/AB2/2CSZDM, Abacus Data Network, V1

Abstract Introduction Chinese Lexical Resources for Gender, Number, Animacy was developed by the Linguistic Data Consortium (LDC) and consists of gender, number, and animacy lexicons produced in support of the DARPA DEFT program. Gender, number and animacy are lexical indicators...

GALE Phase 4 Chinese Broadcast News Transcripts

Mar 18, 2022 - Linguistic Data Consortium

Glenn, Meghan; Lee, Haejoong; Strassel, Stephanie; Maeda, Kazuaki, 2022, "GALE Phase 4 Chinese Broadcast News Transcripts", https://hdl.handle.net/11272.1/AB2/TVASI8, Abacus Data Network, V1

Abstract Introduction GALE Phase 4 Chinese Broadcast News Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 134 hours of Chinese broadcast news speech collected in 2008 by LDC and Hong University of Science and Technolo...

Columbia Games Corpus

Mar 18, 2022 - Linguistic Data Consortium

Hirschberg, Julia; Gravano, Agustin; Benus, Stefan; Ward, Gregory; Sneed German, Elisa, 2022, "Columbia Games Corpus", https://hdl.handle.net/11272.1/AB2/TPZYOR, Abacus Data Network, V1

Abstract Introduction Columbia Games Corpus was developed by the Spoken Language Group, Columbia University and the Department of Linguistics, Northwestern University. It consists of approximately 10 hours of spontaneous English conversation along with corresponding orthographic...

Corpus of Law, Academic, and News

Mar 18, 2022 - Linguistic Data Consortium

Mohammadi, Ariana Negar, 2022, "Corpus of Law, Academic, and News", https://hdl.handle.net/11272.1/AB2/VMWYC0, Abacus Data Network, V1

Abstract Introduction Corpus of Law, Academic, and News consists of 400 Persian documents divided into three genres: legal, academic, and news. The legal section contains texts from official publications, including the civil penal code, the criminal penal code, and the constituti...

Penn Parsed Corpora of Historical English

Mar 18, 2022 - Linguistic Data Consortium

Kroch, Anthony, 2022, "Penn Parsed Corpora of Historical English", https://hdl.handle.net/11272.1/AB2/NWMKHI, Abacus Data Network, V1

Abstract Introduction Penn Parsed Corpora of Historical English was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the First World War (19...

Global TIMIT Mandarin Chinese-Guanzhong Dialect

Mar 18, 2022 - Linguistic Data Consortium

Add Data

Share Dataverse

Link Dataverse

Reset Modifications