RDA and Linguistics

You are here

09 May 2023 21445 reads

Composed by: Dr Helene Andreassen (RDA/EOSC Future Ambassador for Linguistics), Andrea Berez-Kroeker, Lindsay Ferrara
Contributors: List TBA 
Comments requested: Please note that this is a new Discipline page, and it is open for comments from the RDA Community. To add your input please use the comments section below. 

Downloadable disciplinary info sheet: Linguistics 


Overview of data-related practices in Linguistics

“Data, in many forms and from many sources, underlie the discipline of linguistics. [...] From descriptive to theoretical work, from corpus-based to introspection-based inquiry, from quantitative to qualitative analysis, linguists rely on data every day. [... D]ata must be understandable, discoverable, reusable, shareable, remixable, and transformable.” 

(Berez-Kroeker, A.L, McDonnell, B., Collister, L.B., Koller, E. 2022. Data, Data Management, and Reproducible Research in Linguistics: On the Need for The Open Handbook of Linguistic Data Management. In Berez-Kroeker, A.L, McDonnell, B., Collister, L.B., Koller, E. (eds.), The Open Handbook of Linguistic Data Management, p. 3. Cambridge, MA: MIT Press Open. https://doi.org/10.7551/mitpress/12200.003.0005)

Linguistics has a history of developing data practices in relative isolation by subfield, lab, and researcher, which means that broad disciplinary discussions about the role of data in our research is needed. The value of data to our field is under-recognized, and language data has the added dimension of attention to the ethics required in handling the words and languages of historically marginalized peoples. 

The Linguistics Data Interest Group of the RDA endeavors to broaden the conversation around research data and increase the competence of practitioners in our field about methods for data handling. Our outputs include:

The interest group currently works on a needs analysis aiming to determine which educational efforts are needed to broadly train linguists in the methods of open science. This work is supported by the RDA/EOSC Future Domain Ambassador #2 (2022-2023) project. LDIG members are also involved in SSHOC, a project responsible for developing the social sciences and humanities area of EOSC.

What kinds of data are used in linguistics?

Documentary linguistic data (e.g., text, audio, video), grammaticality judgements, instrumental data (e.g., eye tracking, EEG measurements, spectrograms), experimental data, derived data (e.g., transcriptions, annotations, syntactic treebanks), metadata, lexical data (e.g., dictionaries), language catalogs, computational data, interview data

Where is linguistics data shared?

Domain-specific repositories for language and linguistics, institutional repositories, national repositories, Open Science Framework, personal websites, article supplementary files

How is linguistics data shared (e.g. standards, guidelines, trusted examples)?

  • Metadata requirements in repository guidelines, e.g. domain-neutral ones such as Dublin Core, or more discipline-specific ones such as the Data Documentation Initiative and the International Standard for Language Engineering

  • File format requirements in repository guidelines

  • CC licenses, CLARIN licenses (https://www.clarin.eu/content/licenses-and-clarin-categories)

  • Citation guidelines: Recommendations in journal author guidelines (e.g. IASSIST's Quick Guide to Data Citation or DataCite), Tromsø Recommendations for Citation of Research Data in Linguistics (published in late 2019). Also (auto-generated) citation format on the dataset landing page in repositories. 

What are typical file formats for linguistics data?

Audio: .wav, .mp3, .flac

Video: mpeg, .mp4, and others

Text: .txt, .pdf, .docx, .eaf

Image: .tiff, .jpg, .png

Tabular data: .csv, .xclx, .txt, .tsv, .json

Programming: .r, .py, .ipynb

Structured attribute-value data: .xml and derivatives (.lmf, .tbx, .tmx, .tei, .cmdi, and others), .json

Which disciplines collaborate or interface with linguistics?

(Social) Psychology, Gesture studies, Anthropology, Semiotics, Cognitive Science, Education, Applied Linguistics, Health Sciences

RDA Groups active in this discipline

RDA Groups in this discipline that are no longer active

Highlighted RDA outputs

PDF icon Disciplinary_info_sheet-Linguistics.pdf165.08 KB