Automated Content Analysis: special training session

15 Jan 2018

Workshop

To coincide with the release of the National Child Development Study’s Age 11 essays, CLS hosted a special tutorial on automated content analysis to help enable researchers to make the most of this new data. The session covered the fundamentals of using the Differential Language Analysis Toolkit (DLATK) and was led by H. Andrew Schwartz (Stony Brook University).

Event details

Date	15 January 2018
Time	10:30 - 16:00
Price	Free

About the event

Qualitative data, such as essays and free response questions in surveys, are rich sources of psychological, social and behavioural information. Yet such information has traditionally been impossible to leverage at a large scale. Recent advances in computational linguistics and machine learning have produced automatic content analysis tools, which can now be applied to a wide number of settings, including the open responses collected longitudinally within a large national birth cohort study.

In a new project funded by the Economic and Social Research Council, we are applying such tools to newly transcribed essays that were written by cohort members of the National Child Development Study (NCDS), when they were age 11 in 1969. (“Imagine you are now 25 years old…”) The responses provide a largely untapped source of psychological and behavioural information that can be linked longitudinally to outcomes for the same individuals.

A new dataset containing the fully transcribed text for 10,500 of these essays will be released by the UK Data Service in mid-February 2018, and will be available for researchers worldwide to download and analyse.

To enable researchers to make the most of this new data release, the Centre for Longitudinal Studies offered the exciting opportunity to attend a specialised tutorial on automated content analysis, provided by H. Andrew Schwartz, faculty of the Computer Science Department and Center for Computational Social Science at Stony Brook University, New York.

The Differential Language Analysis ToolKit

DLATK (Differential Language Analysis ToolKit) is an end to end language analysis software, specifically suited for social media and social scientific research applications. It has been used for research published in over 40 peer-reviewed papers across psychology, computer science, public health, medicine, and political science. Although the heart of DLATK is a Python library it is typically used through a vestaile command interface (requiring no programming).

This tutorial covered the fundamentals of automated content analysis using DLATK:

The ingredients of automatic content analysis
Differential language analysis
(Linguistic insights into psychosocial phenomena)
Predictive analytics
(Machine and statistical learning using text data)

Speakers

Recommended pre-reading

Differential Language Analysis ToolKit: http://dlatk.wwbp.org/

Papers:

Schwartz, H. A., Giorgi, S., Sap, M., Crutchley, P., Ungar, L., & Eichstaedt, J. (2017). DLATK: Differential Language Analysis ToolKit. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 55-60). Pdf

Kern, M. L., Park, G., Eichstaedt, J. C., Schwartz, H. A., Sap, M., Smith, L. K., & Ungar, L. H. (2016). Gaining insights from social media language: Methodologies and challenges.

Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 21(3), 267-297.

Schwartz, H. A., & Ungar, L. H. (2015). Data-driven content analysis of social media: a systematic overview of automated methods. The ANNALS of the American Academy of Political and Social Science, 659(1), 78-94.

Contact our Communications Team

Event enquiries

Richard Steele
Events and Marketing Officer

Phone: 020 7911 5320
Email: ioe.clsevents@ucl.ac.uk

Contact us

Centre for Longitudinal Studies
UCL Social Research Institute

20 Bedford Way
London WC1H 0AL

Email: clsdata@ucl.ac.uk