Tampere University of Technology

TUTCRIS Research Portal

ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings

Research output: Contribution to journalArticleScientificpeer-review

Standard

ALICE : An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. / Räsänen, Okko; Seshadri, Shreyas; Lavechin, Marvin; Cristia, Alejandrina; Casillas, Marisa.

In: BEHAVIOR RESEARCH METHODS, 2020.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

APA

Vancouver

Author

Räsänen, Okko ; Seshadri, Shreyas ; Lavechin, Marvin ; Cristia, Alejandrina ; Casillas, Marisa. / ALICE : An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. In: BEHAVIOR RESEARCH METHODS. 2020.

Bibtex - Download

@article{c080872702d24bd4b2c0037b53199726,
title = "ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings",
abstract = "Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measuring linguistic input—estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language- independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, enabling automatic phoneme, syllable, and word count estimation from child-centered audio recordings.",
keywords = "Child-centered audio, Language development, LENA, Speaker diarization, Speech processing, Word count estimation",
author = "Okko R{\"a}s{\"a}nen and Shreyas Seshadri and Marvin Lavechin and Alejandrina Cristia and Marisa Casillas",
year = "2020",
doi = "10.3758/s13428-020-01460-x",
language = "English",
journal = "BEHAVIOR RESEARCH METHODS",
issn = "1554-351X",
publisher = "Springer Verlag",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - ALICE

T2 - An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings

AU - Räsänen, Okko

AU - Seshadri, Shreyas

AU - Lavechin, Marvin

AU - Cristia, Alejandrina

AU - Casillas, Marisa

PY - 2020

Y1 - 2020

N2 - Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measuring linguistic input—estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language- independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, enabling automatic phoneme, syllable, and word count estimation from child-centered audio recordings.

AB - Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measuring linguistic input—estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language- independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, enabling automatic phoneme, syllable, and word count estimation from child-centered audio recordings.

KW - Child-centered audio

KW - Language development

KW - LENA

KW - Speaker diarization

KW - Speech processing

KW - Word count estimation

U2 - 10.3758/s13428-020-01460-x

DO - 10.3758/s13428-020-01460-x

M3 - Article

JO - BEHAVIOR RESEARCH METHODS

JF - BEHAVIOR RESEARCH METHODS

SN - 1554-351X

ER -