Abstract

In this study, we identified child- and family-level characteristics most strongly associated with clinical identification of language disorder for preschool-aged children. We used machine learning to identify variables that best classified children receiving therapy for language disorder among a sample of 483 3- to 5-year-old children (54% affected). Using a dichotomous outcome based on receipt of language therapy, we applied the Least Absolute Shrinkage and Selection Operator classification approach to a range of background data available on the children, including teacher and caregiver ratings of communication and social skills. The sample was randomly split into a training (67% of children) and test sample (33% of children) to examine out-of-sample classification accuracy. The full model had excellent classification accuracy based on area under the curve (AUC) of .87 and .85 on the training and test sets, respectively, when utilizing all available background data. Variables most strongly contributing to accurate classification of language-therapy receipt were cognitive impairment, age, gender, and teacher- and parent-reported communication, social, and literacy skills. Use of machine learning approaches to classify children receiving language services in school settings may provide a valuable approach for identifying those factors that best differentiate children with and without language disorders from a clinical perspective.