Skip to main content



Natural Language Processing and Automated Speech Analysis to Identify Older Adults with Cognitive Impairment


April 15, 2020 - March 31, 2025

Funding Source:

National Institute on Aging (NIA)

Grant Number:



A rapidly aging population is increasing the number of people with cognitive impairment (CI) in the United States. Most are undiagnosed and the ability of clinicians, health systems, and researchers to identify them is highly inefficient and has advanced little over recent decades. Standard practice involves screening for CI using clinician-administered tools like the Mini-Mental State Exam or the Montreal Cognitive Assessment. But screening competes with the large number of tasks clinicians are expected to perform and is rarely done, especially in primary care. Instead, health systems and researchers often use diagnostic codes to identify patients with CI, which have very low sensitivity. As a result most patients remain undetected.


We propose to develop and validate state-of-the-art machine learning (ML) algorithms to identify patients with cognitive impairment (CI) in primary care using structured and unstructured data from the electronic health record (EHR) and automated speech analysis (ASA) of audio recorded patient-physician encounters.

We propose to create ML classifiers with data from the EHR and from audio recordings of patients during clinical encounters for an efficient and scalable strategy to identify people with CI.

The Specific Aims are:

1.  Develop and validate an ML algorithm using structured and unstructured features extracted from the EHR by NLP and deep learning to identify patients with CI. Hypothesis 1: The algorithm will identify CI with sensitivity and specificity >95%.

2.  Develop and validate an ML algorithm using features extracted from audio recordings of patient-provider encounters during routine primary care visits to identify patients with CI.  Hypothesis 2: The algorithm will identify CI with sensitivity and specificity >95%.

3.  Develop and validate ensemble algorithms to integrate predictors based on both EHR- and ASA-extracted features to create a global CI diagnostic algorithm. Hypothesis 3: The integrated diagnostic algorithm will be more accurate than either predictor alone.

The ML algorithms using EHR- and or ASA-extracted features will be trained against data from neurocognitive assessments (the reference standard) on 800 primary care patients in NYC and validated in an independent sample of 200 patients in Chicago. This project will be the most rigorous development and validation of algorithms for CI based on EHR- and ASA-extracted features yet performed, the first ASA-based classifier for primary care settings, and the first to test these two strategies in combination. Success with this proof-of-concept study would position us for subsequent research on implementation in health care settings. identifier:



• Principal Investigators:Juan Wisnivesky, MD; Alex Federman, MD; Michael S. Wolf, PhD MPH
• Project Lead: Guisselle Wismer, MPH