How Mount Sinai is using AI to unlock social determinant data in the EHR

A natural language processing algorithm has been helping the New York health system gain new insights from its unstructured notes, boosting its value-based care and pop health efforts.
By Mike Miliard
03:41 PM

By now everyone knows how critical social determinants are to population health and chronic care management. But what's not so obvious is how to make use of that valuable information, especially given most health systems' reliance on structured electronic health record data for their analytics projects.

Indeed, the challenge of documenting social data in the EHR – and then extracting it for analytics – is one of the main impediments to more progress on that front, said Dr. Jacob Reider, former chief medical officer and deputy national coordinator at ONC – and now CEO of Troy, New York-based Alliance for Better Health, where he's devoting himself to removing those and other barriers.

"In the medical universe we had ICD, and SNOMED-CT that we would use to capture information," he explained in a recent Q&A with Healthcare IT News. "We had a reasonably good way of representing the fact that a patient had diabetes, using a certain coding system."

On the other hand, "we don't have a very good consistent, predictable, repeatable mechanism for expressing that somebody has food insecurity needs," he explained. "Or is a domestic violence victim. Or has a behavior health challenge that makes it difficult to leave the house. Or that they have transportation challenges. We don't have good ways of representing those things."

Until structured clinical terminologies for SDOH data are decided-upon and widely deployed, hospitals can do a lot with the unstructured verbal descriptions they already have in their EHRs, one data scientist showed this past week at the HIMSS Machine Learning and AI for Healthcare event in Boston.

Varun Gupta, IT director, advanced analytics and data management, at New York's Mount Sinai Health System – which earlier this spring launched a new Institute for Digital Health to explore new innovations in AI and ML applications – showed howed how clinicians and case managers there can use natural language processing algorithms to gain access to valuable social data captured in progress notes, procedure and consultation data, discharge summaries and more.

"It's all there in unstructured format," Gupta explained. "And there's a huge opportunity to use that data to find out insights. That's where natural language processing comes in."

Beyond the obvious benefits to patient health and wellness, such opportunities have "big implications for the revenue side and reimbursement," he said.

For instance, even just the written note that a given patient "used to be a smoker" is more valuable than a structured data field saying that they smoke or don't smoke, he explained.

"When a patient is quitting smoking there are big health changes that are happening – the patient is taking care of his health," said Gupta. "That's where the value-based care comes in the picture."

He added that "there are a lot of innovations where we've seen natural language processing coming up in a big way," and the ability to unlock and make use of unstructured social data within the EHR is one of the biggest opportunities.

"We knew there were problems waiting to be uncovered"

As its technology base for this initiative, Mount Sinai first looked toward its Oracle-based clinical data warehouse, Gupta explained.

It wasn't long before he and others on the IT side "knew we needed to do something to advance the technology and move in the right direction," he said.

So the health system deployed a Microsoft Azure-based cloud infrastructure and set up a big data lake to help offer a proving ground for some of these projects.

"It started as a lab project with a few use cases, but slowly and gradually over the years we grew it in a way that most of our data scientists and analysts come into that space and try out their advance algorithms," said Gupta.

Importantly, the team also implemented a FHIR-based API layer on top of it its clinical datasets, which "increased our usability and reduced our time to market," he said. "We can complete projects in weeks that used to take months."

The next challenge, Gupta explained was that Mount Sinai researchers "had to decide which social determinants we wanted to focus on first."

The first group were those deemed most high-impact – in other words, those able to give the most immediate ROI. These, he said, centered around economic factors (food security, housing, nutrition); education; health system factors (insurance, language barriers); and physical environment (transportation, home safety).

In Phase 2, a bit later, researchers looked at factors such as behavioral health, legal issues, social support and physical activity. And in Phase 3 it looked at more narrowly-focused safety issues such as fall safety, special health needs, etc.

With some 200 million clinical notes derived from the EHR added to the data lake, next was the sometimes arduous process of information management, said Gupta: extraction, deduplication, collation, corpus creation, garbage filtering – and, crucially, spellcheck.

"It seems like a simple step but it's a really important part of doing this," he said.

A clean data set also depended on thorough synonym replacement, he added, as well as  lemmatization – whereby inflectional endings are removed from terms, "reducing the words to their basic grammatical form," the lemma.

The "most challenging part," said Gupta, was arriving at the ultimate "bag of words" that would be consulted by the NLP program.

"It was easy from a technical background, but it was more challenging from a clinical standpoint," he explained, and necessitated that clinicians and IT staff had to work closely together to ensure that the right words that might connote homelessness in the unstructured EHR note – for instance,  "doesn't have a house" or "sleeping on the couch" – were included.

Once arrived at, the machine learning algorithm was able to use that bag of words, plus the lemmatized list of notes, to assign SDOH data to certain patients or encounters to denote social needs and track whether things have improved for the patient over time, said Gupta.

"We were happy with the results but really not surprised," said Gupta. "We knew there were problems waiting to be uncovered."

Indeed, between August 2016 and October 2018, across some 7.2 million encounters with more than 226,000 Medicaid patients, Mount Sinai found that 31 percent of patients had at least one social determinant factor that could be impacting their health.

Prior to the deployment of AI to mine that free text data, "this was a population that, through structured data, we didn't have any insights," said Gupta.

Challenges, best practices

The initiative hasn't been without its hurdles, of course. For one, arriving at an ontology – how physicians write their notes, which can be highly-specific – unique to Mt Sinai, but universal enough to effectively capture SDOH was a challenge.

Also, testing the algorithm of the output was labor intensive, Gupta said. "It takes some time, but once you're there you can use them in many different ways."

So was integrating the findings into clinical workflows and ensuring they were used by care managers – something Mount Sinai solved by making smart use of APIs and dashboards.

Integration of those newly-generated outcomes back to the individual patients' Epic chart was also a challenge – something the hospital leaned on third-party vendors such as Tableau to help with.

For health systems looking to try similar projects, Gupta offered a list of best practices.

First, analyze your available data to take stock of what's available. Second, identify opportunities to approach it that fit best within existing processes and will be the most immediately beneficial.

It's OK to start small, he said: "Maybe one or two (social determinants) is more sensible than boiling the ocean."

But after starting simply, it's key to be comprehensive in the approach to how those social factors can be tackled. And toward that goal, it's important to develop a roadmap, he said.

At Mount Sinai, the benefits gained by mining that previously underused data have been considerable so far, from both a clinical and financial perspective.

"AI is always challenging, but I think we're getting there," said Gupta.

Twitter: @MikeMiliardHITN
Email the writer: mike.miliard@himssmedia.com

Healthcare IT News is a publication of HIMSS Media.

Want to get more stories like this one? Get daily news updates from Healthcare IT News.
Your subscription has been saved.
Something went wrong. Please try again.