Join our daily and weekly newsletter for the latest updates and exclusive content on industry-leading AI coverage. learn more
Whenever a patient undergoes a CT scan at the University of Texas Department of Medicine (UTMB), the resulting images are automatically sent to the Cardiac Disease Department, analyzed by AI, and assigned a cardiac risk score.
In just a few months, thanks to simple algorithms, AI flagged multiple patients with high cardiovascular risk. A CT scan does not need to be heart-related. The patient does not need to have heart problems. All scans will automatically trigger an evaluation.
This is a simple preventive care enabled by AI, allowing medical facilities to eventually begin to take advantage of the vast amount of data.
“The data is just sitting there,” Peter McCaffrey, head of AI at UTMB, told VentureBeat. “What I like about this is that AI doesn't have to do superhuman things. It's doing low intelligence tasks, but it's a huge amount of stuff and it still offers a lot of value.
He said, “We know we miss things. We just didn't have the tools to go back and find.”
How AI helps UTMB determine cardiovascular risk
Like many healthcare facilities, UTMB applies AI to many areas. One of the first use cases is cardiac risk screening. The model is trained to scan for accidental coronary calcification (ICAC), a powerful predictor of cardiovascular risk. The goal is to identify patients susceptible to heart disease, which may be overlooked because they do not show obvious symptoms, McCaffrey explained.
Through the screening program, all CT scans completed at the facility are automatically analyzed using AI to detect coronary calcification. Scanning has nothing to do with heart disease. It may be ordered due to spinal fractures or abnormal pulmonary nodules.
The scan is fed to an image-based convolutional neural network (CNN) that calculates an Agathstone score representing the accumulation of plaque in the patient's artery. McCaffrey explained that this is usually calculated by a human radiologist.
From there, AI assigns patients with an ICAC score of 100 or more to three “risk layers” based on additional information (either using statins or visiting with a cardiologist). McCaffrey explains that this assignment is rule-based and can be derived from discrete values in electronic health records (EHR), and AI can determine the values by using GPT-4o to process free texts such as clinical visit notes.
Patients flagged with a score of 100 or more without a known history of heart disease visits or treatment will automatically be sent a digital message. The system also sends notes to the major doctors. Patients identified as having a more serious ICAC score of 300 or higher will also receive calls.
McCaffrey explained that almost everything is automated, except for the phone. However, the facility is a tool that actively pilots the game in the hopes of automating voice calls as well. The only area where humans are in the loop is to check the AI-derived calcium score and risk layer before proceeding with automatic notifications.
Since launching the program in late 2024, health facilities have evaluated approximately 450 scans per month, with five to 10 of these cases being identified as high-risk each month, and intervention is required, McCaffrey reports.
“The point here is that no one suspects you have this disease, and no one needs to order a study of this disease,” he pointed out.
Another important use case for AI is the detection of stroke and pulmonary embolism. UTMB uses a specialized algorithm trained to find specific symptoms and flag care teams within seconds of imaging to accelerate treatment.
Like ICAC scoring tools, CNNs are trained in stroke and pulmonary embolism, each receiving a CT scan automatically, looking for indicators such as blocked blood flow and sudden vascular cut-offs.
“Human radiologists can detect these visual properties, but here the detection is automated and occurs in just a few seconds,” McCaffrey said.
CT ordered “under suspicion” of a stroke or pulmonary embolism is automatically sent to the AI. For example, ER clinicians may sag the face or issue “CT stroke” orders, causing algorithms.
Both algorithms include messaging applications that notify the entire care team as soon as discoveries are made. This includes screenshots of images with lateral hair above the lesion location.
“These are specific emergency use cases where the speed at which treatment is important is how quickly and important,” McCaffrey said. “We've seen cases where we can get a few minutes of intervention because we've lifted our heads faster from the AI.”
Reduced hallucinations, anchor bias
To ensure that the model performs as best as possible, UTMB profiles them for sensitivity, specificity, F-1 score, bias, and factors both pre- and repeated post-development.
Thus, for example, the ICAC algorithm is validated prior to deployment by running the model with a balanced CT scan while the radiologist manually scores. Then compare the two. Meanwhile, post-development reviews give radiologists a random subset of CT scans of AI scores, and complete ICAC measurements blinded to AI scores. McCaffrey explained that this allows the team to recur and recur with model errors and detect potential biases (are considered a change in error magnitude and/or direction).
Because they rely so heavily on the initial information encountered by AI and humans, UTMB employs a “peer learning” technique because they lack important details when making decisions to help prevent anchor bias. A random subset of radiation tests is selected, shuffled, anonymized, distributed to various radiologists, and their responses are compared.
Not only does this help to assess the performance of individual radiologists, but it also helps to detect whether the rate of missed findings is higher in studies that have been used to specifically highlight certain abnormalities (and thus leading to bias fixation).
For example, if AI was used to identify and flag x-ray fractures, the team would look at whether fracture flag studies increased the rate of errors in other factors, such as joint spatial stenosis (common in arthritis).
McCaffrey and his team found that consecutive model versions within the class (various versions of GPT-4O) and within the class (GPT-4.5 vs. 3.5) tend to have lower hallucination rates. “But this is non-zero and non-deterministic, so it's fine, but you can't ignore the possibility and impact of hallucinations,” he said.
So they are usually drawn to generation AI tools that do a good job of quoting sources. For example, a model that summarises a patient's medical course while emerging clinical notes that serve as the basis for their output.
“This allows providers to function efficiently as a safeguard against hallucinations,” McCaffrey said.
Flag “basics” to enhance healthcare
UTMB also utilizes AI in several other areas, including automated systems that help determine whether inpatient admissions are justified. The system acts as a co-pilot, automatically extracts all patient notes from the EHR and uses Claude, GPT, and Gemini to summarise and examine them before presenting assessments to staff.
“This allows us to handle the entire patient population and filter/triage patients,” explained McCaffrey. This tool will assist personnel in drafting documents to support admissions and observations.
In other areas, AI is used to review reports such as echocardiology interpretations and clinical notes to identify gaps in care. In many cases, “it's simply flagging the basics,” McCaffrey said.
Healthcare is complex and includes data feeds from anywhere – images, doctor notes, and lab results – not calculated simply because human talent was not sufficient.
This led to what he described as “a massive, large intellectual bottleneck.” Many data are simply not calculated, despite being aggressive and very likely to find things before.
“That's not accusations in a particular location,” McCaffrey emphasized. “It's generally a healthcare state.” Without AI, “we can't deploy intelligence, scrutiny and thought work on the scale necessary to catch everything.”