By Dave DeFusco
A Katz School study accepted to the prestigious computer vision conference CVPR 2026 in June is tackling a problem that has long limited the usefulness of artificial intelligence in medicine: not just seeing what is in a medical image, but explaining where it is in a way doctors can trust.
The paper, “CG-Reasoner: Centroid-Guided Positional Reasoning Segmentation for Medical Imaging with a Robust Visual-Text Consistency Metric,” introduces a system designed to do two things at once. First, it identifies areas of concern in medical images, such as tumors or lesions. Second, it explains their location in clear, human-like language. This development marks an important shift in how AI can support healthcare.
Today’s AI systems are already very good at analyzing images like X-rays, MRIs and CT scans. They can outline suspicious areas with impressive accuracy; however, these systems usually stop there. They highlight pixels but do not explain their reasoning in a way that matches how doctors think and communicate.
“In clinical practice, doctors don’t just point to a spot. They describe where it is, how it relates to nearby structures and why it matters,” said Lakshmikar Polamreddy, lead author of the study and a student in the Department of Graduate Computer Science and Engineering. “Most AI models ignore that kind of spatial reasoning. Our goal was to bridge that gap.”
The new system, called CG-Reasoner, is designed to combine visual understanding with language. It uses a type of AI known as a multimodal model, meaning it can process both images and text together. This allows it not only to detect a lesion, but to describe it by noting, for example, that a tumor is located in the upper left region of a lung.
A key innovation in the system is something called a “Text2Centroid” module, which is a component that connects words to precise locations. When the AI generates a description, it also predicts a central point—like a set of coordinates—that anchors the explanation to the actual image.
“This helps ensure the explanation isn’t just fluent, but accurate,” said Polamreddy. “The text and the image are tied together through geometry, so the reasoning reflects the real position of the lesion.”
The researchers also introduced a new way to measure how well the system performs. Traditional metrics focus only on how closely an AI’s outlined region matches the true area in an image, but they do not evaluate whether the explanation is correct.
To solve this, the team created PRScore, or Positional-Reasoning Score. This metric checks the visual accuracy and quality of the explanation, essentially asking: does the AI not only find the right spot, but describe it correctly?
In tests across six different types of medical imaging, including X-rays, MRIs and ultrasounds, the system achieved state-of-the-art results. It was not only highly accurate in identifying problem areas, but more consistent in aligning its explanations with those areas.
Assistant Professor Ming Ma, the paper’s corresponding author, said the work addresses a critical barrier to the adoption of AI in healthcare.
“Accuracy alone is not enough,” said Ma. “Doctors need systems they can understand and trust. By combining segmentation with clear, spatially grounded reasoning, we are making AI outputs more interpretable and clinically meaningful.”
Another advantage of the system is efficiency. Despite its advanced capabilities, CG-Reasoner uses a relatively lightweight design, meaning it can run without the massive computing resources often required by cutting-edge AI models.
The implications could be significant. In the future, systems like CG-Reasoner could help automate parts of medical reporting, assist doctors in making faster decisions and reduce the risk of misinterpretation.
“As AI becomes more integrated into healthcare, it has to communicate in ways that align with how clinicians think,” said Polamreddy. “This is about making AI not just powerful, but useful in real clinical settings.”