
Algorithmic bias management is one of the key approach areas for AI governance groups in academic medical centers, and the new research of the ICAHN Medicine School in Mount Sinai in New York provides a good reminder of why this capital approach is so important. Researchers found that generative AI models can recommend different treatments for the same medical condition based only on the socio -economic and demographic fund of a patient.
Their findings are detailed in the online edition of April 7, 2025 of Nature Medicine In an article entitled «Sociodemographic biases in making medical decision making by large language models: an analysis of multiple large -scale models.»
As part of their research, the researchers tested nine large language models (LLMS) to stress in 1,000 cases of emergency department, each replicated with 32 history of different patients, generating more than 1.7 million medical recommendations generated by AI. Despite the identical clinical details, the models of the AI occasionally altered their decisions based on the socioeconomic and demographic profile of a patient, affecting key areas such as triage priority, diagnostic tests, treatment approach and mental health evaluation.
The study summary establishes that compared to a baseline derived from a doctor and the control case of each model without sociodemographic identifiers, cases labeled as black or unleashed or identified as LGBTQIA+ were directed more frequently towards urgent care, invasive interventions or mental health evaluations. For example, certain cases labeled as being of LGBTQIA+ subgroups were recommended mental health evaluations approximately six to seven times more often than what is indicated clinically.
On the other hand, the cases labeled such as a state of high income received significantly more recommendations for advanced image tests, such as computerized tomography and magnetic resonance images, while marked cases with low and medium -sized revenues were often limited to basic or non -additional tests. After applying multiple hypothesis corrections, these key differences persisted. Its magnitude was not backed by clinical reasoning or guidelines, suggesting that they can reflect the bias driven by the model, which could eventually lead to health disparities instead of an acceptable clinical variation, according to abstract.
In a statement, the co-senior author Eyal Klang, MD, head of Generative-AI in the Windreich artificial intelligence department at the ICAHN School of Medicine in Mount Sinai, explained the importance of the study: “Our research provides a framework for the IA assurance, we help the developers and the health institutions to design just and tools.
Researchers warn that the study represents only a snapshot of AI behavior. Future research will continue including assurance tests to evaluate how AI models in real world clinical environments work and if different incorporation techniques can reduce bias. The team also aims to work with other medical care institutions to refine AI tools, ensuring that it maintains the highest ethical standards and treat all patients fairly.
Next, the researchers plan to expand their work simulating clinical conversations of several steps and piloting AI models in hospital environments to measure their impact on the real world. They expect their findings to guide the development of policies and best practices for the guarantee of AI in medical care, promoting confidence in these new and powerful tools.