Below is the AI-generated summary of the March 8 seminar, with minor edits made by the AI Workgroup Leadership.

Quick recap


Raj discussed the potential impact of AI on medical diagnosis, its current limitations, and the need for further research. He highlighted the issue of racial bias in health algorithms and the possibility of removing race as a variable in the estimation of GFR. Raj also emphasized the importance of case studies in medical education, the potential use of AI in medical diagnosis, and the performance of the GPT-4 model on challenging medical cases.




AI's Impact on Medical Diagnosis and Race Bias in Health Algorithms

Raj led a discussion on the potential impact of AI on medical diagnosis, highlighting the complexities and alternatives to the current approach. He emphasized the issue of racial bias in health algorithms, particularly in the estimation of Glomerular Filtration Rate (GFR), which could affect clinical decisions and potentially widen the gap between different racial groups in terms of chronic kidney disease prevalence. Raj also mentioned the possibility of removing race as a variable in the estimation of GFR. The discussion ended with the mention of a national task force recommending the immediate implementation of a new equation that removes race as a variable. The team's focus on using AI in medical diagnosis, their collaboration with clinicians, and their podcast sharing career journeys into AI were also discussed.

AI and Case Studies in Medical Diagnosis

Raj discussed the importance of case studies in medical education, using the example of Clinical Pathological Conferences published by Massachusetts General Hospital and the New England Journal of Medicine. He highlighted a specific case of a 49-year-old man with recurrent hypoglycemia, which was eventually diagnosed as an insulinoma, a type of tumor in the pancreas. Raj also touched upon the potential use of AI in medical diagnosis, demonstrating a scenario where an AI model was asked to provide a diagnosis for a patient's symptoms. He confirmed the accuracy of the AI's response by cross-checking the case details on the New England Journal Medicine website. Raj also emphasized the importance of interactive models in diagnosing patient symptoms and outlined the steps taken to confirm a diagnosis, including fasting the patient and measuring blood glucose levels, as well as the use of imaging studies.

GPT-4 Model's Performance on Medical Cases Discussed

Raj discussed the performance of the GPT-4 model on challenging medical cases, noting that it achieved a top score on many instances, even across a large sample of 70 cases. He also mentioned the model's multimodal reasoning abilities, its accuracy was compared to human respondents, showing it to be more accurate across difficulty levels. Raj also noted that the model's performance improved when both text and images were provided, but performance dropped when additional images were added, hypothesizing that the informative text could distract the model. Finally, Raj mentioned the model's ability to perform well on multiple choice questions and licensing exams, noting that these results are now almost taken for granted.

AI in Healthcare: Progress and Potential

Raj discussed the progress and potential of AI models in healthcare, particularly in the context of diagnostics and patient communication. He highlighted a study that showed chatbot responses were rated as more empathetic and higher quality than human responses. He also shared a story about a boy who was diagnosed with a chronic pain condition using a chatbot, underscoring the potential for these models to assist in complex diagnoses. Raj emphasized the importance of human oversight in these processes, as the chatbot in the story suggested a diagnosis that was later confirmed by a physician. He concluded by noting that the capabilities of these models have advanced beyond what was expected just a few years ago.

AI Models' Limitations and Risks Discussed

Raj discussed the potential drawbacks and limitations of AI models such as ChatGPT and GPT-4. He highlighted that these models, while impressive, can fabricate information and make mistakes that human professionals likely wouldn't. He explained the training process of these models, emphasizing that they start with predicting the next word based on internet-scale text and then undergo fine-tuning using human responses. Despite this process, the models can still produce harmful content. Raj also raised several unanswered questions about these AI models, including their representative data, their evolving performance, and the values embedded in them.

Model Failures and Uncertainty in AI

Raj emphasized the importance of understanding model failures and the need for more research in this area. He acknowledged the limitations of current models, particularly their inability to perform basic arithmetic and write complex essays. He also discussed the issue of 'leakage' or uncertainty in the training data of the model, and the need for more experiments with data not available on the internet. Raj concluded that while there are concerns about overfitting, the impact may not be as significant as previously believed. The discussion also touched upon the role of values in shaping both human interactions and the models themselves, with Raj highlighting the need to get to know the models to elicit the desired output.

AI Diagnostic Shortcomings and Chat GPT 3.5 Usage

Kathy shared her insights on AI's shortcomings, particularly in recognizing anatomical features of chest images, and how this could lead to diagnostic errors. She expressed interest in understanding why AI struggles with these tasks. Kathy also mentioned her use of Chat Gpt 3.5 to generate podcast topics and experts related to AI's role in diagnosis. Tinglong concluded the session, thanking everyone and announcing the next month's topic featuring Dr. Suchi Saria.