CLTX#
1#
Latent profiles of deceased organ donation registrants and non-registrants in the United States#
Editors#
Machine learning has a guaranteed place in the future of clinical medicine.
But for now, these authors have failed to offer a compelling place for it in this specific instance.
A simple logistic regression of all the factors they studied on “registration for organ donation” would help ground their work using methods that are familiar and transparent us. The predictive accuracy of the traditional and transparent vs. “unsupervised” and opaque would be the simplest way to motivate their innovation.
That said, the paper is NOT written in the style a clinical audience expects and deserves. Nothing short of complete redesign of this effort would make this paper appropriate for the journal.
2#
ChatGPT Solving Complex Kidney Transplant Cases: A Comparative Study with Human Respondents#
Editors#
Show code cell source
import matplotlib.pyplot as plt
import numpy as np
# Clock settings; f(t) random disturbances making "paradise lost"
clock_face_radius = 1.0
number_of_ticks = 8
tick_labels = [
"Vision", "Auditory", "Chemistry", "Language",
"Barometry", "Pain", "Temperature", "Gravity"
]
# Calculate the angles for each tick (in radians)
angles = np.linspace(0, 2 * np.pi, number_of_ticks, endpoint=False)
# Inverting the order to make it counterclockwise
angles = angles[::-1]
# Create figure and axis
fig, ax = plt.subplots(figsize=(8, 8))
ax.set_xlim(-1.2, 1.2)
ax.set_ylim(-1.2, 1.2)
ax.set_aspect('equal')
# Draw the clock face
clock_face = plt.Circle((0, 0), clock_face_radius, color='lightgrey', fill=True)
ax.add_patch(clock_face)
# Draw the ticks and labels
for angle, label in zip(angles, tick_labels):
x = clock_face_radius * np.cos(angle)
y = clock_face_radius * np.sin(angle)
# Draw the tick
ax.plot([0, x], [0, y], color='black')
# Positioning the labels slightly outside the clock face
label_x = 1.1 * clock_face_radius * np.cos(angle)
label_y = 1.1 * clock_face_radius * np.sin(angle)
# Adjusting label alignment based on its position
ha = 'center'
va = 'center'
if np.cos(angle) > 0:
ha = 'left'
elif np.cos(angle) < 0:
ha = 'right'
if np.sin(angle) > 0:
va = 'bottom'
elif np.sin(angle) < 0:
va = 'top'
ax.text(label_x, label_y, label, horizontalalignment=ha, verticalalignment=va, fontsize=10)
# Remove axes
ax.axis('off')
# Show the plot
plt.show()
Show code cell output
1. f(t)
\
2. S(t) -> 4. y:h'(t)=0;t(X'X).X'Y -> 5. b -> 6. SV'
/
3. h(t)
\(f(t)\) Each mode has its own data properties and pdf
\(S(t)\) Cumulative inputs (i.e., greater data, more modes) is where the edge is
\(ht(t)\) Quicker processesing of a lot of information
\((X'X)^T \cdot X'Y\)
\(\beta\) Diagnosis as a single word is a tidy token that GPTs were trained to predict
\(SV'\) But “attention” to the context of the key words yields an
unexamined
edge 38
This is a very simple and clearly written analysis of a topical issue. I think its worth publishing, after the authors address the concerns the reviewer raised, as well as the issues I’ve presented.
Authors#
The authors assessed the accuracy of different versions of ChatGPT (3.5, 4, and 4V) in responding to kidney transplant cases from historical quizzes of the American Society of Nephrology (2015, 2014, 2013). There were two cases for each year, a case summary, lab results, an image (e.g. head CT), and multiple choice answers: for instance, which of the following is the most likely cause of her chest pain and CXR findings? [A] Lung cancer, [B] Recurrent breast cancer, [C] Pneumonia, [D] Sarcoidosis.
Performance was benchmarked to nephrology fellows, transplant program directors, and an audience. Figure 1 offers a very powerful & accessible summary of the findings. GPT-4V almost matched the transplant program directors overall performance across the three years studied.
But visual inspection reveals heterogeneity in the hierarchies across the years (e.g. the fellows outperformed GPT-4V in the 2013 quiz but fell short in the other years). Are the differences in performance witnessed over just a handful of quesionnaire items statistically significant? Are they clinically meaningful?
Can you comment on multimodal chatbots that are widely available right now. GPT-4o, for instance, is more recent than GPT-4V and can process not only images, but also audio, video, uploaded .PDF or .CSV files, extending utility beyond text and static images. Much more detailed clinical detail can be processed by these newer versions of chatbots by “reading” the entire patient record on file in a matter of seconds or minutes, including all available images, and perhaps even audio records from the physicians. Isn’t this study design under-selling the potential edge these chatbots may actually have? And if these approaches are limited by HIPAA, then can the authors address these issues in the revised discussion?51
Second, to avoid seeming biased towards one-specific industry product, can the authors comment on other chatbots on the market including claude.ai, meta.ai, perplexity.ai, gemini, etc? How would the authors advise a clinician to approach these options? ChatGPT has been an industry leader, but this is a very dynamic and fast-growing area of competition and it would be helpful to place ChatGPT in the broader context