1AI Fundamentals for Clinical Researchers

What Is Artificial Intelligence? A Clinician's Primer
Supervised, Unsupervised, and Self-Supervised Learning
Common AI Architectures in Medicine
Foundation Models and Large Language Models in Clinical Research
Matching the Problem to the Approach
What AI Cannot Do: Limitations and Common Misconceptions

2Data: The Foundation of Medical AI

Why Data Quality Matters More Than Model Complexity
How Much Data Do You Need?
Data Annotation and Labeling for Clinical AI
Class Imbalance: When Rare Conditions Dominate Your Research Question
Data Preprocessing and Augmentation
Multi-Site and Multi-Device Data: The Generalization Challenge
De-Identification, Privacy, and Regulatory Compliance for Research Data
Synthetic Data and Data Simulation

3Developing AI Models

The AI Development Lifecycle: From Hypothesis to Deployment
Training, Validation, and Test Sets: Why You Need All Three
Transfer Learning: Standing on the Shoulders of Large Models
Hyperparameter Tuning and Model Selection
Overfitting, Underfitting, and Regularization
Working with an Engineering Team: The Clinician's Role
Tools and Infrastructure: What You Need to Know

4Evaluating Model Performance

5Clinical Validation and Study Design

Technical Validation vs. Clinical Validation: Understanding the Difference
Designing Retrospective Validation Studies
Designing Prospective Clinical Validation Studies
Multi-Site External Validation
Reader Studies: Measuring AI's Impact on Clinical Decision-Making
Defining Clinically Meaningful Endpoints
Reference Standards and Ground Truth in Medical AI

6Regulatory Pathways for Medical AI

7Bias, Fairness, and Responsible AI

Sources of Bias in Medical AI
Demographic Fairness: Performance Across Populations
Generalization Across Devices, Sites, and Geographies
Explainability and Interpretability in Clinical AI
Transparency and Reproducibility
Ethical Frameworks for AI in Healthcare

8Implementation and Clinical Workflow Integration

From Model to Product: The Implementation Gap
Clinical Workflow Analysis: Where Does AI Fit?
Interoperability: DICOM, HL7 FHIR, and Integration Standards
Human-AI Interaction and User Interface Design
Clinician Trust, Adoption, and Change Management
Monitoring AI Performance in Production

9Publishing and Reporting AI Research

Reporting Standards: CONSORT-AI, SPIRIT-AI, TRIPOD+AI, and STARD
Writing the Methods Section for an AI Study
Common Methodological Pitfalls in Medical AI Papers
Sharing Code, Data, and Models
Peer Review of AI Manuscripts: What Reviewers Look For

AAppendices

Glossary of AI and Machine Learning Terms for Clinicians
Recommended Reading and Courses
Regulatory Standards Quick Reference
Checklist: Is Your AI Study Ready for Publication?

Evaluating Model Performance

Technical metrics for assessing how well an AI model performs its intended task. This section gives clinical researchers the vocabulary to critically evaluate published results and their own models.

4.1

Classification Metrics: Sensitivity, Specificity, and Beyond

Accuracy, sensitivity (recall), specificity, positive predictive value, negative predictive value, and why accuracy alone is almost always misleading in medical applications.

4.2

The ROC Curve and AUC: What They Tell You and What They Hide

How to read and interpret receiver operating characteristic curves, what AUC actually measures, and the important limitations — including why a high AUC does not guarantee clinical utility.

4.3

Segmentation Metrics: Dice, IoU, and Volumetric Measures

Evaluating models that outline or delineate structures (tumors, fluid, anatomical regions). Dice similarity coefficient, intersection over union, Hausdorff distance, and when each is appropriate.

4.4

Calibration: Does a 90% Confidence Score Really Mean 90%?

Why model confidence scores are often poorly calibrated, why this matters for clinical decision-making, and how to measure and improve calibration.

4.5

Statistical Significance, Confidence Intervals, and Sample Size for AI Studies

Applying rigorous statistical methodology to AI evaluation. Confidence intervals for AUC, bootstrapping, multiple comparisons, and sample size calculations for performance studies.

4.6

Cross-Validation Strategies for Medical Data

K-fold, stratified, grouped (by patient, by site), and leave-one-site-out cross-validation. Choosing the right strategy to avoid inflated performance estimates from data leakage.

4.7

Comparing Models: When Is One Model Truly Better Than Another?

Statistical tests for comparing classifier performance (McNemar's test, DeLong's test for AUC comparison), and avoiding the pitfall of selecting models based on noise in the test set.