Design History File and Quality Management System Essentials

The documentation infrastructure regulators expect: design inputs, design outputs, verification, validation, risk management, and why a QMS must be in place before your first submission.

There is a moment in time that, if you’re not having occassional anxiety about, you should probably start. A reviewer opens your Design History File (DHF), turns to a seemingly random page, and asks: “You retrained your model in July. Walk me through exactly how you selected the training data for version 3.2.” They want to know what the inclusion criteria were, who applied them, and how you know 100% of the images met those criteria.

They’re obviously not expecting you to know the answer off the top of your head, rather they want to know if your documentation is sufficent to answer the questions. If you start reconstructing from memory or piecing together random notes, your credibility takes a hit.

The DHF is the regulatory foundation of your device. It’s explained in 21 CFR Part 820.30 (Design Controls). For AI/ML devices, the DHF is where regulators verify that you actually built what you said you’d build, tested it the way you said you would test it, and made decisions based on evidence rather than guessing.

What Goes Into the Design History File

The DHF is a compilation of records from across your development process. For an MLMD (machine learning medical device), it must include the categories listed below. I’ve included a bunch of questions that you should be able to answer by referring to that section of the DHF.

Data Management Records

Data source documentation: where did each piece of training or validation data come from? (hospital name, date range, inclusion/exclusion criteria)
Data selection methodology: how did you decide which data to use? Did you take consecutive cases, random sample, stratified by outcome?
Data preprocessing logs: what transformations did you apply? (normalization, augmentation, balancing)
Annotation records: who annotated the data? What was the annotation protocol? Did you have inter-rater agreement checks?
Ground truth methodology: how did you establish “truth”? Clinical consensus, histopathology, long-term follow-up?
Dataset versioning: every version of your training set, validation set, and test set is tracked and named (dataset_training_v2.3, etc.)

Model Development Records

This is everything related to building the actual model.

Architecture rationale: why did you choose this model architecture? Why this number of layers, this activation function? If you tried multiple architectures, what did you learn from each?
Training logs: every training run. Not just the final model, but the unsuccessful attempts too. What data, what hyperparameters, what was the loss function, what was the training stopping criterion? This is called the “model development history”.
Baseline comparisons: what did you compare against? Other published models? Clinician performance? A simpler classical ML model? Document the results of these comparisons.
Version control: every version of the model is tracked. Assign a consistent naming scheme to models so you can easily refer to them in other parts of the DHF: model_v1.0, model_v1.1, etc. You can tell when each was created and what changed from the prior version.

Verification and Validation Records

Verification: Link back to the project foundation documents like the SRS and architecture model. Does the model match your design spec? Is the architecture correct? Did you train on the right data?
Validation: does the model solve the clinical problem? What is sensitivity, specificity, NPV, PPV? Did it perform well in the intended use population?
Test set results: on your held-out test set, what were the metrics? Break them down by subgroup (did performance differ significantly in women vs. men? Older vs. younger patients?)
Clinical comparison: if available, how did the model compare to clinician interpretation? Did it outperform experienced radiologists? Match them?
Edge case testing: how did the model handle difficult cases, unusual image quality, demographic outliers?
Subgroup analysis: performance stratified by age, sex, disease severity, body habitus, anything clinically relevant

Risk Management Records

According to ISO 14971 (Medical Device Risk Management), your DHF must include:

Risk identification: what could go wrong? (false negatives, false positives, model drift, adversarial examples, algorithmic bias)
Risk analysis: for each hazard, what’s the severity and probability?
Risk control measures: how will you mitigate these risks? (training on diverse data, monitoring performance in the field, user alerts for uncertain predictions). You can link your mitigation to sections in other documents that address this risk, and list this risk within those documents for traceability.
Residual risk evaluation: after mitigation, what risk remains? Is it acceptable?

Design Review Records

At key milestones (after design spec, after model training, before final validation, before release), you hold design reviews. Document who attended, what was discussed, what decisions were made, and what the rationale was. These meetings are where your team catches problems early.

Requirements Traceability

This is how you link everything back to the original requirements. Create a traceability matrix that shows:

Each requirement from your SRS
The design element(s) that address it
The test case(s) that verify it
The validation evidence that shows it works

A Pretty Common Mistake: Retrospective Documentation

Researchers rarely wake up in the morning excited about the latest entry to the risk matrix they’ll make that day. Documentation is usually an afterthought in dev teams. Development happens (somewhat messily, as development always does), and then, when it’s time to file for regulatory approval, they go back and write the DHF. This is immediately obvious to experienced FDA reviewers, and it destroys credibility. Retrospective documentation has telltale signs. If you end up doing this, at least avoid these obvious tells:

Records are vague (“we used representative images from our institution”)
Dates are suspiciously aligned with submission dates
Training logs are missing for “preliminary runs”
Design decisions lack rationale (“we chose this architecture”) or have post-hoc rationale (“this architecture is standard in the field”)
Risk assessments read like checklists rather than actual decision records (“all risks considered and mitigated”)

Contemporaneous documentation, in contrast, is specific and timestamped:

“Dataset training_v2.2 created 2025-03-15. Inclusion: all consecutive chest X-rays from Hospital A, Jan-Dec 2024, age 18+, English language reports. Exclusion: images with motion artifact, non-standard positioning (documented in exclusion log). 5,247 images screened, 4,823 met criteria. Annotated by Dr. Smith and Dr. Jones; inter-rater agreement 96% on presence of target finding.”
“Training run 47: model_v2.0 trained on dataset_training_v2.2 with hyperparameters [list]. Validation loss plateaued at epoch 150 (training stopped per protocol). Test set performance: sensitivity 94.2%, specificity 93.8%, AUC 0.962. Comparison to model_v1.9: +1.3% sensitivity, +0.5% specificity.”

How do you ensure contemporaneous documentation? Build your DHF as you build your model. You can do this manually with binders and paper, use a Google doc or similar shared folder, or use eQMS software to manage documents. Be prepared to harangue your data scientists relentlessly.

(Or, you can explore this site a bit more: this is exactly the problem Datamint is solving. )

Quality Management System (QMS) Overview

The DHF exists within the broader context of a Quality Management System. The QMS is your organizational framework for ensuring that devices are designed, built, tested, and released safely.

The regulatory standards that govern QMS for medical devices are:

ISO 13485 (Quality Management Systems for Medical Device Manufacturers). This is an international standard that addresses design controls, document control, supplier management, production controls, labeling, complaint handling, etc.
21 CFR Part 820 (FDA’s Quality System Regulation, or QSR). This is the US regulatory requirement. It’s harmonizing with ISO 13485; starting 2026, FDA will transition to the Quality Management System Regulation (QMSR), which more closely aligns with ISO 13485.
Health Canada MDSAP (Medical Device Single Audit Program). If you’re selling in Canada, your QMS must meet Canadian standards, which largely align with ISO 13485.

Your QMS must be in place before your first submission to FDA.

Key components of a QMS include:

Document control: how you manage design specifications, procedures, training records (version control, approval workflows, retention)
Design control: the process for designing new devices or modifications (design reviews, traceability, change control)
Risk management: identifying and mitigating risks per ISO 14971
Supplier management: if you buy third-party tools or components, how you verify they’re suitable
Production and process controls: how you manufacture or compile the final device
Change control: when something changes (new version of a library, new training data), how you assess the impact and document the change
Complaint handling: how you receive, investigate, and respond to reports of device problems
Post-market surveillance: how you monitor device performance after deployment and report safety issues to regulators
Training and competency: ensuring your team understands their responsibilities
Records management: how long you keep DHF records, validation records, complaint records (typically 2-5 years after device is retired)

Integrating DHF and QMS for AI/ML Devices

For traditional medical devices, this is all well-established. For AI/ML devices, there are some AI-specific considerations:

Training Data is a Controlled Input

In traditional medical devices, manufacturing inputs (materials, components) are tightly controlled. For AI, the training data is an equivalent input. Your QMS must have procedures for:

Data source validation (confirming the data came from where you think it did)
Data quality checks (screening for corrupted files, nonsensical labels, duplicates)
Data selection review (ensuring inclusion/exclusion criteria were applied consistently)
Annotation protocol and quality (inter-rater agreement, adjudication of discrepancies)

Model Versioning and Configuration Management

Your QMS must have procedures for tracking model versions and the relationships between model versions, code versions, and data versions. This is where your configuration management (discussed in the IEC 62304 article) becomes a QMS process.

Validation is Ongoing

For traditional devices, validation is a one-time event at the end of development. For AI, validation should be ongoing. Your QMS should include procedures for:

Real-world performance monitoring (does the model perform as expected in actual clinical use?)
Drift detection (is performance declining over time?)
Retraining and re-validation (per your PCCP, if applicable)

Risk Management for AI-Specific Hazards

Your risk management plan (ISO 14971) must address AI-specific risks that traditional devices don’t have:

Data drift: the characteristics of real-world data diverge from training data
Model drift: performance degrades as the population changes
Algorithmic bias: the model performs differently for different demographic groups
Adversarial examples: carefully crafted inputs that fool the model
Concept drift: the clinical meaning of features or labels changes over time

Building Your DHF from Day One

The product sold by your humble hosts, Datamint, can be used to create all this documentation more-or-less effortlessly while also providing all the tools you need for your research like segmentation, model training, and so on. But you don’t need to use software, you can do it manually, and for small projects that might be sufficient. Here’s an example starting point:

Set up a DHF folder structure:

DHF/
├── 1_Planning/
│   ├── Development_Plan.pdf
│   └── Risk_Management_Plan.pdf
├── 2_Requirements/
│   └── Software_Requirements_Spec.pdf
├── 3_Design/
│   ├── Architectural_Design.pdf
│   ├── Model_Architecture_Diagram.pdf
│   └── Design_Review_Minutes/
├── 4_Implementation/
│   ├── Training_Logs/
│   ├── Dataset_Records/
│   └── Model_Version_History.csv
├── 5_Verification/
│   └── Training_Verification_Report.pdf
├── 6_Validation/
│   ├── Test_Results.xlsx
│   ├── Subgroup_Analysis.xlsx
│   └── Clinical_Comparison.pdf
├── 7_Risk_Management/
│   ├── FMEA.pdf
│   └── Risk_Mitigation_Records/
└── 8_Design_Reviews/
    └── Meeting_Minutes/

Assign ownership:

Someone on your team needs to own DHF maintenance. This person doesn’t write all the documentation, but they ensure it gets written, organized, and retained. Often, this is a quality or regulatory person.

Establish procedures:

Write down how you’ll document training runs, dataset changes, design decisions, and reviews. Make these procedures part of your QMS so everyone knows what’s expected.

Start now:

Even if you’re still in research mode, start documenting. You can segregate research work from the eventual submission-ready work, but contemporaneous documentation is irreplaceable.

Key Takeaways

The Design History File (DHF) is the central regulatory artifact. It’s evidence that you designed and built your device systematically and with oversight.
Your DHF must include data management records, model development records, verification and validation results, risk assessments, and design review records.
The most common mistake is retrospective documentation. Build your DHF contemporaneously during development, not after.
A Quality Management System (ISO 13485, 21 CFR Part 820, or equivalent) must be in place before your first regulatory submission.
For AI/ML devices, your QMS must address AI-specific needs: controlled training data, model versioning, ongoing validation, and AI-specific risk management.
Start building your DHF from day one of development. It’s less work to document continuously than to reconstruct retrospectively.

What to Read Next

6.9 Software Lifecycle Requirements: IEC 62304: the standard that defines what goes into your DHF
6.8 Predetermined Change Control Plans: planning for post-clearance model updates
6.1 When Does Your AI Algorithm Become a Medical Device?: start here if you’re not sure you need a QMS yet
2.3 Data Annotation and Labeling for Clinical AI: the annotation records your DHF must contain

This article is part of the AI in Clinical Research Knowledge Base. Regulatory references: 21 CFR Part 820 (Quality System Regulation), ISO 13485 (QMS for Medical Devices), IEC 62304 (Software Lifecycle), ISO 14971 (Risk Management), Health Canada MDSAP. Consult FDA’s Design Control guidance and Design History File guidance for additional detail.