Software Lifecycle Requirements: IEC 62304
The international standard for medical device software development. Safety classifications, required documentation, and what this means for how your engineering team must work.
IEC 62304 is a published international standard documenting the software development lifecycle specifically for medical device software. While it’s not legally mandatory in most jurisdictions, FDA expects compliance with it. Health Canada, the EU, and most other regulators either directly require it or use it as a reference standard.
This standard applies to any software embedded in or supporting a medical device, not just AI models. But for machine learning models, it creates some unique challenges because developing an AI model isn’t the same as developing traditional software. Applying IEC 62304 to AI development means thinking about documentation way earlier than you might think, during the actual research process.
What IEC 62304 Actually Is
IEC 62304 (Medical devices, Software lifecycle processes) is published by the International Electrotechnical Commission. It’s harmonized with FDA guidance and is effectively the international gold standard for medical device software development. Experienced software people will immediately notice the strong smell of 1970s style waterfall development. While you are free to make your own judgement about the value of that style of development, you won’t change the FDA’s mind.
Legally speaking you don’t need to follow this standard in your software practices, unless your jurisdiction explicitly requires it. But if you’re seeking FDA approval for a software-based medical device, FDA reviewers will ask questions that assume adherance to the standard: “Where’s your design history file? Where are your verification and validation records? How did you manage configuration?” The answers to all these questions in a format the FDA expects live in IEC 62304.
The standard is detailed and prescriptive. And trash-talk about waterfall processes aside, it’s also reasonable in an environment where regulators want assurances that the software sausage factory is realtivey sanitary. It basically says: plan your development, document your decisions, verify that your code does what you intend, validate that it solves the clinical problem, and keep detailed records of all of this.
Software Safety Classifications
IEC 62304 categorizes software into three classes based on the risk if it fails. This determines how much documentation and rigor you need.
Class A: No Injury Possible
This class of software is largely administrative or clerical software. Examples are a patient data management system or a front-end for reading reports that a different system generated. If the software fails completely, the worst outcome is inconvenience or an annoyed doctor, but no harm to patients. For this class of software requirements are minimal. You need a documented plan, a basic requirements documents, a lightweight testing process, and records of your march through the process.
Class B: Non-Serious Injury Possible
Class B is where most AI algorithms will fall: diagnostic or decision-support software where an error could cause transient harm or non-serious injury. An example is a diagnostic AI that misclassifies a lesion. The patient might be referred for unnecessary follow-up (harm from delay or false alarm) or might miss early detection (delayed diagnosis). Both are concerning but not immediately life-threatening.
This kind of application requires full lifecycle documentation: requirements, design, implementation, unit testing, integration testing, system testing, and release procedures. You need traceability at multiple levels (every requirement must map to a design element, and every design element to test cases) and you need to think about links to your non-software specific documents such as your risk assessment matrix. Your design history file needs to be comprehensive.
This is a trap researchers should be aware of. Most scientists don’t set up formal documentation like this when they’re just doing research. The unfortunate result is that if FDA clearance is sought later on, a lot of the work needs to be redone from scratch with a controlled process.
Class C: Serious Injury or Death Possible
Class C is for software where failure could directly cause serious harm or death. A real-time monitoring system that alerts to arrhythmia is Class C. An AI for autonomous robotic surgery is Class C. An incorrect output or delayed alert could kill someone. Class C requires everything Class B requires, plus additional rigor: formal requirements reviews, multiple test phases, systematic coverage analysis, and more intense design reviews.
If your AI model falls into this category you shouldn’t try to DIY your regulatory compliance. Get a full time regulatory expert onto your team and hire an experienced consultant to help get set up.
The Eight Core Lifecycle Processes
Onto the actual development lifecycle that the standard describes. IEC 62304 specifies eight processes that span the entire development lifecycle.
1. Software Development Plan
This document is a kind of meta-plan that describes your overall software development process. IEC 62304 describes all the documentation you need, but it doesn’t provide any details, there’s a bunch of ways you can set up your processes that still satisfy the spec. This section is one way to do it. If you include these sections, you’re covered.
- Software scope (what you’re building, what you’re not)
- Development tools and processes
- Coding standards, including any linters you use
- Testing strategy
- Configuration management approach
- Risk management process
- Documentation standards
This is all pretty relevant for traditional software, and you’ll probably have to work a bit to fit your AI training processes into this framework. For an AI model you’ll also want a few extra sections:
- How you split training and test data
- What validation approach you’ll use
- How you’re handling imbalanced classes
2. Software Requirements Specification (SRS)
This document basically answers the question of what is the software supposed to do. For an AI model, your SRS defines:
- Input specifications (image dimensions, file formats, acquisition parameters)
- Output specifications (what classes can the model predict, what confidence scores does it return)
- Performance requirements (sensitivity and specificity targets)
- Environmental requirements (does it need to run on GPU, what OS)
- Usability and safety requirements (for instance, if the model is uncertain, does it have to alert the user rather than defaulting to a classification)
The SRS is written before development. All the other artifacts in the development process “trace” back to elements of the SRS.
It’s obvious to anybody who’s written a line of code that you won’t know everything at the beginning: for example you might know that the MRIs you currently have are 512x512, but you don’t know if you’ll end up downsampling them, or if you’ll encounter MRI machines that take 1024x1024 images. That’s OK, the SRS can change over time as long as you version the document and track why you’re changing it. This is an example of the “traceability” we talk about all over this knowledge base.
3. Architectural Design
The architectural design is the high-level map of your software: how you organize the code, which components are trainable (the neural network), which are fixed (preprocessing), which are auxiliary (logging, monitoring), and how all these components interact. For an AI model, this is where you document the model architecture, the training pipeline, the inference pipeline, how the model gets deployed, and how updates are delivered.
4. Detailed Design
The architectural design says “we have a neural network.” The detailed design says what that network actually looks like: number of layers, activation functions, input shape, how the loss function is defined. In old-school software, this would probaly pseudocode. For AI, it’s the layer-by-layer description of your model, plus the full training procedure. Obviously this will change over time as you experiment with the model.
5. Unit Implementation and Verification
In normal software this is the actual code repo and the unit/integration tests. In an AI context, it’s training the model and verifying that the trained artifact matches the design spec. This is your Jupyter notebooks or pytorch code plus the output metrics like specificity etc.
6. Integration and Integration Testing
If your device has multiple components (a preprocessing step, the AI model, a postprocessing step that applies clinical rules or other heuristics, a user interface, a PDF generation step for the report), you test that they work together correctly.
7. System Testing
End-to-end testing. You put a real clinical image through the whole system and verify that the output is correct. You test edge cases, error handling, and performance on diverse patient populations. For an AI model this might involve writing a demo application that takes and MRI in and spits out a clinical report, or a sample integration with an EMR system, and showing tests that the system gives expected results (i.e. results that line up with the SRS).
8. Release
A document that describes how you package, document, and deploy the software. It should describe what records you keep, what version numbers you assign, how you track what version is deployed where.
Training Data as Part of the Software Specification
In traditional software, you write code and test it. The code is the artifact: configuration files might change, but the source code is the core.
In machine learning, the analogous artifact is the training data. The training code (a python script, Jupyter notebook, whatever) exists, but is pretty boilerplate from one project to another, whereas the training data is what differentiates one approach from all the other approaches. A model trained on 1,000 images from healthy patients will behave completely differently than the same model architecture trained on 1,000 images from a diseased population.
IEC 62304 treats training data as a regulatory artifact. This means:
- Data selection criteria must be documented. You should include why you selected data from a particular location for example a given hospital, as well as any diversity criteria you used. For example, you may have leaned into a female-heavy dataset because you know as a clinician that women are 80% of sufferers of a particular disease.
- Annotation methodology must be documented. This is where you document who came up with the labelling protocol and why they chose the labels they did, how the annotators were chosen, what inter-rater agreement was required and how disputes were resolved, etc.
- Data preprocessing must be documented (normalization, augmentation, handling of missing values)
- Training procedure must be documented (learning rate, batch size, how many epochs, early stopping criteria)
- Data versioning must be tracked (training set version 2.3 was used for model 4.1)
When FDA audits your files and asks “How was model version 2.3 trained?”, you must be able to answer with specificity. What data went in? Who selected it? How was it annotated? What preprocessing was applied? What were the training hyperparameters?
It’s pretty common for all this information to go unrecorded - most AI development is done in a research context not a high-process satisfy-the-FDA environment. The unfortunate discovery is often that a bunch of this work has to be redone after the fact with documentation processes in place. Look for software that can help you with this (yes, this is the shameless plug section, explore the website you’re on).
Configuration Management (IEC 62304 Section 8)
Once you have a deployed model, you need to track every version of everything and be able to reconstruct the relationships.
Configuration management means:
- Every model version is tracked, with creation date and what changed
- Every training dataset version is tracked, with a detailed list of which images/records are in it - probably this means linking it to a dataset version, with the ability to say with confidence what was in that dataset
- Every version of the code that created the model (feature extraction code, training script) is tracked
- The relationships are clear: model version 2.0 was trained on dataset version 3.2 using training code version 1.4
You want to be able to answer questions like “What data trained model 2.3?” or “What differences are there between model 1.9 and 2.0?”.
Many teams implement this with Git for code and a spreadsheet or database for datasets and models. Tools like Datamint (your humble hosts) or others like Weights and Biases can help. Tools make this easier, more efficient, and more reliable but they’re not necessary, as long as you are religious about tagging versions and recording the relationships.
The Design History File
IEC 62304 requires a Design History File (DHF), which is a compilation of all the records from the lifecycle processes above. This is the actual file you’ll use as part of your submission, and it rolls up all the information we’ve described earlier in this section. For an AI model, your DHF includes:
- Design specifications and reviews
- Data management records (where data came from, how it was selected, how it was processed)
- Model training records (every training run, not just the final version)
- Verification records (did the trained model match the design spec?)
- Validation records (did the model achieve the performance requirements?)
- Risk management records (ISO 14971 assessment)
- Change control records (what changed between versions and why)
The DHF provides the evidence that you built the device according to plan and that it does what you say it does. FDA reviewers spend hours in DHF records. How complete and thorough your DHF is will directly influence the number of questions you get from reviewers.
Because most teams start thinking about regulation late in the process, they often build the DHF retrospectively after development is done. A reviewer can usually tell!llllllllective records are vague (“we selected images that were representative of our target population”), while contemporaneous records are specific (“on 2025-03-15, we curated dataset_v2.2 by including all consecutive images from Hospital A from 2024-Q4 that met inclusion criteria X, Y, Z; exclusion criteria listed separately; 347 images selected; annotation completed by Dr. Smith and Dr. Jones with 94% agreement on presence of target finding”).
Practical Integration with Your Engineering Workflow
IEC 62304 is a big deal if you think you might want to eventually commercialize the AI model you’re working on. It’s difficult to construct the documentation post-hoc, and many companies have discovered that they need to re-do large swaths of work if they are to satisfy FDA regulatory requirements. It is worth incorporating at least some process overhead even if it means annoying your PhD students.
Here’s how it translates to your actual engineering:
- Before you start, write down what you’re building and how (development plan)
- Before you build, write down what it should do (requirements spec)
- As you build, document your design decisions with rationale
- As you train, log your training runs, datasets, and hyperparameters
- As you test, document what you tested and what the results were
- Before you deploy, write a release plan
- After you deploy, keep monitoring records
Finally, I’ve tried to make this entire knowledgebase as agnostic to tools as possible, but Datamint (my company) has built its entire product around incorporating this recordkeeping as a side effect of doing normal research. Our goal is to make production of these artifacts and proof points as painless and invisible as possible. Have a look.
Key Takeaways
- IEC 62304 is the international standard for medical device software development. FDA expects compliance even though it’s not legally mandatory in the US.
- Software is classified A, B, or C based on risk. Most diagnostic AI is Class B or C, requiring comprehensive lifecycle documentation.
- The eight core processes are: planning, requirements, architectural design, detailed design, unit implementation/verification, integration, system testing, and release.
- Training data is treated as part of the software specification. Data selection, annotation, preprocessing, and training procedure must all be documented.
- Configuration management requires precise tracking of model versions, dataset versions, and code versions, with clear relationships between them.
- The Design History File (DHF) is the central regulatory artifact. It should be built contemporaneously during development, not reconstructed afterward.
- Tools exist to make this whole process much easier, more reliable, and lower effort. They may be worth investing in if you hope for FDA clearance at some point.
What to Read Next
- 6.10 Design History File and QMS Essentials: the documentation IEC 62304 requires you to produce
- 6.8 Predetermined Change Control Plans: how planned model changes interact with the software lifecycle
- 3.7 Tools and Infrastructure: the practical tooling for version control and experiment tracking
- 2.3 Data Annotation and Labeling for Clinical AI: annotation protocols are regulatory documents under IEC 62304
This article is part of the AI in Clinical Research Knowledge Base. For the full standard, consult IEC 62304:2015 or the 2023 amendment. FDA’s Software Validation guidance and Health Canada’s 2025 MLMD guidance both reference IEC 62304 extensively.