Performance Evaluation Methods for Improvements at Post-market of Artificial Intelligence/machine Learning-based Computer-aided Detection/diagnosis/triage in the United States

Mitsuru Yuba,Kiyotaka Iwasak

Abstract
Computer-aided detection (CADe), computer-aided diagnosis (CADx), and computer-aided simple triage (CAST), which incorporate artificial intelligence (AI) and machine learning (ML), are continually undergoing post-market improvement. Therefore, understanding the evaluation and approval process of improved products is important. This study intended to conduct a comprehensive survey of AI/ML-based CAD products approved by the U.S. Food and Drug Administration (FDA) that had been improved post-market to gain insights into the efficacy and safety required for market approval. A survey of the product code database published by the FDA identified eight products that were improved post-market. The methods used to evaluate the performance of improvements were analysed, and post-market improvements were approved with retrospective data. Reader study testing (RT) or software standalone testing (SA) procedures were conducted retrospectively. Six RT procedures were conducted because of modifications to the intended use. An average of 17.3 readers (minimum 14, maximum 24) participated, and the area under the curve (AUC) was considered the primary endpoint. The addition of study learning data that did not change the intended use and changes in the analysis algorithm were evaluated by SA. The average sensitivity, specificity, and AUC were 93% (minimum 91.1, maximum 97), 89.6% (minimum 85.9, maximum 96), and 0.96 (minimum 0.96, maximum 0.97), respectively. The average interval between applications was 348 days (minimum –18, maximum 975), which showed that the improvements were implemented within approximately one year. This is the first comprehensive study on AI/ML-based CAD products that have been improved post-market to elucidate evaluation points for post-market improvements. The findings will be informative for the industry and academia in developing and improving AI/ML-based CAD.

Introduction
Computer-aided detection (CADe), computer-aided diagnosis (CADx), and computer-aided simple triage (CAST) incorporating artificial intelligence (AI) and machine learning (ML) have attracted considerable attention for increasing diagnostic accuracy and efficient clinical practice [1,2]. However, because AI/ML-based CAD is a novel medical technology, regulatory authorities such as the U.S. Food and Drug Administration (FDA), the Ministry of Health, Labour, and Welfare and the Pharmaceuticals and Medical Device Agency (PMDA) in Japan, and the European Medicines Agency (EMA) in Europe are investigating appropriate regulation systems for AI features.

Materials and methods
Guidelines for performance evaluation
The guidelines on ‘Digital Health’ published by the FDA were obtained from the FDA website (accessed June 20, 2022). In total, 23 guidance documents issued since 2005 [4,8–13,15–30] were identified including the draft versions.

Data sources of AI/ML-based medical devices
AI/ML-based CAD data were obtained from the FDA product code database (Fig 1). As of June 1, 2022 (the date from which devices were selected), 6749 product codes have been listed. Using search keywords, such as AI, ML, and deep learning, 19 product codes were identified (eight for AI, ten for ML, and one for deep learning). Among the 19 product codes, seven duplicates were removed, and five others were excluded after screening (excluding codes that did not correspond to triage, notification, detection, or diagnosis). The final seven product codes encompassed 69 devices in total. Of these, four were granted De Novo clearance and 65 were granted 510(k) clearance (no pre-market approval). Finally, eight products with the same product name, but resubmitted with post-market improvements, were included.

Results
Furthermore, the FDA recommends the multiple-reader-multiple-case (MRMC) protocol in which data obtained from multiple patients are read by multiple readers. However, although conducting the study under the MRMC protocol is statistically credible, the FDA does not consider this to be the case in situations where it is difficult to have multiple readers for a single patient’s data; for example, when conducting a prospective study.

Discussion

This study elucidated that (1) post-market improvements are approved with retrospective data, (2) post-market improvements that do not change the intended use are approved by evaluating through SA, and (3) products are being developed for triage.

Conclusion
This is the first comprehensive study on AI/ML-based CAD products that have been improved post-market, to elucidate evaluation points for post-market improvements.

It was revealed that (1) post-market improvements are approved with retrospective data, (2) post-market improvements that do not change the intended use are approved by evaluation using SA, and (3) products are being developed for triage. Industry, regulatory bodies, and academia should continuously discuss the implementation of regulations that exploit the characteristics of AI/ML-based medical devices. These regulations should be developed on the premise of post-market improvement to reduce the burden on healthcare professionals and ensure patient safety.

The findings of this study will contribute to promoting post-market improvement with understanding regulations of AI/ML-based CAD that ensures the efficacy, safety, and quality of these products.

Citation: Yuba M, Iwasaki K (2023) Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States. PLOS Digit Health 2(3): e0000209. https://doi.org/10.1371/journal.pdig.0000209

Editor: Danilo Pani, University of Cagliari: Universita degli Studi Di Cagliari, ITALY

Received: September 23, 2022; Accepted: February 7, 2023; Published: March 8, 2023

Copyright: © 2023 Yuba, Iwasaki. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The authors declare that all the data included in this study are available within the paper.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Harvard Medical School - Leadership in Medicine Southeast Asia