Stanfordi uuringu kohaselt ei hinnata mõnda FDA poolt heaks kiidetud tehisintellekti meditsiiniseadet "piisavalt".

Allikasõlm: 808637

Liituge programmiga Transform 2021 12.–16. juulil. Registreeru for aasta AI sündmus.


Some AI-powered medical devices approved by the U.S. Food and Drug Administration (FDA) are vulnerable to data shifts and bias against underrepresented patients. That’s according to a Stanford õppima avaldatakse Nature Medicine last week, which found that even as AI becomes embedded in more medical devices — the FDA approved over 65 AI devices last year — the accuracy of these algorithms isn’t necessarily being rigorously studied.

Although the academic community has begun developing guidelines for AI clinical trials, there aren’t established practices for evaluating commercial algorithms. In the U.S., the FDA is responsible for approving AI-powered medical devices, and the agency regularly releases information on these devices including performance data.

The coauthors of the Stanford research created a database of FDA-approved medical AI devices and analyzed how each was tested before it gained approval. Almost all of the AI-powered devices — 126 out of 130 — approved by the FDA between January 2015 and December 2020 underwent only retrospective studies at their submission, according to the researchers. And none of the 54 approved high-risk devices were evaluated by prospective studies, meaning test data was collected before the devices were approved rather than concurrent with their deployment.

The coauthors argue that prospective studies are necessary, particularly for AI medical devices, because in-the-field usage can deviate from the intended use. For example, most computer-aided diagnostic devices are designed to be decision-support tools rather than primary diagnostic tools. A prospective study might reveal that clinicians are misusing a device for diagnosis, leading to outcomes that differ from what would be expected.

There’s evidence to suggest that these deviations can lead to errors. Tracking by the Pennsylvania Patient Safety Authority in Harrisburg found that from January 2016 to December 2017, EHR systems were responsible for 775 problems during laboratory testing in the state, with human-computer interactions responsible for 54.7% of events and the remaining 45.3% caused by a computer. Furthermore, a draft U.S. government report issued in 2018 found that clinicians not uncommonly miss alerts — some AI-informed — ranging from minor issues about drug interactions to those that pose considerable risks.

The Stanford researchers also found a lack of patient diversity in the tests conducted on FDA-approved devices. Among the 130 devices, 93 didn’t undergo a multisite assessment, while 4 were tested at only one site and 8 devices in only two sites. And the reports for 59 devices didn’t mention the sample size of the studies. Of the 71 device studies that had this information, the median size was 300, and just 17 device studies considered how the algorithm might perform on different patient groups.

Partly due to a reticence to release code, datasets, and techniques, much of the data used today to train AI algorithms for diagnosing diseases might perpetuate inequalities, previous studies have shown. A team of U.K. scientists avastatud et peaaegu kõik silmahaiguste andmestikud pärinevad Põhja-Ameerika, Euroopa ja Hiina patsientidelt, mis tähendab, et silmahaiguste diagnoosimise algoritmid ei tööta hästi alaesindatud riikidest pärit rassirühmade puhul. Teises õppima, researchers from the University of Toronto, the Vector Institute, and MIT showed that widely used chest X-ray datasets kodeerima racial, gender, and socioeconomic bias.

Beyond basic dataset challenges, models lacking sufficient peer review can encounter unforeseen roadblocks when deployed in the real world. Scientists at Harvard avastatud that algorithms trained to recognize and classify CT scans could become biased toward scan formats from certain CT machine manufacturers. Meanwhile, a Google-published lühiülevaade revealed challenges in implementing an eye disease-predicting system in Thailand hospitals, including issues with scan accuracy. And studies conducted by companies like Babüloni tervis, a well-funded telemedicine startup that claims to be able to triage a range of diseases from text messages, have been repeatedly called into question.

The coauthors of the Stanford study argue that information about the number of sites in an evaluation must be “consistently reported” in order for clinicians, researchers, and patients to make informed judgments about the reliability of a given AI medical device. Multisite evaluations are important for understanding algorithmic bias and reliability, they say, and can help in accounting for variations in equipment, technician standards, image storage formats, demographic makeup, and disease prevalence.

“Evaluating the performance of AI devices in multiple clinical sites is important for ensuring that the algorithms perform well across representative populations,” the coauthors wrote. “Encouraging prospective studies with comparison to standard of care reduces the risk of harmful overfitting and more accurately captures true clinical outcomes. Postmarket surveillance of AI devices is also needed for understanding and measurement of unintended outcomes and biases that are not detected in prospective, multicenter trial.”

VentureBeat

VentureBeati missioon on olla digitaalne linnaväljak tehniliste otsuste tegijatele, et saada teadmisi transformatiivse tehnoloogia ja tehingute kohta. Meie sait pakub olulist teavet andmetehnoloogiate ja strateegiate kohta, mis juhendab teid oma organisatsioonide juhtimisel. Kutsume teid meie kogukonna liikmeks, et pääseda juurde:

  • ajakohane teave teile huvipakkuvate teemade kohta
  • meie infolehed
  • väravaga mõttejuhi sisu ja soodushinnaga juurdepääs meie hinnatud üritustele, näiteks Muuda 2021. aastat: Lisateave
  • võrgufunktsioonid ja palju muud

Saage liikmeks

Source: https://venturebeat.com/2021/04/12/some-fda-approved-ai-medical-devices-are-not-adequately-evaluated-stanford-study-says/

Ajatempel:

Veel alates VentureBeat