مطالعه استنفورد می گوید که برخی از دستگاه های پزشکی با هوش مصنوعی مورد تایید FDA به اندازه کافی ارزیابی نشده اند

گره منبع: 808637

در 2021 تا 12 جولای امسال به Transform 16 بپیوندید. ثبت نام برایr رویداد هوش مصنوعی سال.


Some AI-powered medical devices approved by the U.S. Food and Drug Administration (FDA) are vulnerable to data shifts and bias against underrepresented patients. That’s according to a Stanford مطالعه منتشر شده در طبیعت پزشکی last week, which found that even as AI becomes embedded in more medical devices — the FDA approved over 65 AI devices last year — the accuracy of these algorithms isn’t necessarily being rigorously studied.

Although the academic community has begun developing guidelines for AI clinical trials, there aren’t established practices for evaluating commercial algorithms. In the U.S., the FDA is responsible for approving AI-powered medical devices, and the agency regularly releases information on these devices including performance data.

The coauthors of the Stanford research created a database of FDA-approved medical AI devices and analyzed how each was tested before it gained approval. Almost all of the AI-powered devices — 126 out of 130 — approved by the FDA between January 2015 and December 2020 underwent only retrospective studies at their submission, according to the researchers. And none of the 54 approved high-risk devices were evaluated by prospective studies, meaning test data was collected before the devices were approved rather than concurrent with their deployment.

The coauthors argue that prospective studies are necessary, particularly for AI medical devices, because in-the-field usage can deviate from the intended use. For example, most computer-aided diagnostic devices are designed to be decision-support tools rather than primary diagnostic tools. A prospective study might reveal that clinicians are misusing a device for diagnosis, leading to outcomes that differ from what would be expected.

There’s evidence to suggest that these deviations can lead to errors. Tracking by the Pennsylvania Patient Safety Authority in Harrisburg found that from January 2016 to December 2017, EHR systems were responsible for 775 problems during laboratory testing in the state, with human-computer interactions responsible for 54.7% of events and the remaining 45.3% caused by a computer. Furthermore, a draft U.S. government report issued in 2018 found that clinicians not uncommonly miss alerts — some AI-informed — ranging from minor issues about drug interactions to those that pose considerable risks.

The Stanford researchers also found a lack of patient diversity in the tests conducted on FDA-approved devices. Among the 130 devices, 93 didn’t undergo a multisite assessment, while 4 were tested at only one site and 8 devices in only two sites. And the reports for 59 devices didn’t mention the sample size of the studies. Of the 71 device studies that had this information, the median size was 300, and just 17 device studies considered how the algorithm might perform on different patient groups.

Partly due to a reticence to release code, datasets, and techniques, much of the data used today to train AI algorithms for diagnosing diseases might perpetuate inequalities, previous studies have shown. A team of U.K. scientists یافت تقریباً تمام مجموعه داده‌های بیماری‌های چشمی از بیماران در آمریکای شمالی، اروپا و چین می‌آیند، به این معنی که الگوریتم‌های تشخیص بیماری چشم مطمئن نیستند که برای گروه‌های نژادی از کشورهایی که کمتر حضور دارند به خوبی کار کنند. در دیگری مطالعه, researchers from the University of Toronto, the Vector Institute, and MIT showed that widely used chest X-ray datasets کدگذاری racial, gender, and socioeconomic bias.

Beyond basic dataset challenges, models lacking sufficient peer review can encounter unforeseen roadblocks when deployed in the real world. Scientists at Harvard یافت that algorithms trained to recognize and classify CT scans could become biased toward scan formats from certain CT machine manufacturers. Meanwhile, a Google-published whitepaper revealed challenges in implementing an eye disease-predicting system in Thailand hospitals, including issues with scan accuracy. And studies conducted by companies like سلامت بابل, a well-funded telemedicine startup that claims to be able to triage a range of diseases from text messages, have been repeatedly called into question.

The coauthors of the Stanford study argue that information about the number of sites in an evaluation must be “consistently reported” in order for clinicians, researchers, and patients to make informed judgments about the reliability of a given AI medical device. Multisite evaluations are important for understanding algorithmic bias and reliability, they say, and can help in accounting for variations in equipment, technician standards, image storage formats, demographic makeup, and disease prevalence.

“Evaluating the performance of AI devices in multiple clinical sites is important for ensuring that the algorithms perform well across representative populations,” the coauthors wrote. “Encouraging prospective studies with comparison to standard of care reduces the risk of harmful overfitting and more accurately captures true clinical outcomes. Postmarket surveillance of AI devices is also needed for understanding and measurement of unintended outcomes and biases that are not detected in prospective, multicenter trial.”

VentureBeat

ماموریت VentureBeat این است که یک میدان شهر دیجیتال برای تصمیم گیرندگان فنی برای کسب دانش در مورد فن آوری تحول و معامله باشد. سایت ما اطلاعات اساسی در مورد فن آوری ها و استراتژی های داده را برای راهنمایی شما هنگام هدایت سازمان های خود ارائه می دهد. ما از شما دعوت می کنیم که به عضوی از انجمن ما تبدیل شوید ، برای دسترسی به:

  • اطلاعات به روز در مورد موضوعات مورد علاقه شما
  • خبرنامه های ما
  • محتوای رهبر افکار دروازه ای و تخفیف دسترسی به رویدادهای باارزش ما ، مانند تحول 2021: بیشتر بدانید
  • ویژگی های شبکه و موارد دیگر

تبدیل شدن به یک عضو

Source: https://venturebeat.com/2021/04/12/some-fda-approved-ai-medical-devices-are-not-adequately-evaluated-stanford-study-says/

تمبر زمان:

بیشتر از VentureBeat