The Shaming of Watson

IBM’s AI tool for healthcare is solving problems, and getting better every year. So why is everyone acting like it’s a failure?

--

From Hero to Has-Been in Just 4 Years

If you’re at all interested in technology and healthcare, by now you’ve probably heard about IBM Watson, the artificial intelligence technology that went from winning on Jeopardy in 2011 to being marketed to healthcare organizations for a variety of purposes.

Watson on Jeopardy (Photo: IBM)

One of the earliest implementations was at MD Anderson Cancer Center (MDA) in Houston, where Watson was to help oncologists solve a big problem: too much data. From a press release in October 2013 entitled “MD Anderson Taps IBM Watson to Power ‘Moon Shots’ Mission”:

MD Anderson has accumulated an unprecedented breadth and depth of clinical oncology data and knowledge… Watson’s cognitive capability has been shown to be a powerful tool to extract valuable insights from such complex data and MD Anderson’s Oncology Expert Advisor capability can generate a more comprehensive profile of each cancer patient… MD Anderson’s Oncology Expert Advisor can provide evidence-based treatment and management options that are personalized to that patient, to aid the physician’s treatment and care decisions.

Trending AI Articles:

1. Google will beat Apple at its own game with superior AI

2. The AI Job Wars: Episode I

3. Introducing Open Mined: Decentralised AI

4. AI & NLP Workshop

Pretty ambitious. Fast forward just 4 years to 2017, though, and the picture has changed:

So in only 4 years, MD Anderson went from christening the project to . . . shutting it down completely. That’s a shockingly short period of time to even get a project running, much less to be able to evaluate whether it’s working. Makes you think that Watson must have been a complete disaster!

Well, not so much. In fact, the program was closed down for contracting irregularities, according to an audit done by the University of Texas (the parent university of MD Anderson). Contracts were made without proper signatures and approval, money earmarked for the Watson program was spent elsewhere, and on and on.

The only thing that wasn’t a problem, according to that audit: Watson.

In fact, the auditors noted that “Medical oncology staff also told us that internal pilot testing of [Watson’s work with lung cancer treatment] achieved an accuracy of prediction near 90 percent, but advised that significant updating is needed before [Watson] can be tested further.”

The medical staff also told the auditors that the Watson was not in any way integrated with the hospital’s electronic medical record (EMR) system — not surprising, since one of the main characteristics of EMR systems nationwide is the difficulty in getting them to connect to other systems.

The point is that there is no indication that anyone on the medical staff at MD Anderson felt that Watson itself was a problem, or overhyped, or failing to perform up to expectations.

Meanwhile, Progress

As a counterpoint to the MD Anderson collaboration, one can look to IBM’s work with its more than 230 partnering healthcare organizations worldwide.

At Memorial Sloan Kettering Cancer Center (MSKCC) in New York City. At MSKCC, medical staff have been working with IBM since 2012, using the AI technology in a variety of ways, including

These systems are in wide use and have been found to be highly concordant with physician recommendations in studies in Korea, Thailand, Mexico, Arkansas, North Carolina, and elsewhere. UNC provides a particularly promising example:

In a study UNC conducted with 1,000 actual patient cases to compare Watson’s genomic analysis with the analysis of the center’s tumor board, the investigators found that Watson identified the same potential therapies as the tumor board 99% of the time. But what was more extraordinary, in about 300 patients, Watson found clinically actionable information that the tumor board had not identified.

For a variety of reasons, the systems recommending treatments are unlikely to achieve full concordance, particularly in international settings—the systems are trained on US data and US treatment protocols, for example, can different significantly from those in other countries—but the results are undeniably promising.

Still, the narrative has shifted from favorable to failure.

Online articles mentioning “IBM”, “Watson”, “Health”, and “Fail” (or “Failure”)

Watson is Bad

Leading the drumbeat of bad news on Watson has been STAT News, an online journal “about life sciences and the fast-moving business of making medicines”. In 2017 and 2018, they’ve published a series of unflattering articles about Watson, with the most damning (“IBM pitched its Watson supercomputer as a revolution in cancer care. It’s nowhere close”) coming out in September 2017.

Some of the criticisms strike me as frankly silly. For example, STAT notes that “the actual capabilities of Watson for Oncology are not well-understood by the public…”, but I’m not quite sure why the public would be expected to have any in-depth understanding of a oncology data system.

STAT also says that Watson “is still struggling with the basic step of learning about different forms of cancer,” which should surprise no one. Cancer AI isn’t like self-driving cars — where at some point the systems may be good enough that the AI won’t need further training, because the system will know everything it needs to know. In medicine, and particularly in oncology, we do not know — and do not expect to ever know — everything we need to know.

Like they say, it’s a journey, not a destination.

Even When It’s Good

But the most “underwhelming” aspect of Watson, per the STAT authors, was that it agreed with the doctors’ treatment ideas:

On a recent morning, the results for a 73-year-old lung cancer patient were underwhelming: Watson recommended a chemotherapy regimen the oncologists had already flagged… [One of the oncologists] said later that the background information Watson provided, including medical journal articles, was helpful, giving him more confidence that using a specific chemotherapy was a sound idea. But the system did not directly help him make that decision, nor did it tell him anything he didn’t already know.

So we’re supposed to be disappointed because a computer sitting on a desk provided a treatment recommendation for a particular patient, taking into account that patient’s history, labs, type of cancer, etc . . . and it was the same as the one picked by the medical specialist who had trained for more than a decade to do the same thing?

Yes, says STAT: “… showing that Watson agrees with the doctors proves only that it is competent in applying existing methods of care, not that it can improve them.” Ho-hum.

Don’t Believe the Hype

It seems to me that the one truly valid criticism of the Watson system is that IBM hyped it relentlessly (a process you might have cottoned on to once you noticed them hawking Watson on Jeopardy). Guilty as charged: IBM certainly has worked to build expectations — but looking past the hype there is a there there: Watson for Oncology is a widely-used system that most of its user-doctors seem to find useful, with no evidence at all of widespread opposition or objection in that same population of providers.

Does it need more refinement, and more data, and especially more clinical validation and more peer-reviewed reporting in the medical literature? Yes, yes, yes, and yes. But let’s not overlook the fact that even today Watson is an electronic system that can more often than not look at the patient data and give us the same treatment recommendations as a highly trained oncologist with years of experience, which is—make no mistake—a goddamn miracle of technology.

Further reading:

This story originally published at FutureHealth.

--

--

TED/Davos/keynote speaker, Lemelson/WSJ winner, Magpi co-founder, Ebola doc, Airbnb’er. AI, big data, digital health, global health.