Data Maven: An Interview with Caroline Chung,MD

Abstract

Introduction

Caroline Chung, MD, is a distinguished clinician scientist, currently Vice President and Chief Data & Analytics Office and Director of Data Science Development and Implementation of the Institute for Data Science in Oncology at the MD Anderson Cancer Center, in Houston, Texas. She is also a professor in Radiation Oncology and Diagnostic Imaging with a computational imaging lab. Dr. Chung was educated in Canada at the University of British Columbia in Vancouver and spent eight years at the Princess Margaret Cancer Centre in Toronto before moving to Texas in 2016.

In this interview with EIC Doug Flora, which was recorded for the journal’s second annual summit, The State of AI in Precision Medicine,1 Dr. Chung discusses a number of key issues including the impact of artificial intelligence (AI), data security, mentorship, and more.

This interview has been lightly edited for length and clarity.

Douglas Flora: Caroline, how did you first get into the field of AI in oncology?

Caroline Chung: I went into radiation oncology because I’ve always enjoyed physics and math alongside medicine. As I went through residency, I started to think through the molecular horizon of oncology. How could we start to pursue true precision medicine? I started to think, if we can’t measure what’s happening with a tumor quantitatively, how are we going to get to precision medicine? It starts with precision measurement. This is where my pursuit around quantitative imaging began. I started this journey working in the preclinical space studying brain tumors. When you work with the preclinical MR [magnetic resonance], you basically need to sit alongside an imaging physicist and so I learned all about MR pulse sequence programming and how to analyze the images myself by coding in Matlab at the time. So that’s how I dove into the deep end.

I then worked to translate back some of the discoveries in terms of quantitative imaging biomarkers into the clinical space by integrating these biomarkers into the design of a phase 1 clinical trial. As I started to explore the use of quantitative imaging biomarkers in the clinical space, I realized just how heterogeneous the data was, even when looking at just the imaging data. Then you start looking at all the other data in the [electronic health record] and in the healthcare system, and you realize, there’s a lot of work to be done here if we’re really going to leverage the technology that is emerging on the horizon. And here we are. Everyone’s very enthusiastic about AI, and we have not necessarily tackled all the data problems.

Flora: Many of us look to you and your team’s leadership around data, and here you are, Chief Data & Analytics Officer for the World’s leading cancer center. Where do you start in terms of data governance and data security in Houston?

Chung: Ironically, the data conversation does not necessarily start with the data. It starts with the people because the people are the ones generating the data—directly and indirectly. We started by acknowledging that everybody across the organization is a data steward. For example, whether you’re the person greeting and entering the person’s name and confirming their address and postal code or providing care for the patient or helping take a medical image, we reminded them: you are contributing to cancer research and regulatory activities and continued improvement in care because we use those demographics in our research, in our tumor registries reports and in our operations.

Building out this organizational culture that allows everyone to take pride in being a data steward across the board starts to raise the awareness and importance of protecting data like protected health information and ensuring high data quality. We have started to have people from across the organization asking us clarifying questions so that we can all collectively serve as good data stewards. I think that conversation is a huge step in itself.

The second important piece is that with everyone being a data steward, it provides a level of comfort and trust that our patients can feel. Everyone who is going to be touching my data along the way has my best interest in mind.

Flora: MD Anderson has worked with a number of partners requiring careful supervision around data access and contracts with outside partners. What have you learned that might help our readers as we build our own partnerships moving forward?

Chung: We’re starting to orient companies how to better engage with healthcare organizations in this new era, and it’s an evolving journey for that relationship. Each relationship is going to be unique. At the same time, we’re no longer in an era where we are obliged to “throw data over the fence.” Now that we have the technology to equip access to data with better controls, we want to provide access for the intended use of that data and not anything more—again to serve as strong data stewards.

Flora: What have you rolled out in terms of AI assisting or augmenting the radiologists in your organization to help detect cancers?

Chung: We’ve been looking to adopt tools that show promise in helping us and can bring value. Although there have been a growing number of US Food and Drug Administration (FDA)-approved algorithms, but just because an AI model is FDA approved does not necessarily mean that tool works in your institution with your data.

I’m a car fan so let me use a car analogy. If you took a Ferrari, you need to put pristine gas into that car or it’s not going to run very well. In contrast, other cars may run on lower quality fuel without issue. What kind of fuel do you have? What kind of quality images do you have? How consistent are your images? Are you getting all sorts of images with different quality from different institutions? And you’re planning to be running this tool with all of that kind of data? All these considerations need to be put in place as you evaluate the kind of model you may consider wanting to implement in your institution. It’s not ultimately one tool that rules them all. It may be a collection of tools that complement each other. All of those pieces need to be put in place as you evaluate whether you’re going to move forward. And these are just the AI models that are already commercially available, let alone models people may be developing themselves.

The second piece that needs to be considered is, even if it worked well on day one and you’re getting 99.9% accuracy, it may not stay that way long term. So you have to have mechanisms to monitor the performance of these models long term. We have a director whose sole responsibility is model implementation and lifecycle management. That includes engaging with the team, to say, what is that readiness?

Having a human in the loop is a great concept. But having a well-trained human in the loop is critical to achieving the ultimate goal of responsible oversight. Thinking through all of those pieces of implementation are the critical yet challenging aspects, the technical deployment of a model is not necessarily a hard task. But if you plug it in and it spews out results, how will you know whether the results are reliable or useful? For this reason, a true implementation process requires a lot more due diligence.

Flora: Let’s discuss a concept you just mentioned, monitoring model performance after implementation?

Chung: Whether the models may evolve over time or not, our data will likely change over time. For example, if we’re talking about imaging data, you may get a scanner upgrade or a software upgrade that results in the images being processed in a different way. And so feeding different data into the same model may start to produce different kinds of results, resulting in inferior model performance compared to when the model was initially evaluated. Now that’s considering a model that’s not changing. There are models that will continue to evolve over time because of the interactions, and that is a much bigger challenge. We have not solved how we’re going to manage this or who takes responsibility around the performance of the model over time. But by continually monitoring and practicing verification, validation and uncertainty quantification, we can detect concerning changes over time and respond.

Verification is making sure the model is doing what you think it’s doing just from a software perspective, validation is making sure that the outputs are making sense, and that the precision is still there, and then the uncertainty quantification is measuring the uncertainty in the quantity of interest being produced by the model. While many of the models present results in binary way—for instance “Is or isn’t there cancer present?” But there’s a probability behind that, and perhaps by presenting the uncertainty around the model output, we can help inform decisions better. If the error bars are big versus small, your confidence in the model output could be very different and this would affect your decision making.

We make these kinds of decisions based on weather reports regularly. If a hurricane is projected to be coming your way, do you hunker down or go? It may depend on the projected likelihood of it affecting your particular neighborhood and the expected severity. I learned all about this when I moved to Houston! When it comes to daily weather reports, the optimist will probably not carry the umbrella even with a high chance of rain versus the person who wants to be ultra prepared will have an umbrella with them with a very low chance of rain.

Flora: Let’s talk about the possibilities of bias in these data sets and what that can mean for the things that you say are pushed out.

Chung: The efforts being made today are scratching at the surface of the potential biases in these data sets. For instance, addressing the demographic representation within a population of patients in an AI model training set is one aspect of addressing bias but does not address many other aspects of bias in the data and resulting models.

Because beyond the race and ethnicity of the patient, we need to consider where are they geolocated? What kinds of technology do they have available to them? Do they have digital access to care? Are there differences in medical practice where they received care? Because medical practice is not uniform across all clinics, all centers. Building models utilizing the available data without considering these biases runs the risk of reinforcing the recommendation that reflects biases in care that were captured in the data, a self-perpetuating cycle, rather than recommending the treatment that may result in the best outcome for that patient.

This is where the interdisciplinary nature is so critical. Data scientists can’t just be thrown over the data and be asked to build a model. They have to start to work hand in hand with the clinicians who are raising the question because these are the nuances that they need to appreciate, and they can’t just take everything at face value. But the current default is that a data scientists receive data and are asked to build a model so the data scientists take all the data at face value. You’re going to find associations but those associations may not make sense clinically and many associations may have embedded biases unless you have a collective conversation around the data, the clinical environment and clinical question at hand. We need to work together to ask questions like: How do we address this problem of interest and how we can start to build in the right pieces to elevate the care for people who may not have been receiving that care? How do we make sure that people do get opportunities and access to things that were not necessarily in the data that was being fed into the algorithm?

Flora: How do we foster those collaborations? You’re in a place with 22,000 cancer fighters. 80% of the patients probably are going to be cared for in a community setting like mine. Where do you start with that? And who should be at that table?

Chung: I would say first, patients. I think patients will provide the most unique perspectives around their journey and their experience. And I’m sure even all of us as doctors when we’re sitting on the other side of the table as patients, we also have our unique journeys, despite the fact that we’re well connected. If I’m having this this much difficulty as a medical expert who has people I can call and ask for help. How does the average patient navigate through the system? So thinking that through and making sure that the patient is at the table.

Beyond patients, I would certainly say clinicians need to be at the table along with all people who are generating data from different layers within the organization. And this is how we at MD Anderson started to build up a collective group during the pandemic to build our data management system, called Context Engine, because we recognized the just how important and useful context is—both the context of data generation like, where was it generated? And why? And how? as well as the context of data use and making sure that those two things are matched up. And we have people from all different disciplines—clinical, research, data, IT—at one table foster the valuable cross conversation.

That’s also the motivation of our Institute for Data Science in Oncology—to drive to maximal impact in cancer leveraging data science. Obviously, this means pursuing the blue skies goals of how we can apply data science to oncology, but also the very practical pieces of translating tools into operations and workflows. There are many articles that are being published around the art of the possible, but what fraction of those can we translate into clinic? And can we be more strategic of what we’re pursuing in the blue skies so that we can translate faster and really learn from real world impact as we continue to push forward and upward.

Flora: Industry is a couple of years ahead of us and they’re moving more quickly to keep apace of these technologies than some of our larger academic institutions, because there’s less bureaucracy there. And frankly, there’s more money to play. I want to talk about the community around you and me. I’m sure you’re approached by industry partners looking to build relationships that can help patients. How do you evaluate that process in your own research and your own center?

Chung: One of the first things we start with is, are we aligned in our goals in our engagement and with how we want to approach our data and our data governance? Going back to the data governance question, if the idea is that the interaction is that they’re wanting to buy a bunch of our data, we do not sell our data so that is certainly misaligned. In contrast, if there a specific shared goal in mind, we can work collaboratively toward this goal, leveraging the data, resources and talents collectively in a way that works for both of us.

That’s one of the bigger pieces that we really think about as we embark on the relationship. It has certainly been a learning journey—how to mature that new wave of data collaborations to achieve the greatest impact while respecting the governance needs. We can move past some of the traditional mechanisms of “sharing data” by just pulling all the data into one giant repository where everything is de-identified. With this traditional approach, you lose a lot of the context because you've needed to do that for the de-identification process, and while these large repositories can be useful for some kinds of data science research and development, the depth of some of the questions that we want to ask as clinicians and scientists cannot be answered with these kinds of data sets. We have to think of new ways for us to collaborate around data with context to allow us to ask deeper and more complex questions. For instance, there are many drugs that have been around for a long time and some may say you can only do so much with these drugs. But is that really true? Can we deliver it in a different ways where the outcomes are very different for specific patients—improving response while minimizing toxicity? These are the kinds of questions we can start to interrogate if we had more context around our data.

Flora: I’m interested in the applications of these things to minimize burdens on physicians. You are in a field that is high tech. We’re starting to see the advent of some of these tools helping you as a radiation oncologist as well, not just a data scientist. As we start to talk about augmenting you with things like adaptive radiotherapy, where are you seeing this today? And maybe two years from now, as a radiation oncologist who’s got to spend 2–3 days on a treatment plan or longer?

Chung: There are a number of different companies as well as academic sites that have built out automated contouring tools to assist radiation oncologists, automated treatment planning systems. Laurence Court at MD Anderson has developed what we call the Rapid Planning Assistant to help the Third World countries that often lack the experts needed to leverage the full power of the devices they have managed to acquire at their clinics. The Radiation Planning Assistant allows these clinics to upload the CT simulation images of the patient and auto segmentation is performed and a radiation treatment plan is generated for review by the local physician. This can fill the gaps in people on the ground such as dosimetrists who may not be available to generate these plans. In this way, such technology can help us start to bridge the gap in terms of what kinds of care people can get in all parts of the world.

In terms of adaptive radiotherapy goes back to the question of what impact does this have and what data do we going to generate to support the value? One of the things that we can do is evaluate the need and the potential technology that can leverage a certain kind of data to motivate generation of the relevant data. This may affect what data and how we’re generating that data. In this regards, it is still unclear how AI scribed notes will affect further development of the technology because if the model continues to generate the data and you keep feeding model-generated data into the model, this could potentially collapse the model. These are the kinds of things that we need to anticipate, consider and adapt moving forward. And perhaps, can we pretest so we can know how best to generate data, to allow us to leverage this technology better and evolve it further?

In terms of personalization. I think that there’s a lot we can do in terms of imaging pathology correlation. We still do not have a clear understanding of what is actually occurring biologically when we see radiological patterns and changes. As oncologists in general, we’ve been following this dogma that where the contrast stops is the margin of the tumor, yet we know from multimodal imaging that this is not true. We have to move past this. Can we leverage imaging in a more meaningful way? Can we start to interrogate it deepers leveraging what we have with all the multimodal data? Can we almost get that in vivo biopsy in the imaging data?

As a radiation oncologist, knowing where to target biologically active tumor is key. We need to know exactly where we’re going to target the radiation because the technology has brought us so far that we can very precisely deliver radiation. Today, it’s possible to very precisely miss the target if we do not account for the imaging interpretation uncertainty.

We’re now at that turnkey time point. If we can define the biological target correctly, we can drive dramatic impact. This is where we need to invest and consider what kind of data and how are we looking at that data moving forward.

Flora: Where would you start if you’re a community oncologist or as a regular doctor who doesn’t live in data all day long?

Chung: This is the reason why American Society of Clinical Oncology officially kicked off an AI communities of practice in 2024 that I have the pleasure if co-chairing with Ravi Parikh. There’s been a growing community of people from a range of backgrounds, and one of the topics that really kept coming forward is just where to start when it comes to AI. And we’re in the process of creating webinars while leveraging the AI Community of Practice to inform people and convene the dialogue.

All are welcome to join the community and take part and contribute. One of the things that we have heard as a demand is, where’s a good place where I can start reading. I don’t want to just go wander the internet and try to take anything, but you know what are the good courses that I could potentially take in the small pockets of time what are the good resources? And so we’re collecting all of that up and growing the community to allow for collective learning. Getting connected and starting to get informed is a great first step.

As a new emerging technology comes about, some people may be a bit scared, others may feel excited but you also need to really appreciate the limitations so you can approach it with a balanced view and you can set up expectations where you will likely experience success. A lot of the challenges that centers have seen is they’re asking too much of the technology or not necessarily asking the right kinds of questions. What are the capabilities of the technology today? I think in the community setting, you want to say, what tools are out there that I could leverage today. And consider what are the capabilities as well as the limitations of those tools so that you make sure that you put the right safeguards in place.

Flora: If you had unlimited resources right now in 2024 to invest in AI for oncology, where would you spend your money?

Chung: Not a hard question! I would invest it in the data itself. Working to manage and generate more effectively is such a key investment. Current data generation ad management is incredibly heterogeneous, and we need to start to work toward building a stronger foundation.

If we do that well, we will accelerate the ability to leverage the technology that’s emerging leaps and bounds. And this is not only in medicine, it’s true across the board. Trying to curate data at the Nth mile is certainly not the best place to start. We had to start at this place to pragmatically utilize what data we already had. But knowing what we know now, what should we do differently moving forward? What should we invest in to make the next five years different? Taking a first mile rather than nth mile strategy is a focus at MD Anderson—thinking through and improving the way that data is generated, how it flows and how it is managed and governed.

Technology will continue to change and evolve so data will not remain completely uniform over time. So how do you build transparency around what has changed and how? By establishing standard approaches to how data and metadata, the descriptors around the data, is generated and collected so that we can better understand and can cross calibrate over different practices today but also over time. The world doesn’t work with one currency, but it works because we know how to convert across those currencies. The world doesn’t work on one electrical system. That’s why we have adapters. But why are we able to use adapters? It’s because it is standardized… Similarly, we need to start to think about, how can we start to standardize data management in enough that we can translate across the different settings and consider the context when data is used.

Flora: Where are you most excited to see these technologies applied in terms of improving patient care? Can we make patient care more human using these tools, as ironic as that sounds?

Chung: There are many different ways that it can touch the patient. One is around convenience and ease of access. Bring capabilities to navigate and schedule things more conveniently and efficiently, and reduce the time spent on paperwork, which is an unfortunate burden. This would allow patients and caregivers to clear their minds just to focus on their clinical care and getting better.

A second key area of opportunity is to bring the relevant, important, and accurate information in front of the physician and the patient together in that room to have the conversation that allows the clinical decisions to be made more effectively.

Tackling these major areas will allow the focus of effort to be in the human-to-human connection, which embodies the heart and art of medicine.

Flora: That’s awesome. You’ve just described the Holy Grail!

Footnotes

1

The State of AI in Precision Oncology virtual summit; December 12, 2024.