Can ChatGPT Support Biomedical Research? with Catherine Gao, MD and Yuan Luo, PhD

Northwestern scientists Yuan Luo, PhD, and Catherine Gao, MD, discuss a study they conducted using the artificial intelligence chatbot, ChatGPT. The results showcase the online tool's ability to produce convincing medical research abstracts. They also discuss the tool’s potential to help with writing-intensive tasks in healthcare and medical research.

“Like any kind of technology, it can be used in a positive way or in a negative way. I was just really impressed with how articulate the ChatGPT was and how smooth the text was. I think there's great potential for technology like this to help relieve some of the burden, that burden of writing that scientists have to really kind of enable them to focus more on the science.” — Catherine Gao, MD

Instructor of Medicine in the division of Pulmonary and Critical Care

“ChatGPT, it's already born. It cannot be ‘unborn,’ right? So banning the technology won't work. And we'll all need to learn to use it and evolve with it.” — Yuan Luo, PhD

Director of the Center for Collaborative AI in Healthcare
Associate Professor of Preventive Medicine (Health and Biomedical Informatics), McCormick School of Engineering and Pediatrics

Episode Notes

Intrigued by seeing examples of ChatGPT-generated text on social media, ranging from poems and sonnets to faulty scientific abstracts, Gao decided to explore the online tool's ability to produce convincing medical research abstracts in a formal, quantifiable way.
The study, which was published in a preprint on bioRxiv, looked at how well ChatGPT can generate medical research abstracts and evaluate how the AI-written documents compared to abstracts written by humans.
The study found that ChatGPT's-generated text did not raise any plagiarism alarms but could be distinguished from text written by humans when run through A.I. detector outputs online. The generated text has certain signatures that can be detected by AI detector outputs.
Also, as part of the study, members of biomedical sciences labs were recruited as blinded human reviewers and given a mix of real and falsely generated abstracts to analyze.
They could only spot ChatGPT-generated abstracts 68% of the time and also incorrectly identified 14% of real abstracts as being AI-generated, Gao said.
The results can be seen as concerning because although ChatGPT can generate text that is fluent and very convincing, it may not always be accurate or true. Conversely, Gao says it shows how technology like this has great potential to help relieve some of the burden of writing scientists face. With a tool such as this, they could spend less time writing and more time engaged in their research. They could also publish negative results of research projects, adding knowledge to their field.
Luo believes that banning ChatGPT is not a solution because the technology is already in use, and people should learn to use it and evolve with it.
He gives the analogy of automobiles and how generations ago some people may have been scared to drive but learned to use them safely. Laws were put in place along with other measures to improve automobile safety despite the risk that comes with driving. Similarly, Luo hopes safety measures can be put in place for ChatGPT-type technologies so they can be used as productive tools in research, business and daily life.

Additional Readings

Browse the latest research from Feinberg’s Institute for Artificial Intelligence in Medicine
Read an editorial published in Cureus Journal of Medical Science “Artificial Hallucinations in ChatGPT: Implications in Scientific Writing”
The World Association of Medical Editors’ recommendations on ChatGPT and Chatbots in relation to scholarly publications

Recorded on February 10, 2023.

Read the Full Transcript

Erin Spain, MS [00:00:10] This is Breakthroughs. A podcast from Northwestern University Feinberg School of Medicine. I'm Erin Spain, host of the show. A new Northwestern Medicine study is making some headline news because it shows how well the new online tool ChatGPT can write a medical research abstract. Here with details and thoughts on the future of such tools in medicine are study authors, Dr. Cathy Gao, an instructor of medicine and the division of Pulmonary and Critical Care Medicine at Feinberg, and Dr. Yuan Luo, director of Center for Collaborative AI in Healthcare in the Institute for Augmented Intelligence and Medicine at Feinberg. Welcome.

Catherine Gao, MD [00:00:56] Thank you so much for having us.

Yuan Luo, PhD [00:00:58] Thank you for having us.

Erin Spain, MS [00:00:59] So many of us have heard about ChatGPT recently, there's been numerous articles about news stories all over social media. Could both of you tell me in your own words, what is ChatGPT? Dr. Gao, do you want to start?

Catherine Gao, MD [00:01:14] So ChatGPT is what we call a large language model. So basically it's a very specialized model, sort of in the natural language processing realm that's trained on a huge, huge amounts of text to be able to sort of predict the best next token in a series of words. And so with this power, it can really generate text that is very fluent and very convincing in a wide range of topics.

Yuan Luo, PhD [00:01:39] I think one of the really remarkable capabilities of ChatGPT is that given the sentence prompt, you can write a really fluent paragraph or several paragraphs in response, and these responses are quite on topic and a lot of the times ChatGPT can write with very much the confidence of a well-respected researcher in certain topics. And whether it's correct or not is another thing, though.

Erin Spain, MS [00:02:10] Tell me about that. Technology like this has existed in different shapes and forms in recent years, but this is different and there are some exciting aspects about this and some concerning aspects.

Yuan Luo, PhD [00:02:22] One of the critiques for ChatGPT is that although it sounds really confident, a lot of the content can be inaccurate or sometimes even false. I have many colleagues who ask the research topic and ChatGPT seems to be claiming someone wrote this paper or that paper that is completely non-existing, and so this kind of made-up sayings really hurt the credibility of ChatGPT, but people are using it, and they are using it to a greater extent in their lives. You can see how fast it grows in five days into a million users, right? So I think this really can be problematic.

Erin Spain, MS [00:03:03] Dr. Gao, you have an interest in AI-assisted medicine and you wanted to explore ChatGPT a little further as soon as it came out.

Catherine Gao, MD [00:03:14] You know, I think one thing that's really unique about ChatGPT that sort of sets it apart from previous generations of large language models is just the fact that Open AI made it open and free for anyone to use. So it's really gone mainstream in a way that these previous models haven't. And so I was just intrigued starting to see on Twitter, you know, people are using it. Hey, you write an essay about this, right? You know, write a sonnet about this. And I wanted to explore it to see how it could write my area of research, how it could write scientific abstracts. There is, you know, a lot of interest online about what the generated text would look like. And there were a lot of people posting examples here and there. So what we really aim to do is to explore it in a more formal, quantifiable way.

Erin Spain, MS [00:03:57] Describe the study to me. What was it and what were the results?

Catherine Gao, MD [00:04:01] As a disclaimer, this is still a preprint study and it's under peer review right now. We looked at a couple of different things. So as our control section, we took scientific abstracts from high impact factor journals and then we used the title and the journal as a prompt to ChatGPT to have it generate its own version of that abstract. And then we compared to those two groups. And so because ChatGPT is generating this tech sort of de novo when fed into plagiarism detectors, so kind of your typical, you know, we use "Plagiarism.Detector.Net" but you know "Turn It In" or something like that might be similar. It actually doesn't alarm these at all. The majority of the abstracts were felt to be 100% original based on these because it is really writing it, you know, from scratch. However, generated text does have certain signatures that can set it apart from text that a human writes. And so there exist these A.I. detector outputs online that you can use to see if a text sort of has some of these features. So when we fed in that the original abstracts, they all scored really, really low, they felt they looked very human like text. Whereas when we put in the generated abstracts, they all scored very high. They had a lot of features that looked like they were written by an A.I. And then finally we were really impressed by how good these abstracts were, and we wondered if a person might be able to tell the difference. And so our study team members, who are all members of our biomedical sciences labs, so people who are used to reading science and, you know, engaging and writing. Only four people. And we gave them a mixture of real abstracts and the generated abstracts and asked them if, you know, they could tell which one was which. And they actually said it was surprisingly hard at times, but they were able to pick out about two thirds of the generated abstracts as being as being fake. But they were really suspicious. You know, they really went in there like, I'm going to find every single generated abstract to the point where they even misclassified 14% of real abstracts as being generated. So they weren't super good at telling them apart, which is really kind of surprising to us.

Erin Spain, MS [00:06:06] So are there hallmarks that we could train ourselves to look for that these online tools can tell the difference, or are we not able to detect those with our human brains?

Yuan Luo, PhD [00:06:17] So I think this might be difficult because I think from Cathy's study, we found out that the general abstract looks so much like human-written to an extent that humans themselves have a hard time to differentiate it. But machines can. Exactly what patterns is the machine using? This could be quite interesting and complex to find out. Even if we have found out that pattern, whether humans can use those patterns to to detect, whether it's machine generated, it can still be hard. Let me put it this way. So if if I'm asking a radiologist to tell from the chest x-ray whether this patient has certain disease, he, he or she might be able to do so and with confidence. But if I'm asking him or her to test whether this patient might have this or that kind of genetic mutation from the chest x-ray, that could be a really difficult task. But to a certain extent, I think a machine can still do that. And so I think this illustrates this level of difficulty when it compares to detect the machine generated text using machine as tools versus relying all by humans themselves. And I think this may imply some of the philosophical questions to when the machine generated text is so much like humans that humans themselves cannot differentiate it, whether do we still want to use a machine to to detect? And that what does that mean?

Erin Spain, MS [00:07:58] Is this a question that you want to investigate further in your center?

Yuan Luo, PhD [00:08:02] I think it's a quite interesting question. And on the one hand, we could get into an arms race about the A.I. tools that generate text and A.I. tools that detect whether it's human generated or machine generated. On the other hand, I think we really wanted to maybe take a back step and think about what do we want to use those tools for, right? So nowadays I think some of the advertisements and news articles are already written by natural language processing tools. Do people really care about who wrote them as long as they've got the information. Right? So I think we really need to think about what we want to use those tools for and perhaps for different use cases. If we wanted to impose different levels of authenticity for the for the text or the content that's generated.

Erin Spain, MS [00:08:57] There also becomes this question that's been out there about authorship. Some journals have said they won't accept anything that has ChatGPT as an author, what do you think?

Yuan Luo, PhD [00:09:06] Well, so on the one hand, I do sympathize with a lot of the journals on their rationale of why ChatGPT shouldn't be an author. For most cases, I think we want to use that as a tool. We could potentially modify the text or use ChatGPT to augment our text, but then that gets into the question of to what degree can we use it and not to violate the integrity and authenticity of the text that's generated. And so I think this kind of threshold might be hard to set because on the one hand we need to assume people are innocent until proven guilty in terms of violating those rules. Right. But on the other hand, do we know the exact threshold that we wanted to claim that someone has used the ChatGPT that way that's violate the authenticity and integrity. So I think it can be hard and especially people tweak ChatGPT's text. And so I think we need to do further research in order to better understand and also try to gauge people's perspective from from different disciplines in addition to the science discipline there are a lot of other disciplines that involves variety in their daily activities, right? So, for business, for politics and for laws, right? So we need to include their perspectives as well.

Erin Spain, MS [00:10:37] Dr. Gao, you have some ideas for ways this could be used for good, especially when it comes to clinical care, or to help make physicians jobs a little bit easier. Tell me about some of the ideas you've been thinking of.

Catherine Gao, MD [00:10:50] Like any kind of technology, it can be used in a positive way or in a negative way. I was just really impressed with how articulate the ChatGPT was and how smooth the text was. I think there's great potential for technology like this to help relieve some of the burden, that burden of writing that scientists have to really kind of enable them to focus more on the science. You know, an interesting idea. I was talking with one of the post-docs in our lab at Thomas Stoger, about is that a lot of the times people don't often write up negative results because it takes a lot of effort to write it. But if writing becomes a lot easier using technology like this, maybe science will become more balanced for example. Other things that it could be potentially helpful for is sort of improving equity, especially among scientists who, you know, are publishing in a language that might not be their native language. That's kind of a step that's holding them back in dissemination of sort of their brilliant scientific ideas. Could using technology like this be helpful and really speed up science?

Erin Spain, MS [00:11:52] I want to talk about how this is being used in education as well. Many universities now already have some guidelines or rules and syllabus for certain classes about how to use ChatGPT or how not to use it. And at your center, the Center for Collaborative AI in Health Care at Northwestern, there's also an educational component here for using ChatGPT. Just tell me about that and how you think it can be used in education for the good.

Yuan Luo, PhD [00:12:18] I think for the education purpose, well, at least for me, I think if we can use ChatGPT to enhance people's writing and so that to enable them to write as fast as they think, I think that would be good. And so in a sense that when they write, they don't need to worry about the choice of words or choice of sentences or grammars or styles. They can just let their mind flow and then let their ideas come through, and then the rest may be taken care of by ChatGPT, at least for a draft. And then they can refine on that. And so that might be something that would be attractive in terms of education purposes and in cases like documenting clinical documentation, that could also be quite helpful as well, as Cathy pointed out.

Erin Spain, MS [00:13:09] You know, the collaboration between the two of you, a clinician who's also a physician scientist, you are leading a center that's rooted in AI here at Northwestern. Tell me about that intentional collaboration and how we might see more of this and more studies to come.

Yuan Luo, PhD [00:13:27] I think the idea to start the Collaborative AI in Health Care really stems from this observation that sometimes clinicians and AI scientists, they might not work well as we expect or hoped for. So you hear AI proponents sometimes saying that AI is going to replace radiologists, AI is going to replace pathologists. Of course none of that is true. For the pandemic, I think it also taught people a lesson that some of the hypes about AI and health care, it doesn't realize. And that has caused the suspicions and sometimes cynicism from clinicians towards AI science as well. In order for these two communities to really work together with each other, they really need to start the work from the beginning and then invite each other's perspectives so that they can add a dimension to their own and then have this continued collaboration and in which we call we want to create a fertile ground so that we can polinate, that cross-pollinate that way. So imagine that physicians and AI scientists during their early career, an analogy to when children were young, they were brought up together and they have this natural bond and this natural bond matures and and strengthens as they advance their career as they grow up. Then this kind of bond, this kind of rapport really would last long and and be very beneficial for the kind of working AI in health care. And this is the motivation of why we were creating this collaborative AI in health care initiative.

Catherine Gao, MD [00:15:09] I think that was really well said, and I think we share sort of that dream of using these technologies to really sort of augment clinician capabilities. ChatGPT is just one example, but because everyone is using it, I think it's bringing a lot of new people to the conversation that might not have been as involved earlier. So I think it's really great that it's gotten so many new people interested that we can sort of engage with and collaborate with going forwards.

Erin Spain, MS [00:15:35] And this is just the beginning of this version of ChatGPT that's out is still considered sort of a test version, isn't that right?

Catherine Gao, MD [00:15:43] Yeah. So, I'm really impressed to see that how good it is already. And I think it's only going to get better. And I think there's incredible opportunities across health care as these large language models continue to get better to, you know, be applied in the clinical setting.

Erin Spain, MS [00:15:58] What would you like to say, both of you, to some folks out there and the media, the study has even been quoted and talked about in kind of a negative way, like this is scary. This is concerning. Those are the types of words that have been used quite a bit. What would you like to say to address some of those concerns?

Yuan Luo, PhD [00:16:14] One perspective is that so for something as terrible as the SARS-CoV-2 virus, we could not even ban it. And so for ChatGPT, it's already born. It can now be unborn, right? So banning the technology, it won't work. And we'll all need to learn to use it and evolve with it. And I would also use the analogy of where the automobile was invented. And so just ask our grandparents or great grandparents. Are they afraid of driving it? Of course, there is some risk associated with driving an automobile. But as we get used to this new tool, we learned to put our speed limit. We invented seat safety belt. So I think we probably can take a similar attitude with ChatGPT and then invent the safety belt for ChatGPT and then put this put on the speed limit for ChatGPT and then let it serve our needs and perhaps be a collaborator in our research and in our daily life and in our business.

Catherine Gao, MD [00:17:27] I think that was so well said and a great example. It's just such an interesting time to see this new technology and see how different people are using it and exploring it as well as sort of engaging in this discussion across the academic and scientific community about, you know, what sort of the boundaries are, what are the optimal uses going forward. It's a rapidly evolving discussion that's just really interesting and exciting to be a part of.

Erin Spain, MS [00:17:51] Why do you think Northwestern is in a great position to take the lead on some of these studies and really use AI and ChatGPT and test those limits?

Yuan Luo, PhD [00:18:02] I think Northwestern is really a unique, unique position because we have very tight collaborations by faculties from the medical school and engineering school and also the art and science school. And I also talk about the Feinberg School of Medicine in particular. We have cohorts of faculty that are from the AI backgrounds, from the engineering backgrounds, from basic science backgrounds, and also, more importantly, we have a lot of the physicians that are interested in these techniques, and they come to the faculty of Engineering and Science all the time and then wanted to use those tools to change how they practice medicine and how they could improve the patient care. And I think this is really a powerhouse for this kind of development in terms of leveraging on the emerging techniques in order to inform basic science and accelerate translation. So this is, from my view, what's unique about Northwestern. And on the other hand, I think Northwestern is also a welcoming place for people from those different kinds of backgrounds. When I joined Northwestern, I got collaboration invitations extended from people from all different kinds of backgrounds, and that that really impressed me. And I think that trend continues to be that unique feature for for the university and for the medical school.

Catherine Gao, MD [00:19:34] I can to say as a pulmonary critical care fellow here and it's just been an amazing place to train and learn and, you know, meet all different people really at the forefront and sort of cutting edge of new technologies. So it's just been a fantastic place and I'm really excited to see future developments going forward.

Erin Spain, MS [00:19:53] Thank you both so much for coming on the show and explaining the study and all the exciting things happening at Northwestern and the area of AI and medicine. Thanks for listening and be sure to subscribe to this show on Apple Podcasts or wherever you listen to podcasts and rate and reviews. Also, for medical professionals, this episode of Breakthroughs is available for CME Credit. Go to our website Feinberg.Northwestern.edu and search CME.

Continuing Medical Education Credit

Physicians who listen to this podcast may claim continuing medical education credit after listening to an episode of this program.

Target Audience

Academic/Research, Multiple specialties

Learning Objectives

At the conclusion of this activity, participants will be able to:

Identify the research interests and initiatives of Feinberg faculty.
Discuss new updates in clinical and translational research.

Accreditation Statement

The Northwestern University Feinberg School of Medicine is accredited by the Accreditation Council for Continuing Medical Education (ACCME) to provide continuing medical education for physicians.

Credit Designation Statement

The Northwestern University Feinberg School of Medicine designates this Enduring Material for a maximum of 0.25 AMA PRA Category 1 Credit(s)™. Physicians should claim only the credit commensurate with the extent of their participation in the activity.

Disclosure Statement

Yuan Luo, PhD, has received consulting fees from Walmart and CBio-X Holdings, Inc. Catherine Gao, MD, has nothing to disclose. Content reviewer Theresa Walunas, PhD, has received grant or research support from Gilead Sciences. Course director, Robert Rosa, MD, has nothing to disclose. Planning committee member, Erin Spain, has nothing to disclose. Feinberg School of Medicine's CME Leadership and Staff have nothing to disclose: Clara J. Schroedl, MD, Medical Director of CME, Sheryl Corey, Manager of CME, Allison McCollum, Senior Program Coordinator, Katie Daley, Senior Program Coordinator, Michael John Rooney, Senior RSS Coordinator, and Rhea Alexis Banks, Administrative Assistant 2. All the relevant financial relationships for these individuals have been mitigated.

Claim your credit