Ciaccio EJ. Detecting ChatGPT in Scientific Text: A Historical Perspective on Chatbot Catchphrases. Journal of Biomedical Informatics and AI. 2026;1(1):1. DOI: https://doi.org/10.5281/zenodo.18983792
Detecting ChatGPT in Scientific Text: A Historical Perspective on Chatbot Catchphrases
Edward J. Ciaccio, PhD Department of Medicine, Columbia University College of Physicians and Surgeons, New York, NY, USA
Abstract. Large language model (LLM)-generated text, beyond grammar and spelling improvements, may be present in scientific documents even without attribution. Hence, development of an accurate manual screening paradigm would be helpful to detect whether LLM, or chatbot, content is included. In this study, examples are provided to suggest how peculiar catchphrases detected in scientific text can be checked for chatbot generation. Catchphrases were first selected manually, and then Google Search was utilized in seeking articles with each catchphrase up to 2024 to show a historical perspective. Thereafter, it was determined whether paragraphs with the catchphrase were of chatbot origin using the GPTZero detector. The number of articles published with each catchphrase in recent literature, citations per article, the publishing journal Impact Factor, and the document section in which the chatbot phrase appeared were compiled to characterize chatbot phrasing. In this investigation, it was found that most suspected peculiar phrasings were indeed chatbot-associated. Based on a statistical analysis, the number of published biomedical and bioengineering articles with chatbot phrasings has increased substantially in recent years, particularly after the onset of ChatGPT. Moreover, most of the chatbot-containing articles studied were published in journals with a substantial Impact Factor. Chatbot-generated text is most commonly found in the Abstracts and Introduction sections, but also in the Methods, Results and Discussion, Limitations, and Conclusions. It is therefore inferred that Chatbot content has peculiar phrasing that can be detected manually, and that it ubiquitously appears in scientific documents in the peer-reviewed literature. Chatbot-generated content, over and beyond simple grammar and spell correction, is present and has been increasing even in top-ranked journals.