The Generative AI models – Unveiling the strengths and limitations of ChatGPT and its peers

Kunal Chandawar MD, DM (Clinical Immunology and Rheumatology)
Assistant Professor, Department of Clinical Immunology and Rheumatology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh

Imagine your brain as a sponge that soaks up all the information it comes across right from the time you’re born. Over the years, this sponge gets filled with words, sentences, stories, facts, and more. Now, whenever you’re asked a question, your brain goes through this vast store of information, picks up the relevant bits, and gives you an answer. 

Large Language Models (LLMs), like ChatGPT from OpenAI, Bard from Google, and Claude from Anthropic are a bit like that sponge. Instead of soaking up experiences, these models “learn” from vast amounts of text data on the internet. They read books, articles, websites, and every piece of text they can get their digital hands on. Once trained, they can generate human-like text based on the patterns they’ve observed. LLMs function on the foundation of neural networks, mirroring, in some ways, the complexities of the human brain. These networks process and remember patterns from the data they’re given. When you ask an LLM a question, it doesn’t “think” like we do. Instead, it predicts the most likely response based on the patterns it has seen during its training.

What is the role of Large Language Models (LLMs) like ChatGPT in healthcare?

New chatbots like ChatGPT have the potential to transform healthcare. They can help medical students grasp tough concepts. Doctors stay up-to-date on research and handle paperwork.

Chatbots explain hard topics clearly. Some can even answer patient questions or schedule appointments. They summarise lengthy articles so professionals can learn efficiently. Many check work for spelling and grammar mistakes, can even analyse healthcare data and generate content from text to Images.

What are the inherent limitations of LLMs, including ChatGPT?

LLMs, despite their vast knowledge, exhibit surface comprehension and lack deep understanding. They may perpetuate biases; Any LLM is as good as its database and a biased database would give biased results, posing risks in medical applications. There’s also a lack of transparency and they might provide plausible but incorrect answers (Hallucination). Over-reliance on algorithms can erode the intuitive aspect of patient care honed through years of clinical practice.

How can the integration of LLMs be improved in healthcare?

Clinicians must view LLMs as tools, not replacements, preserving the human element in clinical judgment and patient care. Developers should ensure unbiased, comprehensive, and up-to-date training data for LLMs in healthcare, with regular audits, algorithm transparency, and open feedback channels.

What major LLMs are currently available, and what are their unique features and limitations?

Several LLMs, including GPT-4 (OpenAI), Claude 2 (Anthropic), Llama-2 (Meta), and BARD (Google), are discussed. Each has unique features such as speed, reasoning improvement, open-source nature, and integration with Google Cloud. They also have limitations like token limits, cost considerations, and access restrictions.

What cautionary measures are essential as LLMs integrate into healthcare?

It is crucial to tread with discernment as LLMs integrate into healthcare. Clinicians should remember that LLMs are tools, not substitutes, and developers must ensure unbiased training data and transparency in algorithms. The foundational tenets of medicine, including empathy and ethical considerations, should not be overshadowed by technological brilliance.

How do token limits impact the functionality of LLMs?

Token limits affect how much context or information an LLM can handle at one time. Models like GPT-4 have up to 20,000 tokens, influencing their ability to provide accurate and coherent responses. The token limit is a crucial factor in understanding the scope and performance of LLMs.

What are the advantages and disadvantages of major LLMs such as GPT-4, Claude 2, and BARD?

ChatGPT seems incredibly knowledgeable thanks to access to Bing’s updated data. Claude aims to give ethical, harmless answers. BARD leverages Google’s vast info. Grok goes for fun, casual dialogue. The more we use them, the smarter they get. Over time, the bots better understand our questions and improve their replies. What are the upsides? They keep getting better, cost less than human help, and have vast knowledge on almost everything. What are the downsides? Sometimes they make up bogus facts. They can have serious limitations we don’t fully grasp yet. And you may not have access depending on location or language.

Further Reading

  1. Lee P, Bubeck S, Petro J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med. 2023;388(13):1233-1239.
  2. Shah NH, Entwistle D, Pfeffer MA. Creation and Adoption of Large Language Models in Medicine. JAMA. 2023;330(9):866-869.
  3. Cutler DM. What Artificial Intelligence Means for Health Care. JAMA Health Forum. 2023;4(7):e232652.
  4. Egli A. ChatGPT, GPT-4, and Other Large Language Models: The Next Revolution for Clinical Microbiology? Clin Infect Dis. 2023;77(9):1322-1328.