Artificial Intelligence in Laryngeal Cancer Management: Enhancing Guidelines or Redefining Standards?
ABSTRACT Objective To evaluate the accuracy of artificial intelligence (AI) in establishing clinical decision‐making in the treatment of advanced laryngeal cancer. Methods A structured question was elaborated for each of the seven recommendations chosen. Each Large Language Model (LLM) platform answered the questions. The Claude platform identified the differences between the guidelines and the responses generated and three specialists evaluated the impact of such differences. Results Of the 28 analyzed responses, 22 (78.6%) demonstrated content similarity with existing guidelines. Two responses showed that guidelines contained significantly more comprehensive content, three responses from LLMs provided additional content not demonstrated in the guidelines, and one response showed direct disagreement with established guidelines. Conclusion There was a 78.6% overlap in responses between guideline recommendations and LLMs. Therefore, while AI holds promise for transforming guideline creation, its integration into clinical practice must be carefully evaluated to ensure that it complements, rather than replaces, established expert‐driven protocols. Level of Evidence 4.
Citação
@online{rogério_aparecido2025,
author = {Rogério Aparecido , Dedivitis and Castro, Mario Augusto
Ferrari, De and Leandro Luongo , Matos and Daniel Araki , Ribeiro
and Bruno Pelison , Duarte and Luiz Paulo , Kowalski},
title = {Artificial Intelligence in Laryngeal Cancer Management:
Enhancing Guidelines or Redefining Standards?},
volume = {10},
number = {6},
date = {2025-12-01},
doi = {10.1002/lio2.70296},
langid = {pt-BR},
abstract = {ABSTRACT Objective To evaluate the accuracy of artificial
intelligence (AI) in establishing clinical decision‐making in the
treatment of advanced laryngeal cancer. Methods A structured
question was elaborated for each of the seven recommendations
chosen. Each Large Language Model (LLM) platform answered the
questions. The Claude platform identified the differences between
the guidelines and the responses generated and three specialists
evaluated the impact of such differences. Results Of the 28 analyzed
responses, 22 (78.6\%) demonstrated content similarity with existing
guidelines. Two responses showed that guidelines contained
significantly more comprehensive content, three responses from LLMs
provided additional content not demonstrated in the guidelines, and
one response showed direct disagreement with established guidelines.
Conclusion There was a 78.6\% overlap in responses between guideline
recommendations and LLMs. Therefore, while AI holds promise for
transforming guideline creation, its integration into clinical
practice must be carefully evaluated to ensure that it complements,
rather than replaces, established expert‐driven protocols. Level of
Evidence 4.}
}