Domain-Specific Biomedical Ontologies, RALM, and Generative Medical Expert Systems
As described previously, Controlled Natural Language (CNL) phrases of SNOMED-CT medical definitions of interest (logical and textual definitions) can be used as the context for questions answered by a Large Language Model (LLM). This process, which involves In-Context Retrieval-Augmented Language Modeling (RALM), seems compelling.

It enables several informatics opportunities and has the potential to be an effective use of biomedical ontologies, which, at the height of my prior experience with them, had less immediately realizable general value than promised. A tool that naturally allows us to connect via conversation with a domain ontology may bypass what would typically be facilitated through application development or logical reasoning in previous systems.
Recent research has demonstrated that Generative AI systems benefit from being connected to knowledge graphs to improve question-answering accuracy over structured data. I was not surprised when I read this, given my experience in a massive effort to connect [1] Cyc to a large heart and vascular patient registry managed as an RDF Dataset. Cyc is a
a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works [..] to capture common sense knowledge
It is significantly more sophisticated than the production systems of the era of artificial intelligence research from which it emerged and exceeds the capabilities of expert systems conceived during that era.
These earlier, primordial expert systems acquired and managed knowledge as if-then rules or so-called productions [2]. Even then, one of the central maxims of these production systems was that “to achieve good performance, AI systems must have large amounts of knowledge. " Unlike traditional software, AI-based expert systems derive intelligent behavior from a declarative and explicit knowledge base [3].

Even back then, the use of declarative knowledge representation was not the most essential virtue of the system alone, but rather, the amount of knowledge and how well it was semantically organized to reflect the domain was more significant.
For this reason, even during the height of the Semantic Web and Linked Data system development, I always felt it was not enough for an information ecosystem just to have large amounts of linked and (SPARQL-queriable) RDF content. To derive intelligent capabilities, the semantics of the content needed to be very well specified and imbued with the meaning that intelligent agents and applications would need rather than expect the value of well-connected and queriable linked data to emerge serendipitously. The domain specification via an ontology was the secret sauce.

The experience of connecting Cyc to our large Heart and Vascular RDF dataset and SPARQL endpoint reinforced this intuition.
By this same rationale, the interest in using knowledge graphs with generative AI systems is only a realization of the first step, and the next, more transformative step will be the use of well-specified reference domain ontologies and their textual representations. Many of the primordial expert systems presented descriptive text in their interfaces that appeared generative but were hand-constructed phrases in the vocabulary of the knowledge domain as a result of the limitations of the natural language capabilities of the era [3]. The generation of CNL phrases from ontologies is a similar mechanism, but one that is meant for use in endowing generative AI systems with rich, domain-specific vocabulary.
Some contemporary research on the archetypes emerging from the systems built on generative AI is investigating the natural analogy between production systems and language models. They reveal a disadvantage in LLMs, which consist of billions of uninterpretable parameters, in their inherent opaqueness as a contrast with production systems, which are defined by discrete and human-legible productions [4]. On the other hand, the scale of LLMs and the ability to pre-train them provide massive advantages over traditional production systems.

The article on connecting LLMs with knowledge graphs suggests an alternative approach that uses a vector database. However, using CNL phrases from domain-specific ontologies as the corpus for RALM, embedded in a vector database, can further benefit this connection. We can do this without abandoning the value of a knowledge graph or ontology while addressing the inherent opaqueness of LLMs via linking to definitions in a machine-interpretable knowledge base and also providing the benefit of syntactic querying against this knowledge base that a vector database affords. This can be useful for a generative AI system beyond producing queries for execution against knowledge graphs described in that article. The resulting systems can be considered Generative Expert Systems.
They can be used to produce conversational interfaces using domain-specific vocabulary. Their ability to use domain-specific vocabulary can be enhanced using textual representations of definitions from reference domain ontologies with In-Context RALM or as a corpus for training and fine-tuning them. Connecting them directly with the concepts in these ontologies and the logical reasoning and querying capabilities they support can help circumvent some of their opaqueness and with the training of underlying neural networks to fit models that facilitate intelligence rather than parrot-like replication of how humans communicate.
Medical science was the domain in which many of the earlier expert systems (such as MYCIN) were conceived. Given this and my background, it is not surprising that my primary interest is building generative medical expert systems. However, there are very few medical LLM.
Earlier this year, work was done to compile fine-tuned language models called Medalpaca for biomedical tasks. They were evaluated on the United States Medical Licensing Examination (USMLE), a standardized assessment for US medical students [5]. Most of the data used for training involved flash cards used by medical students that cover the
entirety of the medical school curriculum, Stack Exchange question-answer pairs, and medical question-answer pairs extracted from WikiDoc, a collaborative platform for medical professionals for sharing and contributing current medical knowledge.

Later this year, a group released MEDITRON, a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain [6]. They were trained on a new, extensive dataset of rigorously researched clinical articles designed to guide healthcare practitioners and patients in making evidence-based decisions from various healthcare-related sources and abstracts and full-text papers from PubMed and PubMed Central papers.
I plan to work on testing the use of domain ontologies with In-Context RAG on these models. Theoretically, they may benefit from a different, more definitional corpus. I will also learn how to fine-tune them and augment the training process to take advantage of the logical entailment capabilities of the source domain ontologies. It might be a dead end, but it seems like an intellectually stimulating path worth taking.
- D Pierce, C., Booth, D., Ogbuji, C., Deaton, C., Blackstone, E., & Lenat, D. (2012). Semanticdb: A semantic web infrastructure for clinical research and quality reporting. Current Bioinformatics, 7(3), 267–277.
- Doorenbos, R. (1995). Production matching for large learning systems (Doctoral dissertation, Carnegie Mellon University).
- Lachman, R. (1989). Expert systems: A cognitive science perspective. Behavior Research Methods, Instruments, & Computers, 21(2), 195–204.
- Sumers, T., Yao, S., Narasimhan, K., & Griffiths, T. L. (2023). Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427.
- Han, Tianyu, et al. “MedAlpaca — An Open-Source Collection of Medical Conversational AI Models and Training Data.” arXiv preprint arXiv:2304.08247 (2023).
- Chen, Zeming, et al. “MEDITRON-70B: Scaling Medical Pretraining for Large Language Models.” arXiv preprint arXiv:2311.16079 (2023).