Making COVID Terminology Comprehensible

Research Team: Yehoshua Perl, James Geller and the Members of SABOC

The medical community's open-source Coronavirus Infectious Disease Ontology (CIDO) was released in January 2019 by University of Michigan Associate Prof. Oliver He. It has been continuously extended and now stores extensive conceptual knowledge about COVID and Corona Virus infections. CIDO quickly grew to over 6,000 concepts and 113 relationship types and continues to grow.

The problem with a large ontology such as CIDO is that it is hard to understand and hard to learn, because of its size and complexity.

Figure 1. Layout of CIDO's hierarchy. This represents the overwhelming complexity of the coronavirus ontology. Every white dot is a medical concept.

Figure 1 illustrates such complexity. Every white dot represents a medical term/concept. The colored lines are generalization links. When a link connects a white dot at a lower level to a white dot at a higher level that means that the lower concept is a specialization of the higher concept. For example, a Corona Virus Infection is a specialization of a Virus Infection and the former appears below the latter, while the two are connected by a line.

All lines emanating from the concepts of the same level are in the same color, to improve the visualization. Zooming in doesn't help with comprehension, because the other ends of the lines emanating from a focus concept are pushed out of view, putting a strain on the short term memory of the user.

Yehoshua Perl, James Geller and their students in the Structural Analysis of Biomedical Ontologies Center applied the Ontology Abstraction Framework (OAF), originally developed from 2015 to 2017 by their postdoctoral student Christopher Ochs, to simplify and visualize the complexity of CIDO.

OAF is based on a theoretical framework that has been created and optimized over two decades of research by Perl, Geller, and their students. Here is a brief summary.

A partial-area taxonomy is a network which provides a summarized view of an ontology for a display on a screen that is easily comprehensible. However, for a large ontology this summarization network may still be too large. For example, the Fall version of CIDO consisted of 5,138 concepts. Its partial-area taxonomy had 519 partial-areas, which cannot be displayed as 519 boxes on one screen.

To obtain an even more compact summary of an ontology, we defined the weighted aggregate taxonomy (WAT). The idea is to differentiate between major partial-areas summarizing many concepts and minor ones summarizing just a few, by defining a cutoff value. In a WAT, only partial-areas above the cutoff are displayed as boxes. Each such node (box) summarizes a major subject in the topic modeled by the ontology.

The WAT worked well for many ontologies, but for CIDO it created a long and narrow diagram that did not make good use of a computer screen. The newly invented "child-of-based layout" was shown to overcome this problems and generates a balanced layout for the CIDO summarization fitting well for visualization (see figure 2). Examples of major subjects of CIDO, include process (summarizing 301 concepts), viral vaccine (standing for 58 concepts) and viral protein (summarizing 43).

Figure 2. A big-picture summarization network of major subjects of CIDO

However, the partial-areas below the cutoff value are not deleted, they are just hidden. Their contribution is aggregated into the closest large parent or ancestor partial-area. By clicking with the mouse on a major subject node, the OAF software tool can expand it back to show the hidden details.

For example, when clicking on the major subject process, the OAF software tool generates the network of secondary subjects of the major subject process shown in Fig 3. Among these secondary subjects we find Coronavirus infectious disease process (summarizes 8 concepts), COVID-19 diagnostic process (7 concepts), and its child COVID-19 diagnostic process by serological assay (3 concepts).

Figure 3. Expansion of the process subject

The general public's interest in COVID may wane after a vaccine becomes mainstream, which is happening now with Moderna and Pfizer in the US. But CIDO and the software to make sense of it all will be relevant for years to come.

Even when a vaccine is available, it will take time for the whole world population to get vaccinated. Furthermore, medicine will have still to deal with the leftovers of the pandemic — all those symptoms and problems which people have after they were cured already. Another issue is that CIDO will be very helpful when the next pandemic hits. It will need to be adapted, but the framework and many concepts will still be applicable.