Did You Know What You Really Need is an Ontology?
Mark HallMark Hall
Mark Hall
2019년 10월 7일
Many of us, particularly those of us in technology-facing roles, can recite by heart the by-now familiar sales pitches offering pre-packaged solutions to enterprise data issues. But, hidden deep within the avalanche of hyperbolic AI-centric sales and marketing ‘solutions’ to contemporary problems of data and information management, is a gold nugget.
As you sit through yet another pitch for a data-lake solution playing imaginary AI bingo, checking off innovation, insight, automation and of course analytics, along with the anecdotes describing projects that took days, not months, you could certainly say to yourself that this is exactly what everyone else is doing. Surely there’s safety in numbers and surely the latest industry benchmark says exactly that? However, given the price of most packaged solutions: handing over huge amounts of your data; committing thousands of hours of subject matter expertise to annotate and tokenize data that the vendor won't understand, not to mention a crater in your tech budget, the price of going it alone might be lower than you think.
There is a another way.
When I first heard the word ontology, I immediately wondered how Plato’s ontological dualism could have anything to do with data modeling and why it is so important in tackling many of today’s challenges in managing information. While the subject is vast, and certainly deserves more explanation than I can cram into a paragraph, the most important thing to know is that an Ontology’s primary purpose is simply to represent knowledge within a domain. A representative semantic model (which is basically what an Ontology can be described as) is intended to convey shared understanding within the domain, often via the implementation of a knowledge graph, such as a triple store. An Ontology emphasizes the relationships and meaning of those relationships between concepts (or ‘classes’ using Ontology parlance) within your business. While traditional data modeling has concerned itself primarily with the capture and retrieval of data, an Ontology concerns itself with a shared understanding of what that data means.
Before embarking on the AI-journey, particularly the application of natural language processing and machine learning across unstructured data, it’s critical to ensure you understand and document your domain. Committing to an enterprise Ontology forces organizations to do precisely this. While relational data modeling requires conceptual, logical and physical models along with a deep data dictionary and glossary to facilitate business understanding, a semantic data model integrates all of those into the Ontology. Whether you decide to stick fastidiously to OWL, venture into SHACL or tinker with SKOS, the flexibility and power offered by linked data, using open standards and a handful of (often open-source) tools, is immense.
However, making the leap from the relational world of data management and governance, to semantic modeling and linked data, can be surprisingly inflammatory. You may meet resistance within your technology organization from people who have spent decades honing their skills on Oracle, DB2 or any number of well-established solutions. Dogma and intransigent positions on how data should be captured and represented are common in large organizations. Even when you’ve piqued their intellectual curiosity, skepticism within your executive management could be a navigational challenge and not for the faint-of-heart, especially in a space crammed with shiny objects and dazzling buzzwords all fighting for investment dollars. But there are plenty of reasons to persevere. One good reason is the open nature of OWL. Another is that big-box packaged solutions may not adequately address problems that are unique to your organization.
The advantages of semantic data models and the use of triple-stores (or indeed other types of graphs) are well documented, but the two most obvious reasons are linked data and using inference (or reasoning). A simple example might be, if you know Sally works for Innovation Inc., and Innovation Inc. is located in New York, then you can infer that Sally works in New York. If we add information that Sally works remotely and lives in Vermont, we can infer that she actually works in Vermont and that Innovation Inc. has employees in Vermont.
While this might seem simple, it’s actually quite difficult to perform such reasoning in relational databases and can be computationally expensive, and all this is even before considering the difference between open assumption models versus closed assumption models, and exactly how we capture “I don’t know” versus “it’s not in the database”.
Ontologies excel at this kind of reasoning. It's relatively easy to then develop algorithms that explore your graph for new relationships you hadn’t defined in the model, thus adding to the knowledge graph. Further, because semantic modeling is concerned with meaning, new relationships provide insights within the context of your domain. This logical reasoning is in contrast to the statistical approach often taken with data-lakes or traditional NLP, which might be good at finding similar concepts, but miss out on different concepts that are relevant or actually share meaning.
Can I Add This to my Shopping Cart?
Let’s talk about a widely used (and mis-used) word: innovation. The idea of a commodified solution being sold to everyone being innovative, is something of an oxymoron. Consider the struggle between Apple and Adobe over Flash about a decade ago. To re-cap, Flash was a proprietary plugin on many websites that supported everything from embedded video to games and early web applications. While it was free to the end user, the tools required for creating Flash-based content were sold by Adobe. Flash was ubiquitous on the web, and it was incredibly lucrative for Adobe.
There were technical difficulties in getting Flash to run on phones with much less processing power than computers. At the time, Apple's stance was that if Adobe could get Flash to run on the iPhone, it would be supported. However, after the opening of the App Store in 2008, Apple quickly realized that, if Flash was available on the iPhone, developers would create apps using the cross-platform Flash, which in turn would mean that Adobe, not Apple, would set the standard for app development on their device.
This represented an existential threat to the perception of Apple’s innovative product. If Apple wanted to implement some novel capability within the device—such as Siri, augmented reality, or health monitoring—those innovations would not find their way into applications until Adobe implemented it in their tooling. Of course, Adobe would only implement those capabilities in their tools when the applicability of those capabilities became widely apparent, and that would only happen when other phone makers (competitors of Apple) included them in their devices. Basically, Apple’s innovation would not be rewarded until everyone else had implemented the idea, and by then the advantage would be lost. This cycle removes the motivation for driving innovation and creating differentiating products in the market. We all know how this turned out: Flash is dead and the Apple App store is considered the gold-standard for application variety and quality. Innovation won this round.
The point I want to make is that it’s hard to innovate if another organization is allowed to decide when your best ideas can be executed. For Apple, innovation was driven by the developers and their tools. For you, it’s driven by enterprise data, where it sits and perhaps most importantly how it’s modeled. Committing to an enterprise Ontology, linked data, and open standards allows you to control where and how you spend your investments of people and capital. In the longer term, an enterprise ontology gives you the freedom to implement your own ideas as you think of them, rather than waiting for them to become commoditized and packaged by big-box vendors.
An enterprise knowledge graph has broad applicability to support everything from search to analytics. It can provide your machine learning algorithms with a deep bench of linked learning data to consider, and the very nature of an ontology, especially if you establish a strong core, means that adding additional domains of information won’t require re-modeling from scratch. Standards compliant triple stores are exceptionally portable, so should your favorite vendor decide to change their licensing model, you have the flexibility to move elsewhere with minimal effort.
So why isn’t everyone doing this?
Graph databases are hardly a new idea, and semantic data models are certainly not the only way to deal with unstructured and semi-structured data within the enterprise. However, developing an enterprise ontology from scratch can be daunting as it requires the dedication of subject matter experts and some of your smartest technologists to commit to a book of work that might not have obvious or immediate benefits. Within that paradigm, it’s difficult to compete with the senior sales executives playing golf with your C-Suite management, or inviting decision makers to baseball games to show off their latest wares. Who wants to sit through a PowerPoint presentation explaining the finer points of lexical expressiveness with a timeline measured in years, when they can be fine dining on someone else’s expense account being pitched on the next big win?
Nevertheless, you should persist, because the rewards are great. No one knows your organization better than the people within it, so if you’re looking for insight and innovation with your enterprise, there’s no better strategy than asking your most knowledgeable people internally. You can’t buy insight from outside, and you really can’t innovate using pre-packaged approaches.
Building your own enterprise knowledge base, supported by a robust representational data model will give you the building blocks to be self-determinate, have an insightful view into your data and not be beholden to someone else’s ability to innovate. What you need is an Ontology.
요약하면, 그는 기업이 AI·데이터 혁신을 하려면 먼저 ‘온톨로지(Ontology)’를 구축해야 한다고 주장합니다.
패키지형 AI 솔루션의 한계
많은 기업이 ‘데이터 레이크’나 ‘AI 분석 플랫폼’ 같은 상용 솔루션을 도입하지만, 이는 벤더의 방식에 종속되고, 데이터와 지식 구조를 스스로 통제하기 어렵게 만듦.
장기적으로는 비용이 크고, 혁신 속도가 벤더에 의해 제한됨.
온톨로지(Ontology)의 필요성
온톨로지는 도메인 내 개념과 관계를 명확히 정의한 지식 표현 모델.
단순 데이터 모델링이 ‘데이터 저장·조회’에 초점을 맞춘다면, 온톨로지는 ‘데이터의 의미와 관계’를 공유·이해하는 데 초점을 둠.
보통 지식 그래프(Knowledge Graph) 형태로 구현되며, OWL, SHACL, SKOS 같은 표준을 활용.
온톨로지의 장점
추론(Reasoning) 가능: 예) "Sally는 Innovation Inc.에서 일한다" + "Innovation Inc.는 뉴욕에 있다" → Sally는 뉴욕에서 일한다.
링크드 데이터(Linked Data): 여러 데이터 소스를 의미적으로 연결.
확장성: 새로운 도메인 추가 시 전체 모델을 다시 만들 필요 없음.
벤더 종속성 최소화: 표준 기반 triple store 사용 시 이식성 높음.
조직 내 도전 과제
기존 RDBMS 중심 사고방식(Oracle, DB2 등)에 익숙한 기술 인력의 저항.
경영진 설득의 어려움: 단기 ROI가 보이지 않음.
외부 벤더의 화려한 마케팅·영업 활동과 경쟁해야 함.
전략적 제안
내부 도메인 전문가와 기술 인력을 투입해 기업 고유의 온톨로지를 구축.
이를 기반으로 AI, 검색, 분석, 추천, 자동화 등 다양한 서비스 확장.
장기적으로 혁신 속도를 스스로 통제할 수 있는 기반 마련.
| 항목 | 설명 | 실무 팁 |
|---|---|---|
| 도메인 분석 | 조직의 핵심 개념, 관계, 규칙 정의 | 워크숍 형태로 부서별 SME(Subject Matter Expert) 참여 |
| 표준 선택 | OWL, RDF, SKOS, SHACL 등 | 오픈소스 툴(Protégé, GraphDB, Stardog) 활용 |
| 지식 그래프 구축 | Triple Store에 온톨로지+데이터 적재 | SPARQL로 질의, 추론 엔진 적용 |
| AI 연계 | NLP, ML 모델에 의미 기반 데이터 제공 | 데이터 전처리·라벨링 비용 절감 |
| 변화 관리 | 기술·문화적 저항 완화 | PoC(Proof of Concept)로 작은 성공 사례부터 확산 |
도메인 온톨로지 설계 → 개념·관계 정의
데이터 매핑 → 기존 DB, 문서, API 데이터를 온톨로지 구조에 맞게 변환
Triple Store 적재 → RDF/OWL 형식으로 저장
추론 엔진 적용 → 새로운 관계·지식 자동 생성
AI 모델 학습 → 의미 기반 데이터로 NLP, 추천, 예측 모델 강화
서비스 배포 → 검색, 챗봇, 분석 대시보드, 자동화 시스템 등
Mark Hall의 주장은 “AI 혁신의 진짜 기반은 온톨로지”라는 것입니다.