AI-900 정리노트 - Document intelligence, Knowledge Mining
- Document intelligence extends OCR by extracting, understanding, and organizing text data.
- Automates document processing (e.g., receipts, forms), reducing manual work and errors.
- Example: Scans a receipt, extracts merchant info, total, and tax, and maps them into a database.
- Azure AI Document Intelligence provides prebuilt and custom models for document analysis.
Azure AI Search
Azure AI Search Features
- Built on Apache Lucene: Programmable search engine with 99.9% uptime SLA for cloud and on-premises.
- Data from any source: Accepts JSON data; auto-crawling supported for selected Azure sources.
- Multiple search options: Supports vector search, full-text search, and hybrid search.
- AI enrichment: Built-in image and text analysis capabilities using Azure AI.
- Linguistic analysis: Supports 56 languages with phonetic matching and language-specific linguistics.
- Configurable user experience: Offers vector queries, text search, hybrid queries, fuzzy search, autocomplete, geo-search, and more.
- Azure scale, security, and integration: Works across data, machine learning, Azure AI services, and Azure OpenAI.
Azure AI Search Data Flow
-
Start with a data source
- Original data artifacts like PDFs, videos, images, or database text (e.g., Azure Storage, Azure SQL Database, Azure Cosmos DB).
-
Indexer
- Automates data movement from source through document cracking, enrichment, and into indexing.
- Converts original file types to JSON (called JSON serialization).
-
Document cracking
- The indexer opens files and extracts content for processing.
-
Enrichment
- Applies AI skills (built-in or custom) to enrich extracted data.
- Skillsets define operations like OCR, text translation, captioning images, or evaluating text sentiment.
- Enriched content can be saved in a knowledge store (tables/blobs in Azure Storage).
-
Push to index
- Serialized JSON is used to populate the search index.
-
Querying the index
- Users search (e.g., "coffee") against the search index.
- The index has a schema (like a table) with fields, data types (e.g., string), and attributes for filtering, sorting, and searching.
How to use
- Azure portal's import data wizard
- with the REST API
- SDK
Insightful post on document intelligence and knowledge mining! These technologies power many intelligent document automation use cases, including extracting key data from unstructured documents, automating compliance checks, and enhancing decision-making processes. For a comprehensive overview of practical applications, check out this guide:
intelligent document automation use cases