DMC, Inc.
azure ai logo and text with knowledge mining solutions

Azure AI: Knowledge Mining Solutions

Knowledge mining is a field of artificial intelligence that pertains to extracting key insights from unorganized data. Like data mining, which finds patterns and correlations, knowledge mining takes things to the next level by also contextualizing knowledge across a wide range of data formats. Azure AI Search allows us to do this on existing knowledge bases to help organize and enhance your business.  

At DMC, we pride ourselves on our vast technology skillset. AI solutions are just a part of our application development services. From IoT solutions to custom software from scratch, we are ready to tackle any project that comes our way. In this post, we will introduce the fundamentals of Azure AI Search, AI Enrichment, and Multimodal Search. We will also talk about what an AI Search solution might look like from DMC. 

knowledge mining illustration
Credit: Microsoft

What is Azure AI Search? 

Azure AI Search is a Microsoft-owned and managed, cloud-based service that enables applications to have enterprise-grade information retrieval. The service has the traditional characteristics of a search service, such as indexing, keyword search, and knowledge stores. However, Azure AI search can enhance your search capabilities using AI Enrichment, Multimodal Search, and more! All these features live in the Azure ecosystem, making it easy to scale and apply a secure solution that fits your needs. 

What Can Azure AI Search Do? 

Azure AI offers several standard features of a search service, with the addition of AI capabilities. In this article, we will take a deeper look at AI Enrichment and Multimodal Search. Other features exist such as: 

  • Indexing 
  • Vector and hybrid search 
  • Full-text search 
  • Full Lucene query syntax search 
  • Relevance scoring 
  • Semantic ranking 
  • Knowledge stores 

AI Enrichment

AI Enrichment in the Azure AI Search service takes raw, unstructured data and transforms it into structured, searchable content using other Azure AI Services such as Computer Vision and Language Processing. What this means is that we can take inputs, such as PDFs and text files, and extract meaningful information from them so they can be accurately retrieved in response to a search query. 

How AI Enrichment Works

Diagram of how AI enrichment works
Credit: Microsoft

The process for AI Enrichment can be broken down into three phases: 

  1. Date Importing 
  2. Enrichment and Indexing 
  3. Output Exploration

Data Importing is the step where an indexer connects a data source with unstructured documents and pulls them into the search service. 

Enrichment and Indexing is the largest and most complex step of the process. Enrichment starts with the indexer opening files and extracting key data from them, such as dates, keywords, or any other customizable entities. During this process, an “enriched” version of the document is created. This enriched document can either be temporary or stored for future reuse. The indexer then applies field mappings, which are paths between the source data and search index. We also create a path between the enriched data and search index. 

In the final step, Output Exploration, we start to see the end goal of setting up this service, which is to be able to navigate a previously unsearchable data source. This can appear in the form of a simple search bar or passed onto a chatbot that can dynamically retrieve the enriched data and present it in a user-friendly, conversational format.  

Use Cases for AI Enrichment

  • Knowledge Management: Extract entities and key phrases from internal documents to support advanced search
  • Customer Support: Enrich support tickets with metadata to improve query routing
  • Research Analysis: Process academic papers or reports to extract insights for searchable archives

Multimodal search is the ability to ingest, understand, and retrieve information across multiple content types, including text, images, video, and audio. This enables us to search using more diverse methods such as similarity search and hybrid queries, which we will talk about more in this section.  

How Multimodal Search Works

Multimodal Search uses AI to vectorize and index non-text content, enabling similarity search and hybrid queries. The process involves the following steps: 

  1. Content Ingestion 
  2. Vectorization 
  3. Index Storage 
  4. Query Processing 

Content Ingestion is where we extract data from text, images, PDFs, and more via indexers.  

Vectorization uses AI models to convert unstructured content into vectors. Vectors are used to capture semantic or visual features of data. They are essential in getting accurate return results when querying for non-text elements, such as visual features. 

The process of storing the vectors and their associated metadata for fast retrieval is the Index Storage step. 

Finally, Query Processing is where a user’s search input is matched against relevant vectors so that semantically similar results can be returned.  

So, what is the end goal of multimodal search? While the overall goal is to expand your organization’s knowledge sharing and storage capabilities, the two primary added features are similarity searches and hybrid queries.

This technique leverages vector embeddings to find content that is conceptually or visually like a query, rather than relying solely on exact keyword matches. For example, a user uploading an image of a red dress can retrieve similar dresses based on visual features like color and style, even if the textual descriptions differ. Similarity search enables semantic and visual matching across diverse data types. 

Hybrid Queries

Multimodal Search supports hybrid queries that combine similarity search with traditional keyword search for more comprehensive results. By merging results using Reciprocal Rank Fusion, Azure AI Search ensures that both semantic relevance and exact matches are considered. For instance, a query like “blue sneakers” can retrieve results based on both the text description and vectorized images of sneakers, providing a balanced and highly relevant output. 

With these two query processing features, an application can answer questions like “What is the process to approve a purchase order?” even when the only description of the process lives inside an embedded diagram in a PDF file. 

diagram of query processing

Use Cases for Multimodal Search 

  • Conversational AI: Enable chatbots to answer queries using text, images, or PDFs from a knowledge base 
  • E-Commerce: Allow users to upload product images to retrieve descriptions or reviews
  • Healthcare: Query medical images and patient records for diagnostics
  • Media Analysis: Search video or audio archives alongside text for comprehensive insights

To demonstrate the power of Azure AI Search, let’s explore a high-level example of how DMC could leverage this technology to solve a knowledge organization problem for a client. 

The Challenge

A multinational consulting firm needs a search solution to help consultants quickly access industry trends, regulations, and internal best practices. Their knowledge base includes thousands of PDFs, images, and internal reports across multiple languages, but traditional search tools deliver irrelevant results and struggle with non-text data. This leads to countless hours spent manually searching documents, impacting productivity and client response times. 

The Solution

DMC proposes that the firm implement Azure AI Search with AI Enrichment and Multimodal Search to create an intelligent search platform. By enriching and vectorizing their knowledge base, this enables consultants to retrieve accurate, contextually relevant content using text or image queries, streamlining research and improving client outcomes. This solution would include: 

Complete Knowledge Base Ingestion: Using Azure’s pre-built indexers vectorize and index a pre-existing knowledge base to make it easier to navigate. 

Enhanced Data Retrieval: With AI Enrichment and Multimodal Search, the search service can quickly and accurately retrieve information based off a query. 

Implementation

To build this solution, DMC provisions an Azure AI Search resource within the client’s Azure subscription. The development process includes: 

  • Data Ingestion 
    • Store PDFs, images, and reports in Azure Blob Storage, ingested via Azure AI Search indexers
  • AI Enrichment Pipeline 
    • Apply Optical Character Recognition (OCR) to extract text from scanned PDFs and images (e.g., charts, infographics) 
    • Use entity recognition to tag regulations, companies, and dates in documents
    • Employe translation skills to index multilingual content for global teams
  • Multimodal Search Setup 
    • Vectorized images and text using Azure OpenAI and Computer Vision models
    • Configured hybrid search to combine keyword and vector queries for comprehensive retrieval
  • Deployment 
    • Integrate the search solution into the firm’s internal portal using Azure SDKs, supporting text and image-based queries

Benefits

  • Improved Relevance: Semantic ranking and vector search increases result accuracy
  • Time Savings: Reduced research time by accessing insights via intuitive searches 
  • Multimodal Flexibility: Image-based queries (e.g., uploading a chart) can now retrieve related documents, enhancing usability
  • Global Accessibility: Multilingual indexing supports seamless search for international teams
  • Scalability: Using Azure as the foundation, the service can scale easily with the firm’s needs and goals

Responsible AI

AI Enrichment and Multimodal Search rely on Azure AI models (e.g., Computer Vision, Language Service, Azure OpenAI) that may process personal or sensitive information.  

To uphold privacy and fairness, it is essential to consider the following: 

  • Ensure input data complies with applicable privacy laws and regulations. Avoid processing sensitive personal data without explicit consent or legal basis. 
  • AI models may inadvertently reflect biases in training data, potentially affecting entity recognition or image analysis. Regularly evaluate outputs for fairness and adjust configurations (e.g., custom skills or filters) to minimize biased results. 
  • Clearly communicate to users when AI-generated outputs, such as extracted entities or vectorized content, are used in search results, ensuring they understand the role of AI. 

More information on Microsoft’s guidelines on responsible use of AI Search can be found in their documentation provided here

Why DMC? 

At DMC, we blend our years of expertise with the Azure ecosystem with a client-centric approach to deliver transformative cloud-based solutions. Our deep knowledge base enables us to craft tailored platforms that unlock actionable insights and streamline operations. As a Microsoft Solutions Partner, we are committed to responsible AI practices. We ensure ethical, secure, and scalable solutions that align with your business goals, empowering you to stay ahead and succeed. 

Let’s Recap

Azure AI Search is a powerful platform for intelligent search that uses AI Enrichment and Multimodal Search. AI Enrichment processes unstructured data into searchable, structured content, enabling applications like knowledge management and multilingual search. Multimodal Search extends this to images and audio, supporting cross-modal retrieval with similarity search and hybrid queries. Responsible AI practices ensure ethical use by addressing privacy, bias, and accuracy concerns. In our real-world example, a consulting firm could leverage these features to build an efficient search solution, boosting productivity and client satisfaction. For organizations seeking to unify diverse data and enhance discovery, Azure AI Search delivers expert results. 

At DMC, we’re passionate about harnessing these tools to solve complex challenges and deliver measurable results. From automation to agriculture, our team is equipped to build custom AI solutions across industries that meet your unique needs. Ready to explore the potential of Azure AI Vision? Contact us to start a partnership today!