Categories

All

Services

Industries

SharePoint Optical Character Recognition (OCR) Solution for Image Only PDFs

SharePoint Optical Character Recognition Solution

Summary

DMC's consulting services team implemented our SharePoint OCR Solution to convert Image Only PDF documents to searchable text for an established law firm based in Chicago, Illinois. The solution automatically scanned each and every document stored in the SharePoint Document Management System, identified Image Only PDF files, added a text layer to those PDF files via optical character recognition, and automatically re-saved the documents to the SharePoint Document Management System where they could be indexed by SharePoint's Enterprise Search engine.

Customer Benefits

  • PDF files can now be indexed by SharePoint Enterprise Search and instantly searched from SharePoint, allowing the legal firm's staff to quickly locate documents using simple keyword search
  • Automation of the OCR process saved at least 4,000 hours of staff time that would have been required to convert each PDF file individually
  • At least $150,000 was saved by implementing a custom solution when compared to the cost of imlementing a packaged OCR software application, which are typically priced at $1+ per OCR'd page
  • Achieved a 96% success rate of adding a searchable text layer to image-only PDF files

Technologies

  • Microsoft SharePoint 2010
  • Microsoft Office SharePoint Server (MOSS) 2007
  • Microsoft .NET 3.5 Framework
  • Microsoft SQL Server 2008 R2
  • Microsoft Windows Server 2008 R2
  • Aquaforest OCR SDK

Solution

Approximately 60% of the law firm's files are PDF files, and 1/3 of these PDFs are Image Only. The content of PDF files which contain only images cannot be searched.

The legal firm asked DMC for assistance with scanning their existing SharePoint Document Repository's 700,000+ files and converting Image Only PDF documents to searchable documents using Optical character recognition (OCR).

In order to help the law firm's staff quickly locate key documents, DMC built an application to first scan all existing documents already in SharePoint to determine which were Image Only PDFs. These documents were then processed by an OCR module built upon the Aquaforest OCR SDK to render the textual content searchable via SharePoint. The legal firm's SharePoint document repository of 700,000 files was scanned and converted in approximately 45 days, with a 96% success rate of adding a searchable text layer to image-only PDF files.

A simple SharePoint keyword search now instantly retrieves a list of all files containing the specified keyword(s), providing quick access to the information in all of the client's document files, saving vital time for their employees and customers.

Since implementing the original SharePoint OCR application, DMC has upgraded the application for compatibility with SharePoint 2010, 2013, 2016, and Office 365 SharePoint Online.  Features have also been added to identify newly uploaded PDF files and OCR them multiple times daily, as well as the ability re-scan specific sites and libraries.

For more information on our SharePoint OCR Solution, please Contact Us.

Posted in Custom Applications, Document Management, Enterprise Search, Microsoft Consulting Services, OCR Solution, SharePoint, SharePoint / Office 365 Packaged Solutions

DMC Chicago • 2222 N. Elston Ave Suite 200 • Chicago, IL 60614 • (312) 255-8757

DMC Boston • 20 Holland St Suite 408 • Somerville, MA 02144 • (617) 758-8517

DMC Dallas • 1920 McKinney Ave 7th Floor • Dallas TX, 75201 • (972) 432-5536

DMC Denver • 1860 Blake St Suite 410 • Denver, CO 80202 • (303) 223-1801

DMC Houston • 3311 Richmond Ave Suite 209 • Houston, TX 77098 • (713) 322-9192

DMC New York • 141 W 36th St Suite 20N • New York, NY 10018 • (917) 473-0030

DMC Seattle • 506 2nd Ave Suite 910 • Seattle, WA 98104 • (206) 388-5186

DMC St. Louis • 1034 S Brentwood Blvd • St. Louis, MO 63117 • (314) 627-5427

https://www.dmcinfo.comsales@dmcinfo.com • (888) DMC-4400