Getty Images/

Manage Learn to apply best practices and optimize your operations.

The enterprise advantages of automated data collection

Many organizations still rely on manual data entry that wastes time and results in low-quality data. Here are the latest automated data collection techniques and their benefits.

Process automation has a wide range of applications, but typically the common goal is to replace human activities with technology to reduce cost and streamline repetitive processes.

Manufacturing companies have been using industrial robots to replace activities traditionally performed by humans for decades, and business process automation shares the same goal: to replace business functions performed by humans with software applications. Work activities that are repetitive in nature and require little intelligent analysis and decision-making to complete are prime candidates for process automation. 

The continued proliferation of paper

We know that a paperless office provides enormous benefits to both businesses and the environment. There are dozens of studies showing that digital documents reduce costs and improve quality and efficiency when compared to their paper counterparts. But the business community continues to generate an enormous amount of paper.

An IDC whitepaper titled "The Migration from Paper to Digital: Why Digitization Remains Elusive" states that:

  • Paper documents comprise 30% of the documents used each day.
  • Knowledge workers continue to create more paper documents than electronic.
  • Organizations continue to use paper documents for key business processes that include customer, employee and patient onboarding, purchase orders, expense reporting, and content review and approval.

We know that a common next step is to transfer the information from paper documents to a computerized system. The process can be manual or robotic. In an era when a core strategy for most organizations is to automate as many business processes as possible, manually transcribing information from paper documents to computer systems continues to be commonplace. It is an easy assumption to make, as well as a gross understatement, that manual data entry is time consuming, costly and error prone.

Sources of data collection

Automated data collection tech

Automated data collection has advanced far beyond the simple scanning of documents as a single image for storage and archival. Modern automatic data capture products can intelligently identify and extract individual pieces of text, checked boxes, filled circles and handwriting from paper documents, then transform them into structured data elements that can be used by computerized systems.

Optical character recognition. Optical character recognition (OCR) has been available for a couple of decades and is the most common technology to transform paper documents into electronic files.

Intelligent character recognition (ICR). ICR enhances OCR capabilities by using additional technologies to recognize different fonts and handwriting styles. ICR applications often use machine learning algorithms and artificial intelligence to improve the system's recognition capabilities and increase paper to digital transformation accuracy.

Intelligent document recognition (IDR). IDR is primarily a marketing term used by automated data collection software vendors to describe the combination of technologies they use to improve their products' recognition and extraction capabilities. In early stages of IDR, the vendors used predefined templates that allowed their software to extract values that occur in specific locations.

An easy way to visualize how a template works is that it is a computerized overlay that is placed on top of a document. The software extracts the data from the overlay's predefined locations and inserts the values into their corresponding fields. Templates continue to be a popular method of extracting data from paper documents.

Like any highly competitive market arena, automated data capture software vendors understand that constant innovation and integration of new features that differentiate their product from other offerings is essential.

Many competitors are now using advanced software that combines human logic with artificial intelligence and machine learning to identify and extract data from paper documents and store it as structured information. Their goal is to use various technologies to either augment or entirely replace document templates.

Automated data collection marketplace

Whenever there is a perceived need to automate a business process, you'll find an enterprising set of vendors that provide solutions. From industry heavyweights like Oracle and IBM to vendors that focus specifically on automated data collection, there is a wide and ever-growing array of offerings available. As a result, organizations interested in automated data capture products have a robust, and somewhat bewildering set of products, technologies and features to evaluate. Here are a couple of popular products that will allow you to better understand the different strategies, technologies and features the vendors utilize:

Ephesoft. A highly competitive product in this space is Ephesoft. The vendor is well known and often referred to as an industry leader by both customers and competitors. The product uses artificial intelligence and its own patented machine learning algorithms to extract data from paper documents. What sets this vendor apart from its competitors is the product's robust feature set, numerous APIs, ability to extract data from a diverse set of document sources and comprehensive reporting capabilities.

Amazon Textract. Where there's a marketplace to compete in, there's a good chance you'll find an Amazon offering. Amazon Textract combines OCR with advanced technologies to intelligently extract text and handwritten data from scanned documents. In addition, Amazon provides users with the ability to create custom workflows to allow human reviews of data at any stage during the extraction process.

Rossum. Rossum is a strong competitor in this space and its product receives good reviews. Like most automated data collection vendors, Rossum employs machine learning and self-learning artificial intelligence to extract data without having to rely on predefined templates. Rossum uses a two-stage process that separates the activities into data extraction and validation. During the validation stage, the software assigns confidence scores to the extracted data. The product automatically prompts users to inspect empty fields and manually review data elements with low scores.

IBM Datacap. The industry heavyweight's Datacap offering is both a standalone product and a key component of its IBM Cloud Pak for Business Automation product suite. IBM states that the product "Uses machine learning to automate the processing of complex or unknown formats and highly variable documents difficult to capture with traditional systems." The product easily integrates with IBM's Robotic Process Automation software and allows users to define processing rules, create workflows and easily export data to other systems. Datacap Mobile enables users to capture information at the point-of-contact.

Dig Deeper on Data quality techniques and best practices