Document Classification For The Real World: An Executive Interview With ReadSoft President Bob Fresneda
Written by Brian Sherman
Document automation has been through a rather quiet revolution over the last few years as innovative technologies have displaced straight structured forms processing in many organizations. Capturing semi-structured and unstructured documents and integrating them into an automated process is now possible, with real results for companies employing this software. The labor saved by reducing human intervention in forms processing, even if it only eliminates one computer keystroke or supervisor's review, could quickly repay the cost of the implementation.

Several vertical markets, including banking, finance, and healthcare, have been quick to adopt semi-structured and unstructured document processing. How have end users and resellers benefited from what software supplier ReadSoft coined as "Document Automation"? Bob Fresneda, president of the New Orleans-based company, provided some clear examples and insight into the benefits of software integrations with ERP (enterprise resource planning) applications. He also provided an update on how the organization and its employees coped with Hurricane Katrina and how proper planning helped ReadSoft overcome some serious business continuity obstacles.
Bob, how has document classification developed and what are some specific issues end users face with documents and internal processes?
As the hot topic changed from structured to semi-structured documents and then to unstructured documents, end users were able to assess the applications that made the most sense for their particular business. VARs (value-added resellers) explored technologies they could develop and implement to solve their customers' specific needs and still make a reasonable profit.
That's how the accounts payable market evolved, with the technology transitioning from structured documents to semi-structured documents. Customers using semi-structured forms, such as bills of lading or other accounts payable-type forms, recognized that classifying these documents early in the document automation process would minimize manual sorting. For example, in an accounts payable department, an invoice with a PO (purchase order) is handled in a different way than an invoice without one. Various employees may process certain types of invoices for the accounts payable department that, when that document has a purchase order assigned to it, may have to be validated in an ERP system by someone else. In SAP systems, a materials management module stores all the purchase orders, and employees may have to validate whether or not this invoice matches all of the items that were issued on the original purchase order. Other companies apply document classification technology by categorizing an invoice based off key word logic (based off the text inside of an invoice), which may determine whether or not an invoice is PO-based.
How do the classification engines work and how does ReadSoft's technology compare to its competitors?
The classification programming is accomplished through the use of meta-tagging and key word searches on the document itself. This programming not only captures the data, but also classifies and routes that particular invoice to the right person. We have our own character recognition technologies and our own document recognition technologies, as well as our own forms classification products, unlike other classification vendors.
Where is document automation being employed within end user companies?
One use is within accounts payable departments. Companies used to manually sort the PO-based invoices from the non-PO-based invoices, scan them in, and a program would send them to the appropriate clerk who handled that particular type of invoice. Manual sorting of the invoices was required, but now all invoices can be input in one batch without sorting them. The system automatically sorts and routes each invoice to the appropriate data entry operator in the accounts payable department.
What are some other real-world examples of document automation?
Processing claims in an insurance department predominantly when they are received is another area where classification is being rapidly adopted. For example, when processing an insurance claim, document classification technology can classify the type of document that is in that form. Is it an actual claim, an associated document, or a file with appendices or pictures attached to it? You can classify each type of form and create a folder based on all the information in that specific claim. Many large insurance companies are paying out a significant amount of compensation for the damage caused by the numerous hurricanes that hit the United States in the past year. Document automation streamlines this process, which provides quicker processing, reduced labor costs, and improved customer satisfaction.
How does classification relate to the routing of each document?
Classification is the mechanism that feeds the routing of each document. The classification engine identifies and categorizes the document, and notifies the workflow module what type of data is on that particular document (e.g. indexing.) The workflow module then routes the file based on that information. The steps inside a workflow environment are automated in this manner, and the end result should satisfy the needs of end users.
Some large insurance companies are using the document content analysis tools inside our core products to classify documents based on logotypes, text, and terminology (bag of words). If there are particular words or phrases in a document, the program has the ability to automatically label or categorize that form. There's a combination of different things you can do to create your own classification engine. Customers can even work with a reseller to develop and customize rules and policies, build them into the product, and perfect them.
One important point I want to stress is that document classification is paying off for customers, as numerous examples demonstrate. Unstructured documents still require more human intervention than the structured forms. I don't like to use the phrase manual intervention because it still involves an electronic document. Someone in the organization must look at the file and verify that the classification is correct, compared to the classification of the semi-structured documents.
Certifications for integrations seem to be a hot topic for the reseller community (and end users buying them). How does ReadSoft approach these solutions?
The more of the document that you can automate and integrate into an ERP system, the better your ROI is for both the capture technology and the ERP system, as well as any services needed to combine them. When a customer goes to a reseller's Web site and sees that it is certified in Oracle, SAP, or Microsoft Great Plains, that distinction offers assurances that the risks and installation time associated with implementing the technology will be minimal. We can provide certifications for all three applications to our reseller partners. End users should feel more comfortable with an SAP- or Oracle-certified VAR implementing this integration (and capturing associated documents) in their facilities.
A company can start out having structured forms and PO technology implemented, then add semi-structured documents and begin classifying unstructured documents. Electronically feeding additional types of documents into an ERP system not only creates an efficient use of technology and licenses, it also generates a whole document automation marketplace. Forms processing and data extraction have become core components in content management, and the value of the repository or ERP system increases as new document types are incorporated. Document lifecycles can be completely automated by converting forms electronically and feeding them into an ERP with a certified solution. This also allows your company to export data and images to systems outside the organization. By routing a document with our SAP or Oracle workflow routines, you can automate its flow from a supplier (such as ReadSoft) to a customer (Allstate, for example), and to other destinations outside of the ERP infrastructure.
How does this compare with what other software vendors are offering to these markets?
We believe we have an advantage over our competitors with Microsoft Great Plains, Oracle, and SAP applications, with at least a year head start with these applications. Within the technology itself, I think it's an open game. At ReadSoft, we build our own technologies in-house, which we have been able to do with the success of our structured and semi-structured products, and reinvestment of revenue from those products. We generate more than 50% of our license revenues with new technologies through semi-structured applications with our DOCUMENTS for Invoices product. Many software companies have to make acquisitions to fill technology gaps, which means the products may not be as easy for VARs to integrate and their customers to employ.
None of our competitors are generating the same percentage of their revenue from IDR (intelligent document recognition) as we are. The template forms-processing business is not disappearing, but there is no growth, and it appears to be turning into more of a commodity. The VAR with the lowest-price solution gets the sale with the older technologies, since most of the products are perceived as equivalent. Resellers can differentiate themselves from others in a commodity-driven market by becoming trained, certified, and able to offer integration to ERP systems. With structured applications, ERP technologies increase the value of the application. This also applies to semi-structured and unstructured document automation, where VARs can show their customers significant cost savings by implementing ERP integrations, regardless of a higher purchase price.
Companies looking to implement ERP integration technology will have more confidence in certified solutions and resellers that are qualified to sell them. The investment that ReadSoft and our partners make in training and development lowers the risk associated with related acquisitions and adding new document forms for end users. Oracle, SAP, and Microsoft offer ERP systems that are projected to develop and flourish in the long-term, and that is the precise reason we developed integrations for each.
Are there certain markets that your resellers are developing these products for?
Vertical specialized partners bring great value to the market. While many vendors try to reach every potential imaging partner available, others select certain verticals and develop products and support to tackle the specific needs of those markets. Our goal is to be a leading provider, with our reseller partners, of full end-to-end solutions for the Oracle, SAP, and Great Plains ERP market. When the business targets are more horizontal in nature, we develop solutions with companies like Hyland Software with its On Base product, or Perceptive Software and the Image Now line, or EMC with its Application Extender. We assist VARs of those companies in the implementation of each product in combination with ours.
That is how ReadSoft fits in a horizontal market, but in the vertical space there is potential for partners that implement solutions for companies with Great Plains systems in use (or looking to add them). VARs may concentrate on organizations that process fewer than 80,000 invoices each year using Great Plains. That's actually a big market, and we have been very active with resellers developing solutions within that context. There are larger invoice processing opportunities with Oracle and SAP systems, but there are not as many end users that need those integration applications. There may be 33,000 Microsoft Great Plains installations around the world, with 75% to 80% of those in the United States. In comparison, Oracle has more than 20,000 customers using its Financials program, with approximately 18,000 in North America, while SAP has thousands of U.S. clients, many of which are Fortune 500. ReadSoft has signed up many strategic partnerships for vertically specialized partners to sell and implement our products in these particular markets.
On a more personal note, how did your organization and employees fare in the aftermath of Hurricane Katrina?
We feel blessed that we were in the suburbs of New Orleans (Metairie, LA), which experienced heavy wind damage but not the substantial flooding that devastated the other parts of the city. When we moved the company here from San Diego (in December 2002), one of our first goals was to develop a disaster plan. Over the last three years we assembled and refined a hurricane evacuation plan, with the assistance of our parent company in Sweden. Since we are a global company, we have the ability to remote access the servers from two different cities in Sweden, as well as servers in London, Paris, and Chicago. Instead of connecting to our servers in New Orleans, the plan involves shutting them down prior to a major hurricane and routing through our VPN (virtual private network) tunnel to one of the other locations.
In the three years we have been here, we only had to initiate this plan one other time. The weekend prior to Katrina's landfall, with it approaching, we followed through on the plan and routed our servers to Sweden before we left the building. We all left with the assumption that we would resume operations here in a couple of days, but obviously that didn't happen.
As part of our plan, ReadSoft employees were set up to read e-mails at a minimum, so our electronic communication lines wouldn't be down at any time. Our toll-free telephone line was rerouted to Chicago for continuity of all our support, telemarketing, and accounting services. Our Chicago office has 4,000 square feet and three employees, but with the flexibility to house another 10 employees. Due to the loss of some large tenants, the building had considerable vacancies, so we were able to rent some prime real estate (Michigan Avenue) at a reasonable cost. We moved the accounting and support departments there and also offered a Chicago corporate apartment to employees with logistics problems. Several people that were having trouble finding a good place to live relocated there to have e-mail and cell phone access. Other employees work from their homes or have virtual offices outside of Louisiana. The people that remained in the New Orleans area were working out of a variety of locations, including with their families and friends.
If your company has a plan and employees are aware of it (and everyone is on board with it), everything goes pretty smoothly. We are lucky that we are not a brick and mortar-type business, so we can do business from unconventional offices, at least temporarily.
How severe was the damage to your employees' homes and your office?
Katrina's 130 to 140 MPH winds went right through here, causing extensive glass and structural damage in the area, but our building made out pretty well. Two of ReadSoft's employees had their homes destroyed during Katrina and the flooding afterward, while two others experienced major water damage. Overall, our families and company were luckier than many others in New Orleans. We hope that by sharing our experience with the development and implementation of a disaster plan, other companies will see the need to prepare for similar unforeseen events.
2005 was a great year for ReadSoft despite Katrina, as our sales were up more than 40% from the previous year. We are back in our office, with all of our employees returning after the storm. Our company has been fortunate to recruit four new employees who may not have left other jobs before Katrina, but had to find other employment when their offices were destroyed.
Click here for additional information on ReadSoft.
Brian Sherman is chief editor of ECM Connection and Data Storage Connection