Pre-process the documents - HxGN SDx - Update 63 - Administration & Configuration

Administration and Configuration of HxGN SDx

Language
English
Product
HxGN SDx
Search by Category
Administration & Configuration
SmartPlant Foundation / SDx Version
10

You can use the Data Capture Pre-Processor Utilities module to prepare files for import into Data Capture. You can also extract data from the files prior to running the Content Discovery Task (CDT) instead of running the data extraction step in the Data Capture Readers. The module performs all data processing on the client machine through installed applications, unlike the Data Capture Readers.

There are several advantages for using the module for data extraction and processing:

The Data Capture Pre-Processor Utilities module allows you to prepare data for import into Data Capture. You can use this module instead of the data extraction step in the Data Capture readers to extract the data from the content files. You can extract the data from the files prior to running the Content Discovery Task (CDT). Unlike the Data Capture readers, the Pre-Processor Utilities module does all the data processing on the client machine. The applications that are used to extract data from the files using the pre-processor utilities module need to be installed on the client machine. There are many advantages in this approach, which include:

  • Large volumes of file processing can be distributed over a number of client machines

  • More than one template can be selected for processing, depending on file information. Data Capture Readers allow only one template at a time. You can also:

    • Select different templates for different directories that have files with the same file extension.

    • Select the templates depending on border size so that title block information can be extracted.

  • Instead of SmartPlant Markup Plus on the server, Microsoft Office applications on the client machine can be used to extract data from files.

  • Content extraction completed as a preliminary exercise before the final run, need not be repeated unless, corrections need to be made.

  • If you are extracting data from searchable PDF files, advanced rules can be applied and set to specific information.

  • This module can recognize tags scattered across multiple lines (such as instruments data in a PDF), unlike the document reader that can only recognize tags in a single line.