Extract data from a document - SmartPlant Foundation - IM Update 48 - Help

Extract data from a document - SmartPlant Foundation - IM Update 48 - Help - Hexagon

SmartPlant Foundation Help

Language

English

Product

SmartPlant Foundation

Search by Category

Help

SmartPlant Foundation / SDx Version

SmartPlant Markup Plus Version

10.0 (2019)

Smart Review Version

2020 (15.0)

This functionality was modified in an update. For more information, see Extract data from a document (modified in an update).

You can extract data from a document thereby updating all relationships between tags and the document.

How can I configure DCOM to allow content extraction from Microsoft Office files?

You must enable DCOM permissions before HxGN SDx can access Microsoft Office applications.

This is mandatory if you want to extract content from:

a Microsoft Excel file using the Data Capture Datasheet Reader Pre-Processor.
any Microsoft Office file (97-2003) using the Data Capture Office Reader Pre-Processor.

To set the DCOM configuration for the respective file type application, complete the following steps:

Click Start > Administrative Tools > Component Services.
In the tree view, expand Component Services > Computers > My Computer > DCOM Config.
Based on the Microsoft Office file type, locate and right-click the respective DCOM Config component service:
- Microsoft Excel Application
- Microsoft Word 97-2003
- Microsoft PowerPoint Slide (97-2003)
On the shortcut menu, click Properties.
In the General tab, set the Authentication Level to None.
In the Identity tab, select The Launching User option.
In the Security tab, set the Launch and Activation Permissions to Customize, and click Edit.
1. Add the Administrators created by Server Manager.
2. Select the Allow check box for the following items:
  - Local Launch
  - Remote Launch
  - Local Activation
  - Remote Activation
  - Read
  - Special permissions

What is the purpose of a default template group?

When you use Data Capture Content Discovery Task in the Desktop Client or Extract Content in the Web Client to extract content from multiple documents, the software automatically considers the templates and rules defined for the default template group. The default template group is considered only when a PDF file or a drawing file is attached to the document. To successfully extract content, ensure that the templates and rules are configured for the template group. However, if you have not chosen any template group as default, the software automatically considers a template group DefaultDrawingTemplateGroup for extracting the content. This default template group is provided with the software.

In the Web Client, to extract content from a single document, the software automatically considers the templates and rules defined for the auto selected default template group. The default template group is considered only when a PDF file or a drawing file is attached to the document. However, you have an option to select and apply any other template group instead of the default template group. For more information, see Extract data from a document.

For the auto selected default template group, the Match Tag Patterns option is pre-selected.

To extract content from a document, use Web Client to perform the following steps:

To extract content from 3D models .zvf and .mdb2 files, we recommend you to install Microsoft Access database engine 2010 (64-bit) on the SmartPlant Foundation application server.

Click Documents > All Documents.
To extract data, select a document from All Documents list, and click Actions > Extract Content.
In the Extract Content window, do the following:
1. Select any file attached to the document from the Select File list.
2. Select a template group from the TemplateGroup/Template list to apply the processing rules using the corresponding Preprocessor Reader.
From Update 14 onwards, for PDF and drawing files, a template group which is set as default in the Data Capture Drawing Reader Pre-Processor and PDF Reader Pre-Processor is automatically selected and applied for extracting content. However, you can select and apply any template group from the TemplateGroup/Template list to process the content. For more information, see Manage drawing reader pre-processor templates and template groups and Manage PDF reader pre-processor templates.
Click OK.

SHARED Tip To use preprocessed content files for processing the file, click Show more and apply more options as follows:

Click this		To do this
Use Existing PreProcessed Content Files		Process the file using the preprocessed content XML file available. To extract content using the preprocessed content files, we recommend you attach the ContentFile.xml along with the corresponding file to the document. You must also attach GraphicsMapFile.xml if the file type supports graphical navigation.
Reader Pre-Processor		Select appropriate Preprocessor Reader for processing the datasheet file.
For Hexagon 3D model	OleDB Provider box and type the connection string.	Connect to the Microsoft Access database.
For Hexagon 3D model	Match Tag Patterns check box.	Extract the tags based on the tag patterns defined in the Tag Discovery Patterns module.

Before preparing to work with large amount of data, based on the size of the data it is recommended to configure the LicenseTimeoutSeconds property under Site Settings node in SmartPlant Foundation Server Manager. This setting will prevent the license token to timeout, thereby allowing the session to retain.
To view the status of content extraction from a selected document:
- Select Actions menu, and click Show the detail form > Extract Content.
  
  For more information about the status of a document processed using the Data Capture, see Data Capture Document Status.
FDW tags are created without applying the ENS definition.
The master tag and the FDW tag are identified with the same icon . The alias tag is identified with the icon.
In the Desktop Client, the FDW tag is identified with the icon which is same for the master tag extracted using the Data Capture Content Discovery Task module.
You can select Match Tag Patterns to extract the tags based on the tag patterns defined in the Tag Discovery Patterns module. For few file types, Match Tag Patterns is pre-selected.
Except for the datasheet file, the Reader Pre-Processor is automatically selected based on the attached file type. For the datasheet file, you can select one of the following options as the base reader:
- Datasheet Reader
- PDF Reader
You can view the base reader set for different file types in the Data Capture Central Settings module in the Desktop Client. For more information, see Manage file types and prioritize them for content extraction.
For PDF files and Microsoft Office files, by default PDF reader is selected as the base reader in the Data Capture Central Settings module in the Desktop Client. For any file types other than the PDF files if the base reader is set as the PDF reader, when extracting content from such file types the PDF reader generates Markup renditions which are used by the software to retrieve the tags details. For more information, see Manage file types and prioritize them for content extraction.
After extracting data from the document, you can navigate to the document and tags in Web Client. For more information, see View and manage Data Capture data using the Web Client.