Create a new content discovery task - HxGN SDx - Update 64 - Administration & Configuration

Administration and Configuration of HxGN SDx

Language
English
Product
HxGN SDx
Search by Category
Administration & Configuration
SmartPlant Foundation / SDx Version
10

This functionality was modified in an update. For more information, see Create a new content discovery task (modified in an update).

When and how is the SPFHotSpotter.ini file updated?

  • The SPFHotSpotter.ini file is updated whenever any new tag discovery patterns are created or existing tag patterns are updated. The content discovery task uses the updated SPFHotspotter.ini file to extract tags.

  • After the tag discovery pattern are updated, in order to update the SPFHotSpotter.ini file, Data Capture initially checks for the SPFHotSpotter.ini file attached to the SmartConverter Control object. The SPFHotSpotter.ini file and the SmartConverter Control object must belong to same configuration type on a HxGN SDx application server.

  • If no SmartConverter Control object is related to the same configuration item, the SmartConverter Control object related to the parent configuration item is used. If no SmartConverter Control object is related to the parent configuration item, the default SmartConverter Control object related to the ConfigurationTop is used. If no SmartConverter Control object is related to the ConfigurationTop, then the SPFHotspotter.ini file available at [drive]:\Program Files (x86)\SDX\SPFSmartConverter is updated.

What is the purpose of a default template group?

When you use Data Capture Content Discovery Task in the Desktop Client or Extract Content in the Web Client to extract content from multiple documents, the software automatically considers the templates and rules defined for the default template group. The default template group is considered only when a PDF file or a drawing file is attached to the document. To successfully extract content, ensure that the templates and rules are configured for the template group. However, if you have not chosen any template group as default, the software automatically considers a template group DefaultDrawingTemplateGroup for extracting the content. This default template group is provided with the software.

In the Web Client, to extract content from a single document, the software automatically considers the templates and rules defined for the auto selected default template group. The default template group is considered only when a PDF file or a drawing file is attached to the document. However, you have an option to select and apply any other template group instead of the default template group. For more information, see Extract data from a document.

For the auto selected default template group, the Match Tag Patterns option is pre-selected.

Before preparing to work with large amount of data, based on the size of the data it is recommended to configure the LicenseTimeoutSeconds property under Site Settings node in HxGN SDx Server Manager. This setting will prevent the license token to timeout, thereby allowing the session to retain. If you are using SDx deployed in Smart Cloud, this must be done for you by the Smart Cloud Team.

  1. On the Content Discovery Task page, click Create Content Discovery Tasks .

  2. Select the Document Criteria filter and Document Reader filter to process the documents that match the selected criteria.

  3. Select one or more file types for the selected document reader filter.

  4. Type search text in the Document Name Pattern box to process the documents that match the selected criteria.

    SHARED Tip If you want to schedule the content discovery task to process at a later date, click Tasks Start Date .

  5. Click OK to view the list of document that will be processed. You can filter the documents for processing in this window.

  • By default, the property Is Data Capture Rel is set to True on document to tag relationships SPFNDocRevMasterTag, SPFNDocRevAliasTag, FDWDocRevTag and SPFNFDWDocRevChildTag for Data Capture tags.

  • By default, the tags extracted by the content discovery task are associated with an Unknown tag classification and an Unclassified security code.

  • FDW tags are created without applying the ENS definition.

  • To extract tags from the drawing and pdf files, the software applies the templates and rules from the template group which is set as default. For more information, see Manage drawing reader pre-processor templates and template groups and Manage PDF reader pre-processor templates.

  • After the content discovery task processes the documents those have a reader as a base reader, the reader gets changes to the Image or Document reader. If you have to process these documents by the content discovery task, you must specify the reader as Image or Document without specifying the actual file type.

  • When a content discovery task fails, large file sets are re-processed in smaller and smaller batches to find the problem. For example, documents are re-processed in batches of 100 then 10, drawings in batches of 20 then 2. For each batch, a child content discovery task is created under the master content discovery task.

  • Data Capture creates the relationship object SPFNCDTFailedCDT between a master content discovery task and a child content discovery task.

  • To check the status of a content discovery task for failed documents in the Desktop Client:

    1. Click Find > Data Capture Items > Content Discovery Tasks.

    2. Right-click a content discovery task, and click Show CDT for Failed Docs in the shortcut menu.

      SHARED Tip To check the status of a master content discovery task, right-click a content discovery task, and click Show Root CDT in the shortcut menu.

  • You can select a content discovery task and click Rerun Content Discovery Task to rerun a content discovery task and process all the documents attached to it.

  • You can select a content discovery task and click Rerun Content Discovery Task for selected documents to process the selected document.

  • If the Content File and the GraphicsMap file are available in \\PreProcessedAlternateRenditions\PrepProcessedContentFiles folder and the \\PreProcessedContentFiles folder, then the content discovery task looks for the Content Files in \\PreProcessedAlternateRenditions\PrepProcessedContentFiles folder.

  • While extracting content from documents of drawing files and 3D models, a drawing representation object is created for each tag based on the Graphic OID property value in GraphicsMapFile.xml. The GraphicsMapFile.xml has the information for graphical navigation such as the corresponding InterfaceDefs, as well as all tag UIDs and Graphic OIDs for the document. The drawing representation object is related to the respective document and tag.

  • The drawing representation objects are specifically used for graphical navigation in the Web Client.

  • You cannot process the transferred documents and FDW documents using content discovery task.

  • When multiple units with the same name are related to multiple areas, in such scenario after content extraction the tag is related to the unit based on the tag's relationship to the area.

    How is tag related to unit based on it's relationship with the area?

    Tag is related to the appropriate unit which is related to the area corresponding to the tag. However, when there are multiple units with same name and each unit is related to different area, then the unit that must be related to the tag is considered based on the Interface ISPFNResolveDuplicateObjects. The following tables illustrates different examples:

    For example,

    When Tag part Unit "SC" is assigned to multiple areas such as 10, 20 ,30, 40 where

    • Area=10 in tag discovery pattern or tag attribute. Then tag will be related to the unit "SC" which is related to the Area=10.

    • SPFNTagObjArea relationship is defined as 30 in the content file. Then tag will be related to the unit "SC" which is related to the Area=30.

    • Document pattern is defined with Area=20 and tag discovery pattern is defined with tag attribute (Area=40) and document attribute (Area=40). Then tag will be related to the unit "SC" which is related to the Area=20.

    • Document pattern is defined with Area=20 and Tag discovery pattern is defined with document attribute (Area=40). Then tag will be related to the unit "SC" which is related to the Area=40.

    • Extracted Tag pattern is "29-SC-60" then tag should not be related to Unit SC as there is no Unit SC that is related to Area 60.

    How can I create a relationship from tag to asset or model if there are duplicate assets or models?

    If you have multiple assets or models with the same name related to different areas or any other business objects that are related to the tag, in such case in order to relate the tag to the appropriate asset or model, before loading and processing documents you are required to load a xml file with the following information:

    <SPFNResolveDuplicateObjects>

    <IObject UID="SPFNSDAAreaAssetRelDef" Name="SPFNSDAAreaAssetRelDef" Description="FDWAssetAreaRelDef"/>

    <ISPFNResolveDuplicateObjects SPFNClassDefUIDOfDuplicateObject="FDWAsset" SPFNRelDefUIDOfBusinessObject="FDWTagArea_12" SPFNRelDefUIDOfDuplicateObject="FDWAssetArea_21"/>

    <ISPFNResolveDuplicateObjectsPI/>

    </SPFNResolveDuplicateObjects>

    In the above XML code, replace the property SPFNClassDefUIDOfDuplicateObject with ClassDef UID of model (FDWModel) and property SPFNRelDefUIDOfDuplicateObject with RelDef UIDs of area and model (FDWModelArea_21), to relate tag to an appropriate model when there are duplicate models

Process .sha files

  1. In the Central Data Capture Settings module, map the .sha file type to Image Reader in the File Type page.

  2. In the Data Capture Pre-Processor Utilities module, process the .sha file using the Drawing Reader Pre-Processor, and generate the content file.

  3. In the Data Capture Task Manager module, process the content file with Content Discovery Task.

View document relations

On the Progress tab, click View document relations View Document Relations to view the relationship details of the document in the Relationship Details tab as shown in the following example:

Document work bench

1

The central node represents the master object on which the view related items service is based.

2

The Columset Properties pane displays the properties based on the column set configured for the master object.

3

The terminal nodes represent the related objects.

4

Represents the number of the related objects. You can click the hyperlink to view the properties of the object or click View Related Items VRI to view the object related items.

5

The Additional Properties pane displays the properties configured in the Property Lists module.

  • The terminal nodes displayed are based on the EdgeDefs configured on the view related items client API method. The EdgeDefs can be configured as a parameter (Arg1) for the view related items client API method in the Desktop Client.

  • The View Related Items client API method must be related to the interfaces realized by the selected document. If this method is not related to any of the selected document interfaces, then the terminal nodes in the diagram represent objects expanded from the relationship and user-defined edge definitions related to the master object.

  • If a property created in either the Tag Naming System or in the Property Lists module has a relationship configured against it, that relationship is created during the content discovery task.

  • You can click the View error log hyperlink of the content discovery task in the Summary tab to view the Error Log in the Content Discovery Task.

  • You can click the View Information Log hyperlink of the content discovery task in the Summary tab to view the log information of files in the Content Discovery Task module.