SG-1481 Data Lake Upload Setup - Duplicate data is uploaded to Datalake when a new schedule is created - HxGN EAM - Version 12.0 - Hexagon

HxGN EAM Resolved Issues for 2022

Language
English
Product
HxGN EAM
Search by Category
HxGN EAM Version
12

SG-1481 Data Lake Upload Setup - Duplicate data is uploaded to Datalake when a new schedule is created

 Description 

Customer has the requirement to transfer data lake schedules from one schedule (old) to another schedule (new) to group all common tables into one schedule. After they recreated the new schedule, they notice that duplicate data is being transferred to Data Lake.

Steps:

a. Go to Data Lake Upload Setup, made old data lake schedule inactive, example schedule 20

b. Extract the data of the data lake tables on the old schedule 20 through reports so it will capture the Last Update value of the existing tables in the old schedule.

c. Created new data lake schedule example 21 in EAM web > Data Lake Upload Setup

d. Delete all table records from the old data lake schedule using script (delete from datalake_table where scheduleid = 20)

e. Add table records to new schedule 21 using import utility by populating Last Update value from the old schedule using the extract done on step-b

f. Made status of tables on new schedule to Active through script (update datalake_table set status = 'ACTIVE' where scheduleid = 21)

g. Imported connection points and dataflows and made dataflow active (new schedule)

h. Activated the new schedule (scheduleid - 21)

Actual Results : Record which was already sent to data lake was again being sent to data lake after creating and activating new schedule even we gave last updated on data lake table same as old last update, resulting in duplicate data.

Expected Results : Record which is already transferred should not be sent again.

(for example, if a record was last saved on 26th may,2022 12:30:40 and sent to data lake, if new schedule is activated on 31st may, 2022, we are again getting to data lake the last saved record on 26th May 12:30:40 which was already sent to data lake even if I updated the Last Update on new schedule to 26th May, 2022 12:30:40)

Note : We are noticing this for almost all tables for those added to new schedule.