Moving from Teradata to BigQuery? Automate the process

 





Moving from Teradata to BigQuery? 

 

 

Automate the process!

The release of Teradata - a technical challenge 

 

Teradata is an “Appliance” chosen by countless players. Teradata's specialization in Datawarehousing / Analytics has enabled the implementation of solutions with exceptional computing capacities and strong scalability. 

But most of Teradata's historical users are now in the process of switching to the Cloud.


One of our clients decided to switch its Teradata assets to Google Cloud Platform (BigQuery), as well as to migrate a certain number of data visualization technologies (SAP BO, Power BI). 

 

We share with you the methodology implemented as part of this migration. 

 

Some key indicators concerning the platform to be migrated 



Asset data:

  • 400,000 data “containers”;
  • 270,000 tables;
  • 11,000 files.


Active scripts:

  • 122,000 views;
  • 1,200 macros;
  • 500 BTEQ injection processes;
  • 600 BO universes;
  • 100,000 web reports;
  • 30,000 data manipulation processes distributed across 450 Stambia ETL projects.


Usage statistics*:

  • 30% of data used; 
  • 30% of tables/views/files used;  
  • 50% of transformations used.
  •  

*which lead to a decision-making report or application use  

Processing:  

  • 1,500,000 requests per day,
  • 880,000 insert/update/merge 

.... 



3 major issues have been identified to succeed in this migration 

 

  • It was necessary to be able to  define in continuous time what existed  on the source platform, with all its dependencies; 
  • We had to be able to make  a permanent inventory of the progress  of migration: what is being migrated, what must be; 
  • We had to  share the migration process with everyone,  to avoid misunderstandings. 

 

Automating these tasks was essential. 

 

Step 1

Master the source platform by mapping it





{openAudit}  made it possible  to control internal processes via physical data lineage, in the field, in Bteq, but also in ETL/ELT, Views, Macros, other scripts associated with feeding flows.

 



{openAudit}  contributed to identifying  the uses of information,  via an analysis of audit database logs, for data consumption and injection.

 



{openAudit}  analyzed  the scheduling of tasks  and linked it to data lineage, as well as data uses.

 



{openAudit}  highlighted the  impacts in the data visualization tools associated with Teradata  (e.g. Power BI, SAP BO...), to glimpse the related complexity (management rules) and  to  be able  to do data lineage truly from start to finish.

 



2nd step

Automate migration 

Through a series of mechanisms,  {openAudit}  reproduced most of the processing in BigQuery  : parsing, then producing enriched standard SQL.

Note that certain encapsulations (Shell, others) are likely to degrade the output.

Also note that the existence of an ETL/ELT in the source system requires a transposition of processing. For some of them,  {openAudit}  helps speed up the project.

Step #3

Master deployment in GCP using mapping




{openAudit}  performed  dynamic analysis of BigQuery, scheduled queries, view scripts and Json and CSV loading files, to enable the intelligent construction of flows.



{openAudit}  analyzed the logs in Google Cloud's Operations (Stackdriver) , to immediately understand the uses of the information.



{openAudit}  defined  the scheduling of tasks,  to link it to data lineage and data uses.



{openAudit}  introspected certain “target” data visualization technologies  that rely on GCP (Looker, Data Studio, BO Cloud, Power BI, etc.), to be able to reconstruct the intelligence by comparing the responses.  

Furthermore, the connectors could be migrated to BigQuery (case of connectors with a deterioration in performance via datometry's hyper-Q middleware). 

 

 


 

Conclusion

We do not think that a migration of such ambition can be organized through "kick offs" and "deadlines", but in an intelligent process which is based on a real mastery of the source platform / and the target platform, via continuous technical introspection of processes and uses, as well as a graphic representation of "the" Information Systems, which everyone can understand and exploit.

 

Migration automation will have undeniable added value in this context.

 


Commentaires

Posts les plus consultés de ce blog

La Data Observabilité, Buzzword ou nécessité ?

BCBS 239 : L'enjeu de la fréquence et de l'exactitude du reporting de risque

Le data lineage, l’arme idéale pour la Data Loss Prevention ?