Moving from Teradata to BigQuery? Automate the process

novembre 22, 2023

Moving from Teradata to BigQuery?

Automate the process!

The release of Teradata - a technical challenge

Teradata is an “Appliance” chosen by countless players. Teradata's specialization in Datawarehousing / Analytics has enabled the implementation of solutions with exceptional computing capacities and strong scalability.

But most of Teradata's historical users are now in the process of switching to the Cloud.

One of our clients decided to switch its Teradata assets to Google Cloud Platform (BigQuery), as well as to migrate a certain number of data visualization technologies (SAP BO, Power BI).

We share with you the methodology implemented as part of this migration.

Some key indicators concerning the platform to be migrated

Asset data:

400,000 data “containers”;
270,000 tables;
11,000 files.

Active scripts:

122,000 views;
1,200 macros;
500 BTEQ injection processes;
600 BO universes;
100,000 web reports;
30,000 data manipulation processes distributed across 450 Stambia ETL projects.

Usage statistics*:

30% of data used;
30% of tables/views/files used;
50% of transformations used.

*which lead to a decision-making report or application use

Processing:

1,500,000 requests per day,
880,000 insert/update/merge

....

3 major issues have been identified to succeed in this migration

It was necessary to be able to define in continuous time what existed on the source platform, with all its dependencies;
We had to be able to make a permanent inventory of the progress of migration: what is being migrated, what must be;
We had to share the migration process with everyone, to avoid misunderstandings.

Automating these tasks was essential.

Step 1

Master the source platform by mapping it

{openAudit} made it possible to control internal processes via physical data lineage, in the field, in Bteq, but also in ETL/ELT, Views, Macros, other scripts associated with feeding flows.

{openAudit} contributed to identifying the uses of information, via an analysis of audit database logs, for data consumption and injection.

{openAudit} analyzed the scheduling of tasks and linked it to data lineage, as well as data uses.

{openAudit} highlighted the impacts in the data visualization tools associated with Teradata (e.g. Power BI, SAP BO...), to glimpse the related complexity (management rules) and to be able to do data lineage truly from start to finish.

2nd step

Automate migration

Through a series of mechanisms, {openAudit} reproduced most of the processing in BigQuery : parsing, then producing enriched standard SQL.

Note that certain encapsulations (Shell, others) are likely to degrade the output.

Also note that the existence of an ETL/ELT in the source system requires a transposition of processing. For some of them, {openAudit} helps speed up the project.

Step #3

Master deployment in GCP using mapping

{openAudit} performed dynamic analysis of BigQuery, scheduled queries, view scripts and Json and CSV loading files, to enable the intelligent construction of flows.

{openAudit} analyzed the logs in Google Cloud's Operations (Stackdriver) , to immediately understand the uses of the information.

{openAudit} defined the scheduling of tasks, to link it to data lineage and data uses.

{openAudit} introspected certain “target” data visualization technologies that rely on GCP (Looker, Data Studio, BO Cloud, Power BI, etc.), to be able to reconstruct the intelligence by comparing the responses.

Furthermore, the connectors could be migrated to BigQuery (case of connectors with a deterioration in performance via datometry's hyper-Q middleware).

Conclusion

We do not think that a migration of such ambition can be organized through "kick offs" and "deadlines", but in an intelligent process which is based on a real mastery of the source platform / and the target platform, via continuous technical introspection of processes and uses, as well as a graphic representation of "the" Information Systems, which everyone can understand and exploit.

Migration automation will have undeniable added value in this context.

Rechercher dans ce blog

Le data lineage et l’usage des données pour transformer un système : simplifications / migrations

Moving from Teradata to BigQuery? Automate the process

Some key indicators concerning the platform to be migrated

3 major issues have been identified to succeed in this migration

Commentaires

Enregistrer un commentaire

Posts les plus consultés de ce blog

Migration automatisée de SAP BO vers Power BI, au forfait.

La Data Observabilité, Buzzword ou nécessité ?

La 1ère action de modernisation d’un Système d'Information : Ecarter les pipelines inutiles ?