Exit Oracle Data Integrator (or other ELT), Migrate and sustainably break the complexity of flows.

 

Exit Oracle Data Integrator

(or other ELT),

 

Migrate and sustainably break the complexity of flows.  

ODI (Oracle Data Integrator), and other "legacy" ELT solutions automatically generate SQL queries that run directly in the database. We won't teach you anything  😉.


The ELT engine generates new nested subqueries and joins as the system evolves. Transformations end up being buried in gigantic SQL blocks. And teams sometimes don't understand much about it anymore.

 

In addition, the old ELTs are strongly challenged by IS modernization projects whose cornerstone is often a "move to Cloud".
Cloud-native ELTs that use the same logic as their "ancestors" are booming: Matillion, DBT, Dataflow (Google) or Glue (Amazon).... They are associated with super-powerful Cloud databases like Redshift, BigQuery or Snowflake. The combo is impressively efficient! 

 

In a migration context, it will be a question of moving the transformations from the source to the target, by setting up upstream mechanisms creating simplicity and observability.
Then, two logics will confront each other on the mode of execution of the flows in the target:

  • Continue to support transformations on the database (ELT mode), to leverage the power of the target Data Warehouse.
  • Or outsource job execution to containers,  for maximum control.

 

We'll explain it to you.

 

Essentialize,  migrate ,  decomplicate

data pipelines

👉  Only essential transformations

 

  • Jobs and sub-jobs from ODI (or other legacy ELT) will be analyzed to identify sources, targets, filters and different transformation logics.
  • The intersection of real uses of information and technical data lineage will make it possible to eliminate unnecessary transformations upstream of the migration (sometimes more than 50% of the total ).
  • This way, only really useful transformations will be carried to the target . 

👉  Conversion to SQL

 

  • Once the key transformations are isolated, they will be converted into SQL queries, step by step.
  • Each request will correspond to a separate phase of source processing.

👉  Breaking the complexity

 

  • This step-by-step SQL approach will allow structuring processing into multiple levels to avoid generating complex monolithic queries (as in the source).
  • Nested SQL will be re-flattened, which is crucial for maintainability of flows.
  • Materialized tables or volatile tables can also be used to store intermediate results . This will allow for a better understanding of the data path.
  • We will create explicit checkpoints at each SQL transformation step (which does not exist in "legacy" ELTs, such as ODI for example).
 

Option #1:

Execute transformations in the target DB,  in ELT mode

👉  Adaptation to the target

 

  • SQL transformations will be directly integrated into Cloud-native ELT jobs.
  • The generated SQL can be adjusted to fit the target syntax.
  • Option: adding jobs in project configuration, defining dependencies between jobs, etc.
  • Job testing to verify their operation and possibly data testing to ensure their integrity.

👉  Benefits

 

  • Reduced latency through transformations within the database.
  • Native exploitation of the computing capacities of the target Data Warehouse.
 

Option #2:

Run transformations in containers, in ETL mode

👉  Containers

 

  • The transformations will be executed in an external container, detached from the target database and the target ELT. It will serve as a preconfigured application gateway, simple to mount, easily duplicable, to parallelize certain pipelines if necessary.
  • Temporary containers will serve as intermediate storage areas where data will be loaded before transformation.
  • The container volume containing the buffer database will be externalized  so that developers can access it. 

  • Jobs will be tested to verify their operation (and possibly the data, to ensure their integrity).

👉  Benefits

 

  • Reduced load on the target database : jobs will be executed in parallel with a "multi-threaded" architecture.
  • Continuity of service ensured even in the event of intensive processing (heavy processing or batch operations).
  • Increased flexibility for workflow execution and orchestration.
 

Commentaires

Posts les plus consultés de ce blog

Migrer de SAP BO vers Power BI, Automatiquement, Au forfait !

La Data Observabilité, Buzzword ou nécessité ?

BCBS 239 : L'enjeu de la fréquence et de l'exactitude du reporting de risque