| SQL & SQLOps at the service of digital transformation |
|
|
|
| |
|
| SQL procedural code and its derivatives have been marginalized in favor of ETL/ELT in “on premise” infrastructure. ETL/ELT have proliferated in companies over the last 30 years, with the primary idea of making the deployment of “data pipelines” accessible to less technical profiles. It was also about facilitating impact analyses, having versioning, facilitating flow monitoring, being able to replicate Jobs, etc. With the Cloud, SQL is coming back “in majesty” for many very good reasons. And we are also seeing SQLOps-oriented solutions appear that “augment” SQL by encapsulating it in tools that offer additional guarantees . But how to make a precise transition from DataStage, Talend, BODS, SSIS, etc., to “simple” SQL or to SQLOps-oriented tools? |
|
|
|
| |
|
| Our experience shows us that developers often continued to model flows in SQL before building them with ETL in place. |
|
|
|
| |
|
| | Developers often have a strong relationship with the SQL language : SQL provides a natural way to describe and understand data relationships and patterns. |
|
|
|
| |
|
| | Using SQL queries allows you to quickly test data transformations directly in the database. |
|
|
|
| |
|
| | Using SQL allows applications to manage data without worrying about the underlying storage, simplifying deployment across different cloud platforms. |
|
|
|
| |
|
| In short, SQL is simplicity, efficiency, scalability. “The world of tomorrow” is largely focused on SQL, with proof of a massive return of SQL and its “dialects” to manage data processing among hyperscalers: Redshift SQL at AWS, BigQuery at GCP, or T -SQL in Azure. As is often the case, we tend to reinvent what we had somewhat forgotten 😊. But SQL still has the limits that it had before : - You can end up with an abundance of code that is difficult to maintain.
- And the more imposing the system, the more expertise there will be needed, which is not in line with domain-oriented architectures ("data mesh"), aimed at giving the profession intensive and extensive use of data. given.
- ...
|
|
|
|
| |
|
| | And the solutions oriented SQLOps ? |
|
|
|
| |
|
| SQLOps-oriented solutions encapsulate SQL. They allow data analysts to write SQL code to define transformations, to test them, to document them and to execute them in a reproducible manner. In short, they offer functionalities that combine the simplicity of SQL with a real control tool. The best known of these solutions are dbt, dbForge Studio, or even Dataform from Google. |
|
|
|
| |
|
| The main attributes of these tools: |
|
|
|
| |
|
| | They promote a modular approach to data development, allowing the creation of reusable packages. This simplifies maintenance, updating and managing data transformations. |
|
|
|
| |
|
| | Documentation is automatically generated from SQL query metadata. |
|
|
|
| |
|
| | These tools integrate automated testing functionalities. ..... |
|
|
|
| |
|
| Whether you want to migrate to simple SQL, to a "dialect" of SQL, or to a dbt type solution, you will need to be able to "deconstruct" everything that your ETL/ELT embeds effectively, by transposing it into the target technology. It can be very complicated, very long, very expensive. |
|
|
|
| |
|
| We offer a 2-step method: |
|
|
|
| |
|
| | 1. Simplify the source system: |
The granular data lineage of {openAudit}, our software, coupled with the uses of information, will make it possible to identify and rule out "dead branches" , ie data pipelines which are continually replayed but which produce data which are not never consulted. This is often more than 50% of systems. Also see: |
|
|
|
| |
|
| | The very strong introspection capacity of {openAudit}, in particular with regard to ETL/ELT present in legacy systems (DataStage / BODS / SSIS / Talend, etc.), makes it possible to make an exhaustive and dynamic inventory of the set of actions that act on the data. - {openAudit} will “translate” these statements into simple SQL;
- This SQL will be enriched with an executable to make it “adherence” as precisely as possible to the target technology (BigQuery, Redshift, Azur SQL, etc.);
- For a migration to dbt (for example), {openAudit} will encapsulate this SQL in dbt jobs.
|
|
|
|
| |
|
| All of these migration mechanisms will be automated. Furthermore, views, views of views, dynamic procedures, flows managed with procedural code will be included in the target, depending on the articulation of the source. |
|
|
|
| |
|
| SQL plays a central role in the Cloud environment, both at the database level and in the evolution of ETL methods, towards more flexible approaches adapted to the Cloud. Judicious use of SQL dialects specific to cloud databases and adoption of SQLOps-oriented tools can help optimize data management in these dynamic environments. Some would like to move in this direction, but migration is not simple. Except to automate the processes. Also see: |
|
|
|
|
|
|
| |
|
| |
|
Commentaires
Enregistrer un commentaire