Google has launched into the deep end of Data Management with Dataplex! Dataplex is essentially a central data catalog for BigQuery, Google's managed DWH. The range of technologies analyzed continues to grow, with support for other databases available in GCP, in particular Cloud SQL, BigTable and Spanner, but also for GCP's flagship data visualization solution, Looker. Dataplex is also starting to automate the collection of metadata from third-party sources: MySQL, Snowflake, Databricks, etc. With this, Dataplex customers have a complete view of data within a single unified catalog with its descriptions and contexts. Google is now looking to complete Dataplex to make it comprehensive and competitive in the world of Data Management, trusted by Informatica, Collibra and others. This is how Google integrated a “Data Lineage” functionality into Dataplex. Data Lineage makes it possible to follow the deployment of data in an Information System: its origin, its successive transformations and its final impacts. What is “Data Lineage” used for? - To be certain that data comes from an authoritative source.
- To carry out impact analysis in the event of modification or deletion of a table.
- To ensure sensitive data is used correctly across the business and ensure compliance with regulatory requirements.
- To track errors in a data flow to their root causes.
- To prepare for a migration by mapping a system in detail.
Obviously “Data Lineage” is a crucial element in the range of a Data Management solution. However, Dataplex's Data Lineage functionality does not currently have the characteristics that we believe are essential to ensure that all of the promises relating to Data Lineage can be kept. We believe that Dataplex can be wisely combined with {openAudit} for a complete Data Management solution. |
Commentaires
Enregistrer un commentaire