Google has taken the plunge into Data Management with Dataplex! Dataplex is essentially a central data catalog for BigQuery, Google's managed data warehouse. The breadth of technologies analyzed continues to grow, with support for other databases available in GCP, particularly Cloud SQL, BigTable and Spanner, as well as GCP's flagship data visualization solution, Looker. Dataplex is also beginning to automate metadata collection from third-party sources: MySQL, Snowflake, Databricks, etc. With this, Dataplex customers have a complete view of data within a single unified catalog with its descriptions and contexts. Google is now looking to expand Dataplex to make it comprehensive and competitive in the data management world dominated by Informatica, Collibra, and others. Google has integrated a "Data Lineage" feature into Dataplex. Data Lineage makes it possible to track the deployment of data in an Information System: its origin, its successive transformations and its final impacts. What is the purpose of "Data Lineage"? - To be sure that data comes from an authoritative source.
- To perform impact analysis in case of modification or deletion of a table.
- To ensure that sensitive data is used correctly within the company and to ensure compliance with regulatory requirements.
- To track errors in a data stream to their root causes.
- To prepare for a migration by mapping a system in detail.
Clearly, “Data Lineage” is a key element in the range of a Data Management solution. However, Dataplex's Data Lineage functionality currently lacks the features we believe are essential to ensure that all of the promises of Data Lineage can be met. We believe that Dataplex can be judiciously combined with {openAudit} for a complete Data Management solution. |
Commentaires
Enregistrer un commentaire