Mastering processes in the dataviz layer

 

Mastering processes in the dataviz layer

According to a  2023 Gartner report , there is a marked trend towards moving data transformation processes from databases to dataviz tools.

 

This is not very surprising, recent dataviz tools are easy to use, they are rich and experiential. The profession is using them more and more to do "data prep".

Data governance, and in particular data traceability, is profoundly complicated by this decentralization of environments.

Furthermore, the intelligence in the reports is not shared, nor shareable, nor centralized (because it is specific to the dashboard), which creates opacity on the management rules used by the business to consolidate the data.

In this context, technical data lineage in the data visualization layer is a key asset. 

 

Plotting data in the dataviz layer,  a challenge  

 

The complexity of transformations

Data in the dataviz layer often goes through many complex transformations: calculations, aggregations, filters, expressions, etc.

Each transformation changes the nature and structure of the data. To do real data lineage, there is no choice, everything must be addressed.

 

A lack of standardization

Dataviz solutions, such as Power BI, Qlik, Looker, Tableau… use different languages ​​and approaches to define and manipulate data. For example, DAX for Power BI, Expression for SAP BO, or LookML for Looker. This lack of standardization complicates the integration of the different transformation steps into a single end-to-end data lineage flow. And users often create specific expressions or calculations in dashboards without documenting their logic... 

 

An answer?

 

Fine introspection  to deconstruct the complexity of dashboards

Our {openAudit} solution continuously analyzes the internal structure of dashboards, inspects metadata and reveals the underlying intelligence, including data sources, applied transformations and business logic.

 

Specificity: an automatic representation of all flows 

In one of the {openaudit} interfaces, the sources are positioned on the left and the dashboard cells on the right.

Between the two, all the transformations are detailed (variables, expressions, etc.) to unearth all the complexity and to bring it to the attention of as many people as possible.  

 

 Option.1  - create data lineage from a multi-technology list of dashboards 

data lineage in the dataviz layer

1. When clicking on the dashboard title, the data lineage is represented with all the flows. The details of the variables, expressions, etc. are available on hover. The dashboard cells are on the right. This data lineage can be extended to the underlying databases. 

2. Possibility to "zoom"  from any datapoint on the dashboard...

3...only the stream in question remains on the screen. 

 

Option.  - data lineage directly from the graphical representation of a dashboard

1. Activation of the {openAudit} extension from a dashboard. 

2. Choosing a cell within the dashboard. 

 

3. The dashboard cell sourcing has just been displayed. We can go back to the sources. 

 

Specificity: a data lineage based on exhaustive analyses

For Power BI for example, data lineage will encompass Analysis Services components such as Power Query (M code), SQL queries, expressions calculated in DAX, as well as analyses in MDX language. For SAP BO, data lineage will encompass Universes, SQL queries generated by Data Providers of Webi reports, variables and formulas, regardless of the nesting level.

 

Specificity: an impact analysis of power supplies through to uses 

We provide a view to perform instant impact analysis from any datapoint in the sources.

From a field or table, it is possible to visualize its impact up to all the dashboards concerned, technology by technology. We also integrate ad hoc queries, transfers to other systems, etc.

 

Perform end-to-end impact analysis from a field or table

1. Starting from a data in the databases that are in source, all the impacts are displayed instantly.  The size of the "point" will give indications on the size of the query and on its cost (in the Cloud).  

 

2. All impacted technologies are listed. They can be selected or excluded from the analysis.

3. Each “impact”, therefore usage, can be analyzed: user(s), last execution, number of executions, query size, cost.

 

Conclusion 

The increasing complexity of transformation processes in dataviz tools poses significant challenges for data traceability and general understanding of systems.

Granular data lineage within dataviz tools, connected to feed analysis, makes it possible to instantly highlight all management rules in order to harmonize them within complex platforms. This fine introspection also makes it possible to identify nuances between dashboards, often barely perceptible, but potentially with serious consequences. For that, we also have answers  😉.

 

Technologies addressed (others in development):

Commentaires

Posts les plus consultés de ce blog

La Data Observabilité, Buzzword ou nécessité ?

BCBS 239 : L'enjeu de la fréquence et de l'exactitude du reporting de risque

Le data lineage, l’arme idéale pour la Data Loss Prevention ?