Data Observability, Buzzword or necessity?




 

Data Observability,

 

Buzzword or necessity?

 

The quality of data used in daily operations plays a key role for businesses. 

 

A 2022 Gartner study suggests that “bad data” costs organizations approximately $12.9 million per year. 

The UK government's Data Quality Hub estimates that organizations spend between 10% and 30% of their revenue on data quality issues, which can cost hundreds of millions of dollars for leading businesses. 

As early as 2016, IBM estimated that poor data quality cost American businesses $3.1 trillion per year.

...

Everyone wants to rely on data integrity to avoid simple errors related to feeding problems in feeds, or poorly organized, replicated data, etc. But how ? 

 

“Data observability” techniques are one option. 

 

 


What is data observability?

 

Simply put, data observability refers to  “monitoring the progress of the lifecycle of a data flow from operational sources to exposure (i.e. consumption)” throughout the entire cycle. transport/processing.

 

With data observability, it becomes possible to know the incompletenesses or potential errors that are underlying the data consumption phase.

 



A huge business need for data observability

 

We live in a highly competitive economic world, which relies a little more on data every day. We are talking about “data driven” companies.

 

Data is no longer just intended to deconstruct an economic activity, but to project it into the future.   Everyone will need real-time insights based on qualitative data in order to meet this requirement.

 

This is the main difference between data observability and

simple data monitoring.

When you monitor data, you know if something is wrong, but introspective analyzes are not associated with it . 

When you use observability, you can determine the cause of the problem. You can go further, and ensure that these root causes are corrected so that they are no longer at work in the future.

 

2 options :  

Many cloud platforms in particular include built-in features that could be thought of as data observability capabilities.

  • If your company typically uses only one database, few data visualization tools, and for a small population, choosing this approach is probably preferable.
  • However, if you have complex data management, with many sources, many storage technologies, data visualization, different schedulers, ESBs, etc., the integration of real data observability tools will be  essential  “In-house” software solutions require significant and recurring engineering effort. This makes them unappealing for most businesses.


 



And the ROI of an observability solution?

 

When considering the true impact of a new technology, it's especially important to consider the return on investment (ROI) you'll get from using it.

But because data observability is a large and complex topic, it has been difficult to determine exactly what its ROI is.

There are, however, 2 virtuous financial impacts that should be taken into consideration:

 

  • Operational Financial Gains: The operational component includes efficiencies gained by improving the use of data or eliminating time spent on manual processes that can now be managed automatically. For example the efficiency gains linked to the elimination of downtime due to inaccurate or incomplete data, or the time saved in automatic vs manual collection, etc.

 

  • Economic: when  managers have strong confidence in the quality of the information they use to make decisions,  their audacity increases tenfold, serving the company's performance.

 



Some essential attributes of a data observability solution?

 

  1. Can my solution allow me to be sure that the  data is up to date  ? Is there a scheduler monitoring component?
  2. Does my solution allow me to make  my system simpler, more readable, more efficient?
  3. Can my solution allow me to validate that  my data is complete and calculated correctly  ? 
  4. Is it possible to move from a  “high level” view of my data flows to the technical details that prevailed in their implementation?  
  5. Does my solution allow  information flows to be traced from the data visualization layer to operational sources? 
  6. Does my solution have  data lineage functionality?
  7. Does my solution  popularize technical terminologies by associating them with business terminology,  to truly share knowledge with everyone? 
  8. Does my solution  work automatically, dynamically, for continuously up-to-date answers?

 

Non-exhaustive list 😊   

 

 



Our answer: mapping based on  data lineage

 

At Ellipsys, we publish data lineage software,  {openAudit}  , which makes it possible to address these observability issues in a completely automated way by delivering an exhaustive and automated map of the information system. Some high value-added characteristics of this “mapping”:  



A business and IT vision for the entire system  :  the business terms used to define the data are stored in what are called semantic or similar layers. {openAudit}  propagates this terminology throughout the system as much as possible so that businesses can also investigate systems “with their eyes open”. 



Monitoring of the scheduling of data flows:  {openAudit}  allows you to detect broken or slowed flows by continuously analyzing the scheduler jobs.

Starting from failed jobs, the granular data lineage of  {openAudit}  makes it possible to act on the chains by providing access to the underlying code. 

 




An end-to-end impact analysis in cartographic form: this {openAudit}  map  allows you to simulate the impact of any change in the information system. Adding a column to a table can literally undermine the quality of crucial business metrics. Having impact analysis tools is not an option.





Data lineage in the dataviz layer:  today, intelligence is largely built within data visualization solutions.  {openAudit}  's data lineage also investigates dataviz solutions to shed light on improper formulas, obsolete management rules, etc., all in 3 mouse clicks!

 




Improve observability by decommissioning the “dead branches” of the system:  information systems carry an incalculable amount of information; largely for nothing. A recent Flexera study indicates that 70% of data in systems is useless.

{openAudit}  allows upstream simplification of the system by identifying unused data and their sources, ie, the "dead branches" of the system. This makes it possible to significantly improve the intelligibility of a system and therefore greatly increase its observability.

 

Reduce IT debt 

 

 

Conclusion

 

Having absolute confidence in your data is imperative in the era of “data driven” companies.

To do this, it is necessary to implement data observability mechanisms. It can be done “paper and pencil”, but the rapid evolution of systems and the proliferation of management rules will make the copy very quickly obsolete.

 

We believe that the automation of system analysis, i.e. the reverse engineering of data transformation/transport processes throughout the system, with responses in cartographic form, is the simplest, most shareable answer there is. . And the most lasting, since it is replayed every day. 

 

Read also : 

 

Lower Cloud Costs

Commentaires

Posts les plus consultés de ce blog

La Data Observabilité, Buzzword ou nécessité ?

BCBS 239 : L'enjeu de la fréquence et de l'exactitude du reporting de risque

Le data lineage, l’arme idéale pour la Data Loss Prevention ?