Cloud Costs - how to do better?

 Cloud Costs - 

how to do better? 



Companies have massively migrated to the Cloud, even if in Europe we are more or less in the middle of the transition. The pricing models are obviously very different from those that were in effect "on premise". 

The promise is to control costs, but often unpleasant surprises accumulate. It is often due to lack of a good understanding of things that bills soar. Thus 50% of companies spend more than $1.2 million per year on Cloud services with a growth rate of 20% (Editor's Choice / Gartner). We will see the mechanics below. But there are some reasons to rejoice!  

- The first good news is that finally the AWS, Azure or GCP storage offers have pricing models which are now largely harmonized. Thus, once familiar with these models, it will be easier to evaluate the added value of each of them. In the examples below, we will discuss the case of Azure. 

- The second good news is that the systems are clogged with data that is not used, or no longer needed, which has consequences on direct costs, but also on the astronomical sums that are spent on maintenance vs. development for the service of business.


Cloud storage cost items 

_______________________________________


- Storage  : in general, these costs are expressed in GB / TB per month or per year. Each provider offers storage tiers with different levels of cost versus performance and availability.

- Storage operations  : generally, this is not significant, except obviously when the objects are numerous, with numerous workloads. 

- Download  : these fees based on GB read vary depending on storage levels and network activity when downloading data.


Storage : 

It is common to have volume discount levels linked to performance levels: hereinafter the Azure model.  




Storage operations: 

Moving data is not necessarily expensive: reading, prioritization, recovery of properties, etc. However, it can quickly increase, particularly on certain offers. 


We can see that the lower storage cost levels have the highest activity costs, since these levels are not designed… to have activity, QED! 

Modeling storage costs can be tricky because you need to understand what an application does with the data to be able to model these costs. 

With smaller but more numerous files (IoT, emails, etc.), and this is the real vector of system inflation, operations can be multiplied and so can costs. We must assume that each of them is the subject of 4 writing operations and 2 reading operations! 




Billing can soar!


The download:

Data recovery is billed in the storage cost, while downloading outside the Cloud region represents a "transfer cost" linked to bandwidth.  

Download in the Cloud region: 

Recovery data transfer costs are not often large sums. For example, for Azure, accessing any amount of data from Hot has no transfer cost. Access to 5TB would cost around $50 if retrieved from Cool, or $100 if accessed from Archive.

Downloading outside the Cloud region: 

This is not the same story... 





As noted, download costs can become enormous: 

When creating instances in a separate Cloud region, 
If the whole company starts doing data science on local data, and this is more and more common ;-)


Systems that continually grow, with a strong impact! 

______________________________________



As we have seen, the costs of Coud are based on storage, storage operations, and downloading. 

When we know that a very large proportion of data has no use, it does not hurt to carry out continuous cleaning. 

Forester says that between 60% and 73% of a company's data has no use for analytics. And data centers have a heavy impact on the environment. Currently, servers already consume 1.5% of the world's electricity... IDC predicts a massive increase in stored volume to 175 zettabytes, an increase of 300% compared to today. How far will we go? 


Detect and put aside what is not useful in the systems. 

_______________________________________

  • {openAudit} will retrieve certain logs in the audit databases which provide information on the use or not of data: is it present in a dashboard, is it requested by this or that tool? This information is identifiable in the Google Cloud Platform Stackdriver, in AWS Cloudwatch, or Monitor for Azure.  

  • Starting from the data used, openAudit will deconstruct the flows down to the operational sources (= data lineage), and thus identify the “living branches” of the system. And by deduction, the dead branches. All that will remain is to decommission them continuously to keep the system in its right proportion. 


 Conclusion

_______________________________________

Cloud pricing is a real nebula, but it is tending to harmonize. 

And each need will have its response, with its strengths and limitations. What is certain is that costs are exploding for the simple reason that volumes are continually increasing. 

We think that one of the essential subjects will be to contain the size of the systems, or even to make them decrease by continuously detecting and removing dead matter. 

{openAudit} can enable this virtuous movement. 

Commentaires

Posts les plus consultés de ce blog

La Data Observabilité, Buzzword ou nécessité ?

BCBS 239 : L'enjeu de la fréquence et de l'exactitude du reporting de risque

Le data lineage, l’arme idéale pour la Data Loss Prevention ?