Even if they comply with precise criteria and certifications, such as TIER, data centers are not immune to failures or breakdowns in the cloud. It can be technical, organizational or human. Yet very few companies include this risk in their cloud projects...
OVH customers will remember the end of 2017. In November, the hosting provider fell victim to a major outage that affected its infrastructures. As a result, more than 3 million websites, including big names in media, e-commerce and banking were impacted!
More recently, in June, Visa's European system failed. Although 91% of UK cardholder transactions were processed as normal, according to the company's European manager, some retailers were forced to abandon their machines in favor of cash-only transactions.
These two examples show that the data centers and infrastructures of major corporations are not immune to failure. And yet, six out of 10 companies migrate their data and part of their IS to the cloud without fully assessing the cost of a service interruption.
Verita 's " The Truth in Cloud " report
This is what we discover when we read the report entitled " The Truth in Cloud " by Veritas.
What's most surprising is the willingness to take the plunge into the cloud, and the lack of awareness of the risks involved. On the one hand, 99% of IT managers declare that their company will migrate to the cloud in the next 12 to 24 months.
On the other hand, a majority of French respondents admit that they have not assessed the cost of a service interruption to their company. And yet, the threat does exist. According to a survey of 1,000 data center operators by the Uptime Institute (which issues TIER certifications), between 25% and 46% of those questioned had experienced a business outage.
Admittedly, this study dates back to 2014. But its results are partly confirmed by the Veritas report: 36% of respondents (41% in France) estimate the total duration of service interruptions at less than 15 minutes per month. But a third (20% in France) say they can reach 30 minutes or more!
Human error: 70% of cloud failures
Can you imagine a consumer returning half an hour later to shop on an inaccessible e-commerce site? All studies show that Internet users are rarely loyal to a site, and that they are above all looking for the right opportunity!
The primary function of a data center should be to provide constant uptime for the critical applications it houses. However, unexpected failures can occur. Their operators need to be proactive in finding ways to prevent them.
Four causes account for the majority of breakdowns.
Human error
Whether they are involved in design, installation or maintenance, people are often the cause of failure or malfunction. The Uptime Institute reports that almost 70% of failures can be attributed to human error.
It has to be said that many aspects invite potential errors, whether due to illogical sequencing, poor (or lack of) labeling, lack of maintenance or inadequate training.
Cooling failure
Overheating can bring down a data center or a company's server room. Case in point: a trainee who, during his lunch break, thinks he's the administrator and fiddles with the settings. As a result, he puts the air conditioning on standby".
However, when the equipment gets too hot, it shuts down to protect itself, resulting in a failure. In the case of our trainee, the company made two mistakes. Firstly, it left the server room open during the lunch break (in addition to the risk of air-conditioning failure, there is also a risk of data theft or loss). Secondly, it left the trainee alone in a critical location.
Wiring problems
Cabling is at the heart of all data center activity. It must be efficient and faultless. If the cabling system fails, the data center can also be at risk.
Digital threats
Computer attacks are a growing cause of data center breakdowns. Here too, the origin may be internal: an employee accidentally falling into the trap of a phishing attack. It can also be of external origin: an intrusion attempt on the computer network or a Ddos attack.
And then there are the breakdowns. In the case of OVH, the two EDF power supplies and the two generators failed. As a result, the routing room lost power.
Whatever the risks, it's essential to reinforce Plan Bs. It's also important not to be too quick to blame technicians. Due to a lack of training or an increased workload, they may unwittingly be the cause of a failure. It is therefore essential to multiply training courses, particularly those devoted to IT security.
Nor should these sessions exempt IT security managers from reinforcing their access and password management policies...
Investing in high-performance cabling and periodically checking cooling equipment are also wise precautions.
All these measures must be applied by data center operators. But their customers can also ensure that they are applied correctly. The long-term future of their business depends on it!