Convinced by the benefits they can derive from the cloud, companies are deciding to migrate a greater or lesser proportion of their information systems. But they may come up against a major obstacle: how can they transfer ever-increasing volumes of data? Physical transfer remains the only solution. But it's not without its drawbacks...
Every day, companies are moving more and more workloads to the cloud. Last March, the renowned British Broadcasting Corporation (BBC) migrated over a petabyte (corresponding to the files of more than 23,000 employees) to the cloud.
In early May, Google won a major contract: Twitter decided to transfer over 300 petabytes (PB) and Hadoop clusters to Google Cloud Platform.
Whatever the reasons (greater flexibility, greater security, greater availability...), companies are faced with the problem of transferring huge volumes of data. We're not talking here about migrations of a few handfuls of TB, but of petabytes (or 1024 terabytes, ± 1 million billion bytes).
1# Rack and plane
Conventional transfer solutions are not suitable, as they take too long. As we saw in our article on "Object storage", NAS protocols are not suitable for petabytes either.
This involves a physical transfer. Many companies offer such services, particularly the IT giants. These include racks capable of storing up to a petabyte of data. This is what Google offers with its GTA data-transfer appliance. But you need to be patient. According to Google, 1 PB can be transferred in 25 days. And the bill can quickly climb: almost 1,800 euros for its 480 TB transfer solution (to which shipping costs must be added).
For its part, IBM supplies storage peripherals for mass data migration (120 TB). These media are then shipped by UPS. In September 2017, Microsoft announced the release of Azure Data Box. This is a physical medium that can hold around 100TB of data and can be accessed via SMB/CIFS. Here too, the media are then shipped by various specialized carriers (DHL, UPS, FedEx...).
2# Semi-trailer
Amazon has been offering various solutions since 2009. These include Snowball. This is a suitcase-sized rack that can store up to 80 TB. But above all, there's the Snowmobile. This is a storage server installed in a 14-meter-long container. It is transported to the company's headquarters by semi-trailer.
The 100 PB storage package is connected to the customer's network via a broadband line. According to Amazon, the Snowmobile's full storage capacity can be reached in around ten days. The server is then transported to the nearest Amazon datacenter. The final step is to transfer the data to the cloud.
3# Business at a standstill?
To prevent this highly sensitive transfer (the container can store confidential information) from falling victim to a modern-day stagecoach attack, Amazon specifies that it relies on "dedicated security personnel, GPS tracking, 24/7 video surveillance....". Like its competitors, all data is encrypted.
His solution seems ideal. But a closer look at the Terms & Conditions reveals that the metadata of transferred files is modified. The only metadata not modified are the file name and size. All of which complicates synchronization and information management.
Furthermore, the very long transfer times (from the company's servers to the AWS racks or truck, then transfer to the cloud provider's datacenter) can force companies to "freeze" certain activities. If your company benefits from a very fast connection (10 Gbps, for example), a petabyte will take around ten days to migrate from your datacenter to a provider's cloud platform. This is an option well worth considering.
The migration of petabytes is therefore a highly complex undertaking. It requires lengthy preparatory work to study all possible solutions and anticipate malfunctions.