Azure File Sync

Azure File Sync

Up to now, many hybrid cloud scenarios have been acceptable at best. With Azure File Sync, Microsoft presents what is a milestone and paves the way for File Services into the world of hybrid clouds. Reading time: 12 minutes.


With File Sync, cloud storage does not play the role of a storage component within a hybrid architecture. But File Sync allows you to seamlessly integrate cloud resources into your existing infrastructure. Without disrupting business operations, without incurring high initial costs and without exposing the user to a completely new ecosystem.

We had the opportunity to work with File Sync as part of a proof of concept and test it together with our customer. Of course, we gained a lot of experience, which we would like to share here.

Since we describe the current situation of File Services in the first few paragraphs, these might be less interesting for experienced readers. Therefore, if you are interested only in information on File Sync you might want to skip to the paragraph “Here comes Azure File Sync!”. There you will find the hard facts.

Where hybrid solutions fail

The development of File Services

One of the most common tasks of a local data center is File Services. Historically, this means providing an infrastructure to store data in an organized way. However, this definition only covers the “bare minimum” of what file services have to perform today.

The requirements continuously expanded over time. For example, the everyday business life of a large corporation demands that employees be given very different and granular authorization for the digital data library. That is the point at which OneDrive and SharePoint fail as valid options by the way. This is because complex authorization structures cannot be handled or imported there.

A file service must also enable users to exchange and process data together. And one component that is becoming more and more important is the protection of files from disaster.

There’s problem looming ahead

So far so good. However, there are increasing problems with data maintenance on-premises. Small companies already have huge storage requirements. Especially if unused files cannot simply be deleted, but have to be archived for several years for legal reasons. But not only are the current needs of companies great, but the inflow is vast. As a result, IT managers have to make crude estimates of future data volumes, which requires prophetic skills in high-growth industries. Overprovision or deficiencies are pre-programmed.

If you store your data in a local environment, you must also maintain you hardware in-house. This means that either trained specialists must be employed or external help must be requested. Both alternatives are not free of charge and they require resources that might be used in a more sensible way elsewhere. For example, if they are added to the actual business purpose.

No one deletes data

If you want to transfer the data of your employees to SharePoint or OneDrive, you will be surprised that the data grows rapidly. No user deletes his files, they are copied. And that’s a very natural reaction. And not ill-considered either. If you look at the SLAs of OneDrive or SharePoint online, you can read that the data you delete will be eradicated from their servers automatically after 90 days. However, the natural caution of users makes it difficult for the company to achieve the big goal of hardware savings. On the contrary: the company may pay twice as much.

A step in the right direction

Many of you will hear the bells ringing: Cloud, cloud, cloud! And this impulse is right. However, it turns out that the cloud cannot solve all the problems. It even brings new ones to the table.

Many managers are uncomfortable with putting all of their company’s data into an IT landscape over which they have no control. Additionally, legal documents that raise objections. In other words, only a few companies will solely consume infrastructure as a service in the form of cloud computing, but instead will continue to operate data centers on their own responsibility to host some of their data and applications. Which brings us to the hybrid cloud.

What capabilities must a good solution offer?

The hybrid cloud combines the advantages of an on-premises infrastructure with the advantages of the cloud. Confidential and highly frequented data is kept on site. Unused or archived files or those that need to be made available to a broader audience, such as employees in other offices, are migrated to the cloud.

The scenario in question will enrapture IT managers. But those who deal with the technical implementation are often confronted with insurmountable obstacles. If an existing data center is extended with cloud storage, the existing access permissions must be transferred into the hybrid solution. Easier said than done.

Another thing that hampers many hybrid solutions a is the integration of the cloud into the on-premises infrastructure. It is often desirable to integrate the cloud memory harmoniously into the existing system as a storage expansion. In reality, however, data created in the cloud must first be retrieved from the cloud to the local data center and rehydrated in order to make it usable (see StorSimple).

Furthermore, a good transition cannot be disruptive. The user should have the same graphical interface with the same look and feel as before. This saves time and money because no workshops or introductory phase are necessary.

Here comes Azure File Sync

The basic idea of Azure File Sync is to enable multiple Windows file servers (Server Endpoint in File Sync nomenclature) to synchronize with File Shares (Cloud Endpoints in File Sync nomenclature). The entirety of server and cloud endpoints is called Sync Group.

But how is the synchronization implemented in detail? Which data sets are kept locally, which ones are displaced to the file share? What are the administration and automation options? Are the NTFS permissions on the local servers preserved? We will focus on answering these questions and further details in the following paragraphs.

But first of all, a short remark on the installation of the sync service: File Sync is placed on Windows servers in the form of an agent. The agent connects and authenticates the server with Azure. The File Sync Agent runs under Windows 2012 R2 and 2016 and in the future it will also be possible to equip Linux servers with the new software.

The main feat: synchronization

Generally, synchronization can take place between several server endpoints (limited to 50 in the Public Preview, this limit will be increased from General Availability in the spring of 2018) and several cloud endpoints (currently only one, and will also be increased in the future). For simplicity’s sake, we will initially limit ourselves to a server endpoint and a cloud endpoint and then leverage this knowledge to tackle more complex multi-sync scenarios.

If a user wants to synchronize a local Windows file server with a file share, the user must first select those volumes and folders that are to become part of the file sync topology. Afterwards, these volumes are replicated to the file share. Sensitive data that you do not want to have in the cloud can thus be excluded from the architecture.

In the next step, you can define the minimum storage space to be kept free on the local storage media. Once this limit is breached, File Sync starts sorting out the data unused for the longest time. Consequently, these files are only stored in the file share.

In short: the file share contains all data records selected for synchronization; locally, only the currently used files within the user-defined limitations are stored. Microsoft calls the process of caching “hot” files locally and displacing “cold” files cloud tiering.

However, the scenario described above is only a foretaste. In the future, further tiering features will be added to the skillset. Users will have the possibility to pin folders. Pinned folders are permanently stored locally. This can be used to meet the requirements applications that cannot tolerate high latencies.

Finally, it will become possible to specify criteria (e. g. folder affiliation, file type and others) according to which data is sorted out after user-defined intervals. Log files, that are rarely viewed but still need to be kept to meet regulatory requirements, can thus immediately leave the local storage space.

Now, let’s progress to a more involved situation and add more server endpoints to our model. While the original file server is located in your primary data center in Cologne, you want to be able to exchange and collaborate on data with a US branch office. So you add a server in Seattle to serve as a server endpoint of the Sync Group in the US. If your colleagues in Seattle process data, changes are passed on to the file share and finally realized in Cologne. This way, an entire synchronization network can be set up in which all data (with short latency-induced delays) is consistent.

Since multiple cloud endpoints allow for disaster recovery capabilities, we discuss this scenario with more than one cloud endpoint under Disaster Recovery built-in.

Two File Sync agents synchronized with File Share

All about access

NTFS permissions

Anyone who knows Azure well enough, knows that file shares are not capable of handling NTFS permissions. Access is instead granted via shared access signatures (SAS). These are URIs that consist of the URI of the storage resource and an SAS token. The latter controls which permissions the user possesses and carries out authentication.

This scheme is of course not practicable for the complex authorization structures of modern companies. A considerable shortage that can stand in the way of cloud projects. With File Sync, Microsoft had a change of heart. File Sync transfers existing NTFS permissions to the file share during migration. If users access files in the share via the agent, they are subject to the same permissions that apply locally. This is a perfect scenario for integrating cloud components into an existing IT infrastructure.

It should be emphasized here once again that access must be done via File Sync. For direct “cloud access”, SAS or Azure admin rights are required still. Since many companies would like to give their employees direct access to file shares without using the workaround via File Sync, Microsoft is developing in this direction. Cloud-native access should be possible by the middle of next year (2018).

Partial Up- and Download

But how do I access data that is no longer stored on my local server but only exists in the cloud? It couldn’t be simpler: These files are still displayed (greyed out) in the local namespace.

If I use data that is no longer available locally, only the chunks necessary for processing are downloaded. For example, if I watch a video, File Sync merely streams the parts I actually watch and does not download the entire file.

The same applies to the data upload: If I am working on an Excel file, but only change one entry, File Sync will only transfer the modification of this entry to the file share and all other connected file servers.

Conflict resolution

If we talk about File Sync, we talk about collaboration. While you are editing an Excel document at the Cologne headquarters (to return to the previous example), an employee in Seattle can access and change the same file. Conflicts are inevitable.

The conflict resolution mechanism currently at place is known from OneNote. If a file is edited by several users at the same time, File Sync selects and stores a “Winner file”. The colliding documents will be saved as alternative versions in the same location. Then users have to decide manually what happens to the existing versions.

However, this is not going to be a permanent solution. A team is currently working on implementing file locking: if a file is in use, a second party must wait until the file is released again.

High availability

Disaster Recovery built-in

A File Sync architecture offers recovery option both on the datacenter side and the Azure side. Let’s consider over both options:
If a local server fails, the agent can be installed on a working server. Initially, the replacement server provides itself with metadata in order to be able to resume operation as a file service as quickly as possible. Consequently, data is already displayed in the namespace and requested files are streamed immediately. After that, the remaining files designated for offline use will be downloaded successively.

But what if your entire data center is unavailable? Use the cloud! Currently, this means deploying an Azure VM. You can either map the file share as a network drive via SMB or place the file sync agent on the VM. Et voilà!

However, the developers want to eliminate this workaround in the long run. In the (hopefully not too far) future, users can access file shares immediately in the cloud: not via SASs, but standard NTFS permissions. This opens up a cloud-native scenario: If your company has branch offices that need to get along without local servers, employees will be able to use a file share (including NTFS permissions) without the detour of a local server.

Earlier, we briefly mentioned that cloud-to-cloud sync, i. e. multiple cloud endpoints, are a future scenario. This has several advantages. In order to illustrate this more vividly, we return to your Cologne headquarters and the American branch office.

If you operate the File Share in Germany Central, your American colleagues will (maybe) complain about the “long line”. Well, you can just create another file share in US West that synchronizes with the existing cloud endpoint in Germany Central. Via the cloud-to-cloud-sync, this extends to your data center in Cologne. Now you have a consistent Sync Group that includes two server endpoints (Cologne, Seattle) and two cloud endpoints (Germany Central, US West).

Suppose a fire breaks out in the European data center. The file share can no longer be used – the underlying hardware is burnt up. The File Sync Agent on the Cologne file server detects the connectivity issues and simply connects to the nearest cloud endpoint containing the relevant data instead. In our example, for lack of alternatives, to US West. However, if there was another cloud endpoint in, say, US South, the Cologne server would prefer this due to its geographical (and the associated network) proximity.

Of course, it is important in all this to ensure that the given laws and guidelines are adhered to. It may not be possible to simply synchronize data across national borders.

Failover (Disaster Recovery) of branch office

 

Backup

Azure File Sync eliminates the need for a local backup solution. The local server need no longer be slowed down at regular intervals because a backup needs to be created and moved to a secondary location. Instead, Azure Backup Vault can be used to take an incremental snapshot of the file share that can be stored miles away – and even on another continent. And all this without disrupting the local operations.

In all of this, you can define the frequency and retention; restore individual files or an entire file share at any point in time.

Encryption

If so much data is shipped without the user reviewing every individual transfers, security issues will naturally arise. During transmission, files are protected with 265-bit SSL encryption. The file share itself is secured by Microsoft-managed keys. Here, too, we take a look at the future: User-managed keys will come.

The process of migration

Although Azure File Sync can be used without an on-premises infrastructure (as described above), the primary use case will be a hybrid scenario. If your data center is not in its infancy, you will most likely not manage without a data migration.

Given the huge amounts of data that are common today, in some cases even an MPLS connection with an Azure data center will not be sufficient to quickly migrate all files into the cloud. Microsoft provides an import service for this purpose. You send your media to an Azure data center, where the data is copied to cloud storage.

A list of hardware requirements and further information can be found here.
Of course, this migration does not happen immediately and users will have changed some data in the meantime. How is this delta bridged?

This problem is one of the biggest hurdles we faced in our project. Because the data of the Windows server that we wanted to include in a sync topology is permanently in use and changed.

Microsoft is currently working on a solution to this problem: When initially syncing the file share with the local server, File Sync detects where changes have occurred. The uploaded, unmodified files are tagged with a tag that identifies them as outdated. The up-to-date files, which have been changed locally by users in the meantime, are also migrated to the file share. The user can now determine what happens to the obsolete records, for example with the help of a PowerShell script. This is made much easier by the tags, which allow you to address all outdated files simultaneously.

As part of our PoC, we were fortunate enough to find a legacy Windows file server in a remote location in a company otherwise populated by NetApp. Apart from the fact that the File Sync Agent is only compatible with Windows Server (as mentioned earlier with Linux in the future), the protocols of some storage solutions make the synchronization process more difficult. For example, a NAS storage device does not notify the File Sync Agent when files are changed. This means that the agent must check this regularly and manually, which weakens the general performance of the storage medium.

Also interesting..

More on Cloud Computing? Here are some reading recommendations:

Here you learn more about your roadmap into the Cloud.

Who are the big players in the Cloudbusiness? Read more.

At a certain stage everybody has to talk about money. Here is more on Cloud Budget

Leave a Reply

Your email address will not be published.

Interested in the new file service by Azure?
Feel free to contact us!