This is part four of the Cloud Reference Architecture (CRA) blog series and here I am going to explain to you the concept of Virtual Data Center in Azure and any other cloud.
- Check out part 1 – Cloud Reference Architecture or CRA – Foundation
- Check out part 2 – Cloud Financial Governance
- Check out part 3- Enterprise Architecture
In fact, all what is discussed in this blog series is just scratching the surface of what I am covering in the Cloud Migration Handbook. The book covers other major topics for architects and security professionals such as:
- Chapter 1: Practical Foundations for Cloud Computing.
- Chapter 2: Types of cloud migration.
- Chapter 3: Cloud Governance.
- Chapter 4: Cloud Reference Architecture (CRA).
- Chapter 5: Security in the Cloud.
Previous Talk: Cloud Reference Architecture (CRA)
In a previous blog post, we’ve introduced the concept of the cloud reference architecture(CRA) as defined in ISO/IEC 17789 standard, and why you should consider having one. The end result of a cloud reference architecture is to achieve a balance between cloud agility from one side, and security and governance from the other side.
Simply put, the Cloud Reference Architecture (CRA) helps organizations address the need for detailed, modular and current architecture guidance for building solutions in the cloud.
We’ve also covered that the Cloud Reference Architecture (CRA) serves as a collection of design guidance and design patterns to support structured approach to deploy services and applications in the cloud. This means that every workload is deployed with security, governance and compliance in mind from day one.
The need for the Virtual Data Center Model
I saw many patterns over the years of customers deploying workloads in Azure without forward thinking, or without adopting a framework (a.k.a reference architecture). Usually they are at the initial stage of adopting the cloud or more of (let’s try how this thing works first). From my experience and when talking to a lot of customer, I saw usually this pattern happening more frequently.
“Let’s try to do this in the cloud”, someone shouted thinking that it is time to evolve and start exploring the cloud and see how it feels. They might have some Azure credit in their EA agreement, or they consult with their CSP partner to get a subscription in Azure and willing to spend some dollars in this. Perhaps they have couple of people certified in Azure and they want to get their hands dirty working in Azure.
Usually these attempts happen without proper sizing or cost optimization in mind and the goal is to put some workloads out there. Sometimes it is a new small noncritical project and they want to give the cloud a chance. “We created a subscription in Azure” they shout. Of course, the networking and security guys get excited as they want to be part of this new world of cloud computing or perhaps, they want to preserve their jobs, who knows.
Everyone is a global admin of course from the first day, or else things will break (they thought). The security team wear the cloud hat and start deploying a virtual machine with their favorite firewall engine (Palo Alto I assume) and of course we need two of them for high availability. Well done!
Networking team spend the whole weekend trying to open a site-to-site VPN between that subscription (VNET) and their on-premises network, and when that happen, they celebrate big time. Can’t blame them as it is a big thing when you try it for the first time, we’ve all celebrated once when that happened, the happiness of being able to PING a machine in Azure from your laptop at the office!
All eyes are now on the infrastructure team, those guys who spent a lifetime deploying big virtualization clusters on-premises are now tasked to create couple of Azure virtual machines to host some applications. Without full understanding of the different Azure virtual machines types or any of the cost optimization best practices (auto-shutdown, Azure hybrid benefit, or reserved instances), they just pick a VM SKU that looks right at that moment, and mission accomplished.
Now it is time for the application team to deploy their applications on those newly created virtual machines, assuming that the disks are optimized for the right IOPS and throughput, while the billing clock is counting. After all, they get their application running in Microsoft Azure and it is a victory day for everyone.
Weeks later, the head of IT is walking around the floor and he gets a call. “We have a new application that we need to deploy next week” someone said. With full confident he said, “let’s create a new Azure subscription and repeat the success story”. So, a new subscription is born, and everyone gets busy deploying firewalls, VPN connectivity, and couple of VMs. Oh, wait’ everyone is a subscription admin or else things will break, or worse, everyone is a global admin.
You can see where this is going right? They might acquire a new company, a new application needs to be deployed, and of course, a new subscription and the whole story gets repeated. Perhaps this time, the security team were not involved, and they end up deploying a subscription with production workloads without any security element (firewall).
With time, managing all those subscriptions, maintaining all those firewalls, monitoring the VPN tunnel health, and tracking changes become a challenge. What started to be an exploring to the Azure lands become islands of workloads deployed here and there. Governance is lost and security is a nightmare.
But what if there is a better way to do things? What if we paused for a minute and adopt a reference architecture that can work for today’s and tomorrow’s need. A reference architecture that helps that company deploy different workloads in Azure or any cloud with governance and security in mind from day one. An architecture that works for each organization regardless of their size or needs.
I hope you are still reading and excited about what is coming next. Let’s try by doing this.What if we introduced a new subscription that acts as a security and governance guardian? A subscription that helps solve all those problems by introducing a new abstraction layer in the middle. In this new subscription (to be called later a Shared Services subscription), we are going to move all those firewalls from all other subscriptions and deploy only one set of firewall devices into this new subscription. Moreover, instead of having a VPN from each subscription to your on-premises infrastructure, we will just initiate one VPN tunnel from that shared services subscription to on-premises. Now from each of the other subscriptions, we will create a VNET-Peering in Azure only to the shared services subscription. A new Hub and Spoke model starts to evolve to organize everything in the Azure land.
This is what we will call from now on the (Shared Services Model). We deploy our security, management, and connectivity infrastructure in one HUB subscription, and offer these components as a service to our line of business (LOB) applications running on their perspective subscriptions/VNETs. If we think about it, this model has a lot of advantages over the previous model.
First, we established what is called in the security world, a segregation of duties. This means, network and security teams manage the shared services subscription and they are the only people allowed to modify firewall settings or alter connectivity components (hybrid connectivity), while application developers or DevOps get full access on their perspective LOB subscriptions. DevOps team can have subscription owner right on their own subscriptions to innovate and do crazy stuff on their own space, but they can’t alter (and compromise) firewall and networking stuff as they are hosted in their own subscription (shared services subscription).
Second, we now eliminate the need to have connectivity and security components (firewall) in each of the LOB subscriptions, which minimize the cost and the management overhead. Instead, we have well-defined security policy that is to be maintained in the shared services level.
DevOps are now more empowered to do whatever they want (ideally, Azure security policies need to be defined by security team to prevent developers from accidentally creating public IPs in their own subscription and accidentally exposing workloads directly to the internet). And security and networking team need only to manage things at the HUB level.
The idea here that I am trying to deliver is that by having a good reference architecture in place, IT and security teams can extend their trust to the Azure land with a proven and well adopting hub and spoke architecture and by carefully considering the unique nature of the cloud. Things like the software defined networking SDN in the cloud as networking is now defined by a code not by wires. Identity and access management if designed right, enables proper segregation of duties between different stakeholders.
Security becomes a shared responsibility in the cloud where the nature of risks are different, but the role of IT security is the same as we start to leverage more of security as a service, threat intelligence and learn how each Azure component should be securely configured. Compliance and monitoring are hugely impacted in the cloud and without a proper mindset, things might fall apart.
One of the most underestimated elements when considering the cloud, and one of the most famous arguments I head from customers is the miss-understanding of the SDN component of any cloud platform. In the cloud, everything is done in code. Even if you are touching a fancy web management portal, web calls are being made to serve every request. A full networking stack can be defined in a JSON template and unleashed to create a full data center network. RIP for the broadcast and multicast as they don’t exist in the cloud. No more VLANs and switches. Routing tables are not defined inside firewalls and switches (to some extend) but are printed on VNETs and subnets. L4 firewalls are now coded withing a subnet boundaries or within a group of machines with the same tag (Seach the internet for the term Micro-Segmentation or Application Security Groups in Azure).
Even the way you interact with cloud services is now different as most of time these services are exposed over the internet (think of a piece of code trying to access a storage account and don’t be surprised if you figured out this is happening over the internet and not inside your lovely Azure network). Of course some technologies are there (like VNET service endpoints) but this requires special configuration.
An approach to isolation, security, and trust in the cloud
[It is not something related purely to networking, it is nothing you sell in Azure, but it is a philosophy.]
I am not going to teach you here how Azure works, but I want to give you an example of how the whole hub and spoke model works. While VLANs are the hero of on-premises world, in Azure things are virtual, so it is not a surprise that we have a construct in Azure called a Virtual Network or VNET. A VNET is an isolation boundary in Azure with no default ingress endpoint. We can’t think of a VNET as the same we think about a subnet in the on-premises world. A VNET for example can have one or more subnets and you define the address space of a VNET in advance where subnets later consume portion of that VNET level address space.
It is an isolation boundary in the sense that workloads in different VNETs by default can’t talk to each other’s, while workloads within subnets in the same VNET can talk to each other’s and even have an out of the box name resolution. Routing between subnets in the same VNET is taken care of by Azure, so machines in different subnets within a VNET can reach each other. However, you can control or overwrite this default behavior by configuring User Defined Routes (UDRs). Now this is powerful because now we can enforce the traffic existing from those subnets to go to a hub subnet where we maintain a firewall so traffic between subnets gets inspected. This is also where we can enable our virtual machines in Azure to route traffic to our on-premises infrastructure.
You might ask then, if I have one big VNET with multiple subnets that I can deploy all my workloads in, why do I need multiple VNETs? Well, a single VNET is hosted inside a subscription like any and every Azure resource. In fact, every resource in Azure is a child of a subscription object and this is how Microsoft reports the cost for resources per subscription. A single VNET can’t span multiple subscriptions, so if you are in a situation where you can want to create more than one subscription (there are a lot of reasons why you want to do that – Check my part 3 about Enterprise Architecture and Subscriptions Design Models), then you need to create a VNET for each subscription.
For example, if you want to deploy the HR app and the Marketing app in Azure, each in their own separate subscription, then you need to create a VNET for each subscription to host your virtual machines. You can’t create one VNET and stretch it across the two subscriptions. Going back to the Hub and Spoke model and the idea of having a shared services HUB in its own subscription, you end up with three VNETs, one for each subscription. By default, VNETs have no ingress endpoint as they are an isolation boundary. You can’t expect a VNET to receive a traffic from the outside world or from another nearby VNET. The way to make different VNETs talk to each other’s is by defining a relationship between VNETs in Azure which is called VNET Peering which simulate a VPN tunnel in the on-premises world. So in the hub and spoke model we talked about, the HR App VNET will have a VNET peering with the shared services subscription and the marketing app will have a VNET peering with the shared services VNET (the hub). From the other side, since peering is a one-way relationship, the shared services VNET will also have a VNET peering relationship with both HR and Marketing VNETs.
It is also worth mention that the traffic between resources in the peered virtual networks is completely private and stays on the Microsoft Backbone and will not go through the public internet, which means it is a secure way of connecting workloads in different VNETs (and different subscriptions) even if the other VNET is in another Azure region (although there is a cost for such traffic),
Now if we deployed a VPN gateway on the shared services VNET and established a VPN tunnel from there to our on-premises network, we can enable VMs and resources in both the HR and the Marketing VNETs to use that VPN gateway (in the hub) to reach the on-premises world. This is the whole idea of having a hub and a spoke model, that we only need one VPN Gateway in the HUB that serves the hybrid connectivity needs for other spoke VNETs and eliminates the need for deploying VPN gateways and hybrid connectivity in each spoke VNET, which both reduces the cost and manageability overhead.
The mechanism for the VPN gateway in the Hub to offer this service to the spoke VNET is called Gateway Transit. Gateway Transit is a VNET Peering property that enables one virtual network to use the VPN gateway in the peered virtual network for cross-premises connectivity. This also works if one of the VNETs are deployed in different Azure region.
This opens the door for a lot of design opportunities and with that capabilities we start to shape the architecture that works for our current and future needs.
Now without such forward thinking, and if you have three VNETs, each with its own VPN gateway and with two on-premises locations (Europe office and US office), you end up having three VPN tunnels per on-premises locations to each of the cloud VPN gateways. Each VNET will have a separate firewall to inspect traffic going from and to each VNET from you on-premises land. Since VNETs by default can’t talk to each other’s, you will end up creating a mesh of VNET peering between the three VNETs (two peering relationship per VNET, 6 in total), and if you have more VNETs, then the number of VNET peerings you need to manage will increase exponentially (in fact there is a limit of the VNET peering relationships you can have in a single subscriptions).
You can see the mess we have here if each VNET in each subscription maintains its own firewall and hybrid connectivity to both on-premises locations. Security is also hard to manage as you can reach the Azure land from many entry points and your security team need to keep an eye on each of those entry points. There is obvious need to think of a better way to govern how traffic and security are built in the cloud.
If you don’t know already, the Cloud Security Alliance CSA has a solution for us.
The Cloud Security Alliance is a non-profit organization with a mission to promote the use of best practices for providing security assurance within cloud computing.
They solve this problem as they recommend implementing a preferred and flexible architecture for hybrid cloud connectivity using a “bastion virtual network”.
In this architecture, it is possible to connect multiple cloud networks to your on-premises datacenter via one hybrid cloud connection.
You build a dedicated virtual network for the hybrid connection and then peers any other networks through the designated bastion network. You can also deploy firewall rule sets to protect traffic flowing in and out of the hybrid connection.
Now why this is important? The Cloud Security Alliance warns that any on-premises threat can be used to propagate to the Azure land and a compromised on-premises machine can be used to scan your whole cloud networks. Therefore, it is important to govern this hybrid connectivity as it might become the weakest link.
In this architecture, there is a need to minimize the number of tunnels between on-premises and the cloud, and those connections should terminate in a bastion VNET (shared services VNET) where security controls are in place to inspect traffic passing between the on-premises land and Azure land.
Let’s say that one of the application servers in LOB 1 VNET wants to access a server on-premises. Traffic goes to the bastion VNET (shared services), thanks to the VNET peering, then to the VPN gateway in the bastion VNET, to the internet inside the VPN tunnel, down to the VPN gateway on-premises and finally to that server. Traffic is inspected at the central firewall deployed in the bastion VNET.
The hub and spoke architecture reduces complexity also when it comes to connecting your VNETs inside Azure. If you have four VNETs and you want them to talk to each other’s, a VNET peering is required between each set of VNETs. The other problem here is this requires a lot of management to maintain those VNET peerings, and this doesn’t scale well when the number of VNETs you have increase. There is also a limit on the number of VNET peering you can have in a subscription.
Moreover, security becomes a major issue. How can you regulate and inspect traffic going between VNETs? what if you have a VNET for production workloads and a VNET for dev and test, and you don’t want any traffic to be allowed from dev/test VNET to your production VNET? You can’t assume that everything in the Azure land is trust worthy, as perhaps different teams might be responsible of managing different portion of your Azure network/subscriptions.
Of course, you can use Network Security Groups, but this only gives you L4 traffic inspection and managing those network security groups become difficult. Moreover, if your devops team are managing the LOB subscription, they can easily change the settings of those network security groups configurations.
The Hub and Spoke architecture solves all those problems. Each LOB VNET maintains a VNET peering only to the HUB VNET. We then deploy a centralized firewall servers in the Hub VNET. This architecture scales well as the number of VNETs increase as we only need one set of VNET peering for each LOB VNET.
Any connectivity between two VNETs should go to the HUB VNET (shared services) where centralized firewall nodes are used to inspect and regulate traffic. Only the security team can change the configuration of the central firewalls, leaving the DevOps team with nothing but to request access from security. This model scales well as the number of VNETs increase as you only need one set of peering between LOB VNETs and the Hub VNET.
Since now we have a HUB VNET and all LOB VNETs connect to that Hub VNET (a.k.s bastion VNET)(a.k.a shared services VNET). This introduces another opportunity of optimization. We can move common infrastructure services across all LOB VNETs and host them in the HUB VNET. Traditionally, you might have a domain controller, CA serve and file server in each LOB VNET. If we have two LOB VNETs, this means we are deploying six servers, three in each LOB VNET.
Instead, we can deploy only one set of those servers in the HUB VNET and offer these common services to all LOB applications. We reduced the number of servers from six to three. If a server in one of the LOB VNETs wants to access a domain controller, it uses the VNET peering to the HUB server to access the domain controller in the HUB VNET. This also makes onboarding new LOB easy since common infrastructure services are already deployed and offered as a service by the HUB VNET.
Jumpboxes are devices or machines we use to connect from to other workloads in the cloud. Instead of allowing RDP or SSH access from everywhere to your valuable assets in the LOB VNETs, we only configure our firewalls to allow these connections from the jumpbox server. A perfect place to place the jumpbox is to host it in the HUB VNET as it already has a peering connection to all other VNETs. In some cases, people might deploy a jumpbox per VNET but it’s up to you.
Now let’s connect all the dots together. In the cloud, the first thing we do is to create that HUB VNET (shared services VNET). In that VNET, we place our VPN gateway, centralized firewall, jumpboxes and shared services infrastructure (domain controllers, file servers, CA server,…).
Connectivity between Azure land and on-premises land is terminated at the HUB VNET so our centralized firewall in that VNET can inspect traffic passing through that hybrid connectivity. This enforces our governance and security rules and is the foundation of all other security controls.
For each LOB application, we create a dedicated VNET (spoke) and connect that VNET only to the HUB VNET. Traffic passing between VNETs must pass through the centralized firewall deployed in the HUB VNET. Any LOB VNET can access shared services infrastructure (domain controller,..) by using the VNET peer connection to the HUB.
Only the security and network engineers can manage the HUB VNET (subscription) and DevOps team are not allowed to manage that shared services subscription. On the other hand, DevOps teams are given freedom to manage their LOB VNETs (subscription). This creates a perfect balance between agility and speed of delivery that the cloud promises, and security and governance from the other hand.
With such design, we have a loosely coupled architecture. Think about it. Each LOB is deployed in its own VNET (and/or subscription) and shared services, firewall, and hybrid connectivity are all deployed in a separate HUB VNET (and/or subscription). If we no longer need to have a LOB application anymore in the cloud, we can easily delete that subscription/VNET without affecting anything else in the cloud. There is no obvious dependencies between workloads deployed in LOB1 and LOB2.
Segregation of Duties
Segregation of duties is easy to implement in this architecture. SecOps and NetOps mange the HUB subscription where hybrid connectivity and central firewalls are deployed. DevOps don’t have access to that subscription or change the firewall settings to make their life easier. They need to comply to the central security policy.
From the other hand, since each LOB application deployed in its separate VNET (and/or subscription), DevOps team can be given a lot of control on that subscription to create resources and deploy faster. DevOps or developers working on one LOB might not be the same DevOps or developers working on another LOB application. With this architecture, we can give each set of people access to their own LOB subscription which enforces the principle of segregation of duties.
Auditing and Logging
Another key and important security design pattern I want to share with you here is re-thinking on how to store, access and retain logs, monitoring data, playbooks and other security/management related resources.
Most cloud resources that you deploy in Azure have a setting to configure auditing or monitoring, and usually you need to store this data somewhere (a storage account or a log analytics workspace). Let’s say you have a separate subscription for your LOB called the HR LOB subscription. You have a VNET there with all your virtual machines and other Azure resources. When you deploy resources inside the HR LOB subscription, you want to store the audit logs somewhere (a storage account), so you create a storage account in the HR LOB subscription, and everything works fine.
This will work just fine, but there are things I want to share with you here. First, audit logs and security logs are classified as critical data that should be secured and not altered. The integrity of such information should be maintained, and if your developers’ team are owner of the HR LOB subscription, then they can do something bad and delete that audit data as they have access to that storage account as it is deployed in the same subscription.
Instead, you want to preserve the integrity of such logs by storing them in a separate isolated place where developers can’t change. They might have read access to the logs to troubleshoot problems, but they should not be able to change or delete that data. One option is to create a separate subscription (Auditing and Logging Subscription) with only the security team has full access, and everyone else read access.
When you then configure audit logging or any type of logging on your applications, you create a storage account or log analytics workspace in the Auditing and Logging Subscription and configure your application or Azure resource to use that.
Another though I have is what if you stored logs and auditing data for the HR LOB applications in the same HR LOB subscription, and then for any reason you don’t need that HR application anymore and you want to delete that subscription? Deleting the subscriptions means losing all the audit and log data. Sometimes, for compliance reasons, you need to maintain the logs for a long period of time. Therefore, by storing the logs in a separate subscription solves the problem. You can delete the HR subscription while maintaining the log data for a long period.
Get the PowerPoint Slides
You can access the slides from SlideShare by following this link.
You can also download my presentation in PowerPoint that contains all slides, animation and each slide contains a notes section that explains each concept.
Get my latest book about Cloud Migration
This book covers a practical approach for adopting and migrating on premises systems and applications to the Public Cloud. Based on a clear migration master plan, it helps companies and enterprises to be prepared for Cloud computing, what and how to successfully migrate or deploy systems on Cloud, preparing your IT organization with a sound Cloud Governance model, Security in the Cloud and how to reach the benefits of Cloud computing by automation and optimizing your cost and workloads..
Subscribe to my YouTube Channel
In my YouTube channel, I post videos about cloud security and Microsoft MVPs story to help people understand cloud and cybersecurity in simplified and professional way.