25 ways to innovate with hyperconverged infrastructure | Windows Server Summit 2019

ArticlesBlog

Written by:


>>Hi, I’m Cosmos from
the Core OS storage team. Let’s talk about
hyperconverged infrastructure. HCI is rapidly becoming the way that organizations modernize
on-premises infrastructure. By consolidating from proprietary dedicated storage
arrays and network appliances to simply industry standards
servers that bring their own software-defined
storage and networking, organizations around
the world are lowering costs, increasing performance, and
simplifying their operations. In fact, according to research by
the Enterprise Strategy Group, 54 percent of organizations with a matured datacenter
modernization strategy, expect that deploying hyperconverged
infrastructure will be among their most significant IT projects
for the next 12 to 18 months. That’s not just aspirational talk,
it’s already happening. According to data from IDC, hyperconverged infrastructure
investment grew 57 percent last year. It’s now nearly two billion
dollars per quarter. That’s a lot of spending, and it means a lot of
opportunity for you in IT. It’s an exciting time for on-premises infrastructure and especially hyperconverged
infrastructure. So how do you get
from the Hyper-V you already have to hyperconverged
infrastructure? The key is Windows Server 2019. The latest version of
Windows Server includes everything you need to
deploy and manage HCI, including Hyper-V, the foundational hypervisor
of the Microsoft Cloud; software-defined storage, specifically Storage Spaces Direct;
Software Defined Networking, and Windows Admin Center; the future of Windows
Server management, which has no additional cost
beyond your Windows licenses. What’s more, together
with our partners, we recently launched the
Azure Stack HCI Solutions program. An Azure Stack HCI solution
combines all of these with prevalidated hardware
that’s designed and tuned especially for
hyperconverged infrastructure, so you get up and running
quickly and smoothly. Another way to look at it, it’s the same hyperconverged
compute, storage, and networking with
the same hardware testing and validation criteria
as Azure Stack. But instead of being
geared toward running IaaS and PaaS within
Azure consistent portal, it’s a more familiar way to run virtualized apps on-premises
with increased efficiency. Between Windows Server 2019, Windows Admin Center,
and Azure Stack HCI, there is a lot to cover. To help us unpack it, I’d like to introduce
my colleague Greg, from the Core OS
networking team. Hi Greg.>>Hi Cosmos.>>Here’s what we’re going
to do, a lightning round. We’re going to cover 25
things you need to know about HCI in just 25 minutes. It’s going to go fast, so we’re also publishing a blog
with accompanying details and links to documentation so you can learn more about
anything we cover. All right. Greg, you’re ready?>>I’m ready.>>Let’s do this.>>All right. [MUSIC].>>First, we have to start with
the Azure Stack HCI solutions. Launched earlier this year, these are a big deal
because they’ve really unlocked all the infrastructure
capabilities of Windows Server 2019 for broad deployment and on a broader range of compatible
hardware than ever before. In fact, there are already over
70 solutions available from our 15 partners
covering all corners of the world and available
for purchase right away. Now, you can browse them directly on Microsoft.com in a completely new
Azure Stack HCI Catalog. In your browser, navigate
to Microsoft.com/HCI. That’ll take you to
a marketing page with a big blue button labeled
catalog. Click the button. Here, you’ll see
a rich store style experience where you can browse and filter
all the available solutions. For example, maybe you’re looking for an HPE solution that’s all flash, with iWARP networking
and available in Europe. Well, there you have it. You can even click the one you’re interested in to link directly to the right page on
the partner website so you can learn more and engage
with their sales team. [MUSIC].>>Now, Greg, I noticed
the HCI Catalog no longer makes a distinction
between HCI standard, and HCI premium which used to
mean SDN. So what’s changed?>>Well, Cosmos. Now,
all HCI solutions include what is required for SDN. Now, it doesn’t mean
you have to devote the entire infrastructure to SDN. You can have networking with
your VLAN configurations co-exist side-by-side with
SDN on the same hardware. Let’s take a look. You can go
into Windows Admin Center, select Network or Virtual Network, and your machine will get
whatever settings it needs. [MUSIC].>>Now, that’s great. But
there’s still the small matter of actually deploying
SDN. How do I do that?>>Well, that’s easier than ever. With SDN Express, you
can go to GitHub, download a set of scripts
called SDNExpress, and then you run SDN
Express and you’ll get a very helpful wizard
that will walk you through all the steps you need
to get SDN up and running in probably about
30 minutes or less. [MUSIC].>>Now, we can’t talk
about what’s new for hyperconverged infrastructure
without talking about the most visible
and obvious thing, and that’s Windows Admin Center, the future of Windows
Server management and, certainly, the future of hyperconverged
infrastructure management. In fact, in Windows Admin Center, there’s a rich set of dedicated
screens that are especially for Managing Storage Spaces Direct and Software Defined Networking. You can easily do things
like provisioned volumes, monitor storage spaces jobs, get a view of your
virtual machines across the whole cluster including
their resource consumption, and dive into troubleshooting
your hardware with rich information about
servers and drives.>>With SDN, you can configure
your virtual networks, you can configure
access control lists, instead of the gateways
that your applications need in order to get outside of
their virtual networks as well. [MUSIC].>>When Storage Spaces Direct first launched in Windows Server 2016, by far the top feedback, was a request for
a better user interface, and that’s what we’ve delivered
with Windows Admin Center. The next most requested feature
was deduplication and compression for the
Resilient File System, ReFS. Microsoft’s recommended file system for hyperconverged infrastructure. Deduplication and compression is a technology that saves you space, by identifying duplicate portions of files and then only
storing them once. The savings you can expect
depend on what you’re storing, but they can range from about 30
percent for videos and music, all the way up to about 90 percent with highly repetitive workloads, like ISO files, VHD files, and especially, backups
of those files. To make that clear,
90 percent savings, means you get up to 10 times more usable storage capacity for free. It’s easier than ever to
turn on deduplication, it’s just a single click of a rocker switch in
Windows Admin Center. Sometimes, even with features
like data deduplication, you just need a lot of
raw storage capacity. This is especially true for use
cases like backup and archival. In Windows Server 2016, the maximum storage capacity in
a single cluster was one petabyte. In Windows Server 2019, that has increased by a factor
of four, to four petabytes. To put that in perspective, that’s enough space to store all
of Wikipedia in every language, with complete edit history,
uncompressed 50 times. That’s not just a theoretical number. In fact, in Microsoft
Ignite last fall, we partnered with our friends
at QCT to build such a system, with eight of their biggest
4U rackmount servers, we built what we believe is the largest Storage Spaces Direct cluster ever outside of a public Cloud, at very nearly four petabytes. [MUSIC].>>With Azure Stack HCI, you can deploy anywhere from 2-16 server nodes in
a single cluster. You can always start small with
say four, and then add a fifth, sixth, seventh and so on to scale with the needs
of your organization. But what if you want to deploy hundreds or thousands
of server nodes? Well, with Windows Server
2019, now you can. Suppose we start with
these eight servers in a cluster. What’s new in Windows
Server 2019 is we can encapsulate this cluster in
something called a cluster set, and you guessed it, we can add additional clusters
into the same cluster set. What’s important is this cluster set will present a unified
storage namespace, which means a virtual
machine running on one cluster can seamlessly live migrate to a host in a different cluster and
continue to access its storage, even though it’s
storage stayed behind. [MUSIC].>>In 2016, we shipped SDN, and we heard a lot of
feedback from customers that they want to
have faster gateways. So we’ve worked to
improve in Windows Server 2019, the gateway performance. In many cases, we’ve improved
by over three times. If you have enough connections, you can go from four gigabits
per second up to 18 gigabits per second
through a single SDN gateway. This is for GRE tunneling, and one of the really good
use cases for GRE tunneling is in connecting two
network controllers that are running in different sites, so they can connect their virtual networks and have
the workloads running in each, talk to each other, as
if they’re one Network. [MUSIC]>>With each release, Windows
Server gets more scalable, the numbers get bigger. It’s not just about capacity, it’s also about performance. Windows Server is on the leading
edge of X86 hardware innovation. Consistently one of the first
operating systems and hypervisors to support new hardware technology like the latest Intel
Xeon scalable processors, Remote Direct Memory Access, RDMA networking, NVME drives, and now Intel Optane, including Intel Optane DC
persistent memory. These are 3D cross point-based persistent storage
that’s DDR4 pin compatible, meaning it goes into a memory socket. Last fall at Microsoft Ignite, we teamed up with Intel and built a 12 node cluster packed with
Intel Optane DC persistent memory. We used it to set the HCI
industry record with over 13.5 million IOPS
from a single cluster.>>So it’s not just about
hardware enablement, but we’ve also been making a lot of improvements to
the networking stack as well that benefit either the host or
the guest and in some cases both. We made an assortment
of feature improvements to TCP/IP and UDP performance, nearly doubling the performance. We implemented receive side coalescing in the Virtual
Switch which gives you a great improvement in throughput reducing the amount of
CPU utilization at the same time. We changed congestion providers for TCP defaulting to
Cubic which gives you higher performance across high
bandwidth but high latency links. Finally for the guest, we implemented the Data Plane
Developer Kit or DPDK for Windows, which gives applications like
video processing the ability to get really fast access to the packets bypassing
the host networking stack.>>Now, it’s not
just the Windows networking team that’s been focused on optimizations. We have been on storage team as well. One example is with
mirror-accelerated parity, a technology that allows you to
create a volume that partly uses mirror resiliency and partly uses parity or
a racial coding resiliency. This lets you get the best of both. Fast writes into the
mirror portion and then maximized capacity through
the efficiency of parity encoding. If we take Windows Server
2016 as the baseline, the performance of mirror
accelerated parity has more than doubled in
Windows Server 2019. With all of that capacity
and performance, you’re going to want to
be able to see it and with Windows Server 2019, you can. Hyperconverged infrastructure now has built in performance history, so you can easily get
data from an hour ago, yesterday or last week. There are over 50 key performance counters spanning
processor usage memory, networking storage latency and much more that are automatically
aggregated and stored. There’s nothing you need to set up, install or configure, it just works. You can access it in Windows Admin
Center where you’ll notice that the charts have a time range picker allowing
you to go back in time, and for more advanced scenarios, you can query using PowerShell. I can’t believe we’re
more than halfway through and we haven’t even
talked about core Hyper-V yet.>>I know, that’s right. We’ve
made improvements to things like Shielded Virtual Machines where we’ve improved it so that even if you don’t have network access to your VM, you can still connect
to it either through the console or through
PowerShell Direct. We’ve also added the ability
to run Linux inside your Shielded VMs with distributions such as Ubuntu, Redhat or SuSE.>>Now, regardless of whether
your organization has adopted Shielded Virtual
Machines yet or not, it’s important to protect
your hypervisor host. That’s never been more true than in the last year where
vulnerabilities like Spectre and Meltdown have really shined a bright light on
side-channel attacks. In Windows Server 2016, Hyper-V used something called the
Classic Scheduler which provides a fair share preemptive
round robin scheduling for virtual processors
essentially at random. In Windows Server 2019, there’s a new Hyper-V
scheduler type called the Core Scheduler that
is the new default. This further constraints
virtual processors to the physical core boundaries
further isolating VMs. It’s the default in
Windows Server 2019, but you can actually
use the Core Scheduler on Windows Server 2016 as well. Microsoft backported it last fall
in a cumulative update. In Windows Admin Center,
under Hyper-V settings, you’ll see a new radio button
called hypervisor scheduler type which you can
switch from classic to core.>>So we’ve also been working
to improve web traffic that comes and goes from
a Windows Server Machine. We’ve done this through HTTP/2, a technology that we first
shipped in Windows Server 2016, but we’ve made it better
in Windows Server 2019, by implementing things
such as connection coalescing which allows websites with a common second level domain to share the same
certificate and as a result, the same TCP connection. This gives you fewer
run trips to the server and better overall performance
for your web applications. At the same time, we’ve also improved the Cipher Suite selection process
which reduces the number of connection failures while
at the same time still enforcing the blacklisted ciphers
that are no longer secure>>In Windows Server 2019, the core failover clustering
technology gets more secure as well. In particular failover
clustering will now use exclusively Kerberos or certificate based authentication for all cluster and
storage traffic between nodes. This means the dependency on the NT LAN Manager or NTLM protocol
is completely removed. There’s no change required by users or deployment tools
to make this work, it’s just the out-of-box behavior. Speaking of clustering. An important part of
the operation of any cluster is keeping your Windows servers fully patched with
the latest updates, and it’s never been easier than with Cluster Aware Updating for HCI. Cluster Aware Updating
is a technology that orchestrates the roll-out of updates across
clustered server nodes. Essentially, it takes
the pause, drain, install, restart, and resume workflow and repeats that across all nodes
in the cluster for you. Now in Windows Server
2019, it’s even better. It has special integration
with storage spaces to wait after each node restarts
for storage resync to complete, and it more deeply integrates with Windows Update to check
if an update really, truly requires a restart and it
only pauses and drains nodes if the update does require it minimizing disruption
to virtual machines. With Windows Admin Center, you can easily check for updates and kick-off an updating run
with just a single click. With an all new cluster aware
updating tool that looks almost exactly like the Windows
Update tool for a single machine.>>Let’s talk about quorum. When you deploy a cluster in your core data center with
like six or eight or 12 nodes, you don’t really have
to think about quorum. But our telemetry shows
us that with HCI, you’re more often deploying at
the Edge in branch offices, remote sites or field installations
taking advantage of HCI’s minimum footprint of just
two servers with four drives each. You don’t even need a switch. You can just wire them back-to-back
with a crossover cable. In these kinds of two node clusters, thinking about quorum is essential. In Windows Server 2016, there were two ways that
you could use a witness to provide quorum to
a two-node cluster. You could use a file share from
another on-premises server or you could connect to
the Azure Cloud for a Cloud witness. But what about deployments
that maybe don’t have any other on-premises infrastructure and don’t have a reliable
connection to the Internet? Windows Server 2019 introduces
a third option; the USB witness. Literally, just plug a USB key into a compatible router or switch and the cluster will use
that for a quorum. [MUSIC].>>Whenever you deploy a cluster, even a small one, you’re doing it for high
availability for fault tolerance. Yet, just last year, there was no HCI solution
available from any vendor where the storage
could survive multiple, simultaneous failures
with just two nodes. The reason is that with
a two-node cluster, storage resiliency is provided
using two-way mirroring; essentially, keeping one copy
of data in each server. This means you can survive a drive failure or you can
survive a node failure, but if both happened
at the same time, your virtual machines go down because they lose access to their storage. This wasn’t great. So our engineering
team took inspiration from an old technique called
Raid 51 or Raid 5+1. The idea is to do parody
resiliency within one server, and then mirror that across to the other server giving you
parody on the other side as well. This is what’s often called
a nested-raid level. In Windows Server 2019, Storage Spaces has
a new resiliency type, it can now do nested resiliency. This means you can survive multiple simultaneous
storage failures even with just a two-node cluster. That includes a drive
failure in each server at the same time or a drive failure
and the other server going down, both are totally fine. [MUSIC].>>While storage
resiliency is important, it doesn’t eliminate
the need for backups. But for smaller sites
and branch offices, it doesn’t make sense to have costly backup
infrastructure on-premises. But for that, we have
Azure Site Recovery. Now, Azure site recovery is integrated into Windows
Admin Center with a one-click experience that lets you backup your VMs to Azure
where they’re safely stored. [MUSIC].>>Now typically, you don’t
just have one branch office. That’s why you call
them branch offices. Small branch offices
may not have dedicated IT personnel to respond to problems, so you need to monitor the HCI you deploy to all your
branches centrally. The Health Service is
the component in Windows Server that provides the alerts you see on the Windows Admin
Center dashboard. Now, it integrates
with Azure Monitor. Simply install
the Azure Monitor agent on each server in the cluster, and then when something goes
wrong in any branch office, say a server goes down
or you’re running out of capacity or perhaps a drive fails, Azure Monitor will send you
an e-mail or SMS notification, showing you-all the details
of what’s happened so you can dispatch someone from
headquarters to respond. [MUSIC].>>Now, as you move
more workloads into Azure, the need to connect to Azure
becomes even more important. But when you have many branch offices each one with
their own infrastructure, it becomes difficult to deploy site-to-site our express arc
every one of these sites. So for that, we built
Azure Network Adapter. This is an integration into
Windows Admin Center that makes it very easy to
connect a single server running pretty much anywhere to an Azure virtual gateway
so you can get accessed from that server into
your Azure Files or your other Azure
VMs running in Azure. To find Azure Network Adapter, go to the networking settings of any server in Windows
Admin Center and just click on the Azure
Network Adapter button. [MUSIC].>>Now, when you have a lot of
remote offices or smaller offices, they may not always have a really
fast connection to the Internet. So for these offices, you want to make sure that
any background traffic that’s going between them or your core
data-center or to Azure, gets a lower priority. For that, we have
a really good technology Windows Server called LEDBat, which is another congestion provider that will back off these lower priority network flows in order to let the higher priority
traffic take over. When that higher priority
traffic slows down, then the low-priority traffic
will pick back up again, usually, within a second or two. This is easily enabled either through PowerShell or through SCCM for distributing updates just by going to your distribution point settings if they’re running
Windows Server 2019. [MUSIC].>>We’re almost out of time. Speaking of time, for those of you that are in regulated industries, where you need to have
really accurate clocks, sometimes down to
microsecond accuracy, we made a lot of improvements
to get you there by implementing features such
as precision time protocol, software time-stamping, even additional granularity on
the clock to get it more accurate. We’ve implemented traceability, which gives you the ability to go
in and see the logs where your clocks were set so
that you can go back and prove that your clock is as
accurate as it needs to be. Finally, we added
leap-second support. Cosmos, do you know that every few years a second key is
added or removed from the clock?>>So, literally,
just some minute will randomly have an extra second?>>Yeah, that’s right.
Take a look at the screen. You’ll see it when in
action right here.>>Wooh.>>There it is, 60 seconds.>>Weird.>>You don’t see that every day. [MUSIC].>>Finally, number 25. This one’s not a new feature but
it’s an important milestone. Last year, around this time, we shared that 10,000 clusters around the world were running
Storage Spaces Direct, far exceeding our wildest
expectations for how quickly, you, the Windows Server community
would roll out this technology. Well, one year later, I’m humbled to share that over 25,000 clusters worldwide are now
running Storage Spaces Direct. This is an astonishing rate
of growth since last year. The momentum is just amazing. On behalf of the Windows
Server Engineering team, we want to thank all of you. The community on Twitter, on Slack, on TechNet, our wonderful partners and, of course, our customers. Thank you for your trust and
your business. All right. That is it for our
lightning round. We did it. Twenty-five things in 25 minutes. Was this a good format? Should we do more sessions like
this one? Let us know. Tweet us with the hashtag below
and tell us what you think. As you can see, it’s
an incredibly exciting time for on-premises infrastructure and especially hyperconverged
infrastructure. Whether you’re deploying
a tiny two-node cluster or a petabyte scale, whether you need the best
performance or the best security or you just want that gorgeous new dashboard in
Windows Admin Center, HCI gets better for everyone
with Windows Server 2019. To get started, find solutions from your preferred hardware vendor
at Microsoft.com/HCI. Install Windows Server 2019
from aka.MS/WindowsServer. Manage with Windows Admin Center, which you can download from
aka.MS/Windows Admin Center, and optionally, connect to helpful Azure services
that make On-Prem better. Be sure to watch the rest of
the Windows Server summit, including Haley’s session,
where she’ll tell you more about that. Thank you very much.

Leave a Reply

Your email address will not be published. Required fields are marked *