Managing a worldwide fleet of Home windows desktops, laptops, and servers for Google’s inside groups may be tough, with a relentless stream of recent instruments, excessive expectations, and stringent organizational wants for safe, code-based, scalable administration. Add in a globally distributed enterprise and prolonged work-from-home necessities, and you’ve got a recipe for potential bother.
Right this moment we would wish to stroll you thru among the instruments that the Home windows Operations (WinOps) staff makes use of at Google, and why we made (and open-sourced) them. Our staff is continually working to enhance the method we use to handle our consumer fleet of laptops and desktops, and we have spent the previous a number of years constructing open supply, infrastructure-as-code instruments to just do that.
Now that we’re all working from residence, these decisions have enabled us to maintain working at scale remotely. Let’s dig into just a few frequent Home windows administrative challenges and the way our open instruments may also help.
Challenges with scale
If you handle Home windows in a big, globally distributed enterprise setting, issues of scalability are entrance and middle. Many widespread administrative instruments are GUI-based, which makes them simple to be taught however tough to scale and combine. An administrator is commonly restricted to the performance constructed into the product by its vendor. Many occasions, core administration suites lack qualities that we might contemplate important in a dependable manufacturing setting, together with the flexibility to:
- Peer evaluate edits and to roll modifications from side to side on demand
- Implement platform testing, with assist for automation pipelines
- Combine seamlessly with tooling that additionally manages our different main platforms
As a result of they depend on specific network-level entry, many of those merchandise additionally rely closely on a nicely outlined company community, with clear distinctions between inside and outdoors .
At Google, we have been rethinking the best way we handle Home windows to handle these limitations. We now have constructed a number of instruments which have helped us scale the environment globally and enabled us to persistently assist Google staff, even when main sudden occasions occur.
Open supply merchandise are more and more a key to our success. With the appropriate data and funding, open supply instruments may be prolonged and tailor-made to the environment in methods different purposes merely can’t. Our designs additionally focus closely on configuration as code, fairly than person interfaces: Code-based infrastructure offers optimum integration with different inside techniques, and permits us to handle our fleet in methods which can be audited, peer reviewed, and completely examined. Lastly, the rules of the BeyondCorp mannequin dictate that our administration layer operates from wherever on the earth, fairly than solely inside the corporate’s non-public community.
Let’s dig into a few of these instruments, organized by what they assist us get achieved.
Prepping Home windows gadgets
Glazier, a software for imaging, marked our staff’s first foray into open supply. This Python-based software is on the core of our Home windows system preparation course of. It focuses on text-based configuration, which we will handle utilizing a model management system. Very similar to code, we will use the versatile format to jot down automated assessments for our configuration recordsdata, and trivially roll our deployments again and ahead. File distribution relies round HTTPS, making it globally scalable and straightforward to proxy. Glazier helps modular actions (akin to putting in host certificates or gathering set up metrics), making it easy to increase with new capabilities over time as the environment modifications.
Safe, modular imaging with Glazier helps put together gadgets
Conventional imaging tends to rely closely on community belief and presence inside a safe perimeter. Methods like PXE, Energetic Listing, Group Coverage, and System Heart Configuration Supervisor require you to both arrange a tool on a trusted community phase or have delicate infrastructure uncovered to the open web. The Fresnel venture addressed these limitations by making it doable to ship boot media securely to our staff, wherever on the earth. We then built-in it with Glazier, enabling our imaging course of to acquire important recordsdata required to bootstrap a picture from any community. The end result was an imaging course of that could possibly be began and accomplished securely from wherever, on any community, which aligns with our broader BeyondCorp safety mannequin.
Fresnel permits imaging from any community on the earth
The distant imaging and provisioning course of included a number of different community belief dependencies that we needed to resolve. Puppet offers the idea of our configuration administration stack, whereas software program supply now leverages GooGet, an open supply repository platform for Home windows. GooGet’s open package deal format lends itself nicely to automation, whereas its easy, APT-like distribution mechanism is ready to scale our package deal deployments globally. For each Puppet and GooGet the underlying use of HTTPS offers safety and accessibility from any community. We additionally make the most of OSQuery as a method of amassing distributed host state and stock.
GooGet helps us automate package deal distribution and deployment
Our infrastructure nonetheless has dependencies on traditional Energetic Listing (AD), and the area be a part of course of was a very distinctive problem for hosts that don’t bootstrap from a trusted community. This led to the Splice venture, which makes use of the Home windows offline area be a part of API and Google Cloud companies to allow area becoming a member of from any community. Splice permits us to use versatile enterprise logic to the historically inflexible area be a part of course of. With the flexibility to implement customized authentication and authorization fashions, host stock checks, and naming guidelines not sometimes obtainable in AD environments, this venture has given us the pliability to increase our area nicely past the traditional community perimeter.
Splice helps us be a part of new gadgets onto our Energetic Listing area from wherever
Sustaining our fleet
Deployment is just the start of the system lifecycle; we additionally want to have the ability to handle our lively fleet and preserve it safe.
The Home windows inside replace mechanism is usually adequate to maintain the working system patched, however we additionally needed to have the ability to train some management over updates hitting our fleet. Particularly, we want the flexibility to quickly deploy a important replace, or to postpone putting in a problematic one. Enter Cabbie, a Home windows service that builds upon Home windows APIs to supply an extra administration layer for patching. Cabbie provides us centralized management over the replace agent on every machine in our fleet utilizing our present configuration administration stack.
Centralized patch management utilizing configuration administration
We even have Home windows servers to handle, and these hosts current distinctive challenges, distinct from these we face with our consumer fleet. One such problem is how one can schedule routine upkeep in a method that’s simply configurable, automated, and may be built-in with our varied brokers like Cabbie. This led to Aukera, a easy but versatile service for outlining recurring upkeep home windows, establishing durations the place a tool can safely carry out a number of automated actions which may in any other case be disruptive.
Constructing for the long run
Our staff was lucky to have began many of those initiatives nicely earlier than the Spring of 2020, when many people needed to abruptly go away our workplaces behind. This was due, partly, to embracing the concept of constructing a Home windows fleet for the long run: one the place each community is a part of our firm community. Whether or not our customers are working at a enterprise workplace, from residence, or on a digital machine in a Cloud information middle, our instruments should be versatile, scalable, dependable, and manageable to satisfy their wants.
A lot of the challenges we’ve mentioned right here will not be distinctive to Google. Corporations of all sizes and styles can profit from rising safety, scalability, and adaptability of their networks. Our aim in opening up these initiatives, and sharing the rules behind them, is to help our friends within the Home windows group to construct stronger options for their very own companies.
To be taught extra about our wider fleet administration technique and operations, learn our “Fleet Administration at Scale” white paper.