WatchGuard Endpoint Release Process
Recent news of a global IT disruption caused by a security vendor’s content update has driven important conversations about quality assurance processes for endpoint products and content updates. At WatchGuard, with more than 30 years of experience in this industry, we know well the sensitivity of the update process and wanted to take this opportunity to highlight the processes we have in place to protect our valued partner community and customers from the impacts of a flawed update rollout.
Endpoint security products are closely intertwined with the operating system (OS) and thus they require stronger quality processes. These products’ uniqueness and privileged OS access requirements make the development and quality assurance (QA) process harder than other types of software development. Due to this type of software running on ten to hundreds of millions of endpoints and in extremely diverse settings, we cannot test all the exact same environments in which the product will end up running. To solve that problem, the technical teams at WatchGuard have implemented a process that, while keeping the rhythm of releases, limits the possibility of disrupting normal operations.
WatchGuard’s Endpoint Product Update ProcessPhase 1 - Friends and Family Preview:The process, which is explained in this tech article, starts after the Quality team completes all the internal alpha and beta testing procedures on a new release. Once the software is certified, we start with a stage we call internally Friends & Family testing; in essence, because it started being exactly that, friends and family testing the new solution in production. Originally, nearly 10 years ago, we started by upgrading our personal systems, both corporate systems and personal devices at home. Our internal WatchGuard systems are among the first ones to be included in the Friends & Family deployment – not only the personal computers but also the production servers. Over time, this environment has become much more diverse, with hundreds of accounts and thousands of endpoints. Some of our most strategic partners and some customer administrators wanted to join early adoption of our new versions, which adds these customer environments to our early Friends & Family testing. All systems included in this stage are highly monitored. We added extended telemetry to verify that the new version does not behave differently from the one being upgraded. Among the extended data, we upload and monitor for potential crashes and errors, but also health data, such as memory consumed or average CPU usage. Depending on the changes, we maintain this stage for enough time to verify that both the initial deployment was successful, and that it also works properly for a sufficient period of time. Our Support team is also very active during this stage to identify any potential new issues. In fact, they provide the main input to decide if we can move forward to the next step of the upgrade release process. When our Support team is satisfied with the version and all the metrics are under the defined parameters, we proceed. Phase 2 – Controlled Preview:Next for us is to notify on the Cloud console that a new version is available. The intention is to provide our partners and customers with a notice that they can start deploying the new release. It is possible, as described in this tech article, to perform the upgrade of your systems in a controlled manner. We stay at this stage for several weeks, monitoring the number of new devices being upgraded. As in the Friends & Family stage, our Support teams are very active in identifying anomalous behaviors that might be related to the new version. Phase 3 – Automatic Upgrade Process:Once we are comfortable with the new release, we start the automatic upgrade phases. Again, the number of phases depends on the changes, but they are typically divided into three to four stages, during which we start pushing the upgrade to customers in those stages. WatchGuard’s Content Update Process:The process of delivering content updates is similar. In this case, we have a staging environment similar to our Friends & Family, with hundreds of accounts and thousands of devices. Once the content is certified, we first publish the update to this environment. Similarly to the Friends & Family stage, this environment is highly monitored for health data. Any deviation over the previous performance line is evaluated and retested in this environment. Only when that process is complete – without reporting any new issues – do we push the update to our partners and customers. |
I want to take this opportunity to acknowledge the effort and resulting value of the work our internal Development, Quality Assurance, DevOps, and Support teams do daily to ensure we keep the solution up to date against new threats and avoid problems for our partners and customers. I also want to reiterate WatchGuard’s ongoing commitment to revisit, revise, and evolve these processes as appropriate to continue earning your trust.
Finally, I have been in the industry for more than twenty-five years and I know the pain caused by a failed rollout, and I don’t like seeing this happen to other companies. When this happens, our impulse is to double down on the effort to verify that we keep our partners and customers safe from real threats, and to stay vigilant in constantly reassessing our internal processes.