Multicloud
— Written 2023 —
Context
At the time, our systems were solely hosted on a single cloud computing platform, exposing us to potential failures and service cancellations. We realized the importance of mitigating this risk and opted to "spread out" our solutions across a minimum of two cloud computing platforms. By doing so, we were able to minimize our exposure to external factors beyond our control, which posed a significant risk to the company.
My Role
As the Product Manager, I oversaw product discovery and delivery by working closely with a team of 3 DevOps engineers and 2 full stack engineers. Our daily stand-up meetings and retrospectives, supported by the Kanban framework, helped us to execute tasks smoothly. I was also responsible for managing the product backlog and roadmap, as well as playing a quasi-QA role in testing. To ensure the success of the product, I engaged extensively with all stakeholders, including the marketing, legal teams, and our C-level executives.
We had two main paths for the solution:
-
Active / Active
-
Istio Service Mesh.
-
Cloudflare Argo Tunnel + Kubernetes.
-
Hashicorp Consul.
-
External DNS
-
-
Active / Standby
-
Having at least two clusters switched via DNS.
-
Solution
After researching and negotiating, we found the perfect fit with the autoscaler by Exoscale. It is a consensus to work with the clusters in an on-premises failover for the V1 (Active/Standby).
Results
As the goal of the project was to deliver a multi-cloud active/standby system, we have yet to see actual results beyond the simulated load tests, which performed flawlessly.
Solutions employed:
-
Nginx Ingress Controller
-
Longhorn
-
Cert Manager
-
Cluster Autoscaler by Exoscale
-
Prometheus, Grafana, kube-state-metrics/metrics-server, and Node Exporter
Key Takeaways
-
CI/CD for Kubernetes is essential.
-
It's challenging to maintain a homogeneous environment with different providers and their specificities.
-
Using Longhorn reduces dependence on storage systems from cloud providers.
-
CertManager is useful for certificate issuance.
-
Nginx can be used as an ingress controller instead of relying on the Load Balancer controller of cloud providers.
-
Only a few providers meet our goal of being an impregnable fortress.
-
We had to learn how to implement solutions that were more challenging because we wanted to avoid using the same SaaS solutions.