top of page

Multicloud

— Written 2023 —

Context

At the time, our systems were solely hosted on a single cloud computing platform, exposing us to potential failures and service cancellations. We realized the importance of mitigating this risk and opted to "spread out" our solutions across a minimum of two cloud computing platforms. By doing so, we were able to minimize our exposure to external factors beyond our control, which posed a significant risk to the company.

My Role

As the Product Manager, I oversaw product discovery and delivery by working closely with a team of 3 DevOps engineers and 2 full stack engineers. Our daily stand-up meetings and retrospectives, supported by the Kanban framework, helped us to execute tasks smoothly. I was also responsible for managing the product backlog and roadmap, as well as playing a quasi-QA role in testing. To ensure the success of the product, I engaged extensively with all stakeholders, including the marketing, legal teams, and our C-level executives.

We had two main paths for the solution:

  • Active / Active

    • Istio Service Mesh.

    • Cloudflare Argo Tunnel + Kubernetes.

    • Hashicorp Consul.

    • External DNS

  • Active / Standby

    • Having at least two clusters switched via DNS.

Solution

After researching and negotiating, we found the perfect fit with the autoscaler by Exoscale. It is a consensus to work with the clusters in an on-premises failover for the V1 (Active/Standby).

Results

As the goal of the project was to deliver a multi-cloud active/standby system, we have yet to see actual results beyond the simulated load tests, which performed flawlessly.

Solutions employed:

  • Nginx Ingress Controller

  • Longhorn

  • Cert Manager

  • Cluster Autoscaler by Exoscale

  • Prometheus, Grafana, kube-state-metrics/metrics-server, and Node Exporter

Key Takeaways

  • CI/CD for Kubernetes is essential.

  • It's challenging to maintain a homogeneous environment with different providers and their specificities.

  • Using Longhorn reduces dependence on storage systems from cloud providers.

  • CertManager is useful for certificate issuance.

  • Nginx can be used as an ingress controller instead of relying on the Load Balancer controller of cloud providers.

  • Only a few providers meet our goal of being an impregnable fortress.

  • We had to learn how to implement solutions that were more challenging because we wanted to avoid using the same SaaS solutions.

  • LinkedIn
  • Instagram
  • Flickr
  • Good Reads
  • Letterboxd
  • Spotify

Felipe Faraone​  |  Product

bottom of page