Cloudy-with-a-Chance-of-Computing: Issue #2 is Here!

Your weekly dose of cloud computing insights

May 06, 2023

Hello and welcome back to Cloudy-with-a-Chance-of-Computing, your weekly newsletter for cloud computing enthusiasts! Each week, I'll be sharing some of the most thought-provoking, insightful, and practical resources that I've come across in the cloud computing industry.

In this issue, I've curated some fascinating content that I believe will pique your interest, challenge your thinking, and help you stay ahead of the curve in this ever-evolving industry. First up, we have a surprising story about how and why Prime Video re-architected its audio/video quality inspection solution from a distributed microservices architecture to a monolith application. This move is particularly interesting since AWS is a proponent of serverless and service-oriented architecture. Next, we have an intriguing article that will reveal the similarities and differences of networking services offered by GCP, Azure, and AWS. It's fascinating to see how each provider tackles the same problem with different solutions. Finally, we take a look at a mind-boggling tale of a massive Kubernetes cluster that was scaled up to a whopping 7,500 nodes by none other than OpenAI. The sheer scale of this cluster is mind-boggling! Let's dive in!

Case study: Amazon Prime Video’s move from a distributed microservices architecture to a monolith application - Prime Video | Tech

Amazon Prime Video's audio/video quality inspection solution was originally built using a distributed microservices architecture. However, the team found that this architecture was difficult to scale and manage. They decided to move to a monolithic architecture, which allowed them to reduce costs and improve performance. The team made a number of changes to the architecture. Including, consolidating business logic into a single application process, moving away from S3 for temporary storage and reducing the number of state transitions in Step Functions. As a result of these changes, the team was able to reduce the cost of the service by over 90%. They were also able to improve the performance of the service and make it easier to manage.

“…We designed our initial solution as a distributed system using serverless components (for example, AWS Step Functions or AWS Lambda), which was a good choice for building the service quickly. In theory, this would allow us to scale each service component independently. However, the way we used some components caused us to hit a hard scaling limit at around 5% of the expected load. Also, the overall cost of all the building blocks was too high to accept the solution at a large scale.” - Prime Video | Tech

This case study highlights the importance of carefully evaluating the specific requirements of your application when choosing an architecture. Although microservices and serverless components are often effective tools for scaling and managing complex systems, they may not always be the best approach. It is incorrect to assume that microservices and serverless are always superior to monolithic architectures, as each has its own advantages and disadvantages depending on the use case.

Tutorial: Networking services compared: AWS vs Azure vs Google Cloud by A Cloud Guru

Life is too short to become an expert in AWS, GCP, and Azure, but in today's multicloud landscape, it's essential to have a basic understanding of their respective virtual network services. As more organizations adopt a multicloud approach, with one primary cloud provider and one or more secondary ones, the need for seamless cloud-to-cloud connectivity becomes increasingly important. You never know when you might need to hop between clouds, so it pays to be prepared!

Virtual Private Clouds (VPCs) are a fundamental component of cloud networking, allowing users to create logically isolated virtual networks within their cloud environments. In AWS, VPCs are regionally-scoped resources, which means that they are limited to a single region and do not span multiple regions. However, subnets within a VPC can be defined as either private subnet or public subnet. In Azure, VPCs are also regional resources, tied to a specific region, with subnets also being limited to the region in which they are created. Google Cloud Platform takes a slightly different approach, with VPCs being global resources that can span multiple regions, and subnets being regional resources that can be created within a VPC to enable fine-grained control over networking. For a clearer understanding, refer to the diagrams in the article. The article also delves into the differences in gateway/peering and load balancing implementations among the three cloud providers.

Pass It On: Spread the Cloud Wisdom from Cloudy-with-Chance-of-Computing!

Article: Scaling Kubernetes to 7,500 nodes - OpenAI

OpenAI has achieved a remarkable feat of scaling Kubernetes clusters to 7,500 nodes, enabling them to run large and powerful models like GPT-3, CLIP, and DALL·E. This is not an easy task, as they faced many challenges and learned many lessons along the way. In this article, they share their insights and best practices on how to overcome the bottlenecks and limitations of Kubernetes at this scale. If you are interested in learning more about how they built a scalable infrastructure for cutting-edge machine learning research, you should definitely check out this article.

And that brings us to the end of this week's issue of Cloudy-with-a-Chance-of-Computing. We hope you found the resources we shared useful and thought-provoking. If you found this issue of Cloudy-with-a-Chance-of-Computing helpful, please consider forwarding it to your colleagues, friends, or anyone else who might be interested in cloud computing. Your support can help us reach a wider audience and continue to provide valuable insights and resources.

Thank you for your support, and don't forget to subscribe (if you have not yet) to receive the next issue directly in your inbox!

Happy clouding!

Zilhaz from Cloudy-with-a-chance-of-Computing