Kubernetes Use Case
The Previous State
A large retail company had several services to support their point of sales system and inventory control systems. The services were deployed in a standard JEE server as a single deployment. The system did not scale horizontally without creating multiple instance of the same JEE server and deploying all services together. Any change to a single service required a full regression test and deployment of all services. Standard bug fixes took 4 weeks to be deployed to production after code completion and new features often took as long as 8 weeks after code completion to be deployed to production.
The Proposal and Solution
SUM Global was asked to create an architecture that would scale and deploy each service independently. After careful review of the existing services, we determined the best approach was to create a service mesh deployed into a Kubernetes cluster utilizing Istio for managing the service mesh. Each service was broken out into its own project, build pipeline and deployment. The Kubernetes cluster for the initial load estimates and service deployment was created with 5 nodes. There were several challenges faced by the customer related to creating proper RBAC for the services, rules for scaling and defining proper ingress and egress points in the cluster. Most of the challenges were related to not having to previously answer these questions before. The solution involved creation of Continuous Integration and Delivery pipelines for each service. The new automated testing rigor required for CI/CD was also a challenge. SUM Global worked with the new service teams to design the unit tests and automated regression test suites appropriate for automated deployment. Finally SUM Global added Helm charts to further automate deployment of the service mesh and Terraform definitions for creating the Kubernetes cluster.
Results
The final solution allowed the customer to fully automate the creation of the Kubernetes cluster and the deployment of the Kubernetes tools such as Istio into any cluster created in minutes. Existing and new services were deployed to development and testing clusters in less than a minute from commit of a code change by a developer with full regression and unit testing completed. Deployment to production time was reduced from several weeks to as fast as 5 minutes from code completion to final production deployment. All services automatically scaled under load and returned to minimal deployments as load was reduced. Security was dramatically increased and access to the services was limited to only the external endpoints requiring the services within the mesh. The increase in productivity and code quality has allowed the customer to proactively address market changes and demands. The team is dramatically more responsive to end user issues and can now rapidly address problems.
Social Media