What does Spring Cloud do and why should we care about it?

We all love Production! We are so happy when we go to Production. It is absolutely the happiest place on earth, better than Disneyland. So let’s go to Production as quickly and frequently as possible. Let’s get the travel details as simple as possible. Let’s optimize our deployment process so we can continuously deliver our software safely to Production.

The cold reality

The reality doesn’t seem to be that warm and fuzzy. Oftentimes management says to the technical folks “We need to get things going, we need to be able to release this software faster and more frequently so we don’t miss the marketing opportunities and can meet our customers’ needs.” However, the technical guys would probably say something like “We want to do that as well but it’s so hard for us to set up a new service, we have to address a lot of non-functional requirements, and our current process literally has 500 steps to go to Production.” That’s the struggle of the majority of the IT shops for at least the last ten years.

As developers, we have our reasons. We have a large code base. Developers have to cooperate with each other and synchronize the code changes to make sure they don’t break the build. Of course we do Agile. We have daily stand up meetings and we rotate scrum master. After we’re satisfied with the code, we’re still going to throw it over the wall to the operators such as the QA testers, network administrators and middleware administrators. Therefore the overall process is still slow and now we have become Water-Scrum-Fall, where scrum is in the middle of the waterfall process.

Folks started to realize “We need to have a smaller codebase, this large code base is not working.” Then we started to break our monolithic applications into lightweight microservices, where each microservice can be created, evolved, deployed, and managed individually. But we still are going to have to throw the service to the other side of the wall. That’s when we realized that we need to do devOps that allows the overall ownership of the application and do what’s best for the application. Moreover, we needed an automatic platform where it can help us continuously deliver the software to Production. At Pivotal, we believe that Cloud Foundry is the most comprehensive platform that takes care of everything else other than your code.

On the coding side, we want to be able to quickly produce code as well. If we still used the traditional J2EE or Spring framework, it would still take a lot of time to set up a new service. That’s what Spring Boot is all about. It allows developers to quickly build a production-ready and Spring-powered application or microservice without too much code.

But, there’s always a but. Breaking a big monolithic application into small microservices introduced problems with distributed computing system whereas monoliths avoided those problems. Instead of dealing with one big application, we are now managing 20 microservices. How are they going to find each other? How can they be easily configured and managed? What if one microservice became unavailable? All these questions can give us real headaches.

Introducing Spring Cloud

A few years ago, Netflix pioneered solutions to the distributed computing problems that came with their 500 microservices. They did a lot of great open sourced capabilities. We packaged those up and standardized them in this tooling kit that we call Spring Cloud. The goal is still to support moving fast.

Spring Cloud Config

The first problem Spring Cloud solves is configurations. We could use –D in javaOpts, or environment variables, or application properties, but these approaches all require bounce. What if I want my service to still be running when changing the configuration settings? Also, what if I want to keep my configurations all in one place (centralized configuration management) so that I don’t have to duplicate it among multiple instances? In addition, what if I have sensitive information in my configuration such as passwords? They shouldn’t be human-readable unless decrypted. What about auditing? I want to be able to know who made the config changes. Spring Cloud Configuration addresses these use cases.

Centralize configuration allows you to put all the configuration files in a working config service. In your code, you can simply enable the config server, set the spring application name and the url to the config server, and let spring cloud config do its magical work. If your config server is backed with github, every time you have a git commit, you refresh the configuration without restarting the application. You can also let the application refresh itself and there are three common ways to accomplish this: Actuator Endpoint, JMX node, and Spring Cloud Event Bus. Spring Cloud Event Bus is another Spring Boot starter that can be implemented by RabbitMQ or Kafka. All the microservices that are connected to this event bus are going to refresh their configuration when there is a message coming through. This is very powerful.

Spring Cloud Service Discovery

Discovering microservices seems like a DNS problem on the surface. After all, DNS is all about discovering a site or a service, right? But if you think deeper, DNS has many limitations. First of all, DNS utilizes cache. The service in DNS registry may not be available so you might make a call and it will just hang there. Unless you have a very aggressive timeout mechanism, this is going to cause slow performance for your application. Another limitation is that DNS only does round robin. Many use cases require load balancing to be application specific. For example, what if I want my microservices to only connect to other microservices in the same availability zone; or I want to segregate instance nodes to handle different ranges of requests; or I want a stateful operation where subsequent requests go to the same node because I’m downing a video and I can’t lose the state. In other words, I need more intelligence in my load balancer, a software-based load balancer that can meet all these requirements. Moreover, if you use DNS, you have to go all the way out of the cloud then come back in. So that can be a big performance bottleneck. For these reasons, we have a service discovery pattern to allow the microservices to register themselves and intelligently discover and consume each other.

The implementation of the Service Discovery pattern can be Netflix Eureka , Consul by Hashcorp, or Apache Zookeeper, you can write your own implementation as well. Essentially, we have two parts here: First, service registration. A Service needs to let the world know of its existence. This is a write operation. The other part is Service Discovery. A service client needs to find and consume what the service has to provide. That’s a read operation.

The dynamic nature of Spring Cloud Service Discovery provides many flexibilities for Service Discovery Clients. You can use a Service Discovery Client as an edge service behaving like a gatekeeper. It connects to the middle tier services or backend services and does data transformations in between the requests and services. It can perform client-side load bouncing by using Ribbon. Instead of relying on the server side round robin load balancing, the edge service is smart enough to route different traffic to different service instances based on the business needs. All the use cases above can be addressed with an edge service. You can also define different edge services for different end user devices. For example you may have one edge service to communicate with Android devices, and another edge service to handle all incoming requests from iPhone devices or all html5 devices. The possibilities are endless.

Spring Cloud Circuit Breaker

Spring Cloud also ensures that microservices are highly available in the cloud. We all know that there is no service that can be 100% available all the time. Services can be easily unavailable due to a power outage, accidental viruses, or a new release not being able to handle the production load. So instead of trying to optimize high availability of one node, let’s optimize the time that system can repair itself. If it takes zero seconds for the system to come back in an event of failure, you can confidently claim that your service is 100% available. This changes how we think about developing our software. Instead of focusing on high availability of an application, let’s focus on how the overall system can come back quickly when something goes wrong. In a happy-path scenario, your downstream service has 10 instances running so if 1 instance is down, you can still connect to the other 9 instances. But what if there’s zero downstream instance? In which case we want to build resiliency in our code so the software is more fault tolerant. Pioneer software companies do that all the time. If you look at Google or Netflix, they never go down. Netflix uses this tool called Chaos Monkey, which randomly takes a server down in production to see how the system would behave. They even went to the extreme when they tried Chaos Kong (sort of like King Kong) to crash the entire qw1data center or the entire Amazon AWS region. Even though it’s not likely that an AWS region goes dead, it does happen occasionally. In Netflix’s case, within a split second, they were able to come back in business and re-route all the traffic to a different data center or a different region. Effectively they were 100% available.

So how do we implement this in our code when a downstream service is down? Let’s back up a bit and think about how a circuit breaker works in a house. When there’s too much electricity going through the electrical system in our house, the circuit breaker will be automatically turned on, which means we lose some lights. But at least our house is safe. I would rather have a few lights go out than have my house catch on fire. In a similar way, if a downstream microservice is not available, we’re going to turn on the circuit breaker and go to a fallback mode where a degraded solution is offered. For example, on your website there is a search field where you call the search “microservice”. In case that search microservice is not available, you want to have a fallback mode by giving customers recommendations instead. It’s not ideal, but it’s better than having an error message displayed on the screen. Eventually when the downstream service comes back up, the circuit breaker will be off, and the search engine starts to work again. All of this can be easily set up in your code with Spring Cloud and you can monitor the circuit breaker status in the Hystrix dashboard.

Last but not Least

We have discussed various cloud native patterns including Spring Cloud Configuration, Spring Cloud Discovery, and Circuit Breaker. The goal is to make microservices easily communicate with each other and for them to be more resilient and fault tolerant. One more thing that is really worth mentioning is that to be able to do all this, you need a cloud native platform. You don’t want to deploy your microservices to WebSphere. Think about how difficult it is when you deploy your monolithic application to WebSphere and imagine setting it up with all these cloud native patterns. It will be a horrible and painful experience. Furthermore you want the agility so you can rapidly build things out and scale horizontally. Without a platform, it is impossible to do it.