Steering the evolution of a “legacy” software system is a difficult problem to solve. The size of the codebase, the high degree of coupling between components, and the technical debt accumulated over the years can be overwhelming. Often, the lack of automated tests makes every technical change risky and potentially disruptive to the business. Think about customers not being able to place orders because of a library update.
How do teams approach this uncomfortable situation? When they are tasked with a major refactoring or migration exercise, how do they design a safe transition plan? How do they get started and measure progress?
In this post, we present a case study to show that a data-driven process, based on the Goal-Question-Metric method, can give clarity and confidence to the team. After introducing the approach, we describe some of the tools used to collect the data throughout the process (before and during the migration).
We recently helped a team who develops a B2B product with a typical multi-tiered architecture. Several web apps interact with a REST API, connected with business and data access services. The software is about 7 years old. While we have worked on much larger, older and complex systems, this application already has a decent size, with tens of endpoints.
After an initial code review, we identified problems common for legacy systems: unmanaged and outdated dependencies, eroded package and file structure, lack of automated testing, lack of build automation.
We were asked to propose a technical migration strategy and to work with the team to implement it. The main business drivers for the migration were the cost of maintenance and the security risks associated with outdated libraries and frameworks.
Our recommendation was not to split a monolith application into a federation of micro-services. Distributed micro-services would have added complexity without any real benefit. Instead, we advised to refactor the legacy monolith into a more manageable and robust monolith. To implement this strategy, we designed an incremental process:
Create the skeleton of the new monolith.
Identify bounded contexts, and extract them one by one.
During the migration, users interact with legacy and “fresh” services.
After the migration, deprecated endpoints and dead code have been eliminated.
To safely implement this process, write automated tests (applying BDD techniques) to describe the current behavior of each bounded context. These tests allow us to validate that the behavior of the old and new systems is the same.
In the above diagram, we see:
The legacy monolith (1). Some of the endpoints and of the code were not used anymore (but we didn’t know how many).
Different clients (2) that interact with the backend system.
The new monolith (3), built on a clean structure, with managed dependencies. Initially, this new backend does not contain any endpoint. Over time, endpoints and supporting services are migrated over this fresh codebase.
The API Gateway (4), which we introduced to enable the incremental migration process. The gateway routes HTTP requests to the appropriate monolith. At the beginning of the migration, all requests are routed to the legacy system. At the end, all requests are sent to the new system.
The BDD test harness (5), which is a suite of automated tests that describe the intended behavior of the backend application. The tests are written at the beginning of the process to describe the behavior of the legacy monolith. During the migration, as the endpoints and the traffic are moved over the new system, the test harness is used to check that behavior of the system has not been altered.
A set of end-to-end tests (6), are also created to validate the system through the UI layer. Whereas the tests at the API layer need to be comprehensive, a smaller number of end-to-end tests are implemented (typically focusing on the “happy path”).
How Do We Get Started?
Describing the migration process leaves a number of open questions. How much time will the team need to implement it? And during the process, how will the team be able to measure progress and review the expected completion time?
Answering these questions is very important to provide visibility to management and clarity to the team. Without it, anyone would feel uncomfortable to start the exercise. This is something that we have observed many times: when teams don’t know the size of the elephant, it is very hard for them to start biting.
Framing the Problem with Goal-Question-Metric
To help the team, we applied the Goal-Question-Metric method. In this particular case, we had two goals:
When assessing the situation before the migration, our goal was to estimate the overall effort. We came up with 5 questions, and for each question we identified a list of metrics.
Goal Estimate the effort to do the major refactoring
Question How "big" is the product?
Metric Number of end-to-end scenarios
Metric Number of UI components (apps, pages, components)
Metric Number of REST endpoints
Metric Number of persistent entities
Metric Number of DB queries
Metric Number of source files
Metric Number of commits in git history
Metric Age distribution of source files
Question What are the most important parts of the app?
Metric Ranking of REST endpoints by usage (from logs)
Metric Estimated business impact of scenarios
Question How "significant" is the refactoring?
Metric Number of libraries / frameworks to replace
Metric Number of libraries to upgrade
Metric Delta between current and target versions of libs
Metric Number of source files that need to be modified
Question How "safely" can we do the refactoring?
Metric % of end-to-end scenarios with automated tests
Metric % of REST endpoints with automated tests
Metric Code coverage
Metric Time required to manually test the application
Metric Developer sentiment
Question How much of the code is still used?
Metric % of REST endpoints still used in usage scenarios
Metric % of code covered when going through scenarios
Metric Developer estimation
During the migration, our goal was to track the progress and update the remaining effort. We came up with 2 questions, once again linked to specific metrics.
Goal Track progress of the refactoring
Question How much did we improve the "safety" net?
Metric % of end-to-end scenarios with automated tests + delta
Metric % of REST endpoints with automated tests + delta
Metric Code coverage + delta
Metric Developer sentiment
Question How many "services" did we extract and migrate?
Metric % of REST endpoints (with aggregates) migrated
Metric Number of source files removed from the version
Metric Time spent on every test automation
Metric Time spend on every extraction
With the GQM structure in place, we had to find a way to collect the metrics, in an automated way. We have tools to extract code metrics, but we will not discuss them in this post. Rather, we will focus on the following 2 higher-level metrics, which are harder to collect:
the percentage of REST endpoints still used in usage scenarios
the percentage of code covered when going through scenarios
To collect the data, we designed a system for recording usage scenarios. The system generates log files that link every scenario to a list of endpoints, and to a list of executed methods. For instance, we can use the system to record the “Create invoice” scenario. The system generates metadata that links the scenario to 3 REST endpoints and 12 methods.
To build the system, we integrated 3 main components:
We used an open source Chrome extension designed to facilitate exploratory testing. It provides an easy way to record the “start” and “end” of a scenario. The product owner can do a complete walk-through of the application, and signal the start and end of each individual scenario. It can give them descriptive names. At the end of the recording session, the Chrome extension gives us a first event log, with the temporal demarcation between scenarios.
We used a simple API Gateway in front of the monolith to capture HTTP requests sent to the REST API. This proxy gives us a second event log, where we have a timestamp for every endpoint invocation.
Finally, OpenClover is used to instrument the application code and generate code coverage metrics. This produces a third output of timestamped metadata.
Here is a simplified view of the three generated log files (OpenClover stores data in a database and the process to record sessions is a bit more involved):
* Event log 1 (chrome extension)
12:02:00 start - invoice customer
12:03:12 end - invoice customer
12:04:10 start - print monthly report
12:05:00 end - print monthly report
* Event log 2 (API gateway)
12:02:04 POST /api/auth
12:02:30 GET /api/customers/93
12:02:49 POST /api/tasks/sendInvoice
* Event log 3 (OpenClover)
12:02:04 method com.acme.controllers.AuthController.login
12:02:04 method com.acme.services.AuthService.authenticate
It is then fairly easy to process these files. The first one is used to identify temporal boundaries between scenarios. The others are then used to extract the endpoints and method invocations that occur within these boundaries. One way to make this data easy to use is to generate a CSV file, and then to use a data visualization tool like Tableau.
Of course, setting up a system like that and recording every usage scenario (building the inventory of features) takes quite a bit of time.
But when this is done, the team has something tangible and quantifiable to work with. The team has a concrete and quantitative way to track the progress of the migration. Moreover, developers quickly have a sense for the amount of deprecated endpoints and dead code.
As we stated earlier, this is often what teams need to start working on a challenging and overwhelming task. Whether it is a complex refactoring, an initiative to pay back technical debt, or a campaign to introduce automated testing practices, the same high-level approach can be applied.