One of our clients was developing a new product. It was already live for multiple clients and their clients were happy. Each client had their own DB instance and as the product was in development using a very “organic” approach, this lead to the situation where every client not only had their own instance of the DB (perfectly fine and reasonable), but also their own schema which made adding each feature or new client progressively more and more difficult and time consuming.
At a high level, their architecture was:
- A SQL server instance
- A python REST API
- A series of Azure ADF workflows to import data from multiple different sources.
Once the data had been ingested and transformed into the standard model, it was then further exposed via a large number of complex views which contained a lot of business logic, especially around edge cases. These edge cases didn’t affect every client which added to the complexity of standardisation.
They came to us to help them deliver faster and more reliably.
To do this, it was necessary to standardise the DB schema whilst at the same time continuing to roll out the product to new customers (each with their own desired customisations). In order to achieve this, we took the following steps:
- A process change – all DB changes in prod now had to be done in code with an update script (no need for downgrade scripts), this stopped us from getting any worse.
- Introducing comparative tests as the API level (i.e. 1 API-DB pairing vs. a 2nd). This allowed us to see the impact of our changes on the DB.
- Refactoring the DB to move the case-when pairing to custom DB functions so that they could both be reused consistently and be tested
- Introducing integration tests as the API / DB level
The comparative tests were fairly standard and weren’t particularly special so we don’t go into much depth about them here, but briefly, working on the assumption that what’s already live is correct, then we’re generally interested in the question ‘what has changed?’.
The integration tests covered any custom DB functions and any views where we wanted to ensure that the business logic in the underlying DB was correct. Note that these view level integration tests were reserved for the base level views which fed into everything else; this was a matter of pragmatism (limited time / budgets) as well as need – with the less complex views, we were already fairly confident that they were doing the right thing and as such, comparative tests could be used.
To get to where we wanted to get to, it was necessary to take a series of steps. These steps were typically:
- Pick the view to be processed
- Work out which client’s view was the ‘most correct’
- For this view:
- Create a comparative test
- Rewrite the view as necessary, using DB functions to replace the large number of repeated case-when blocks
- Create integration tests if appropriate
- Create an update script
- Per client:
- Create a clone of the DB instance
- apply update script
- launch the API pointing to the new DB
- run comparative tests
If the comparative tests passed, then everything was good and the code could be committed. However, if there were differences, we then had to analyse the differences to work out which was correct. Then, depending on the answer, decisions could be taken:
- The new version was correct => all good, let the client know that their numbers were changing and why
- The new version didn’t handle some of the client specific edge cases => update the view / custom functions so that it did handle them and then repeat the above
- It was more complicated => park it, standardise what we could and then come back to the problem when it would be much smaller and clearer as to the right thing to do.
Note that this was complicated by the fact that the reasons behind each of the client level customisations had been lost to history and the original developer had moved on. So a large portion of the work was trying to understand the thought process behind the change rather than blindly reimplementing.
Once we had this, we then had a new baseline for what was considered good. Developers could then run the tests locally (pointing to a local instance and a prod [clone] for comparison). This meant that they were able to see the impact of their change before submitting it.
The CI process was also updated so that it ran the newly created tests once per client on each check-in. The per client process was:
- Clone the current production DB
- Apply any DB update scripts
- Launch a copy of the python API pointing to the new DB
- Run the actual tests (each test was given access to a DB connection string and the python API / security token)
- Push the results to their Conical instance
We then subsequently grouped the results together in a Conical evidence set so that we could see a high level overview of the impact of the change and subsequently decide what to do. This was very helpful as it allowed us to catch cases where we had missed client level customisations or, more frequently, where we discovered customisations which made zero sense).
Note that where we still needed customisations, this was still possible (update scripts can have IFs in them after all), but this was minimised as much as possible.
With this infrastructure in place, it was then possible to very rapidly make DB level changes and be confident that we hadn’t broken lots of other parts of the system.
As usual, any questions, please ask.
Happy Testing