comparativetests

Improving testability of complex DBs

One of our clients was developing a new product. It was already live for multiple clients and their clients were happy. Each client had their own DB instance and as the product was in development using a very “organic” approach, this lead to the situation where every client not only had their own instance of the DB (perfectly fine and reasonable), but also their own schema which made adding each feature or new client progressively more and more difficult and time consuming.

At a high level, their architecture was:

A SQL server instance
A python REST API
A series of Azure ADF workflows to import data from multiple different sources.

Once the data had been ingested and transformed into the standard model, it was then further exposed via a large number of complex views which contained a lot of business logic, especially around edge cases. These edge cases didn’t affect every client which added to the complexity of standardisation.

They came to us to help them deliver faster and more reliably.

To do this, it was necessary to standardise the DB schema whilst at the same time continuing to roll out the product to new customers (each with their own desired customisations). In order to achieve this, we took the following steps:

A process change – all DB changes in prod now had to be done in code with an update script (no need for downgrade scripts), this stopped us from getting any worse.
Introducing comparative tests as the API level (i.e. 1 API-DB pairing vs. a 2nd). This allowed us to see the impact of our changes on the DB.
Refactoring the DB to move the case-when pairing to custom DB functions so that they could both be reused consistently and be tested
Introducing integration tests as the API / DB level

The comparative tests were fairly standard and weren’t particularly special so we don’t go into much depth about them here, but briefly, working on the assumption that what’s already live is correct, then we’re generally interested in the question ‘what has changed?’.

The integration tests covered any custom DB functions and any views where we wanted to ensure that the business logic in the underlying DB was correct. Note that these view level integration tests were reserved for the base level views which fed into everything else; this was a matter of pragmatism (limited time / budgets) as well as need – with the less complex views, we were already fairly confident that they were doing the right thing and as such, comparative tests could be used.

To get to where we wanted to get to, it was necessary to take a series of steps. These steps were typically:

Pick the view to be processed
Work out which client’s view was the ‘most correct’
For this view:
- Create a comparative test
- Rewrite the view as necessary, using DB functions to replace the large number of repeated case-when blocks
- Create integration tests if appropriate
Create an update script
Per client:
- Create a clone of the DB instance
- apply update script
- launch the API pointing to the new DB
- run comparative tests

If the comparative tests passed, then everything was good and the code could be committed. However, if there were differences, we then had to analyse the differences to work out which was correct. Then, depending on the answer, decisions could be taken:

The new version was correct => all good, let the client know that their numbers were changing and why
The new version didn’t handle some of the client specific edge cases => update the view / custom functions so that it did handle them and then repeat the above
It was more complicated => park it, standardise what we could and then come back to the problem when it would be much smaller and clearer as to the right thing to do.

Note that this was complicated by the fact that the reasons behind each of the client level customisations had been lost to history and the original developer had moved on. So a large portion of the work was trying to understand the thought process behind the change rather than blindly reimplementing.

Once we had this, we then had a new baseline for what was considered good. Developers could then run the tests locally (pointing to a local instance and a prod [clone] for comparison). This meant that they were able to see the impact of their change before submitting it.

The CI process was also updated so that it ran the newly created tests once per client on each check-in. The per client process was:

Clone the current production DB
Apply any DB update scripts
Launch a copy of the python API pointing to the new DB
Run the actual tests (each test was given access to a DB connection string and the python API / security token)
Push the results to their Conical instance

We then subsequently grouped the results together in a Conical evidence set so that we could see a high level overview of the impact of the change and subsequently decide what to do. This was very helpful as it allowed us to catch cases where we had missed client level customisations or, more frequently, where we discovered customisations which made zero sense).

Note that where we still needed customisations, this was still possible (update scripts can have IFs in them after all), but this was minimised as much as possible.

With this infrastructure in place, it was then possible to very rapidly make DB level changes and be confident that we hadn’t broken lots of other parts of the system.

As usual, any questions, please ask.

Happy Testing

Testing collections with non-unique “unique” keys

We recently had a client who had an API which was live and being actively used and they wanted to improve both it and the testing of it. The API contained a series of end points which the client thought provided a set of rows / objects, where there was a single row per set of unique keys (‘Contract ID’ and ‘Month’). However, the reality was that the API was returning multiple rows per expected set of unique keys.

We were expecting the ‘Contract ID’-‘Month’ tuple to be unique, however the API had other ideas:

Expected:

ContractID: 1
Month: 2025-06
Payment: 375

ContractID: 1
Month: 2025-07
Payment: 375

ContractID: 2
Month: 2025-06
Payment: 57

Actual:

ContractID: 1
Month: 2025-06
Payment: 375

ContractID: 1
Month: 2025-07
Payment: 375

ContractID: 2
Month: 2025-06
Payment: 35

ContractID: 2
Month: 2025-06
Payment: 22

Obviously the long term desired outcome was to update the API such that it behaved as expected, however, their front end had been coded in such a way to tolerate this duplication and they had more pressing needs for their product than being architecturally pure.

We wanted to put in a series of comparative tests. These are where we compare the output of 2 versions of the API (differing in either software or configuration or anything) and compare their outputs. Unlike classic integration tests, these are less intended to be a pure pass-fail, but to let us know what the impact of releasing the new version will be.

One option here would have been to ignore this whole end point during the testing process until the API behaved as expected. This was swiftly ruled out as it relied on some of the most complex logic in the platform (heavily SQL based so unit tests were somewhat scarser) and we were rewriting it for them.

This left us with a few options:

Do a summation in the test code – i.e. grouping all of the rows together and then testing the resulting summed rows.
Do the usual collections comparison functionality where we could and then compare the “non-unique unique” row sets.

Option #1 was ruled out as there was no guarantee that the summation would be correct, especially when it came to if future properties were added to the returned data model running the risk of false negatives. Note that because we use a code generation tool to generate the code level data model, the dynamically generated data models get updated fairly regularly anyway, so it’s unlikely that it gets out of sync with the actual API being tested.

This left option #2. For this, we used the standard BorsukSoftware.Testing.Comparison.Extensions.Collections (nuget) functionality. The return type here contains:

Matching keys
Additional keys
Missing Keys
Non-matching keys
Incomparable keys

For the incomparable keys, we get a set of:

the keys which were expected to be unique, but weren’t (e.g. in this example date and contract number)
the expected rows which matched these keys
the actual rows which matched these keys

From here, we then needed to come up with a way to compare these collections. Because we weren’t interested in the returned order, the simplest thing to do here was to:

pick an ordering method (payment in our case)
flatten down the rows using the array plugin
compare these flattened values

The upside of this approach:

We were aware of the impact of our changes’ impact on this very important end point
We didn’t have a permanent false positive in our tests. These cause developers to simply ignore the given test and therefore they’d miss if there was an actual unexpected change in this space.
It was quick to deliver

The downside of this approach:

It’s a sticky plaster, we still didn’t have a pure API
When the API is fixed so that the number of rows returned dropped, we’ll see very noisy results for that test run. Note that when the API was fixed, the test could also be updated during the development process to do the summation thus proving that the totals hadn’t changed. After that confirmation, the test code be updated (in a subsequent PR most likely) to remove the summation code for the reasons mentioned above so that everything was keen.

The upsides outweighed the downsides and the long term fix was added to the backlog.

We did this via a helper function:

        public static (IReadOnlyCollection<IReadOnlyDictionary<string, object>> matching, IReadOnlyCollection<(IReadOnlyDictionary<string, object> Keys, IReadOnlyList<KeyValuePair<string, BorsukSoftware.Testing.Comparison.ComparisonResults>> Differences)> multipleRowSetsDifferences)
            CompareIncomparableItems<T>(
                IComparativeTestContext context,
                BorsukSoftware.ObjectFlattener.ObjectFlattener objectFlattener,
                BorsukSoftware.Testing.Comparison.ObjectComparer objectComparer,
                BorsukSoftware.Testing.Comparison.Extensions.Collections.ObjectSetComparerStandard.ComparisonResults<T> comparisonResults,
                Func<IEnumerable<T>, IOrderedEnumerable<T>> sortingFunc)
        {
            var multipleRowSetsMatching = new List<IReadOnlyDictionary<string, object>>();
            var multipleRowSetsDifferences = new List<(IReadOnlyDictionary<string, object> Keys, IReadOnlyList<KeyValuePair<string, BorsukSoftware.Testing.Comparison.ComparisonResults>> Differences)>();
            if (comparisonResults.IncomparableKeys.Count > 0)
            {
                context.LogMessage("");
                context.LogMessage(" => Comparing non-unique collections by index");

                foreach (var grouping in comparisonResults.IncomparableKeys)
                {
                    var expectedRows = grouping.Value.ExpectedObjects ?? Array.Empty<T>();
                    var actualRows = grouping.Value.ActualObjects ?? Array.Empty<T>();

                    var differences = objectComparer.CompareValues(
                        objectFlattener.FlattenObject(null, sortingFunc(expectedRows)),
                        objectFlattener.FlattenObject(null, sortingFunc(actualRows))).
                        ToList();

                    if (differences.Count == 0)
                        multipleRowSetsMatching.Add(grouping.Key);
                    else
                        multipleRowSetsDifferences.Add((grouping.Key, differences));
                }

                context.LogMessage("Summary:");
                context.LogMessage($" matching - {multipleRowSetsMatching.Count}");
                context.LogMessage($" differences - {multipleRowSetsDifferences.Count}");
            }

            return (multipleRowSetsMatching, multipleRowSetsDifferences);
        }

The client was happy and the devs were happy as they could see the impact of the fairly chunky changes that they were making.

As usual, any questions, please ask.

Happy Testing!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.