Tag: testing

blog

Release of BorsukSoftware.Testing.BDD

Post author By Steven Ford
Post date 18 September 2025
No Comments on Release of BorsukSoftware.Testing.BDD

We’re pleased to announce the first release of a new Nuget package to make it simpler to use BDD and Cucumber to create integration tests.

This was put together at the request of a client who was looking to increase their use of BDD to improve the reliability of their releases.

This new library contains a series of tools to make it easy to define the code behind the user-chosen steps within the Cucumber document as well as a factory so it can be easily used.

This is intended to be used with custom integration testing frameworks which need to be able to call out to external services such as:

APIs
DBs
Selenium instances etc.

There’s a full read me in the package detailing how to use it and we will be uploading a template project to GitHub shortly giving the full details on how the tool can be used to create a suite of integration tests defined using Cucumber documents.

The results of these integrations can be uploaded straight to Conical although this is not required to use the new library.

Internally, we use the new library to define a new range of

As usual, any questions, please do get in touch.

Happy testing.

Tags announcement, BDD, BorsukSoftware.Testing.BDD, Cucumber, testing

blog

Improving testability of complex DBs

Post author By Steven Ford
Post date 20 August 2025
No Comments on Improving testability of complex DBs

One of our clients was developing a new product. It was already live for multiple clients and their clients were happy. Each client had their own DB instance and as the product was in development using a very “organic” approach, this lead to the situation where every client not only had their own instance of the DB (perfectly fine and reasonable), but also their own schema which made adding each feature or new client progressively more and more difficult and time consuming.

At a high level, their architecture was:

A SQL server instance
A python REST API
A series of Azure ADF workflows to import data from multiple different sources.

Once the data had been ingested and transformed into the standard model, it was then further exposed via a large number of complex views which contained a lot of business logic, especially around edge cases. These edge cases didn’t affect every client which added to the complexity of standardisation.

They came to us to help them deliver faster and more reliably.

To do this, it was necessary to standardise the DB schema whilst at the same time continuing to roll out the product to new customers (each with their own desired customisations). In order to achieve this, we took the following steps:

A process change – all DB changes in prod now had to be done in code with an update script (no need for downgrade scripts), this stopped us from getting any worse.
Introducing comparative tests as the API level (i.e. 1 API-DB pairing vs. a 2nd). This allowed us to see the impact of our changes on the DB.
Refactoring the DB to move the case-when pairing to custom DB functions so that they could both be reused consistently and be tested
Introducing integration tests as the API / DB level

The comparative tests were fairly standard and weren’t particularly special so we don’t go into much depth about them here, but briefly, working on the assumption that what’s already live is correct, then we’re generally interested in the question ‘what has changed?’.

The integration tests covered any custom DB functions and any views where we wanted to ensure that the business logic in the underlying DB was correct. Note that these view level integration tests were reserved for the base level views which fed into everything else; this was a matter of pragmatism (limited time / budgets) as well as need – with the less complex views, we were already fairly confident that they were doing the right thing and as such, comparative tests could be used.

To get to where we wanted to get to, it was necessary to take a series of steps. These steps were typically:

Pick the view to be processed
Work out which client’s view was the ‘most correct’
For this view:
- Create a comparative test
- Rewrite the view as necessary, using DB functions to replace the large number of repeated case-when blocks
- Create integration tests if appropriate
Create an update script
Per client:
- Create a clone of the DB instance
- apply update script
- launch the API pointing to the new DB
- run comparative tests

If the comparative tests passed, then everything was good and the code could be committed. However, if there were differences, we then had to analyse the differences to work out which was correct. Then, depending on the answer, decisions could be taken:

The new version was correct => all good, let the client know that their numbers were changing and why
The new version didn’t handle some of the client specific edge cases => update the view / custom functions so that it did handle them and then repeat the above
It was more complicated => park it, standardise what we could and then come back to the problem when it would be much smaller and clearer as to the right thing to do.

Note that this was complicated by the fact that the reasons behind each of the client level customisations had been lost to history and the original developer had moved on. So a large portion of the work was trying to understand the thought process behind the change rather than blindly reimplementing.

Once we had this, we then had a new baseline for what was considered good. Developers could then run the tests locally (pointing to a local instance and a prod [clone] for comparison). This meant that they were able to see the impact of their change before submitting it.

The CI process was also updated so that it ran the newly created tests once per client on each check-in. The per client process was:

Clone the current production DB
Apply any DB update scripts
Launch a copy of the python API pointing to the new DB
Run the actual tests (each test was given access to a DB connection string and the python API / security token)
Push the results to their Conical instance

We then subsequently grouped the results together in a Conical evidence set so that we could see a high level overview of the impact of the change and subsequently decide what to do. This was very helpful as it allowed us to catch cases where we had missed client level customisations or, more frequently, where we discovered customisations which made zero sense).

Note that where we still needed customisations, this was still possible (update scripts can have IFs in them after all), but this was minimised as much as possible.

With this infrastructure in place, it was then possible to very rapidly make DB level changes and be confident that we hadn’t broken lots of other parts of the system.

As usual, any questions, please ask.

Happy Testing

Tags comparativetests, databases, testing

blog

Testing collections with non-unique “unique” keys

Post author By Steven Ford
Post date 20 August 2025
No Comments on Testing collections with non-unique “unique” keys

We recently had a client who had an API which was live and being actively used and they wanted to improve both it and the testing of it. The API contained a series of end points which the client thought provided a set of rows / objects, where there was a single row per set of unique keys (‘Contract ID’ and ‘Month’). However, the reality was that the API was returning multiple rows per expected set of unique keys.

We were expecting the ‘Contract ID’-‘Month’ tuple to be unique, however the API had other ideas:

Expected:

ContractID: 1
Month: 2025-06
Payment: 375

ContractID: 1
Month: 2025-07
Payment: 375

ContractID: 2
Month: 2025-06
Payment: 57

Actual:

ContractID: 1
Month: 2025-06
Payment: 375

ContractID: 1
Month: 2025-07
Payment: 375

ContractID: 2
Month: 2025-06
Payment: 35

ContractID: 2
Month: 2025-06
Payment: 22

Obviously the long term desired outcome was to update the API such that it behaved as expected, however, their front end had been coded in such a way to tolerate this duplication and they had more pressing needs for their product than being architecturally pure.

We wanted to put in a series of comparative tests. These are where we compare the output of 2 versions of the API (differing in either software or configuration or anything) and compare their outputs. Unlike classic integration tests, these are less intended to be a pure pass-fail, but to let us know what the impact of releasing the new version will be.

One option here would have been to ignore this whole end point during the testing process until the API behaved as expected. This was swiftly ruled out as it relied on some of the most complex logic in the platform (heavily SQL based so unit tests were somewhat scarser) and we were rewriting it for them.

This left us with a few options:

Do a summation in the test code – i.e. grouping all of the rows together and then testing the resulting summed rows.
Do the usual collections comparison functionality where we could and then compare the “non-unique unique” row sets.

Option #1 was ruled out as there was no guarantee that the summation would be correct, especially when it came to if future properties were added to the returned data model running the risk of false negatives. Note that because we use a code generation tool to generate the code level data model, the dynamically generated data models get updated fairly regularly anyway, so it’s unlikely that it gets out of sync with the actual API being tested.

This left option #2. For this, we used the standard BorsukSoftware.Testing.Comparison.Extensions.Collections (nuget) functionality. The return type here contains:

Matching keys
Additional keys
Missing Keys
Non-matching keys
Incomparable keys

For the incomparable keys, we get a set of:

the keys which were expected to be unique, but weren’t (e.g. in this example date and contract number)
the expected rows which matched these keys
the actual rows which matched these keys

From here, we then needed to come up with a way to compare these collections. Because we weren’t interested in the returned order, the simplest thing to do here was to:

pick an ordering method (payment in our case)
flatten down the rows using the array plugin
compare these flattened values

The upside of this approach:

We were aware of the impact of our changes’ impact on this very important end point
We didn’t have a permanent false positive in our tests. These cause developers to simply ignore the given test and therefore they’d miss if there was an actual unexpected change in this space.
It was quick to deliver

The downside of this approach:

It’s a sticky plaster, we still didn’t have a pure API
When the API is fixed so that the number of rows returned dropped, we’ll see very noisy results for that test run. Note that when the API was fixed, the test could also be updated during the development process to do the summation thus proving that the totals hadn’t changed. After that confirmation, the test code be updated (in a subsequent PR most likely) to remove the summation code for the reasons mentioned above so that everything was keen.

The upsides outweighed the downsides and the long term fix was added to the backlog.

We did this via a helper function:

        public static (IReadOnlyCollection<IReadOnlyDictionary<string, object>> matching, IReadOnlyCollection<(IReadOnlyDictionary<string, object> Keys, IReadOnlyList<KeyValuePair<string, BorsukSoftware.Testing.Comparison.ComparisonResults>> Differences)> multipleRowSetsDifferences)
            CompareIncomparableItems<T>(
                IComparativeTestContext context,
                BorsukSoftware.ObjectFlattener.ObjectFlattener objectFlattener,
                BorsukSoftware.Testing.Comparison.ObjectComparer objectComparer,
                BorsukSoftware.Testing.Comparison.Extensions.Collections.ObjectSetComparerStandard.ComparisonResults<T> comparisonResults,
                Func<IEnumerable<T>, IOrderedEnumerable<T>> sortingFunc)
        {
            var multipleRowSetsMatching = new List<IReadOnlyDictionary<string, object>>();
            var multipleRowSetsDifferences = new List<(IReadOnlyDictionary<string, object> Keys, IReadOnlyList<KeyValuePair<string, BorsukSoftware.Testing.Comparison.ComparisonResults>> Differences)>();
            if (comparisonResults.IncomparableKeys.Count > 0)
            {
                context.LogMessage("");
                context.LogMessage(" => Comparing non-unique collections by index");

                foreach (var grouping in comparisonResults.IncomparableKeys)
                {
                    var expectedRows = grouping.Value.ExpectedObjects ?? Array.Empty<T>();
                    var actualRows = grouping.Value.ActualObjects ?? Array.Empty<T>();

                    var differences = objectComparer.CompareValues(
                        objectFlattener.FlattenObject(null, sortingFunc(expectedRows)),
                        objectFlattener.FlattenObject(null, sortingFunc(actualRows))).
                        ToList();

                    if (differences.Count == 0)
                        multipleRowSetsMatching.Add(grouping.Key);
                    else
                        multipleRowSetsDifferences.Add((grouping.Key, differences));
                }

                context.LogMessage("Summary:");
                context.LogMessage($" matching - {multipleRowSetsMatching.Count}");
                context.LogMessage($" differences - {multipleRowSetsDifferences.Count}");
            }

            return (multipleRowSetsMatching, multipleRowSetsDifferences);
        }

The client was happy and the devs were happy as they could see the impact of the fairly chunky changes that they were making.

As usual, any questions, please ask.

Happy Testing!

Tags comparativetests, testing

blog

Testing Complex XML

In conversation with a prospect recently, they mentioned that they had a use-case where they were using XML as their communication mechanism but the comparison of the relevant documents wasn’t a simple case of checking each node-attribute etc.

Instead, there was a well defined way to interpret certain nodes (think that a unit of comparison was defined not just a node, but was addressed by the node and several attributes on that node and the value was from another attribute):

Node 1.
<fxvega ccyPair="GBPUSD" ccy="USD" expiry="23-08-2022" value="234.6" />

Node 2.
<fxvega ccyPair="GBPUSD" ccy="USD" expiry="23-07-2022" value="157.1" />

Additionally:

There could be multiple different result nodes of the same type per trade (e.g. different expiries)
There could be multiple different result types, e.g. vega, delta and gamma etc.
Ordering of these was unimportant

To that end, the standard XML flattening plugin that we have wasn’t suitable for their use case. Instead, there was a requirement to write a custom plugin to do this data normalisation and then the rest of the comparison stack could be used as per usual.

Using this hybrid approach, we are able to compare their use-case and create output payloads which are suitable for uploading to Conical and which can be easily consumed by humans.

In their specific example, they wanted to be able to see not just the differences between their items, but also all of the surrounding information so they could import their requests into their existing analysis tools. This meant that the example payload contains quite a bit more information than just the differences.

This could be thought of as then making it more complicated to subsequently see just the differences within Conical, however, by taking advantage of Conical’s ability to have XSLT transforms defined on a per product basis, they’re able to supply different XSLTs to allow them to have custom, interactive, views within the tool.

Specifically, they can have multiple XSLTs defined for their product, one of which, outputting HTML, renders a table of differences on a per trade basis with the ability, using embedded javascript, to have an button to show the surrounding information for ease of importing into their tools.

The full code for this use case is available on GitHub. The sample output looks something like:

Matching items - 2
Additional items - 1
 Item:
 - id = Vanilla-Put-EURGBP-6M-ATM
Missing items - 0
Differences - 1
 Item:
 - id = Vanilla-Put-EURGBP-1M-ATM
 Diffs:
 - risks.fxdelta-EUR: 1342.2 vs. 1342.3
 - risks.fxvega-EURGBP-2022-06-19-GBP: 234 vs. 234.2

Tags testing, xml

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.