A reconciliation test is a test that demonstrates that no data has been lost and that all data comes from an approved source.

Reconciliation tests have a number of different characteristics than other tests. Foremost amongst these are that most tests are run only in development and test environments, while the reconciliation tests are run regularly on the production system.

DBFit is a suitable tool for running reconciliation tests. However we may find on-premise situations where the reconciliation tests become part of the system under test rather than part of the testing framework in order for production to be reconciled. In other words, we will use tools other than DBFit for reconciliation testing when the customer does not want to give DBFit access to the production data.

  1. Start by creating a DBFit acceptance test
  2. Generally, a reconciliation test does not arrange.  Not arranging is a code smell, which makes the test brittle, among other problems.  Basically, the reconciliation test identifies that there is a problem, but provides little help in pin-pointing that problem.  As far as the author knows these characteristics are unavoidable in a reconciliation test.
  3. The act portion of the reconciliation test calls the process that loads data.
  4. The assert portion of a reconciliation test can be written in several ways:
    1. The gold standard is to compare the shared columns between the two sets.  If set A has columns (X,Y,Z) and set B has columns (W,X,Y) then we check that set A and B have the same (X,Y) pairs.  This can be done in two ways:
      1. The DBFit way is to store two queries and compare them. Sort both sets to improve the performance:
        !|Store Query|SELECT customerkey FROM [DimCustomer] ORDER BY customerKey |ArrivalsTable|
        !|Store Query|SELECT customerkey FROM [hub].[h_dv_config_dbo_DimCustomer] ORDER BY customerKey |FoundationTable|
        !|Compare Stored Queries Hide Matching Rows|ArrivalsTable|FoundationTable|
        |customerkey                                       |
      2. A better performing way which we prefer is to let the database do the work:
        !|Query Stats                                        |
        |query                        | row count? |is empty?|
        |SELECT customerkey from [DimCustomer] EXCEPT SELECT customerkey FROM [hub].[h_dv_config_dbo_DimCustomer] | 0          |true     |
    2. Traditionally, on-premise data warehouse operations teams have believed that they do not have the compute power to perform the set comparison.  This is partially a true assessment of the often embarrassingly weak hardware and partially a shared-memory of the distant past.  In any case, these teams perform reconciliation by comparing aggregates. When comparing aggregates compare at least two for each set.
      !|Store Query|SELECT count(*) c, sum(customerkey) total FROM [DimCustomer]|ArrivalsTotal|
      !|Store Query|SELECT count(*) c, sum(customerkey) total FROM [hub].[h_dv_config_dbo_DimCustomer]|FoundationTotal|
      !|Compare Stored Queries|ArrivalsTotal|FoundationTotal|
      |c?  |total? |
  5. Either way, once the assert is written the reconciliation test is complete.

As a general note, when loading is done continuously or in parallel then the reconciliation test needs to pick a point at which to reconcile. Generally this can be done by getting the most recent record from the target, and then filtering both the source and the target to exclude records newer than that record which was most recent when the test began.