We’ve traditionally developed data warehouse code on our local machines, using a customer’s version control system to manage change. This works fine if bespoke Extract-Transform-Load (ETL) code is what we’re trying to deliver.

The whole point of the Optimal Data Engine (ODE) is to automate ETL code generation using a series of switches we call “config“. We define what we want ODE to do using simple tables of instructions, and let it do the hard work of generating the code to move data around.

This design principle produces a new problem though: how do we know the impact of a change to the config code on the generated ETL code? Well, we have to regenerate the ETL!

We wouldn’t achieve our goal of automation if this impact testing was a manual process, so we have automated it according to the principles of continuous integration (CI). That means steps 4-5 below happen without human intervention:

  1. A test is written that will show whether our goal is achieved or not
  2. Developer writes code to achieve this goal, unit test it themselves
  3. Developer commits code to the relevant Git branch
  4. Our CI environment notices the change, grabs the code, builds a data warehouse, runs a test
  5. CI environment tells the developer what happened

Here are the user stories we completed to deliver this hand-wave:

  • Complete documentation of install process
  • Integrate Jenkins with Git
  • Job created in Jenkins
  • Trivial build script in a Git repository
  • Notifications