In ODE version prior to 5 you could only load the data into Data Vault from the source database on the same server where the configuration database is implemented. ODE version 5 could get the data into the Data Vault from another server via SSIS. ODE version 5 includes a feature that could execute an SSIS package to load the data into a stage table as a part of the table load. At the moment ODE would be able to execute only SSIS packages generated in a specific way based on the Configuration for that table with BimlExpress, a free tool from Varigence. Generated package could work with MS SQL Server Change Data Capture (CDC). At the moment only supported data sources are MS SQL Server and Oracle databases.
ODE could not load data straight from the remote server into the vault, it lands data into stage area on the server where ODE is installed, then loads this data into the Data Vault. So you need to create stage database and configure it. Follow the steps from this article to make sure you have configured everything required for ensemble modelling.
Configure an ensemble using this helper script. This script derives a satellite structure from the stage table. So, before you execute this script, provision a stage table structure based on the source table. The easiest way to do it, right-click on the source table in SQL Management Studio, script up a table creation statement in the new window. Switch the query window connection to the server where ODE is installed, and execute against the stage database. This will create an empty table of the source table structure and this is enough for ODE to get the metadata from. You can edit the table creation script before execution on stage database to exclude unwanted columns, they will be ignored by SSIS load process. Note that column names and type should stay the same as they are in source. Once source table and ensemble metadata are successfully configured in ODE, an SSIS package could be generated. Set StageLoadType to “MSSQLcdc” if your source database is MS SQL Server and you have MS CDC enabled on the instance; otherwise set to “Full” to load data in full snapshot mode.
The following steps describe how to generate packages in BimlStudio. Steps for BimlExpress are different due to the tool difference. Create a new project in BimlStudio. In the project settings check your SQL Server and SSIS versions. ODE target platform is SQL Server 2016, but we have found that packages generated for 2014 version work correctly. Add all four BIML files to this project, you can find them in the folder “BIMLScripts” in the ODE project folder you have downloaded fro GitHub.
Edit the first BIML file called “_Include_10.biml”.
- Put the source system name you have configured for this source. You have created a source system with this name on the preparatory stage and have already used in the ensemble creation script.
- Put in a table name for the next parameter or leave it set to “ALL”. BimlStudio project will generate an SSIS package per all the source tables configured for this source, or re-generate it if it already exists. In case if SSIS packages are required to be generated for a specific list of tables, e.g. for a freshly configured only, provide a comma separated list of tables in this parameter.
- Put in the connection to ODE Config database. This is where BimlStudio will get the packages metadata from.
- You can edit the project file name prefix if you want, the default is “DV_”.
Save changes. Right-click on the next BIML file “10_Environments.biml”. Wait until the BimlStudio generates the code. Once it’s finished, execute the next file “20_Build.biml”. This will generate BIML packages for all the configured and included source tables.
The only step left if to generate the actual SSIS packages. Check if the project name matches the “package_project” variable value you have provided when you configured the connection. In our example it was “DV_AdventureWorks”. Build the project. Generated SSIS packages could be found in the output folder.
The next step is only required due to our limited understanding of the Varigence product. Generated SSIS project has a package deployment model. This should be changed to the project deployment model. To set it up correctly, open the SSIS project in Visual Studio. Right-click on the project in the Solution Explorer and choose “Convert to Project Deployment Model”. It will open a sequence of dialog boxes. Choose “DontSaveSensitive” for the protection level (probably not the safest option, we have a task in our backlog to improve that). Otherwise all default will do. Save the project. If you know how to set this up on the BimlStudio project side, let us know in comments. We will work it out eventually, but your help is much appreciated.
Now everything is ready for the SSIS package deployment. In the Integration Services catalog on your server create a folder with th exactly the same name you have provided for the “package_folder” variable for the connection. In our example it was “Vault_External_Source”. From the Visual Studio, deploy the project into this folder. To do that, right-click on the project in the Solution explorer, choose “Deploy”. Provide a server name and select the folder.
The only part left is the execution. A few SQL Agent jobs should be created to enable the execution. Create a job called “ODE – Scheduled Task Execution”. Add a new step to it to execute the following code against the Config database:
Schedule it to run every 10 seconds. This is kind of execution thread. You can create a few of these jobs to execute multiple table loads in parallel if your server have enough resources for that. Separate the execution start time by 1 second between jobs, so that they start every next second instead of starting all at once. These thread will pick up task from the queue, so multiple table loads could run in parallel.
Each generated SSIS package have two load types implemented, full and CDC. Full mode loads snapshots of the source data i.e. load a current state of the data. If same record has been changed multiple times since the last ODE load executions, only the current state of the record is captured. CDC mode uses the MS SQL CDC functionality and captures all the changes for each record. If record has been changed multiple times since the last ODE load execution, all the changes will be sequentially added to the Data Vault satellite. This makes your raw vault a persistent storage of all the record changes.
Execute the stored procedure for the schedule:
EXECUTE [dv_scheduler].[dv_process_schedule] 'Full_Load'
This will execute all the tables in this list, bring data from the source server via SSIS and load the data into the ensembles.