Today I’d like to take a look at some design principles and extraction techniques when working with SAP as data source for your Data Warehouse.
In every book about Data Warehouse design you’ll find some sort of layer architecture. So, this shouldn’t be that wrong at all, right? Right! There are plenty of reasons to build your Data Warehouse on at least 2 Data Layers. Again, you may want to take a look at Kimball Group for detailed information about fundamental DWH Architecture.
With years of experience ORAYLIS has developped its own Data Integration Guidelines based on Kimballs recommendation. Depending on the complexity we opt for an architecture that ranges from 2 layers in small environments (Replication Layer, Business Layer) up to 4 layers in larger environments (Replication Layer, Technology Layer, Business Layer, Data Marts). You can read more about Inmon vs. Kimball and Data Vault as one new modeling technique in an article from our Experts:
SAP Data Extraction
Is there any difference to be aware of, regarding SAP as data source? Yes and No! It is still best practice to extract source data almost without modification into your basic layer (Replication Layer). Main reasons to do so are:
- Efficiency. Only one controlled and optimized access to the source system.
- Scheduling. Extract Jobs can be scheduled in times with low concurrency on SAP environment.
- Decoupling. Serveral ETL processes can access the original data independently without influencing SAP environment.
- Stability. When changing ETL transformation procedures, no need to reload source data.
But with SAP it is not that simple to get access to unmodified low level business information. Sure, you could try to select data from the underlying DBMS, may it be Microsoft SQL Server or another product. Not speaking of license issues and navigating through ten thousands of tables, this is most likely not allowed – ask your SAP Admins!
Most popular method is to use the same data extraction interface as SAP BW actually does. And that’s where Theobald Software with their products come in. With Xtract IS you get smooth integration into Microsofts ETL tool Integration Services (SSIS) and a secure and proven extraction method.
Table Extract or DeltaQ ?
Now, Xtract IS offers serval possibilities to get data out of your SAP System, like Table Extract or DeltaQ Extract. Whitch one to take?
- Table Extract. Offers 1:1 extract of SAP database tables. You may need to check license issues with SAP! We recommend this method only for some reference tables, e.g. domain tables like key->text value references, where there is no DeltaQ extractor available.
- DeltaQ Extract. DeltaQ is a method offered by SAP to get only the information that as changed since the last extract (incremental load). It uses SAP Extractors that combine data of one or more database table to a new information unit. You can also get a full extract, e.g. when no Delta Extraction is defined or when initializing the extraction.
Of course, incremental loads are the best choice for large transaction tables like orders, invoices, etc. But be aware of the initialization process for every data source: In large SAP installations this process can be very time consuming and you’ll have to stop all transaction processing in your SAP System during initial load to get consistent data. And if you loose some information on DWH side, you’ll have to start this process again. Just another reason for the above mentioned Replication Layer!
In our project we decided to mix Full DeltaQ Extracts for master data tables like customer, material or sales org with incremental loads for transactional data and to do table extracts only for a few reference tables. With the use of standard SAP Extractors you’re also save, if SAP changes certain table structures in the next release because these Extractors will then be changed accordingly and you don’t have to care.
Effects of SAP customization
One last thing to be aware of with regards to data extraction from heavily customized SAP Systems: Z-Fields and -Tables are customized objects that are created for your/your customers special environment. This often causes more effort because they are not part of standard extraction mechanisms and you may have to have someone to develop individual Extractors or extend an existing one.
In the end your basic architecture and ETL strategy could look like this:
Hope, you find this usefull. More is coming soon.