Michael Mukovskiy

 

When you have complex production cubes with substantial data volumes you need a flexible processing infrastructure. It must be also easy for the production team to control and handle it.

Here we show a generic solution for the automation of the cube management which uses a control table and simple XML operations to:

  • create cube partitions
  • process AS objects
  • run any other XMLA commands 

We believe this approach to be:

  • greatly flexible and open
  • simple to control
  • production oriented

Only the basic understanding of XSL transformations is required, which can be directly obtained from provided samples.

The described approach is implemented in the BIDS solution for the AdventureWorks Cube and DW (MSAS 2008 R2, see notes in “Sample solution” below).

General idea

Given the XML nature of MSAS definitions and commands we make a straightforward selection of XSL transformation (XSLT) as a tool for cube management. Moreover having the part of the logic in XSLT files makes the infrastructure more data driven which definitely pays off in production. The template-based structure of XSLT fits real-life production needs.

The control table holds the list of cube objects with the actions that must be executed for them.

The basic idea is simple: the XSLT converts the information from the control table into the XMLA commands:

image

Alternatively we can produce several XMLA commands to be executed separately:

image

Actually XSLT converts one XML document into another. So we need to bring the control table in form of XML first. Lets do it using “FOR XML” clause:

 image

So here is what happens in the simplest SSIS package:

image 

Simple and generic enough? This is where the beauty for the production team lies: the SSIS package can have about zero logic for particular case of cube management in it because:

  • the package does not know anything about the structure and particular data of the control table (even SQL FOR XML can be stored outside of the package)
  • the portion of control table used for the job can be parameterized in several ways (you might want several management cases)
  • XMLA lies outside of the package (the simplest way: in files)

In our sample solution we have implemented a more real life scenario: we want to run the following tasks separately:

  1. The creation of cube partitions
  2. Process for the dimensions (Update)
  3. Process for the facts

image

Lets take a closer look on how particular solution elements can be implemented.

Control table

The table holds the AS objects and the actions they should undergo. For example:

  • the dimensions to be updated
  • the measure groups to be processed
  • the partitions to be created and/or processed
  • the role memberships to be added/deleted

Here is the sample data:

image

In general it is up to you to decide in what form to define the control information. The major point is that the information should find its interpretation in SQL FOR XML and in XSLT.

In our example we have following fields:

JobType Management case. For example: “Daily”, “Full”, “Weekend”.
ObjectType Type of AS object. This will trigger different templates in XSLT.
ActionType For direct use in Type tag of <Process> in XMLA (for example “ProcessFull”)
NeedCreate For partitions (0 or 1).
DatabaseID Identifies the AS object.
DimensionID Identifies the AS object.
CubeID Identifies the AS object.
MeasureGroupID Identifies the AS object.
PartitionID Identifies the AS object.
Parameter_1 A partitioning parameter (for example month).

Feel free to tune the table to your needs. For instance you can have more than one “Parameter_X” to implement a more complex partitioning. The fields “ActionType” and NeedCreate can have other meanings for example for the role management. The template nature of XSLT makes it easy to handle such situations.

SQL FOR XML

We use here the following query to transform the table to the input for XSLT: 

SELECT DISTINCT
      [ObjectType] ‚@ObjectType‘
     ,[NeedCreate] ‚@NeedCreate‘
     ,[DatabaseID]
     ,[DimensionID]
     ,[CubeID]
     ,[MeasureGroupID]
     ,[PartitionID]
     ,[Parameter_1]
     ,[ActionType]
FROM [dbo].[OLAP_ToDo]
WHERE JobType = ‘Regular’
FOR XML PATH(‚ASObject‘), ROOT(‚ASObjects‘)

As already noticed the query can be stored outside of the package and have the parameterization you need (first to replace JobType = ‘Regular’).

We hope you understand the architecture “cut” between control information and XSLT here: the package (or such a container in a complex package) can be fed with the control information in XML form instead of relational.

So here is the sample output:

<ASObjects>
  <ASObject ObjectType="Dimension" NeedCreate="0">
    <DatabaseID>Adventure Works DW 2008R2</DatabaseID>
    <DimensionID>Dim Product</DimensionID>
    <ActionType>ProcessUpdate</ActionType>
  </ASObject>
  <ASObject ObjectType="MeasureGroup" NeedCreate="0">
    <DatabaseID>Adventure Works DW 2008R2</DatabaseID>
    <CubeID>Adventure Works</CubeID>
    <MeasureGroupID>Fact Currency Rate</MeasureGroupID>
    <ActionType>ProcessFull</ActionType>
  </ASObject>
  <ASObject ObjectType="Partition" NeedCreate="1">
    <DatabaseID>Adventure Works DW 2008R2</DatabaseID>
    <CubeID>Adventure Works</CubeID>
    <MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
    <PartitionID>Internet_Sales_</PartitionID>
    <Parameter_1>2009</Parameter_1>
    <ActionType>ProcessFull</ActionType>
  </ASObject>
</ASObjects>

Please note that the XML tags here have no direct meaning for MSAS. On this stage it is our “internal” XML structure that should find its interpretation in XSLTs.

XSL transformations

The XSLT transforms the control information into XSLT commands (PROCESS, CREATE, ALTER etc.). In our solution we keep XSLTs as three separate transformations for:

  • Dimension processing
  • Partition creation
  • Fact processing (of any level)

Once again, the template nature of XSLT gives you a great flexibility at this point. For example you can keep the processing of dimensions and facts in one XSLT even if you use it to produce separate XMLAs (filtering the input of XSLTs).

Here is the ProcessDimension.xslt (please find other XSLTs in our sample solution):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes"/>
  <xsl:template match="ASObjects">
    <Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
      <Parallel>
        <xsl:apply-templates/>
      </Parallel>
    </Batch>
  </xsl:template>

<xsl:template match="ASObject[@ObjectType=’Dimension‘]">
  <Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
           xmlns:xsd="http://www.w3.org/2001/XMLSchema"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
           xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
           xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
           xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200"
           xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">
       <Object>
        <DatabaseID><xsl:value-of select="DatabaseID"/></DatabaseID>
        <DimensionID><xsl:value-of select="DimensionID"/></DimensionID>
       </Object>
      <Type><xsl:value-of select="ActionType"/></Type>
      <WriteBackTableCreation>UseExisting</WriteBackTableCreation>
    </Process>
  </xsl:template>
  <xsl:template match="ASObject">
  </xsl:template>
</xsl:stylesheet>

Please note that:

  • only dimensions are taken: match="ASObject[@ObjectType=’Dimension‘]"
  • the AS object IDs are directly used in XMLA: <xsl:value-of select="DimensionID"/>
  • the ActionType is directly used: <xsl:value-of select="ActionType"/>

Here is the sample output:

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Parallel>
    <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
             xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
             xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
             xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200"
             xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">

      <Object>
          <DatabaseID>Adventure Works DW 2008R2</DatabaseID>
          <DimensionID>Dim Product</DimensionID>
      </Object>

      <Type>ProcessUpdate</Type>
      <WriteBackTableCreation>UseExisting</WriteBackTableCreation>
    </Process>
  </Parallel>
</Batch>

The Process-XSLT for facts looks alike but with one difference: it has dedicated templates for the fact levels: partition, measure group, cube, database.

The XSLTs for processing have no information about particular cubes. Unlike, the partition creation commands have to “know” about specific SQL commands, aggregation designs and slice expressions. In order to keep this reference solution so simple as possible we keep this specific information in templates for particular measure groups (that need to be partitioned) in Create-XSLT. Thus we have a part of cube metadata in the cube management solution! If this is a No-Go for you and you want to have a template partitions directly in cube project, you may think about automatically getting the XMLA definition of the deployed cube first (with DISCOVER) and then using the part of the cube definition as part of the input for your XSLTs.

Sample solution

The sample BIDS solution uses the AdventureWorks Cube (Enterprise version, MSAS 2008 R2).

The control table (see definition and sample data in solution) can be deployed directly in AdventureWorks DW (which you need to process the cube).     

You can go as following:

  1. Deploy AdventureWorks cube and DW and make sure that cube can be manually processed
  2. Deploy the control table in AdventureWorks DW (OLAP_ToDo.sql)
  3. Check the package connections: “MSAS_Connection” for cube and “SQL_Connection” for the control table
  4. Run the package

 

Considerations and discussion

Here we discuss some points about the architecture of the solution in general and particular implementation details.

  1. The control table, its initial transformation to XML-form and its interpretation in XSLT can be adjusted for your certain infrastructure logic.
  2. The SQL FOR XML can be parameterized (WHERE…) and placed outside of the package (for example as SP, table-valued function or view).
  3. One can consider some other structure of XSLTs and execution logic of XMLA in SSIS packages. For example you can have one XSLT for all process commands (dimension, facts etc.).
  4. The control table can be kept on the granularity of AS objects, but can be a projection from the higher abstraction level. For example you can have you cube ToDos first on the level of the DWH tables being updated – so one DWH table can trigger updates in several AS objects.
  5. Feel free to add other parameters for XMLA: Parallel/Sequential, MaxParallel, etc.
  6. You can use files instead of variables for XMLA in the package (to be better debugged in production).
  7. The control information can come not as a table but already in XML form in your infrastructure. You have to adjust the XSLTs to interpret it correctly.
  8. You can add the XML validations in the package just before running the XSLTs.
  9. It is possible to use this technique for other cube operations, for example security role management or incremental processing.
  10. As already mentioned, the metadata of the deployed cubes can be use as a part of the input for XSLTs giving additional information of what actions and how exactly to execute. You may need this for example it you hold your partition templates in the cube.
  11. On systems without SQL FOR XML (for example first version of PDW) you may need other way to bring the control information into XML form (for example using Script Task).
  12. If the DatabaseID does not come with the control table you may need other way to bring it into XML (MERGE option of XML Tasks or as a parameter for SP holding your SQL FOR XML).

Please feel free to provide us your feedback which we will consider to take into this discussion.