Hilmar Buchta

PDW v1 | PDW 2012 | APS

CTAS (Create Table As Select) is a common way on the Parallel Data Warehouse (PDW) to transform one table into another table for example for calculations or for ELT (Extract Load Transform) processes.

The general syntax is quite simple and explained in the PDW help file:

CREATE TABLE [ database_name . [ dbo ] . | dbo. ] table_name
        [ ( { column_name } [ ,…n ] ) ]
    WITH (
        DISTRIBUTION = { HASH( distribution_column_name ) | REPLICATE }
            [ , <CTAS_table_option> [ ,…n ] ] 
    )
    AS <select_statement>
[;]

<CTAS_table_option> ::=
    LOCATION = USER_DB
    | CLUSTERED COLUMNSTORE INDEX
    | CLUSTERED INDEX ( { index_column_name [ ASC | DESC ] } [ ,…n ] )
    | PARTITION( partition_column_name RANGE [ LEFT | RIGHT
        FOR VALUES ( [ boundary_value [,…n] ] ) ) ]

<select_statement> ::=
    [ WITH <common_table_expression> [ ,…n ] ]
    SELECT <select_criteria>

 

For example, a simple CTAS statement to copy the contents of one table (FactSales) to another table (FactSalesTmp) may look like this:

create table FactSalesTmp
with (distribution = hash(DateKey))
as select *  from FactSales

As you can see from the CTAS syntax definition above it’s not possible specify column properties like NULL/NOT NULL or constraints. In the help file there is a note about this:

You cannot specify NULL | NOT NULL for the columns in the CTAS statement; the nullability property is derived from the columns and expressions in the SELECT results.

However, if you’re using CTAS to fill an intermediate stage table used for partition switching operations the switch out table has to have exactly the same definition as the target table and this also means that the NULL/NOT NULL setting on the column has to be identical. So how is the nullability property derived from the expressions in the select results? To show this behavior, I’m using a simple fact table with the following definition:

create table FactSales (
      DateKey int not null
    , ProductKey int not null
    , StoreKey int not null
    , Quantity int not null
    , Amount decimal (13,4) not null
    , Costs decimal(13,4)
) with (distribution = hash(DateKey))

Using the simple CTAS statement from above (the one copying FactSales to FactSalesTmp) preserves the nullability of all columns. As long as you refer to existing table columns, the nullability is preserved from that columns.

But what about calculations? Let’s try the following CTAS statement:

create table FactSalesTmp
with (distribution = hash(DateKey))
as select *
, Amount*0.8 AS StandardCosts
from FactSales

I simply added another column here using a simple calculation. You can check the resulting table structure by choosing the context menu ‘View Code’ in Data Tools or by running the following query:

select name, is_nullable from sys.columns where object_id=object_id(‚FactSalesTmp‘)

Unbenannt

What you see, is that the calculation (StandardCosts) is understood to be nullable by the PDW. How can we mark this column as not null the CTAS statement?

Rewriting the CTAS from above using coalesce doesn’t solve the problem:

create table FactSalesTmp
with (distribution = hash(DateKey))
as select *
, coalesce(Amount*0.8,0) AS StandardCosts
from FactSales

 

However, using isnull does the trick:

create table FactSalesTmp
with (distribution = hash(DateKey))
as select *
, isnull(Amount*0.8,0) AS StandardCosts
from FactSales

 

Unbenann1

The reason for this behavior is the different handling of the data type for coalesce and isnull. So, if you want to have an expression being marked as ‘not null’ in a CTAS statement, use the isnull-function.