Stephan Köppen

During processing of our AS Cubes we found ourselves with a strange behavior we couldn’t explain. In order to get better performance with split the fact partition into 8 partitions. The first division is done over contract ID with modulo. The second division takes place by the column „partition_id“ which is part of a dimension.

In order to process all the partitions we need 8 queries, whereby i only show the part which caused the problem:


  • Below you find the working query. Interesting is the bottom part where the NULL Handling takes place. Sine the join is a left outer we have to make sure we have a value. in case we don’t, take a 1 instead.
SELECT *
FROM  fact1
………
left outer join dim1 on fact1.random_id = dim1.random_id
WHERE  fact1.Month_ID = 201407
AND ISNULL(dim1.partition_id,1) = 1
  • Now we show you the query which is causing problems. The only difference between the query is the comparison value for the NULL Handling. Instead of 1 it is 2.
SELECT *
FROM  fact1
………
left outer join dim1 on fact1.random_id = dim1.random_id
WHERE  fact1.Month_ID = 201407
AND ISNULL(dim1.partition_id,1) = 2
Reading the execution plans there was no difference at all. A simple ETL Package shows the difference in performance. Even though the second query returns way less rows then the first one, its still magnificantly slower.
Partition_ID 1 finished in about 5 Minutes, whereby Partition_ID 2 is stopped afar 10 Minutes with only 20k rows read.


After changing the query to the following the performance was the same.


  • Even though the first one was working fine, we changed it in order to be the same and homogeneous.
SELECT *
FROM  fact1
………
left outer join dim1 on fact1.random_id = dim1.random_id
WHERE  fact1.Month_ID = 201407
 AND NOT(dim1.Partition_ID != 1 AND NOT(1 = 2 AND dim1.Partition_ID is null))
  • After changing the second query it is fast as expected.
SELECT *
FROM  fact1
………
left outer join dim1 on fact1.random_id = dim1.random_id
WHERE  fact1.Month_ID = 201407
AND NOT(dim1.Partition_ID != 2 AND NOT(2 = 2 AND dim1.Partition_ID is null))
Since this behavior isn’t correct (at least in my opinion) a ticket at Microsoft has been opened. Till then, this might help other people facing the same situation.