Hans Klüser

On normal SMP-SQL Servers there are several ways to perform string aggregations via XML or Pivot and dynamic SQL, some more or less efficient.

On APS/PDW you have to do something different.
We want to have a rapid solution so we like “JOIN” and simple additions and we hate anything like “CASE”, or “ISNULL”.

Let’s start with a simple table:

We have several items with their rownumbers in one table. In order to get them side-by-side we create a matrix/table which has an ID, an empty string at its diagonal und a NULL-string in all other fields. Next we perform a join and get a table with the items in its diagonal because addition with the NULL-value results in a NULL value and addition of empty string and item results in the item.

One max-aggregation and one concat-operation later we have the result: one row and all items concatinated.

Click to enlarge:

Sample simple

 

 

Now we modify this attempt to perform an aggregation of items in multiple lines.

The matrix gets a space-character instead of the empty string from row 2 on (or any other delimeter). Next we insert a column with a partitioned rownumber and join the matrix by this. By keeping the line information we can group by this linenumber. Next the concat and we have our little christmas-song.

Click to enlarge:

Thanks to Miriam Funke who implemented and tested it on the PDW.

We used this concept on a 4-node-pdw to aggregate receipt-items for side-by-side analysis. The matrix allows up to 200 items per receipt and we reduced about 60 mio lines to 5 mio lines in about 2 minutes even though we needed to perform a dense_rank and a row_number at once to eliminate same items of the receipt in one step.

Merry Christmas!

Here is a query to test the concept on standard smp including creation of sample data (remove „–“ in last line to execute query):

DECLARE @i INT
DECLARE @i_max INT = 100 –max aggregate items

DECLARE @query NVARCHAR(max)

/***** 1 create matrix for join *****/

SET @query =

CREATE TABLE #T1
(
rowid INT

SET @i = 1
WHILE @i <= @i_max
BEGIN
SET @query +=

,T’+CAST(@i AS NVARCHAR(10)) + ‚ NVARCHAR(1) ‚

SET @i += 1
END

SET @query +=

)

SET @i = 1
WHILE @i <= @i_max
BEGIN
SET @query +=

INSERT INTO #T1 (rowid, T’+CAST(@i AS NVARCHAR(10))+‘) SELECT ‚+CAST(@i AS NVARCHAR(10)) + ‚,““

SET @i+= 1
END
/***** CREATE SAMPLE DATA *****/

SET @query +=

CREATE TABLE #T2
(
ID INT IDENTITY(1,1),
VBELN INT,
POSNR NVARCHAR(10)
)

DECLARE @i INT = 1
DECLARE @imax INT = 1000000

WHILE @i <= @imax
BEGIN
INSERT INTO #T2
SELECT ROUND(RAND(@i) * 30000,0), NULL
SET @i+= 1
END
UPDATE T2 SET POSNR = POSNR_SOLL
FROM #T2 AS T2
JOIN
(
SELECT  ID, POSNR_SOLL = CAST(ROW_NUMBER() OVER (PARTITION BY VBELN ORDER BY VBELN)  AS nvarchar(10))
FROM #T2 AS T2
) AS SOLL
ON
SOLL.ID = T2.ID

/*************** MAIN TASK ***********/

SET @query +=

SELECT
SINGULAR.VBELN,
C_POSNR = CONCAT(

SET @i = 1
WHILE @i <= @i_max
BEGIN
IF @i = 1 SET @query += ‚MAX(T’+CAST(@i AS NVARCHAR(10))+’+POSNR)‘
ELSE SET @query += ‚,MAX(+“,“+T’+CAST(@i AS NVARCHAR(10))+’+POSNR)‘
SET @i += 1
END
SET @query += ‚
)

FROM
(
SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY VBELN ORDER BY POSNR) FROM #T2
) AS SING
JOIN #T1 AS PIV
ON
PIV.ROWID = SING.RN
GROUP BY
SINGULAR.VBELN
ORDER BY
SINGULAR.VBELN

SET @query +=

DROP TABLE #T1
DROP TABLE #T2

SELECT @query
EXEC (@query)