André Kienitz

In this article I want to show  some features about the Group By clause and the Row Number window function that you can use in SQL statements.

There are many situations where you want a unique list of items. But in the data source the items are not unique.

Let’s take an example of the AdventureWorks2012. If you want a list of Job titles from the Employee table, then the Job titles are not unique:

SELECT [JobTitle]
FROM [AdventureWorks2012].[HumanResources].[Employee]
WHERE [JobTitle] IN (N'Design Engineer', N'Research and Development Manager')

dataset01

You can see that Design Engineer appears three times and Research and Development Manager appears twice in the data set.

In order to get a unique list of job titles you can use the group by clause:

SELECT [JobTitle]
FROM [AdventureWorks2012].[HumanResources].[Employee]
WHERE [JobTitle] IN (N'Design Engineer', N'Research and Development Manager')
GROUP BY [JobTitle]

dataset02

Now the list of job titles is unique. You can enhance the SQL statement for the following questions:

1)   How many Design Engineers and Research and Development Manager do we have?

2)   What is the smallest birth and hire date per job title?

For question 1 we use the count function and for question 2 the minimum function.

SELECT [JobTitle]
, COUNT(*) AS count
, MIN([BirthDate]) AS MinBirthDate
, MIN([HireDate]) AS MinHireDate
FROM [AdventureWorks2012].[HumanResources].[Employee]
WHERE [JobTitle] IN (N'Design Engineer', N'Research and Development Manager')
GROUP BY [JobTitle]

dataset03

It is not guaranteed that the smallest birth and hire date are from the same employee. The following statement shows the data from all Research and Development Managers:

SELECT [NationalIDNumber], [JobTitle], [BirthDate], [HireDate]
FROM [AdventureWorks2012].[HumanResources].[Employee]
WHERE [JobTitle] = N'Research and Development Manager'

dataset04

In this case line 1 gives you the smallest HireDate and line 2 the smallest BirthDate. Thus smallest BirthDate and HireDate belongs to different employees.

You can use the row number window function in order to get the HireDate from the employee with the smallest BirthDate per JobTitle.

WITH BASIS AS (
SELECT [NationalIDNumber]
, [JobTitle]
, [BirthDate]
, [HireDate]
, ROW_NUMBER() OVER (PARTITION BY [JobTitle]
ORDER BY [BirthDate] ASC) AS rn
FROM [AdventureWorks2012].[HumanResources].[Employee]
WHERE [JobTitle] IN (N'Design Engineer', N'Research and Development Manager')
)
SELECT [NationalIDNumber]
, [JobTitle]
, [BirthDate]
, [HireDate]
FROM BASIS
WHERE rn = 1

dataset05

The row number function numbers each row starting at one for each JobTitle which is included in the partition by section of the row number function. The numbering is ordered by BirthDate which is inlcuded in the order by section of the row number function. Row number 1 contains all data with the smallest BirthDate per JobTitle. You use an common table expression, because you cannot filter directly the result of the row number function. Below you find the result of the row number function:

SELECT [NationalIDNumber]
, [JobTitle]
, [BirthDate]
, [HireDate]
, ROW_NUMBER() OVER (PARTITION BY [JobTitle]
ORDER BY [BirthDate] ASC) AS rn
FROM [AdventureWorks2012].[HumanResources].[Employee]
WHERE [JobTitle] IN (N'Design Engineer', N'Research and Development Manager')

dataset06