MERGE - A Neat Way to Do Type 2 Load

In this post we'll introduce a way to load data into a Type 2 table in a neat and beautiful way. If you don't know what a Type 2 table (or Slow Changed Dimension Type 2) is, please read the book 'Data Warehouse Toolkit' by Ralph Kimball for details.

Firstly, we use the below script to generate two temp tables and populate data into them.

CREATE TABLE #Target

(

[ID] int

,[Name] varchar(16)

,[IsEnabled] bit

)

INSERT INTO #Target

VALUES (1,'Pepper',1)

,(2,'Chilli',1)

,(3,'Paprika',1)

CREATE TABLE #Source

(

[ID] int

,[Name] varchar(16)

,[IsEnabled] bit

)

INSERT INTO #Source

VALUES (1,'Pepper',1)

,(2,'Capsicum',1)

,(4,'Coriander',1)

select * from #Target

select * from #Source

We get the below datasets

To load data from #Source to #Target, we make the below rules:

Any row in the #Source table with an ID value that’s not in the #Target table will be inserted into the #Target table directly.
If a row in the #Source table has the same ID value as one in the #Target table but with a different Name value, the row from the #Source table will be inserted into the #Target table; meanwhile, the row in the #Target table should be updated so that IsEnabled equals 0.
If a row in the #Source table has the same ID and Name values as the one in the #Target table, do nothing.

The above rules defines a Type 2 load. We can easily achieve our goal using the MERGE statement. Here’s the code

INSERT INTO #Target

SELECT [ID],[Name],1

FROM

(

MERGE INTO #Target Tar

USING #Source Src

ON Tar.[ID] = Src.[ID]

WHEN NOT MATCHED THEN

INSERT ([ID],[Name],[IsEnabled])

VALUES ([ID],[Name],1)

WHEN MATCHED AND Tar.[Name] <> Src.[Name]

THEN UPDATE

SET Tar.[IsEnabled] = 0

OUTPUT $action

,Src.[ID]

,Src.[Name]

) AS CTE (Action,[ID],[Name])

WHERE CTE.Action = 'UPDATE';

After we execute the above code and checks the data in both #Target and #Source respectively, we get the below results

As we can see, our script has done the job neatly and accurately. Here’s the explanation how the above powerful script works:

The ‘MERGE INTO <Target Table> USING <Source Table or Statement> ON <condition>’ sets the basis – it tells us that we’re planning to ‘merge’ data from the #Source table to the #Target table, and we are referencing the ID field in each table as our initial condition.
The ‘WHEN NOT MATCHED THEN’ statement is short for ‘WHEN NOT MATCHED BY TARGET THEN’, which means that for every row in the #Source table that’s not in the #Target table according to the merge condition (Tar.[ID] = Src.[ID]), insert it into the #Target table. The INSERT ([ID],[Name],[IsEnabled]) indicates the insertion fields in the #Source table, and the VALUES ([ID],[Name],1) decides the fields or fixed values to fetch from the #Source table. Please be noticed here that I fetched [ID] and [Name] fields from the #Source table, and then use 1 as a fixed value.
The WHEN MATCHED AND Tar.[Name] <> Src.[Name] statement has an additional condition ‘AND Tar.[Name] <> Src.[Name]’. It indicates that all rows in the #Target table that match the merge condition along with the additional condition, are either updated or deleted as specified. In our case, we use the ‘THEN UPDATE…’ to set the [IsEnabled] values to 0 for those matching records in the #Target table.
The ‘OUTPUT’ statement indicates that the matching rows in the #Source table will be returned, as indicated by the $Action value (UPDATE).
The above returned rows are then inserted into the #Target table using the ‘INSERT INTO #Target’ statement.

For more details and examples about MERGE statement, please refer to the Microsoft online document https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15.

Search This Blog

The Data Thing

MERGE - A Neat Way to Do Type 2 Load

Comments

Post a Comment

Popular posts from this blog

SQL Server - Using SQL Agent JOBs to Schedule and Execute Multiple Stored Procedures in Parallel

sp_executesql – Let's learn How to Use It