MERGE - A Neat Way to Do Type 2 Load
In this post we'll introduce a way to load data into a Type 2 table in a neat and beautiful way. If you don't know what a Type 2 table (or Slow Changed Dimension Type 2) is, please read the book 'Data Warehouse Toolkit' by Ralph Kimball for details.
We get the below datasets
For more details and examples about MERGE
statement, please refer to the Microsoft online document https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15.
Firstly, we use the below script to generate two temp tables
and populate data into them.
CREATE TABLE #Target
(
[ID]
int
,[Name] varchar(16)
,[IsEnabled] bit
)
INSERT INTO #Target
VALUES (1,'Pepper',1)
,(2,'Chilli',1)
,(3,'Paprika',1)
CREATE TABLE #Source
(
[ID]
int
,[Name] varchar(16)
,[IsEnabled] bit
)
INSERT INTO #Source
VALUES (1,'Pepper',1)
,(2,'Capsicum',1)
,(4,'Coriander',1)
select * from #Target
select * from #Source
To load data from #Source to #Target, we make the below
rules:
- Any row in the #Source table with an ID value that’s not in the #Target table will be inserted into the #Target table directly.
- If a row in the #Source table has the same ID value as one in the #Target table but with a different Name value, the row from the #Source table will be inserted into the #Target table; meanwhile, the row in the #Target table should be updated so that IsEnabled equals 0.
- If a row in the #Source table has the same ID and Name values as the one in the #Target table, do nothing.
The above rules defines a Type 2 load. We can easily achieve
our goal using the MERGE statement. Here’s the code
INSERT INTO #Target
SELECT [ID],[Name],1
FROM
(
MERGE INTO #Target Tar
USING #Source Src
ON Tar.[ID] = Src.[ID]
WHEN NOT MATCHED THEN
INSERT ([ID],[Name],[IsEnabled])
VALUES ([ID],[Name],1)
WHEN MATCHED AND Tar.[Name] <> Src.[Name]
THEN UPDATE
SET Tar.[IsEnabled] = 0
OUTPUT $action
,Src.[ID]
,Src.[Name]
) AS CTE (Action,[ID],[Name])
WHERE CTE.Action = 'UPDATE';
After we execute the above code and checks the
data in both #Target and #Source respectively, we get the below results
As we can see, our script has done the job neatly and
accurately. Here’s the explanation how the above powerful script works:
- The ‘MERGE INTO <Target Table> USING <Source Table or Statement> ON <condition>’ sets the basis – it tells us that we’re planning to ‘merge’ data from the #Source table to the #Target table, and we are referencing the ID field in each table as our initial condition.
- The ‘WHEN NOT MATCHED THEN’ statement is short for ‘WHEN NOT MATCHED BY TARGET THEN’, which means that for every row in the #Source table that’s not in the #Target table according to the merge condition (Tar.[ID] = Src.[ID]), insert it into the #Target table. The INSERT ([ID],[Name],[IsEnabled]) indicates the insertion fields in the #Source table, and the VALUES ([ID],[Name],1) decides the fields or fixed values to fetch from the #Source table. Please be noticed here that I fetched [ID] and [Name] fields from the #Source table, and then use 1 as a fixed value.
- The WHEN MATCHED AND Tar.[Name] <> Src.[Name] statement has an additional condition ‘AND Tar.[Name] <> Src.[Name]’. It indicates that all rows in the #Target table that match the merge condition along with the additional condition, are either updated or deleted as specified. In our case, we use the ‘THEN UPDATE…’ to set the [IsEnabled] values to 0 for those matching records in the #Target table.
- The ‘OUTPUT’ statement indicates that the matching rows in the #Source table will be returned, as indicated by the $Action value (UPDATE).
- The above returned rows are then inserted into the #Target table using the ‘INSERT INTO #Target’ statement.
Comments
Post a Comment