Home

Sunday, December 7, 2025

Incremental Load in ETL — CDC, Checksum & Timestamp Explained with Interview Questions

Incremental Load in ETL — CDC, Checksum & Timestamp Explained with Interview Questions

📌 Introduction

Running a full load for large datasets becomes slow, expensive, and inefficient. That’s why Incremental Load is used — it loads only the data that has changed since the last load.

Three widely used Incremental Load techniques:

  • Change Data Capture (CDC)
  • Checksum / Hash Comparison
  • Timestamp / Watermark

🔹 What is Incremental Load?

Incremental load means loading only new and updated records.

💡 Why Incremental Load is Important?

  • Avoids full table scan
  • Reduces time and cost
  • Improves ETL efficiency
  • Less impact on source systems

1️⃣ Change Data Capture (CDC)

CDC captures Inserts, Updates & Deletes from the transaction log and stores it in change tables.

Advantages:

  • Tracks deletes automatically
  • High performance
  • Complete change history

Limitations:

  • Must be supported on source DB
  • Extra storage required

2️⃣ Checksum / Hash Based Method

Generates a hash value from important columns. If hash changes → row updated.

Advantages:

  • Works without timestamp or CDC
  • Detects updated rows effectively

Limitations:

  • Hash calculation overhead
  • Delete detection needs key comparison

3️⃣ Timestamp / Watermark

Loads rows where ModifiedDate > LastWatermark.

Advantages:

  • Very simple and common approach
  • Best for small to medium datasets

Limitations:

  • Does not detect deletes
  • Late-arriving data issue

🔄 Quick Comparison

FeatureCDCChecksumTimestamp
Detect Deletes✔ Yes⚠ Partial❌ No
Implementation EffortMediumMediumEasy
Source DependencyCDC RequiredNoTimestamp Required

💬 Interview Questions

Q1: Why do we prefer Incremental Load?
A: Less data movement → better performance & lower cost.

Q2: Which method captures deletes?
A: CDC.

Q3: Timestamp method disadvantages?
A: No delete tracking & late-arriving data problems.

Q4: What if timestamp is not updated during changes?
A: Use Checksum or CDC.

📌 Summary

  • CDC → Best for full accuracy and delete tracking
  • Checksum → Best when timestamp not reliable
  • Timestamp → Best for simplicity

👉 Comment which method your project uses!

Sunday, August 31, 2025

Using FetchXML in SSIS to Load Filtered Records from Dynamics 365

Integrating Dynamics 365 (Dataverse) data with SQL Server or Data Warehouses is a common requirement in real-world projects. While OData can be used, FetchXML provides a much more powerful and flexible way to query Dynamics 365 data. In this article, we’ll explore how to use FetchXML in SSIS to load filtered records from Dynamics 365, along with practical examples, paging strategies, and a demo package.


🔹 What is FetchXML?

FetchXML is Microsoft’s proprietary XML-based query language used in Dynamics 365 and Dataverse. It allows developers to:

  • Query specific records and attributes

  • Apply complex filters (AND/OR conditions, date-based filters, null checks)

  • Perform joins between related entities

  • Control sorting and paging

Unlike standard SQL, FetchXML is optimized for the Dynamics 365 API and is supported directly in SSIS adapters such as KingswaySoft and CozyRoc.


🔹 FetchXML Structure

A simple FetchXML query looks like this:

<fetch version="1.0" output-format="xml-platform" mapping="logical" distinct="false"> <entity name="account"> <attribute name="name" /> <attribute name="accountnumber" /> <order attribute="createdon" descending="true" /> </entity> </fetch>

Key components:

  • <fetch> – defines query options (version, paging, count)

  • <entity> – defines the target table (entity)

  • <attribute> – specifies fields to retrieve

  • <order> – defines sorting


🔹 Real-World Filter Examples

1. Date Range & Active Records

<filter type="and"> <condition attribute="createdon" operator="on-or-after" value="2025-01-01" /> <condition attribute="createdon" operator="on-or-before" value="2025-06-30" /> <condition attribute="statecode" operator="eq" value="0" /> </filter>

✅ Loads only active accounts created in the first half of 2025.


2. Relative Date & IN Operator

<filter type="and"> <condition attribute="modifiedon" operator="last-x-days" value="90" /> <condition attribute="industrycode" operator="in"> <value>1001</value> <value>1002</value> <value>1003</value> </condition> </filter>

✅ Retrieves accounts updated in the last 90 days belonging to specific industries.


3. Null Check with OR Condition

<filter type="or"> <condition attribute="telephone1" operator="not-null" /> <condition attribute="emailaddress1" operator="not-null" /> </filter>

✅ Retrieves contacts with at least one communication method available.


🔹 Joins with Link-Entities

FetchXML allows joins between entities using <link-entity>.

Example: Accounts with Primary Contact

<link-entity name="contact" from="contactid" to="primarycontactid" alias="c"> <attribute name="firstname" /> <attribute name="lastname" /> </link-entity>

Example: Accounts with High-Value Opportunities

<link-entity name="opportunity" from="parentaccountid" to="accountid" alias="o"> <attribute name="name" alias="OppName" /> <filter type="and"> <condition attribute="estimatedvalue" operator="gt" value="10000" /> </filter> </link-entity>

🔹 Paging in FetchXML

Dynamics 365 limits results to 5000 records per request. To fetch more, you need paging.

Example: First Page

<fetch page="1" count="5000"> <entity name="account"> <attribute name="name" /> </entity> </fetch>

Example: Second Page with Cookie

<fetch page="2" count="5000" paging-cookie="{&quot;cookie&quot;:…}"> <entity name="account"> <attribute name="name" /> </entity> </fetch>

In SSIS with KingswaySoft or CozyRoc, you can simply enable Auto-Paging to handle this automatically.


🔹 SSIS Implementation

Steps to implement FetchXML in SSIS:

  1. Create Data Flow Task

  2. Add Dynamics 365 Source and configure connection (OAuth/Client Secret)

  3. Paste FetchXML query in the FetchXML tab

  4. Enable Preview to validate results

  5. Enable Auto-Paging (recommended)

  6. Add Derived Column / Lookup transformations as needed

  7. Connect to an OLE DB Destination (SQL Server or Azure)

  8. (Optional) Use Variables + Expressions for dynamic FetchXML queries


🔹 Best Practices

  • Use aliases to avoid column name collisions

  • Always test FetchXML in Preview before running packages

  • Keep FetchXML queries in variables for reusability

  • Adjust page size based on API throttling

  • Use filters to minimize unnecessary data load


🎥 YouTube Tutorial

👉 Watch the full video tutorial here: https://youtu.be/YQ_JZNx0PvU

This video includes live SSIS demo steps, sample FetchXML queries, and real-world integration best practices.


🔹 Conclusion

FetchXML is an essential skill for Dynamics 365 + SSIS developers. It allows you to retrieve exactly the records you need, apply complex filters, join multiple entities, and handle large data volumes with paging. When combined with SSIS, it enables scalable and automated ETL pipelines for Dynamics 365 data.

Friday, August 8, 2025

🚀 New Video Tutorial Released!  🎯 Mastering Upsert in Dynamics 365 via KingswaySoft (SSIS Integration Explained) 💡 Learn how to avoid duplicates and ensure clean, accurate data loads using Alternate Keys or GUIDs in your real-time D365 CE projects. 🔍 In this hands-on session, I cover: ✅ The difference between Insert vs Update (Upsert) ✅ When and how to use Alternate Keys or GUIDs ✅ Dynamics 365 setup (Key configuration, Entity setup) ✅ SSIS package walkthrough using KingswaySoft ✅ Real-time data demo using flat file input ✅ Best practices, tips, and real-world use cases 🎓 Perfect for: Data Engineers working with D365 SSIS Developers and Tech Leads Consultants handling CRM/ERP integrations Anyone preparing for interviews in data & integration roles 📺 Watch the full tutorial now: 👉 https://youtu.be/F-XnsMF2PcI 💬 I’d love to hear from you! Have you used Alternate Keys or GUIDs in your D365 projects? What challenges have you faced? #SSIS #KingswaySoft #Dynamics365 #DataIntegration #CRM #Upsert #Dataverse #ETL #MicrosoftDynamics #DataEngineering #YouTubeLearning #SQLServer #PowerPlatform #CloudBIAcademy