Skip to main content

Change Data Capture Engine and Kusto Query

The Change Data Capture (CDC) Engine provides scalable detection and processing of data changes that trigger lifecycle events in EmpowerID's identity lifecycle automation. Rather than manually coding change detection logic for each workflow event, the CDC Engine leverages SQL Server's native Change Data Capture capabilities combined with Azure Log Analytics and Kusto Query Language to automatically track, analyze, and respond to identity attribute changes.

This architecture enables EmpowerID to detect complex lifecycle events—such as mover events triggered by multiple attribute changes within specific timeframes—without requiring custom code for each scenario.

The Challenge of Capturing Data Changes

EmpowerID's event engine enables users to create and configure various flow events for joiners, movers, new mailbox discoveries, and security breach events without coding. After defining these events, the engine evaluates them against established policies to determine appropriate workflows to execute in response. During these workflows, data changes occur throughout the Identity Warehouse, and capturing exactly what changed presents significant challenges:

Complexity of Event Detection

  • Some events cannot be mapped through simple rule-based processes
  • A "mover" event might occur when two out of three attributes (manager, department, or job title) change within 24 hours
  • Different organizations define lifecycle events differently based on their business requirements
  • Writing custom code for each event type is neither scalable nor maintainable

Scale and Performance Requirements

  • EmpowerID manages thousands of identity changes across large organizations
  • Real-time or near-real-time detection is required for security and compliance
  • Change tracking must not impact transactional performance of the Identity Warehouse
  • Historical change data must be retained for audit and compliance purposes

The CDC Engine Solution

EmpowerID addresses these challenges by implementing a scalable Change Data Capture solution that:

  1. Applies CDC selectively to specific tables and attributes in the Identity Warehouse
  2. Generates comprehensive changelogs of all modifications automatically
  3. Transmits change data to Azure Log Analytics for advanced querying
  4. Leverages Kusto Query Language for flexible event detection logic
  5. Triggers workflows when specific change patterns are identified

This approach eliminates the need for manual coding of change detection logic while providing the flexibility to define complex event criteria through queries rather than code.

Architecture Components

The CDC Engine consists of several integrated Azure and SQL Server components working together to detect and process identity changes:

EmpowerID Identity Warehouse

The Identity Warehouse is hosted in Azure SQL Database and serves as the central repository for identity data. When CDC is enabled on specific tables, SQL Server automatically creates:

  • CDC History Tables - Named with "_CT" (Change Tracking) suffix, these tables store all insert, update, and delete operations with metadata columns tracking change type, time, and sequence
  • Deleted Records Tables - Store records of deleted entries for audit purposes
  • System Functions - Provide queryable interfaces to retrieve changes within specific time ranges or sequence numbers

CDC operates at the database level and requires minimal configuration once enabled. The feature is available by default in Azure SQL Database.

PowerShell Scripts

PowerShell scripts deployed in Azure Kubernetes Cluster as containers perform the data transformation and transfer functions:

Master Script

  • Runs an infinite loop with 5-minute intervals
  • Triggers the change streaming script on schedule
  • Manages container lifecycle and error recovery

Change Streaming Script (change_streaming_script.ps1)

  • Loads required PowerShell modules (SQLServer, Az.Accounts, Az.OperationalInsights)
  • Retrieves configuration from Container Secrets (credentials, workspace IDs)
  • Verifies CDC is enabled at database level
  • Reads configuration from config.json to identify tables and columns to monitor
  • Executes the ChangeStreaming stored procedure to retrieve and transform CDC data
  • Pushes transformed JSON data to Azure Log Analytics Workspace
  • Executes Kusto queries against Log Analytics data
  • Inserts qualifying results into BusinessRequestFlowEventInbox table

Configuration File (config.json)

  • Specifies tables and columns for CDC monitoring
  • Allows developers to extend CDC coverage by adding entries
  • Provides flexibility to customize change detection scope without code changes

Azure Log Analytics Workspace

Log Analytics serves as the intermediate storage and querying layer for change data:

  • Data Ingestion - Receives JSON-formatted change data from PowerShell scripts every 5 minutes
  • Automatic Schema Creation - Converts JSON data into queryable tables automatically
  • Retention Policies - Maintains change history according to configured retention periods
  • Query Performance - Optimized for time-series and log data queries

Kusto Query Language

Kusto Query Language (KQL) provides powerful data analysis capabilities similar to SQL but optimized for log and time-series data:

  • Complex Event Detection - Identify patterns across multiple attribute changes and time windows
  • Aggregation and Filtering - Group changes by person, attribute type, or time period
  • Real-time Analysis - Execute queries against streaming change data
  • Flexible Criteria - Define event detection logic through queries rather than code

Example Kusto queries can identify scenarios such as:

  • Department changes within 24 hours of manager changes (mover event)
  • Multiple failed login attempts followed by successful authentication (security event)
  • New accounts created without associated Person objects (joiner event requiring resolution)

BusinessRequestFlowEventInbox Table

The BusinessRequestFlowEventInbox table serves as the integration point between change detection and workflow execution:

  • Query results from Kusto are inserted as records with specific event types
  • Each record includes the PersonID affected and event-specific data
  • The Initiator is set to 'EmpowerIDSystem' to indicate automated detection
  • EmpowerID jobs monitor this table and trigger configured workflows based on event type
  • Workflows execute business logic, notifications, and provisioning actions

CDC Engine Architecture Flow

Detailed Process Flow

  1. Data Modification by EmpowerID Workflows

    • EmpowerID workflows or flow events execute Insert, Delete, or Update (DML) statements against the Identity Warehouse
    • These operations modify tables configured for Change Data Capture
  2. Change Data Capture in SQL Database

    • SQL Server CDC automatically records changes in dedicated history tables
    • Metadata columns track operation type (insert/update/delete), timestamp, and sequence number
    • Changes are recorded transactionally with zero impact on application performance
  3. PowerShell Script Execution

    • Master script triggers change streaming script every 5 minutes
    • Script authenticates to Azure using credentials from Container Secrets
    • Verifies CDC is enabled at database level before proceeding
  4. Configuration Reading

    • Script reads config.json to determine which tables and columns to process
    • Configuration allows selective CDC monitoring without code changes
  5. Stored Procedure Execution

    • ChangeStreaming stored procedure is called with table name and transfer mode parameters
    • Procedure dynamically generates queries to retrieve changes from CDC tables
    • Data is transformed into user-readable JSON format including before/after values
  6. Data Transfer to Log Analytics

    • JSON change data is pushed to Azure Log Analytics Workspace via REST API
    • Log Analytics automatically converts JSON to tabular format for querying
    • Sequence numbers are tracked to prevent data loss or duplication during interruptions
  7. Kusto Query Execution

    • Predefined Kusto queries execute against Log Analytics data
    • Queries identify changes matching specific lifecycle event criteria
    • Results include PersonID and relevant change details
  8. Event Inbox Population

    • Query results are inserted into BusinessRequestFlowEventInbox table
    • Records are typed according to event category (e.g., 'Person Mover')
    • Initiator is set to 'EmpowerIDSystem' to indicate automated detection
  9. Workflow Triggering

    • EmpowerID jobs continuously monitor the BusinessRequestFlowEventInbox table
    • When new records are detected, corresponding workflows are triggered
    • Workflows execute business logic, approvals, notifications, and provisioning actions

Data Transfer Modes

The CDC Engine supports three transfer modes to manage data flow from SQL Server to Log Analytics, balancing performance, reliability, and flexibility:

Configuration: Set runmode parameter to 1

Behavior:

  • System identifies the last sequence number successfully transferred to Log Analytics
  • Only processes changes generated since that sequence number
  • Prevents redundant data processing and duplicate transfers
  • Automatically resumes from interruption point if script or container restarts

Use Case: Normal production operation with continuous change processing

Transfer Mode 2: Viewing

Configuration: Set runmode parameter to 2

Behavior:

  • Retrieves and displays change data without transferring to Log Analytics
  • Used for inspection, testing, and validation purposes
  • Does not update sequence tracking or modify any data

Use Case: Development, testing, and troubleshooting of CDC queries

Transfer Mode 3: Custom Range

Configuration: Set runmode parameter to 3

Behavior:

  • Allows manual specification of starting and ending sequence numbers
  • Transfers a specific range of changes to Log Analytics
  • Useful for backfilling data or reprocessing specific time periods

Use Case: Historical data transfer, error recovery, or selective reprocessing

The sequence-based tracking ensures no data is lost during interruptions and prevents duplicate processing across container restarts.

Example: Mover Event Detection

The following example illustrates how the CDC Engine detects and processes a mover event:

Scenario

A Person's Department attribute changes in the Identity Warehouse, indicating a potential organizational transfer:

  1. Attribute Change Occurs

    • HR system updates employee department from "Sales" to "Marketing"
    • EmpowerID Attribute Flow synchronizes change to Person table
    • SQL CDC records the change in Person_CT table
  2. Change Detection

    • PowerShell script executes ChangeStreaming procedure every 5 minutes
    • Procedure queries Person_CT table for recent changes
    • Transforms CDC data showing Department change from "Sales" to "Marketing"
  3. Log Analytics Processing

    • Change data pushed to Log Analytics as JSON
    • Kusto query executes to identify mover patterns:
    ChangeData_CL
    | where TableName == "Person"
    | where ColumnName == "Department"
    | where PreviousValue != NewValue
    | where PreviousValue != "" // Must have had department before
    | where TimeGenerated > ago(5m)
    | project PersonID, PreviousValue, NewValue, TimeGenerated
  4. Event Creation

    • Query result inserted into BusinessRequestFlowEventInbox
    • EventType set to "Person Mover"
    • Initiator set to "EmpowerIDSystem"
    • EventJsonData includes department change details
  5. Workflow Execution

    • Mover workflow triggered for the Person
    • Business Role and Location Recompiler recalculates role assignments
    • Provisioning policies adjust access based on new department
    • Manager and IT team receive notifications
    • Access to previous department resources reviewed for revocation

Advanced Mover Detection

More complex mover scenarios can be detected through sophisticated Kusto queries:

Multiple Attribute Changes in Time Window:

ChangeData_CL
| where TableName == "Person"
| where ColumnName in ("Department", "Manager", "JobTitle")
| where TimeGenerated > ago(24h)
| summarize ChangeCount = count(), Changes = make_set(ColumnName) by PersonID
| where ChangeCount >= 2
| project PersonID, ChangeCount, Changes

This query identifies Persons with two or more significant attribute changes within 24 hours, triggering comprehensive mover workflows rather than simple attribute updates.

Implementation Considerations

Performance Optimization

CDC Impact on Database:

  • SQL Server CDC operates asynchronously with minimal transaction impact
  • CDC history tables consume additional storage proportional to change volume
  • Consider retention policies and cleanup jobs for CDC tables in high-change environments

PowerShell Script Execution:

  • 5-minute execution interval balances timeliness with resource utilization
  • Can be adjusted based on organizational requirements for change detection latency
  • Container resource allocation should account for peak change volumes

Log Analytics Costs:

  • Data ingestion and retention costs scale with change volume
  • Consider data retention policies aligned with audit requirements
  • Use Kusto query optimization to minimize compute costs

Extensibility

Adding Tables to CDC:

  • Update config.json to include additional tables and columns
  • Enable CDC on the table at database level
  • Customize ChangeStreaming stored procedure if special transformations needed
  • Define new Kusto queries for event detection patterns

Custom Event Types:

  • Define new event types beyond joiner/mover/leaver patterns
  • Create corresponding Kusto queries for detection logic
  • Configure workflows to respond to new event types
  • Update BusinessRequestFlowEventInbox processing to handle new types

Monitoring and Troubleshooting

Verification Queries:

Check recent mover events detected by CDC:

SELECT * FROM dbo.BusinessRequestFlowEventInbox
WHERE EventJsonData LIKE '%TimeGenerated%'
ORDER BY CreatedDate DESC

Verify CDC is enabled on database:

SELECT name, is_cdc_enabled
FROM sys.databases
WHERE name = 'EmpowerID'

Check CDC tables for specific table:

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Person(@from_lsn, @to_lsn, 'all')

Common Issues:

  • CDC not enabled: Verify CDC is activated at database level before table level
  • Script failures: Check Container Secret values for authentication credentials
  • Missing changes: Verify config.json includes desired tables and columns
  • Duplicate events: Confirm sequence tracking is functioning correctly in Automatic mode

Integration with Identity Lifecycle

The CDC Engine integrates with identity lifecycle management at multiple points:

Joiner Processes

  • Detection of new Person objects created via Account Inbox
  • Identification of initial role assignments requiring approval
  • Monitoring of onboarding workflow completion status

Mover Processes

  • Detection of attribute changes indicating role or location changes
  • Identification of manager changes requiring workflow initiation
  • Monitoring of access changes requiring review or approval

Leaver Processes

  • Detection of termination date population in Person records
  • Identification of account disabling events
  • Monitoring of cleanup workflow execution status

Compliance and Audit

  • Complete change history for identity attributes
  • Audit trail of automated workflow triggers
  • Compliance evidence for policy-based provisioning

For additional context on components referenced in the CDC Engine architecture:

For workflow configuration using detected events: