Skip to content

check_schema_drift

Public callable

Compare a current dataframe schema against a baseline schema snapshot.

Parameters:

Name Type Description Default
df Any

Value used by this callable.

required
dataset_name Any

Value used by this callable.

required
table_name Any

Value used by this callable.

required
baseline_snapshot Any

Value used by this callable.

None
policy Any

Value used by this callable.

None
engine Any

Value used by this callable.

'spark'

Returns:

Type Description
dict

Structured output produced by this callable.

Source code in src/fabricops_kit/drift.py
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
def check_schema_drift(df, dataset_name: str, table_name: str, baseline_snapshot: dict | None = None, policy: dict | None = None, engine: str = "spark") -> dict:
    """Compare a current dataframe schema against a baseline schema snapshot.

        Parameters
        ----------
        df : Any
            Value used by this callable.
        dataset_name : Any
            Value used by this callable.
        table_name : Any
            Value used by this callable.
        baseline_snapshot : Any
            Value used by this callable.
        policy : Any
            Value used by this callable.
        engine : Any
            Value used by this callable.

        Returns
        -------
        dict
            Structured output produced by this callable.
    """
    current_snapshot = build_schema_snapshot(df, dataset_name=dataset_name, table_name=table_name, engine=engine)
    if baseline_snapshot is None:
        return {
            "dataset_name": dataset_name,
            "table_name": table_name,
            "status": "no_baseline",
            "can_continue": True,
            "current_snapshot": current_snapshot,
            "baseline_snapshot": None,
            "comparison": None,
            "message": "No baseline schema snapshot found; current snapshot captured as first observation.",
        }

    comparison = compare_schema_snapshots(baseline_snapshot, current_snapshot, policy=policy or default_schema_drift_policy())
    status = str(comparison.get("status", "passed"))
    return {
        "dataset_name": dataset_name,
        "table_name": table_name,
        "status": status,
        "can_continue": bool(comparison.get("can_continue", True)),
        "current_snapshot": current_snapshot,
        "baseline_snapshot": baseline_snapshot,
        "comparison": comparison,
        "message": "Schema drift check completed.",
    }