Skip to content

warehouse_read

Public callable

Read a table from a Microsoft Fabric warehouse.

This uses Fabric Spark's synapsesql connector to read from a warehouse configured in the framework CONFIG mapping. In Source → Unified → Product workflows, this is commonly used when curated inputs are stored in Fabric Warehouse instead of Lakehouse tables.

Parameters:

Name Type Description Default
env str

Environment name in the config mapping, for example "Sandbox" or "DE".

required
target str

Warehouse target name under the selected environment, for example "Warehouse" or "wh_Bronze".

required
schema str

Warehouse schema name, for example "dbo".

required
table str

Warehouse table name.

required
config dict

Config mapping from the config notebook. Expected shape: config[environment][target] = Housepath(...).

None
spark_session object

Spark session to use. If omitted, the helper uses the notebook global spark.

None

Returns:

Type Description
DataFrame

Spark DataFrame loaded from the Fabric warehouse table.

Raises:

Type Description
RuntimeError

If the Microsoft Fabric Spark connector is unavailable.

ValueError

If the selected environment or target is missing from the config.

Examples:

>>> df = warehouse_read(
...     env="EDLH",
...     target="wh_Bronze",
...     schema="dbo",
...     table="Customer",
...     config=CONFIG,
... )
Source code in src/fabricops_kit/fabric_io.py
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
def warehouse_read(env, target, schema, table, config=None, spark_session=None):
    """Read a table from a Microsoft Fabric warehouse.

    This uses Fabric Spark's `synapsesql` connector to read from a warehouse
    configured in the framework `CONFIG` mapping. In Source → Unified →
    Product workflows, this is commonly used when curated inputs are stored in
    Fabric Warehouse instead of Lakehouse tables.

    Parameters
    ----------
    env : str
        Environment name in the config mapping, for example `"Sandbox"` or `"DE"`.
    target : str
        Warehouse target name under the selected environment, for example
        `"Warehouse"` or `"wh_Bronze"`.
    schema : str
        Warehouse schema name, for example `"dbo"`.
    table : str
        Warehouse table name.
    config : dict, optional
        Config mapping from the config notebook. Expected shape:
        `config[environment][target] = Housepath(...)`.
    spark_session : object, optional
        Spark session to use. If omitted, the helper uses the notebook global
        `spark`.

    Returns
    -------
    pyspark.sql.DataFrame
        Spark DataFrame loaded from the Fabric warehouse table.

    Raises
    ------
    RuntimeError
        If the Microsoft Fabric Spark connector is unavailable.
    ValueError
        If the selected environment or target is missing from the config.

    Examples
    --------
    >>> df = warehouse_read(
    ...     env="EDLH",
    ...     target="wh_Bronze",
    ...     schema="dbo",
    ...     table="Customer",
    ...     config=CONFIG,
    ... )
    """
    spark_obj = _get_spark(spark_session)
    p = get_path(env, target, config=config)

    try:
        import com.microsoft.spark.fabric
        from com.microsoft.spark.fabric.Constants import Constants
    except Exception as exc:
        raise RuntimeError(
            "This function must run inside Microsoft Fabric Spark with "
            "com.microsoft.spark.fabric available."
        ) from exc

    return (
        spark_obj.read.option(Constants.WorkspaceId, p.workspace_id)
        .option(Constants.DatawarehouseId, p.house_id)
        .synapsesql(f"{p.house_name}.{schema}.{table}")
    )