Skip to content

lakehouse_csv_read

Public callable

Read a CSV file from a Fabric lakehouse Files path.

This reads from the lakehouse Files/ area using the ABFSS root stored in a Housepath. In the Source step, use it for raw file ingestion before standardisation or conversion to Delta tables.

Parameters:

Name Type Description Default
lh Housepath

Lakehouse path object returned by get_path.

required
relative_path str

Path to the CSV file or folder under the lakehouse root, for example "Files/raw/orders.csv" or "Files/raw/orders/".

required
spark_session object

Spark session to use. If omitted, the helper uses the notebook global spark.

None
header bool

Whether the first row of the CSV file contains column names.

True

Returns:

Type Description
DataFrame

Spark DataFrame loaded from the CSV path.

Raises:

Type Description
ValueError

If lh.root or relative_path is missing.

RuntimeError

If no Spark session is available.

Examples:

>>> lh_source = get_path("Sandbox", "Source", config=CONFIG)
>>> df = lakehouse_csv_read(lh_source, "Files/raw/orders.csv")
Source code in src/fabricops_kit/fabric_io.py
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
def lakehouse_csv_read(lh, relative_path, spark_session=None, header=True):
    """Read a CSV file from a Fabric lakehouse Files path.

    This reads from the lakehouse `Files/` area using the ABFSS root stored in
    a `Housepath`. In the Source step, use it for raw file ingestion before
    standardisation or conversion to Delta tables.

    Parameters
    ----------
    lh : Housepath
        Lakehouse path object returned by `get_path`.
    relative_path : str
        Path to the CSV file or folder under the lakehouse root, for example
        `"Files/raw/orders.csv"` or `"Files/raw/orders/"`.
    spark_session : object, optional
        Spark session to use. If omitted, the helper uses the notebook global
        `spark`.
    header : bool, default True
        Whether the first row of the CSV file contains column names.

    Returns
    -------
    pyspark.sql.DataFrame
        Spark DataFrame loaded from the CSV path.

    Raises
    ------
    ValueError
        If `lh.root` or `relative_path` is missing.
    RuntimeError
        If no Spark session is available.

    Examples
    --------
    >>> lh_source = get_path("Sandbox", "Source", config=CONFIG)
    >>> df = lakehouse_csv_read(lh_source, "Files/raw/orders.csv")
    """
    if not getattr(lh, "root", None):
        raise ValueError("lh.root is required.")
    if not relative_path:
        raise ValueError("relative_path is required.")

    spark_obj = _get_spark(spark_session)
    path = f"{lh.root.rstrip('/')}/{relative_path.lstrip('/')}"
    return spark_obj.read.option("header", header).csv(path)