Skip to content

lakehouse_parquet_read_as_spark

Public callable

Read a Parquet file from a Fabric lakehouse Files path.

This reads from the lakehouse Files/ area using Spark. If Spark cannot read the original Parquet file because of timestamp precision issues, the helper tries a fallback _tsus path. If that fallback file does not exist, it converts the single local Parquet file from nanosecond to microsecond timestamps and retries the fallback path.

Parameters:

Name Type Description Default
lh Housepath

Lakehouse path object returned by get_path.

required
relative_path str

Path to the Parquet file under the lakehouse Files/ folder, without the leading "Files/". For example: "raw/orders/orders_2026.parquet".

required
verbose bool

Whether to print read and fallback progress.

True
spark_session object

Spark session to use. If omitted, the helper uses the notebook global spark.

None

Returns:

Type Description
DataFrame

Spark DataFrame loaded from the original or converted Parquet path.

Raises:

Type Description
ValueError

If relative_path is not a nested file path.

RuntimeError

If neither the original path nor the converted fallback path can be read successfully.

Examples:

>>> lh_source = get_path("Sandbox", "Source", config=CONFIG)
>>> df = lakehouse_parquet_read_as_spark(
...     lh_source,
...     "raw/orders/orders_2026.parquet",
... )
Notes
-----
Assumes Fabric notebook runtime filesystem conventions for local fallback
conversion paths (``/lakehouse/default/Files/...``).
Source code in src/fabricops_kit/fabric_io.py
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
def lakehouse_parquet_read_as_spark(lh, relative_path, verbose=True, spark_session=None):
    """Read a Parquet file from a Fabric lakehouse Files path.

    This reads from the lakehouse `Files/` area using Spark. If Spark cannot
    read the original Parquet file because of timestamp precision issues, the
    helper tries a fallback `_tsus` path. If that fallback file does not exist,
    it converts the single local Parquet file from nanosecond to microsecond
    timestamps and retries the fallback path.

    Parameters
    ----------
    lh : Housepath
        Lakehouse path object returned by `get_path`.
    relative_path : str
        Path to the Parquet file under the lakehouse `Files/` folder, without
        the leading `"Files/"`. For example:
        `"raw/orders/orders_2026.parquet"`.
    verbose : bool, default True
        Whether to print read and fallback progress.
    spark_session : object, optional
        Spark session to use. If omitted, the helper uses the notebook global
        `spark`.

    Returns
    -------
    pyspark.sql.DataFrame
        Spark DataFrame loaded from the original or converted Parquet path.

    Raises
    ------
    ValueError
        If `relative_path` is not a nested file path.
    RuntimeError
        If neither the original path nor the converted fallback path can be
        read successfully.

    Examples
    --------
    >>> lh_source = get_path("Sandbox", "Source", config=CONFIG)
    >>> df = lakehouse_parquet_read_as_spark(
    ...     lh_source,
    ...     "raw/orders/orders_2026.parquet",
    ... )
    Notes
    -----
    Assumes Fabric notebook runtime filesystem conventions for local fallback
    conversion paths (``/lakehouse/default/Files/...``).
    """
    if not getattr(lh, "root", None):
        raise ValueError("lh.root is required.")
    if not relative_path:
        raise ValueError("relative_path is required.")

    spark_obj = _get_spark(spark_session)

    relative_path = relative_path.lstrip("/")
    if relative_path.startswith("Files/"):
        relative_path = relative_path[len("Files/") :]

    orig_spark_path = f"Files/{relative_path}"
    lakehouse_prefix = "/lakehouse/default/"
    parts = relative_path.split("/")

    if len(parts) < 2:
        raise ValueError("relative_path should look like folder/file.parquet or folder/subfolder/file.parquet.")

    tsus_dir = parts[:-2] + [parts[-2] + "_tsus"]
    tsus_relative_path = "/".join(tsus_dir + [parts[-1]])
    tsus_spark_path = f"Files/{tsus_relative_path}"

    orig_local_path = f"{lakehouse_prefix}{orig_spark_path}"
    tsus_local_path = f"{lakehouse_prefix}{tsus_spark_path}"

    if verbose:
        print(f"Try Spark read: {orig_spark_path}")

    try:
        df = spark_obj.read.parquet(orig_spark_path)
        _ = df.limit(1).collect()
        if verbose:
            print("SUCCESS: Spark read original path.")
        return df
    except Exception as exc:
        if verbose:
            print(f"Original Parquet read failed. Will try fallback path. Exception: {exc}")

    for try_convert in range(2):
        tag = " after single-file convert" if try_convert else ""

        if verbose:
            print(f"Try Spark read: {tsus_spark_path}{tag}")

        try:
            df = spark_obj.read.parquet(tsus_spark_path)
            _ = df.limit(1).collect()
            if verbose:
                print("SUCCESS: Spark read _tsus path.")
            return df

        except Exception as exc:
            msg = str(exc)
            path_not_found = (
                "[PATH_NOT_FOUND]" in msg
                or "Path does not exist" in msg
                or "No such file or directory" in msg
            )

            if try_convert == 0 and path_not_found:
                if verbose:
                    print("PATH NOT FOUND for _tsus parquet. Will convert one file and retry.")

                try:
                    mssparkutils.fs.mkdirs("/".join(tsus_dir))
                except Exception:
                    pass

                _convert_single_parquet_ns_to_us(
                    local_in_path=orig_local_path,
                    local_out_path=tsus_local_path,
                    verbose=verbose,
                )
            else:
                if verbose:
                    print(f"FAILED: Spark read _tsus path. Exception: {exc}")
                break

    raise RuntimeError("Failed to read from both original and _tsus Parquet paths.")