Skip to content

_convert_single_parquet_ns_to_us

Internal helper
This page documents an internal implementation helper, not a primary public API.

Convert one Parquet file from nanosecond to microsecond timestamps.

Spark can fail to read some Parquet files that contain nanosecond timestamp precision. This helper reads one local Parquet file with PyArrow, rewrites it with microsecond timestamp precision, and saves it to a fallback path.

This is an internal helper used by lakehouse_parquet_read_as_spark.

Parameters:

Name Type Description Default
local_in_path str

Local input path to the original Parquet file.

required
local_out_path str

Local output path for the converted Parquet file.

required
verbose bool

Whether to print conversion progress.

True

Returns:

Type Description
None

The converted Parquet file is written to local_out_path.

Examples:

>>> _convert_single_parquet_ns_to_us(
...     "/lakehouse/default/Files/raw/orders.parquet",
...     "/lakehouse/default/Files/raw_tsus/orders.parquet",
... )
Source code in src/fabricops_kit/fabric_io.py
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
def _convert_single_parquet_ns_to_us(local_in_path, local_out_path, verbose=True):
    """Convert one Parquet file from nanosecond to microsecond timestamps.

    Spark can fail to read some Parquet files that contain nanosecond timestamp
    precision. This helper reads one local Parquet file with PyArrow, rewrites
    it with microsecond timestamp precision, and saves it to a fallback path.

    This is an internal helper used by `lakehouse_parquet_read_as_spark`.

    Parameters
    ----------
    local_in_path : str
        Local input path to the original Parquet file.
    local_out_path : str
        Local output path for the converted Parquet file.
    verbose : bool, default True
        Whether to print conversion progress.

    Returns
    -------
    None
        The converted Parquet file is written to `local_out_path`.

    Examples
    --------
    >>> _convert_single_parquet_ns_to_us(
    ...     "/lakehouse/default/Files/raw/orders.parquet",
    ...     "/lakehouse/default/Files/raw_tsus/orders.parquet",
    ... )
    """
    import pyarrow as pa
    import pyarrow.parquet as pq

    try:
        if verbose:
            print(f"Reading with pyarrow: {local_in_path}")
            print(f"Writing us timestamps to: {local_out_path}")

        pdf = pd.read_parquet(local_in_path, engine="pyarrow")
        table = pa.Table.from_pandas(pdf, preserve_index=False)

        pq.write_table(
            table,
            local_out_path,
            coerce_timestamps="us",
            allow_truncated_timestamps=True,
        )

        if verbose:
            print(f"done: {local_out_path}")

    except Exception as exc:
        print(f"FAILED converting ns to us for file {local_in_path}: {exc}")