Introduction

Microsoft Fabric’s lakehouse architecture provides a powerful foundation for storing and processing data at scale. One of the key capabilities is accessing files stored in the lakehouse directly from Fabric notebooks using both pandas and PySpark. This approach allows data professionals to seamlessly work with various file formats while leveraging the distributed computing power of Spark when needed.

In this guide, we’ll walk through the process of accessing Excel files stored in a Fabric lakehouse using a notebook, demonstrating how to read the data with pandas and convert it to a Spark DataFrame for further processing.

Prerequisites

  • Access to Microsoft Fabric workspace
  • A lakehouse with files stored in the Files section
  • Basic knowledge of Python and pandas
  • Understanding of Spark DataFrames

For the latest updated information, please refer to the official Microsoft Fabric documentation.

Step 1: Navigate to Your Lakehouse

Fabric workspace showing lakehouse
Fabric workspace showing the lakehouse in the workspace items list

Start by navigating to your Microsoft Fabric workspace where you can see your lakehouse listed among other workspace items. In this example, we can see “LH_REDSHIFT” lakehouse highlighted in the workspace.

Step 2: Access Your Files in the Lakehouse

Lakehouse Files section showing stored files
Lakehouse Files section displaying various files including the Excel file with ABFS path visible

Once inside your lakehouse, navigate to the Files section where you can see all your uploaded files. To view the file properties, click the three dots on the file and select “Properties”. The Properties panel will then show the ABFS path (like “your_file.xlsx” in this example), which we’ll use in our notebook code.

Step 3: Define Your File Location and Path

In your Fabric notebook, start by defining the lakehouse path and the specific file you want to access:

import pandas as pd
from pyspark.sql import SparkSession

# Define lakehouse path and file location
lakehouse_abfss = "abfss://[workspace-id]@onelake.dfs.fabric.microsoft.com/[lakehouse-id]"
file_location = "Files/your_file.xlsx"

# Combine to create full path
excel_path = f"{lakehouse_abfss}/{file_location}"

Step 4: Read the File with Pandas

Use pandas to read the Excel file directly from the lakehouse:

# Read Excel file with pandas
pandas_df = pd.read_excel(excel_path)

This approach is particularly useful for smaller files or when you need pandas-specific functionality for data manipulation.

Step 5: Convert to Spark DataFrame

For larger datasets or when you need distributed processing capabilities, convert the pandas DataFrame to a Spark DataFrame:

# Convert to Spark DataFrame
spark_df = spark.createDataFrame(pandas_df)

# Display the result
spark_df.show()

Step 6: Verify Your Results

The show() method will display the first 20 rows of your DataFrame, allowing you to verify that the data was loaded correctly from the lakehouse.

Alternative Approaches

For Excel files specifically, the pandas approach is the most straightforward since Spark doesn’t have native Excel support. However, you can also work with other file formats directly with Spark:

# For CSV files
spark_df = spark.read.option("header", "true").csv(f"{lakehouse_abfss}/Files/your_file.csv")

# For Parquet files
spark_df = spark.read.parquet(f"{lakehouse_abfss}/Files/your_file.parquet")

# For JSON files
spark_df = spark.read.json(f"{lakehouse_abfss}/Files/your_file.json")

Note: For Excel files (.xlsx), using pandas with pd.read_excel() followed by conversion to Spark DataFrame (as shown in the main example) is the recommended approach since Spark doesn’t natively support Excel format.

Complete Code Example

Here’s the complete code block that you can copy and paste into your Fabric notebook:

import pandas as pd
from pyspark.sql import SparkSession

# Define lakehouse path and file location
lakehouse_abfss = "abfss://[workspace-id]@onelake.dfs.fabric.microsoft.com/[lakehouse-id]"
file_location = "Files/your_file.xlsx"

# Combine to create full path
excel_path = f"{lakehouse_abfss}/{file_location}"

# Read Excel file with pandas
pandas_df = pd.read_excel(excel_path)

# Convert to Spark DataFrame
spark_df = spark.createDataFrame(pandas_df)

# Display the result
spark_df.show()

# Optional: Display basic information about the DataFrame
print(f"Number of rows: {spark_df.count()}")
print(f"Number of columns: {len(spark_df.columns)}")
print("Column names:", spark_df.columns)

Remember to:

  • Replace [workspace-id] and [lakehouse-id] with your actual lakehouse ABFSS path
  • Update the file_location to match your Excel file’s path in the lakehouse
  • Ensure your Excel file is uploaded to the Files section of your lakehouse

Conclusion

Accessing files in Microsoft Fabric lakehouse using notebooks provides a flexible and powerful way to work with your data. By combining the ABFSS path with standard pandas and PySpark operations, you can seamlessly read various file formats and leverage both local and distributed computing capabilities.

This approach is particularly valuable when working with mixed workloads where you need the convenience of pandas for data exploration and the scalability of Spark for larger processing tasks.

The integration between Fabric lakehouse and notebooks demonstrates the platform’s commitment to providing a unified analytics experience that scales from small data exploration to enterprise-level data processing.

I’m Snorre

I’m a Data Engineer and Business Analyst with a background in Economics. I love diving into data and turning it into something meaningful using tools like Microsoft Fabric, Power BI, and Azure Data Factory. Over the years, I’ve worked on everything from data migrations to building reports and dashboards that help organizations really make the most of their data.

I’ve had the chance to work in a variety of environments, and I’m all about creating solid, scalable solutions that drive digital transformation. I also enjoy sharing what I know through training sessions and webinars, always looking to help others get more comfortable and skilled with data.


When I’m not working on data projects or teaching, you’ll probably find me outdoors—whether I’m working in the forest, traveling, or just hanging out with friends and family. I also like getting involved in testing out new Microsoft features, giving feedback to make sure they’re as useful as possible before they’re released.

Let’s connect