温馨提示:本文翻译自stackoverflow.com，查看原文请点击：python 3.x - Reading Excel file from Azure Databricks

azure-data-lake-gen2 azure-databricks excel python-3.x

python 3.x - 从Azure Databricks读取Excel文件

发布于 2021-01-10 03:08:57

我正在尝试.xlsx从Azure Databricks准备Excel文件（），该文件位于ADLS Gen 2中。

例：

srcPathforParquet = "wasbs://hyxxxx@xxxxdatalakedev.blob.core.windows.net//1_Raw//abc.parquet"
srcPathforExcel = "wasbs://hyxxxx@xxxxdatalakedev.blob.core.windows.net//1_Raw//src.xlsx"

从路径读取实木复合地板文件效果很好。

srcparquetDF = spark.read.parquet(srcPathforParquet )

从路径读取Excel文件时抛出错误：没有这样的文件或目录

srcexcelDF = pd.read_excel(srcPathforExcel , keep_default_na=False, na_values=[''])

提问者

Sreedhar

被浏览

0

查看英文版

查看原文

Jim Xu 2020-09-08 10:05

该方法 pandas.read_excel 不支持使用URLwasbs或abfssURL方案来访问文件。有关更多详细信息，请参阅此处

因此，如果要使用pandas访问文件，建议您创建一个sas令牌，并使用https带有sas token的方案来访问文件或将文件下载为流，然后使用pandas进行读取。同时，您还将存储帐户安装为文件系统，然后按照@ CHEEKATLAPRADEEP-MSFT的说明访问文件。

例如

使用SAS令牌访问

通过Azure门户创建SAS令牌
码

pdf=pd.read_excel('https://<account name>.dfs.core.windows.net/<file system>/<path>?<sas token>')
print(pdf)

以流形式下载文件并读取文件

安装软件包azure-storage-file-datalake并xlrd在数据块中使用pip
码

import io

import pandas as pd
from azure.storage.filedatalake import BlobServiceClient
from azure.storage.filedatalake import DataLakeServiceClient

blob_service_client = DataLakeServiceClient(account_url='https://<account name>.dfs.core.windows.net/', credential='<account key>')

file_client = blob_service_client.get_file_client(file_system='test', file_path='data/sample.xlsx')
with io.BytesIO() as f:
  downloader =file_client.download_file()
  b=downloader.readinto(f)
  print(b)
  df=pd.read_excel(f)
  print(df)

此外我们还可以使用pyspark读取excel文件。但是我们需要com.crealytics:spark-excel在我们的环境中添加jar 。有关更多详细信息，请参阅此处和此处

例如

com.crealytics:spark-excel_2.12:0.13.1通过Maven添加软件包。此外，请注意，如果您使用scala 2.11，请添加软件包com.crealytics:spark-excel_2.11:0.13.1
码

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.<account name>.dfs.core.windows.net",'<account key>')

print("use spark")
df=sqlContext.read.format("com.crealytics.spark.excel") \
        .option("header", "true") \
        .load('abfss://test@testadls05.dfs.core.windows.net/data/sample.xlsx')

df.show()

相关问题

1

使用C＃写入Excel。

2

与Datedif公式求和

3

查找从一个工作簿到另一工作簿的匹配值行ID

4

删除多个工作表中表中的空行

5

如何从字符串中查找斜线的出现次数

6

如何将“枚举”转换为数组

7

检查电子表格中的单元格是否为空值

8

在特定工作表的某些列中转换公式

9

VBA遍历工作表以删除重复项

10

如何从具有某些前提条件的位置读取Excel文件？

热门github

1

Bytebot is a self-hosted AI desktop agent that automates computer tasks through natural language commands, operating within a containerized Linux desktop environment.

2

Official inference framework for 1-bit LLMs

3

Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more (翻译：容器映像、文件系统和 Git 存储库中的漏洞以及配置问题和硬编码机密的扫描程序)

4

Find, verify, and analyze leaked credentials (翻译：查找泄露的凭据。)

5

Xray、Tuic、hysteria2、sing-box 八合一一键脚本 (翻译：Xray-core/sing-box 一键脚本快速安装)

6

Streaming music player that finds free music for you

7

Evolution API is an open-source WhatsApp integration API

8

Awesome MCP Servers - A curated list of Model Context Protocol servers

9

LLM agents built for control. Designed for real-world use. Deployed in minutes.

10

All the open source AI Agents hosted on the oTTomator Live Agent Studio platform!

11

⚡ Universal Workflow Orchestration Platform — Code in any language, run anywhere. 800+ plugins for data, infrastructure, and AI automation. (翻译：kestra 是一个无限可扩展的开源编排和调度平台，可以创建、运行、调度和监控数百万个复杂的管道。)

12

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace. (翻译：以数据为中心的 FinGPT。为开放金融开源！革新🔥我们很快就会发布经过训练的模型。)

13

所有小初高、大学PDF教材。

14

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows (翻译：Airflow 是一个以编程方式编写、安排和监控工作流的平台。)

15

Wazuh - The Open Source Security Platform. Unified XDR and SIEM protection for endpoints and cloud workloads. (翻译：Wazuh 是一个免费的开源平台，用于威胁预防、检测和响应。它能够保护本地、虚拟化、容器化和基于云的环境中的工作负载。)