Warm tip: This article is reproduced from serverfault.com, please click

Upload multiple pandas dataframe as single excel file with multiple sheets to Google Cloud Storage

发布于 2020-05-08 14:15:18

I am new to Google Cloud Storage. In my python code, I have couple of Dataframes and I want to store them in a GCS bucket as a single excel file with multiple sheets. In local directory, I am able to do that with using ExcelWriter. Here is the code for that

writer = pd.ExcelWriter(filename)
dataframe1.to_excel(writer, 'sheet1', index=False)
dataframe2.to_excel(writer, 'sheet2', index=False)
writer.save()

I don't want to save a temp file in local directory and then upload it to GCS.

Questioner
Nishant Igave
Viewed
0
Sarath Gadde 2020-11-30 17:19:23

You can instantiate your ExcelWriter() with engine=xlsxwriter and use fs-gcsfs to write the bytes array to excel file on your GCS bucket.

In your case you can do the following:

import io
import pandas as pd
from fs_gcsfs import GCSFS

gcsfs = GCSFS(bucket_name='name_of_your_bucket',
                      root_path='path/to/excel', 
#set a different root path if you wish to upload multiple files in different locations
                      strict=False)
gcsfs.fix_storage()

output = io.BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')

dataframe1.to_excel(writer, sheet_name='sheet1', index=False)
dataframe2.to_excel(writer, sheet_name='sheet2', index=False)

writer.save()
xlsx_data = output.getvalue()

with gcsfs.open('./excel_file.xlsx', 'wb') as f:
  f.write(xlsx_data) 

PS: I had to use strict=False as fs-gcsfs wasn't able to locate the root path (Do check the limitations section in the documentation for fs-gcsfs)

Source: https://xlsxwriter.readthedocs.io/working_with_pandas.html#saving-the-dataframe-output-to-a-string