Warm tip: This article is reproduced from serverfault.com, please click

Load a pandas table to dynamoDb

发布于 2020-11-10 17:14:58

I am trying to load a big Pandas table to dynamoDB.

I have tried the for loop method as follow

for k in range(1000):
    trans = {}
    trans['Director'] = DL_dt['director_name'][k]
    trans['Language'] = DL_dt['original_language'][k]
    print("add :", DL_dt['director_name'][k] , DL_dt['original_language'][k])
    table.put_item(Item=trans)

it works but it's very time consuming. Is there a faster way to load it ? (equivalent of to_sql for sql database)

I've found the batchwriteitem function but i am not sure it works and i don't know exactly how to use it.

Thanks a lot.

Questioner
jpetot
Viewed
0
Leon Moya 2020-12-03 10:06:36

You can iterate over the dataframe rows, transform each row to json and then convert it to a dict using json.loads, this will also avoid the numpy data type errors.

you can try this:

import json
from decimal import Decimal
DL_dt = DL_dt.rename(columns={
    'director_name': 'Director',
    'original_language': 'Language'
})
with table.batch_writer() as batch:
    for index, row in DL_dt.iterrows():
        batch.put_item(json.loads(row.to_json(), parse_float=Decimal))