python-How to upload my training data into google for Tensorflow cloud training

Borislav Stoilov 2020-12-19 00:21:57

Turns out it was a problem with my GCP configuration Here are the steps I made to make it work:

Create an s3 bucket and make all files inside it public so the train job can access them
Include these two in the requirements fsspec and gcsfs
remove the 'engine' parameter from panda.readCsv like so

dataset = pandas.read_csv('gs:///USDJPY.fx5.csv', usecols=[2, 3, 4, 5])

Since you are uploading the python file to GCP a good way to organize your code it to put all of the training logic into a method and then called it conditionally on the cloud train flag:

if tfc.remote():
    train()

Here is the whole working code if someone is interested

import pandas
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from sklearn.preprocessing import MinMaxScaler
import tensorflow_cloud as tfc
import os

os.environ["PATH"] = os.environ["PATH"] + ":<path to google-cloud-sdk/bin"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "<path to google credentials json (you can generate this through their UI"


def create_dataset(data):
    dataX = data[0:len(data) - 1]
    dataY = data[1:]
    return numpy.array(dataX), numpy.array(dataY)

def train():
    dataset = pandas.read_csv('gs://<bucket>/USDJPY.fx5.csv', usecols=[2, 3, 4, 5])

    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(dataset)

    dataset = scaler.transform(dataset)

    # split into train and test sets
    train_size = int(len(dataset) * 0.67)
    train, test = dataset[0:train_size], dataset[train_size:len(dataset)]

    trainX, trainY = create_dataset(train)

    trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

    model = Sequential()
    model.add(LSTM(4, input_shape=(1, 4)))
    model.add(Dropout(0.2))
    model.add(Dense(4))
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(trainX, trainY, epochs=1000, verbose=1)


job_labels = {"job": "forex-usdjpy", "team": "zver", "user": "zver1"}
tfc.run(requirements_txt="./requirements.txt",
        job_labels=job_labels,
        stream_logs=True
        )

if tfc.remote():
    train()

NOTE: This is probably not an optimal LSTM config, take it with a grain of salt

How to upload my training data into google for Tensorflow cloud training

热门帖子

热门github