Warm tip: This article is reproduced from stackoverflow.com, please click
amazon-cloudformation amazon-web-services aws-glue jupyter-notebook

AWS: cloudformation to create Glue jupyter notebook and dev endpoint

发布于 2020-05-01 17:40:41

Looking through the cloud formation documentation I can't see a way to spin up a Glue DevEndpoint, a Jupyter notebook and have the notebook use the newly created DevEndpoint.

Can someone help?

Questioner
smiron
Viewed
28
smiron 2020-02-21 23:59

I've found the solution and the key is using the cloud formation object AWS::SageMaker::NotebookInstanceLifecycleConfig to hook into the notebook OnStart and OnCreate notebook events.

Notebooks created in the Glue portion of the console can be found in SageMaker and there you can see the LifecycleConfig resource associated and it's code.

For completeness on this question please see below the code that is used at the moment for both OnStart and OnCreate when you create a Jupyter notebook from Glue.

Please note that by using this method the newly created notebook has the exact functionality of the notebook created through the console but it will only be visible in the SageMaker portion of the console.

#!/bin/bash
set -ex
[ -e /home/ec2-user/glue_ready ] && exit 0

mkdir -p /home/ec2-user/glue
cd /home/ec2-user/glue

# Write dev endpoint in a file which will be used by daemon scripts
glue_endpoint_file="/home/ec2-user/glue/glue_endpoint.txt"

if [ -f $glue_endpoint_file ] ; then
    rm $glue_endpoint_file
fi
echo "https://glue.eu-west-2.amazonaws.com" >> $glue_endpoint_file

ASSETS=s3://aws-glue-jes-prod-eu-west-2-assets/sagemaker/assets/

aws s3 cp ${ASSETS} . --recursive

bash "/home/ec2-user/glue/Miniconda2-4.5.12-Linux-x86_64.sh" -b -u -p "/home/ec2-user/glue/miniconda"

source "/home/ec2-user/glue/miniconda/bin/activate"

tar -xf autossh-1.4e.tgz
cd autossh-1.4e
./configure
make
sudo make install
sudo cp /home/ec2-user/glue/autossh.conf /etc/init/

mkdir -p /home/ec2-user/.sparkmagic
cp /home/ec2-user/glue/config.json /home/ec2-user/.sparkmagic/config.json

mkdir -p /home/ec2-user/SageMaker/Glue\ Examples
mv /home/ec2-user/glue/notebook-samples/* /home/ec2-user/SageMaker/Glue\ Examples/

# ensure SageMaker notebook has permission for the dev endpoint
aws glue get-dev-endpoint --endpoint-name somiron-dfe-poc-GlueDevEndpoint --endpoint https://glue.eu-west-2.amazonaws.com

# Run daemons as cron jobs and use flock make sure that daemons are started only iff stopped
(crontab -l; echo "* * * * * /usr/bin/flock -n /tmp/lifecycle-config-v2-dev-endpoint-daemon.lock /usr/bin/sudo /bin/sh /home/ec2-user/glue/lifecycle-config-v2-dev-endpoint-daemon.sh") | crontab -

(crontab -l; echo "* * * * * /usr/bin/flock -n /tmp/lifecycle-config-reconnect-dev-endpoint-daemon.lock /usr/bin/sudo /bin/sh /home/ec2-user/glue/lifecycle-config-reconnect-dev-endpoint-daemon.sh") | crontab -

source "/home/ec2-user/glue/miniconda/bin/deactivate"

rm -rf "/home/ec2-user/glue/Miniconda2-4.5.12-Linux-x86_64.sh"

sudo touch /home/ec2-user/glue_ready