Integrate Teradata Jupyter extensions with SageMaker notebook instance
Author: Hailing Jiang
Last updated: September 27th, 2022
This how-to shows you how to add Teradata Extensions to a Jupyter Notebooks environment. A hosted version of Jupyter Notebooks integrated with Teradata Extensions and analytics tools is available for functional testing for free at https://clearscape.teradata.com. |
Overview
Teradata Jupyter extensions provide Teradata SQL kernel and several UI extensions to allow users to easily access and navigate Teradata database from Jupyter envioronment. This article describes how to integate our Jupyter extensions with SageMaker notebook instance.
Prerequisites
-
Access to a Teradata Vantage instance
If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com. -
AWS account
-
AWS S3 bucket to store lifecycle configuration scripts and Teradata Jupyter extension package
Integration
SageMaker supports customization of notebook instances using lifecycle configuration scripts. Below we will demo how to use lifecycle configuration scripts to install our Jupyter kernel and extensions in a notebook instance.
Steps to integrate with notebook instance
-
Download Teradata Jupyter extensions package
Download Linux version from https://downloads.teradata.com/download/tools/vantage-modules-for-jupyter and upload it to an S3 bucket. This zipped package contains Teradata Jupyter kernel and extensions. Each extension has 2 files, the one with "_prebuilt" in the name is prebuilt extension which can be installed using PIP, the other one is source extension that needs to be installed using "jupyter labextension". It is recommended to use prebuilt extensions.
-
Create a lifecycle configuration for notebook instance
Here are sample scripts that fetches the Teradata package from S3 bucket and installs Jupyter kernel and extensions. Note that on-create.sh creates a custom conda env that persists on notebook instance’s EBS volume so that the installation will not get lost after notebook restarts. on-start.sh installs Teradata kernel and extensions to the custom conda env.
on-create.sh
#!/bin/bash set -e # This script installs a custom, persistent installation of conda on the Notebook Instance's EBS volume, and ensures # that these custom environments are available as kernels in Jupyter. sudo -u ec2-user -i <<'EOF' unset SUDO_UID # Install a separate conda installation via Miniconda WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda mkdir -p "$WORKING_DIR" wget https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh -O "$WORKING_DIR/miniconda.sh" bash "$WORKING_DIR/miniconda.sh" -b -u -p "$WORKING_DIR/miniconda" rm -rf "$WORKING_DIR/miniconda.sh" # Create a custom conda environment source "$WORKING_DIR/miniconda/bin/activate" KERNEL_NAME="teradatasql" PYTHON="3.8" conda create --yes --name "$KERNEL_NAME" python="$PYTHON" conda activate "$KERNEL_NAME" pip install --quiet ipykernel EOF
on-start.sh
#!/bin/bash set -e # This script installs Teradata Jupyter kernel and extensions. sudo -u ec2-user -i <<'EOF' unset SUDO_UID WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda source "$WORKING_DIR/miniconda/bin/activate" teradatasql # fetch Teradata Jupyter extensions package from S3 and unzip it mkdir -p "$WORKING_DIR/teradata" aws s3 cp s3://sagemaker-teradata-bucket/teradatasqllinux_3.3.0-ec06172022.zip "$WORKING_DIR/teradata" cd "$WORKING_DIR/teradata" unzip -o teradatasqllinux_3.3.0-ec06172022.zip # install Teradata kernel cp teradatakernel /home/ec2-user/anaconda3/condabin jupyter kernelspec install --user ./teradatasql # install Teradata Jupyter extensions source /home/ec2-user/anaconda3/bin/activate JupyterSystemEnv pip install teradata_connection_manager_prebuilt-3.3.0.tar.gz pip install teradata_database_explorer_prebuilt-3.3.0.tar.gz pip install teradata_preferences_prebuilt-3.3.0.tar.gz pip install teradata_resultset_renderer_prebuilt-3.3.0.tar.gz pip install teradata_sqlhighlighter_prebuilt-3.3.0.tar.gz conda deactivate EOF
-
Create a notebook instance. Please select 'Amazon Linux 2, Jupyter Lab3' for Platform identifier and select the lifecycle configuration created in step 2 for Lifecycle configuration.
You might also need to add vpc, subnet and security group in 'Network' section to gain access to Teradata databases.
-
Wait until notebook instance Status turns 'InService', click 'Open JupyterLab' to open the notebook.
Access the demo notebooks to get usage tips
+
Further reading
If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members. |