Use Vantage from a Jupyter notebook

Author: Adam Tworkiewicz
Last updated: November 10th, 2022

This how-to shows you how to add Teradata Extensions to a Jupyter Notebooks environment. A hosted version of Jupyter Notebooks integrated with Teradata Extensions and analytics tools is available for functional testing for free at https://clearscape.teradata.com.

Overview

In this how-to we will go through the steps for connecting to Teradata Vantage from a Jupyter notebook.

If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com.

Options

There are a couple of ways to connect to Vantage from a Jupyter Notebook:

Use python or R libraries in a regular Python/R kernel notebook - this option works well when you are in a restricted environment that doesn’t allow you to spawn your own Docker images. Also, it’s useful in traditional datascience scenarios when you have to mix SQL and Python/R in a notebook. If you are proficient with Jupyter and have your own set of preferred libraries and extensions, start with this option.
Use the Teradata Jupyter Docker image - the Teradata Jupyter Docker image bundles the Teradata SQL kernel (more on this later), teradataml and tdplyr libraries, python and R drivers. It also contains Jupyter extensions that allow you to manage Teradata connections, explore objects in Vantage database. It’s convenient when you work a lot with SQL or would find a visual Navigator helpful. If you are new to Jupyter or if you prefer to get a currated assembly of libraries and extensions, start with this option.

Teradata libraries

This option uses a regular Jupyter Lab notebook. We will see how to load the Teradata Python driver and use it from Python code. We will also examine ipython-sql extension that adds support for SQL-only cells.

We start with a plain Jupyter Lab notebook. Here, I’m using docker but any method of starting a notebook, including Jupyter Hub, Google Cloud AI Platform Notebooks, AWS SageMaker Notebooks, Azure ML Notebooks will do.
```
docker run --rm -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes \
  -v "${PWD}":/home/jovyan/work jupyter/datascience-notebook
```

Docker logs will display the url that you need to go to:

Entered start.sh with args: jupyter lab
Executing the command: jupyter lab
....
To access the server, open this file in a browser:
    file:///home/jovyan/.local/share/jupyter/runtime/jpserver-7-open.html
Or copy and paste one of these URLs:
    http://d5c2323ae5db:8888/lab?token=5fb43e674367c6895e8c2404188aa550b5c7bdf96f5b4a3a
  or http://127.0.0.1:8888/lab?token=5fb43e674367c6895e8c2404188aa550b5c7bdf96f5b4a3a

We will open a new notebook and create a cell to install the required libraries:

I’ve published a notebook with all the cells described below on GitHub: https://github.com/Teradata/quickstarts/blob/main/modules/ROOT/attachments/vantage-with-python-libraries.ipynb
```
import sys
!{sys.executable} -m pip install teradatasqlalchemy
```

Now, we will import Pandas and define the connection string to connect to Teradata. Since I’m running my notebook in Docker on my local machine and I want to connect to a local Vantage Express VM, I’m using host.docker.internal DNS name provided by Docker to reference the IP of my machine.

import pandas as pd
# Define the db connection string. Pandas uses SQLAlchemy connection strings.
# For Teradata Vantage, it's teradatasql://username:password@host/database_name .
# See https://pypi.org/project/teradatasqlalchemy/ for details.
db_connection_string = "teradatasql://dbc:dbc@host.docker.internal/dbc"

I can now call Pandas to query Vantage and move the result to a Pandas dataframe:
```
pd.read_sql("SELECT * FROM dbc.dbcinfo", con = db_connection_string)
```
The syntax above is concise but it can get tedious if all you need is to explore data in Vantage. We will use ipython-sql and its %%sql magic to create SQL-only cells. We start with importing the required libraries.
```
import sys
!{sys.executable} -m pip install ipython-sql teradatasqlalchemy
```

We load ipython-sql and define the db connection string:

%load_ext sql
# Define the db connection string. The sql magic uses SQLAlchemy connection strings.
# For Teradata Vantage, it's teradatasql://username:password@host/database_name .
# See https://pypi.org/project/teradatasqlalchemy/ for details.
%sql teradatasql://dbc:dbc@host.docker.internal/dbc

We can now use %sql and %%sql magic. Let’s say we want to explore data in a table. We can create a cell that says:
```
%%sql
SELECT * FROM dbc.dbcinfo
```

If we want to move the data to a Pandas frame, we can say:

result = %sql SELECT * FROM dbc.dbcinfo
result.DataFrame()

There are many other features that ipython-sql provides, including variable substitution, plotting with matplotlib, writting results to a local csv file or back to the database. See the demo notebook for examples and ipython-sql github repo for a complete reference.

Teradata Jupyter Docker image

The Teradata Jupyter Docker image builds on jupyter/datascience-notebook Docker image. It adds the Teradata SQL kernel, Teradata Python and R libraries, Jupyter extensions to make you productive while interacting with Teradata Vantage. The image also contains sample notebooks that demonstrate how to use the SQL kernel and Teradata libraries.

The SQL kernel and Teradata Jupyter extensions are useful for people that spend a lot of time with the SQL interface. Think about it as a notebook experience that, in many cases, is more convenient than using Teradata Studio. The Teradata Jupyter Docker image doesn’t try to replace Teradata Studio. It doesn’t have all the features. It’s designed for people who need a lightweight, web-based interface and enjoy the notebook UI.

The Teradata Jupyter Docker image can be used when you want to run Jupyter locally or you have a place where you can run custom Jupyter docker images. The steps below demonstrate how to use the image locally.

Run the image:

By passing -e "accept_license=Y you accept the license agreement for Teradata Jupyter Extensions.

docker volume create notebooks
docker run -e "accept_license=Y" -p :8888:8888 \
  -v notebooks:/home/jovyan/JupyterLabRoot \
  teradata/jupyterlab-extensions

Docker logs will display the url that you need to go to. For example, this is what I’ve got:

Starting JupyterLab ...
Docker Build ID = 3.2.0-ec02012022
Using unencrypted HTTP

Enter this URL in your browser:  http://localhost:8888?token=96a3ab874a03779c400966bf492fe270c2221cdcc74b61ed

* Or enter this token when prompted by Jupyter: 96a3ab874a03779c400966bf492fe270c2221cdcc74b61ed
* If you used a different port to run your Docker, replace 8888 with your port number

Open up the URL and use the file explorer to open the following notebook: jupyterextensions → notebooks → sql → GettingStartedDemo.ipynb.
Go through the demo of the Teradata SQL Kernel:

Summary

This quick start covered different options to connect to Teradata Vantage from a Jupyter Notebook. We learned about the Teradata Jupyter Docker image that bundles multiple Teradata Python and R libraries. It also provides an SQL kernel, database object explorer and connection management. These features are useful when you spend a lot of time with the SQL interface. For more traditional data science scenarios, we explored the standalone Teradata Python driver and integration through the ipython sql extension.