Train ML models in Vantage
Author: Adam Tworkiewicz
Last updated: September 12th, 2021
Overview
There are situations when you want to quickly validate a machine learning model idea. You have a model type in mind. You don’t want to operationalize with an ML pipeline just yet. You just want to test out if the relationship you had in mind exists. Also, sometimes even your production deployment doesn’t require constant relearning with MLops. In such cases, you can use Vantage Analytics Library (VAL) and multiple ML model types it supports.
Prerequisites
You need access to a Teradata Vantage instance.
If you need a new instance of Vantage, you can install a free version called Vantage Express in the cloud on Google Cloud, Azure, and AWS. You can also run Vantage Express on your local machine using VMware, VirtualBox, or UTM. |
Install Vantage Analytics Library
Support for ML in Vantage requires Vantage Analytics Library (VAL). In this section, we will install VAL and load some sample data.
-
VAL is distributed as an rpm file. Go to Teradata Downloads and download the VAL rpm to your local machine.
-
Upload the file to your Vantage install. If you are running Vantage Express locally, you have many ways to do it:
-
If you installed Vantage Express on VirtualBox, you should be able to drag & drop the file to the VM’s desktop. You can also use
scp
by connecting to port4422
, e.g.:scp -P 4422 ~/Downloads/VAL-2.0.0.3-1.x86_64.rpm root@localhost:/root/Desktop
-
If you use VMware and you have enabled drag & drop, you should be able to drag and drop the file to the VM’s desktop.
-
If you have SSH access to your Vantage nodes, you can use
scp
to upload the binary, e.g.:scp ~/Downloads/VAL-2.0.0.3-1.x86_64.rpm root@vantage.server.name:/tmp/
-
-
We will now create a new database where VAL functions and procedures will be installed. You could install VAL in a global location such as
SYSLIB
, but installing VAL in a specific database will make it easier to start over if things go wrong. Let’s create a database calledval
and grant appropriate permissions to our user. Please edit to match your database name and user id:CREATE DATABASE val AS PERMANENT = 60e6, -- 60MB SPOOL = 120e6; -- 120MB GRANT CREATE FUNCTION ON val to dbc; GRANT ALTER FUNCTION ON val to dbc; GRANT EXECUTE PROCEDURE on SQLJ.REMOVE_JAR to dbc; GRANT EXECUTE PROCEDURE on SQLJ.INSTALL_JAR to dbc; GRANT EXECUTE PROCEDURE on SQLJ.REPLACE_JAR to dbc; GRANT CREATE EXTERNAL PROCEDURE ON val to dbc;
-
Open Gnome Terminal in the VM and start the installation process. Adjust the rpm path as necessary:
rpm -Uvh --nodeps ~/Desktop/VAL-2.0.0.3-1.x86_64.rpm
-
The install wizard will ask you for the hostname, user id, and password. If you are running the install on your Vantage Express VM, the values are:
-
Hostname:
localhost
-
Userid:
dbc
-
Password:
dbc
-
Account string: leave empty, press ENTER
-
BTEQ or FASTLOAD command: leave empty, press ENTER
-
-
The wizard will ask you to choose which part of VAL you want to install. We want to start with installing
td_analyze
procedure, i.e. option 1. Once you select option 1, the script will ask for the database name wheretd_analyze
will be installed. Enterval
and press ENTER. -
While still in the wizard, install option 5, i.e.
Tutorial Tables
. These are sample tables with data that we are going to use to build a sample model.
Sample data
Now, that we have VAL and sample tables loaded, let’s explore the data. It’s a simplistic, fictitious dataset of banking customers (1K-ish rows), Accounts (10K-ish rows) and Transactions (100K-ish rows). They are related to each other in the following ways:
In later parts of this how-to we are going to explore if we can build a model that predicts average monthly balance that a banking customer has on their credit card based on all non-credit card related variables in the tables.
Create a linear regression model
Let’s start by creating a wide table (Analytic Data Set, or ADS) that joins the three tables above.
You must have CREATE TABLE permissions on the Database where the Vantage Analytic Library is installed. |
-- Switch to val database.
DATABASE val;
-- Create the ADS.
CREATE TABLE VAL_ADS AS (
SELECT
T1.cust_id AS cust_id
,MIN(T1.income) AS tot_income
,MIN(T1.age) AS tot_age
,MIN(T1.years_with_bank) AS tot_cust_years
,MIN(T1.nbr_children) AS tot_children
,CASE WHEN MIN(T1.marital_status) = 1 THEN 1 ELSE 0 END AS single_ind
,CASE WHEN MIN(T1.gender) = 'F' THEN 1 ELSE 0 END AS female_ind
,CASE WHEN MIN(T1.marital_status) = 2 THEN 1 ELSE 0 END AS married_ind
,CASE WHEN MIN(T1.marital_status) = 3 THEN 1 ELSE 0 END AS separated_ind
,MAX(CASE WHEN T1.state_code = 'CA' THEN 1 ELSE 0 END) AS ca_resident_ind
,MAX(CASE WHEN T1.state_code = 'NY' THEN 1 ELSE 0 END) AS ny_resident_ind
,MAX(CASE WHEN T1.state_code = 'TX' THEN 1 ELSE 0 END) AS tx_resident_ind
,MAX(CASE WHEN T1.state_code = 'IL' THEN 1 ELSE 0 END) AS il_resident_ind
,MAX(CASE WHEN T1.state_code = 'AZ' THEN 1 ELSE 0 END) AS az_resident_ind
,MAX(CASE WHEN T1.state_code = 'OH' THEN 1 ELSE 0 END) AS oh_resident_ind
,MAX(CASE WHEN T2.acct_type = 'CK' THEN 1 ELSE 0 END) AS ck_acct_ind
,MAX(CASE WHEN T2.acct_type = 'SV' THEN 1 ELSE 0 END) AS sv_acct_ind
,MAX(CASE WHEN T2.acct_type = 'CC' THEN 1 ELSE 0 END) AS cc_acct_ind
,AVG(CASE WHEN T2.acct_type = 'CK' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS ck_avg_bal
,AVG(CASE WHEN T2.acct_type = 'SV' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS sv_avg_bal
,AVG(CASE WHEN T2.acct_type = 'CC' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS cc_avg_bal
,AVG(CASE WHEN T2.acct_type = 'CK' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS ck_avg_tran_amt
,AVG(CASE WHEN T2.acct_type = 'SV' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS sv_avg_tran_amt
,AVG(CASE WHEN T2.acct_type = 'CC' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS cc_avg_tran_amt
,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 1 THEN T3.tran_id ELSE NULL END) AS q1_trans_cnt
,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 2 THEN T3.tran_id ELSE NULL END) AS q2_trans_cnt
,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 3 THEN T3.tran_id ELSE NULL END) AS q3_trans_cnt
,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 4 THEN T3.tran_id ELSE NULL END) AS q4_trans_cnt
FROM Customer AS T1
LEFT OUTER JOIN Accounts AS T2
ON T1.cust_id = T2.cust_id
LEFT OUTER JOIN Transactions AS T3
ON T2.acct_nbr = T3.acct_nbr
GROUP BY T1.cust_id) WITH DATA UNIQUE PRIMARY INDEX (cust_id);
We will now build a linear regression model that takes parameters from the dataset and tries to predict the monthly credit card balance.
We call td_analyze
and tell it we want a linear model. The input is in table VAL_ADS
and consists of multiple columns. The dependent variable is cc_avg_bal
. We want the model to be written to val
database in table called LINEAR_REGRESSION_DEMO
:
call td_analyze('linear',
'database=val;
tablename=VAL_ADS;
columns=tot_age,tot_income,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ck_acct_ind,sv_acct_ind,sv_avg_bal,ck_avg_bal,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind;
dependent=cc_avg_bal;
outputdatabase=val;
outputtablename=linear_regression_demo');
The procedure creates several output tables. For now, we don’t have to analyze what is in the tables. Let’s see how we can use the newly created model to perform scoring.
Scoring
Let’s use the model to perform predictions and evaluate the scores. To do this, we call td_analyze
with linearscore
parameter. We point to the input table (VAL_ADS
), the model tables (prefix linear_regression_demo
) and define the target table (linear_regression_score
) in val
database:
call td_analyze('linearscore',
'database=val;
tablename=VAL_ADS;
modeldatabase=val;
modeltablename=linear_regression_demo;
outputdatabase=val;
outputtablename=linear_regression_score;
predicted=estimate;
retain=cc_avg_bal;
scoringmethod=scoreandevaluate;');
As a result, we get linear_regression_score
table that contains the real balance, the predicted balance and the difference between these two. Let’s have a look at a sample:
SELECT * FROM linear_regression_score SAMPLE 10;
You will see results similar to:
cust_id|cc_avg_bal |estimate |Residual | -------+------------------+------------------+-------------------+ 1362498| 0.0| 284.7057772484358| -284.7057772484358| 1362828| 1184.35|463.74177458594215| 720.6082254140578| 1362839| 2933.135802469136| 982.9240031182255| 1950.2117993509103| 1362986| 500.9148148148148| 881.4116539412856| -380.4968391264708| 1362511|235.85941489361701|294.35369563202846|-58.494280738411426| 1363134| 0.0|430.27950420065997|-430.27950420065997| 1363481| 0.0| 411.2359958542745| -411.2359958542745| 1362644| 209.3304347826087|279.75770904482033| -70.42727426221163| 1363141| 0.0| 550.1681921045503| -550.1681921045503| 1363290| 0.0|120.35348558871233|-120.35348558871233|
Summary
In this quick start we have learned how to create ML models in SQL. The method used Vantage Analytics Library (VAL). We were able to build a linear regression model and run predictions using the model. We have done that using SQL without any coding.
Further reading
If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members. |