Using Teradata with FEAST
Author: Mohammmad Taha Wahab , Mohammad Harris Mansur and Will Fleury
Last updated: January 5th, 2023
Introduction
Feast’s connector for Teradata is a complete implementation with support for all features and uses teradata as an online and offline store.
Prerequisites
You need access to a Teradata Vantage instance.
If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com. |
Overview
This how-to assumes you know Feast terminology. If you need a refresher check out the official FEAST documentation
This document demonstrates how developers can integrate Teradata’s offline and online store
with Feast. Teradata’s offline stores allow users to use any underlying data store as their offline feature store. Features can be retrieved from the offline store for model training and can be materialized into the online feature store for use during model inference.
On the other hand, online stores are used to serve features at low latency. The materialize
command can be used to load feature values from the data sources (or offline stores) into the online store
The feast-teradata
library adds support for Teradata as
-
OfflineStore
-
OnlineStore
Additionally, using Teradata as the registry (catalog) is already supported via the registry_type: sql
and included in our examples. This means that everything is located in Teradata. However, depending on the requirements, installation, etc, this can be mixed and matched with other systems as appropriate.
Getting Started
To get started, install the feast-teradata
library
pip install feast-teradata
Let’s create a simple feast setup with Teradata using the standard drivers' dataset. Note that you cannot use feast init
as this command only works for templates that are part of the core feast library. We intend on getting this library merged into feast core eventually but for now, you will need to use the following cli command for this specific task. All other feast
cli commands work as expected.
feast-td init-repo
This will then prompt you for the required information for the Teradata system and upload the example dataset. Let’s assume you used the repo name demo
when running the above command. You can find the repository files along with a file called test_workflow.py
. Running this test_workflow.py
will execute a complete workflow for the feast with Teradata as the Registry, OfflineStore, and OnlineStore.
demo/
feature_repo/
driver_repo.py
feature_store.yml
test_workflow.py
From within the demo/feature_repo
directory, execute the following feast command to apply (import/update) the repo definition into the registry. You will be able to see the registry metadata tables in the teradata database after running this command.
feast apply
To see the registry information in the feast UI, run the following command. Note the --registry_ttl_sec is important as by default it polls every 5 seconds.
feast ui --registry_ttl_sec=120
Offline Store Config
project: <name of project>
registry: <registry>
provider: local
offline_store:
type: feast_teradata.offline.teradata.TeradataOfflineStore
host: <db host>
database: <db name>
user: <username>
password: <password>
log_mech: <connection mechanism>
Repo Definition
Below is an example of definition.py which elaborates how to set the entity, source connector, and feature view.
Now to explain the different components:
-
TeradataSource:
Data Source for features stored in Teradata (Enterprise or Lake) or accessible via a Foreign Table from Teradata (NOS, QueryGrid) -
Entity:
A collection of semantically related features -
Feature View:
A feature view is a group of feature data from a specific data source. Feature views allow you to consistently define features and their data sources, enabling the reuse of feature groups across a project
driver = Entity(name="driver", join_keys=["driver_id"])
project_name = yaml.safe_load(open("feature_store.yaml"))["project"]
driver_stats_source = TeradataSource(
database=yaml.safe_load(open("feature_store.yaml"))["offline_store"]["database"],
table=f"{project_name}_feast_driver_hourly_stats",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)
driver_stats_fv = FeatureView(
name="driver_hourly_stats",
entities=[driver],
ttl=timedelta(weeks=52 * 10),
schema=[
Field(name="driver_id", dtype=Int64),
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int64),
],
source=driver_stats_source,
tags={"team": "driver_performance"},
)
Offline Store Usage
There are two different ways to test your offline store as explained below. But first, there are a few mandatory steps to follow:
Now, let’s batch-read some features for training, using only entities (population) for which we have seen an event in the last 60
days. The predicates (filter) used can be on anything relevant for the entity (population) selection for the given training dataset. The event_timestamp
is only for example purposes.
from feast import FeatureStore
store = FeatureStore(repo_path="feature_repo")
training_df = store.get_historical_features(
entity_df=f"""
SELECT
driver_id,
event_timestamp
FROM demo_feast_driver_hourly_stats
WHERE event_timestamp BETWEEN (CURRENT_TIMESTAMP - INTERVAL '60' DAY) AND CURRENT_TIMESTAMP
""",
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips"
],
).to_df()
print(training_df.head())
The feast-teradata
library allows you to use the complete set of feast APIs and functionality. Please refer to the official feast quickstart for more details on the various things you can do.
Online Store
Feast materializes data to online stores for low-latency lookup at model inference time. Typically, key-value stores are used for online stores, however, relational databases can be used for this purpose as well.
Users can develop their own online stores by creating a class that implements the contract in the OnlineStore class.
Online Store Config
project: <name of project>
registry: <registry>
provider: local
offline_store:
type: feast_teradata.offline.teradata.TeradataOfflineStore
host: <db host>
database: <db name>
user: <username>
password: <password>
log_mech: <connection mechanism>
Online Store Usage
There are a few mandatory steps to follow before we can test the online store:
The command materialize_incremental
is used to incrementally materialize features in the online store. If there are no new features to be added, this command will essentially not be doing anything. With feast materialize_incremental
, the start time is either now — ttl (the ttl that we defined in our feature views) or the time of the most recent materialization. If you’ve materialized features at least once, then subsequent materializations will only fetch features that weren’t present in the store at the time of the previous materializations.
CURRENT_TIME=$(date +'%Y-%m-%dT%H:%M:%S')
feast materialize-incremental $CURRENT_TIME
Next, while fetching the online features, we have two parameters features
and entity_rows
. The features
parameter is a list and can take any number of features that are present in the df_feature_view
. The example above shows all 4 features present but these can be less than 4 as well. Secondly, the entity_rows
parameter is also a list and takes a dictionary of the form {feature_identifier_column: value_to_be_fetched}
. In our case, the column driver_id is used to uniquely identify the different rows of the entity driver. We are currently fetching values of the features where driver_id is equal to 5. We can also fetch multiple such rows using the format: [{driver_id: val_1}, {driver_id: val_2}, .., {driver_id: val_n}] [{driver_id: val_1}, {driver_id: val_2}, .., {driver_id: val_n}]
entity_rows = [
{
"driver_id": 1001,
},
{
"driver_id": 1002,
},
]
features_to_fetch = [
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:avg_daily_trips"
]
returned_features = store.get_online_features(
features=features_to_fetch,
entity_rows=entity_rows,
).to_dict()
for key, value in sorted(returned_features.items()):
print(key, " : ", value)
How to set SQL Registry
Another important thing is the SQL Registry. We first make a path variable that uses the username, password, database name, etc. to make a connection string which it then uses to establish a connection to Teradata’s Database.
path = 'teradatasql://'+ teradata_user +':' + teradata_password + '@'+host + '/?database=' + teradata_database + '&LOGMECH=' + teradata_log_mech
It will create the following table in your database:
-
Entities (entity_name,project_id,last_updated_timestamp,entity_proto)
-
Data_sources (data_source_name,project_id,last_updated_timestamp,data_source_proto)
-
Feature_views (feature_view_name,project_id,last_updated_timestamp,materialized_intervals,feature_view_proto,user_metadata)
-
Request_feature_views (feature_view_name,project_id,last_updated_timestamp,feature_view_proto,user_metadata)
-
Stream_feature_views (feature_view_name,project_id,last_updated_timestamp,feature_view_proto,user_metadata)
-
managed_infra (infra_name, project_id, last_updated_timestamp, infra_proto)
-
validation_references (validation_reference_name, project_id, last_updated_timestamp, validation_reference_proto)
-
saved_datasets (saved_dataset_name, project_id, last_updated_timestamp, saved_dataset_proto)
-
feature_services (feature_service_name, project_id, last_updated_timestamp, feature_service_proto)
-
on_demand_feature_views (feature_view_name, project_id, last_updated_timestamp, feature_view_proto, user_metadata)
Additionally, if you want to see a complete (but not real-world), end-to-end example workflow example, see the demo/test_workflow.py
script. This is used for testing the complete feast functionality.
An Enterprise Feature Store accelerates the value-gaining process in crucial stages of data analysis. It enhances productivity and reduces the time taken to introduce products in the market. By integrating Teradata with Feast, it enables the use of Teradata’s highly efficient parallel processing within a Feature Store, thereby enhancing performance.
Further reading
If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members. |