Register and call remote AI models in AlloyDB overview

This page describes key concepts that you must know before registering an AI model endpoint and invoking predictions with Model endpoint management.

To register remote model endpoints with AlloyDB Omni, see Register and call remote AI models in AlloyDB Omni.

Overview

Model endpoint management is an AlloyDB AI feature that includes functions and operators that help you register and manage AI model metadata. You can register a model endpoint, manage model endpoint metadata in your database cluster, and make calls to the remote model endpoints using SQL queries.

Model endpoint management provides the google_ml_integration extension that includes functions that let you register the metadata related to AI models with AlloyDB. This registered metadata is used to generate vector embeddings or invoke predictions.

AlloyDB AI query engine is a suite of functions that build on model endpoint management (Preview), and adds support for AI operators that let you combine natural language phrases with SQL queries, like ai.if() for filters and joins, ai.rank() for ordering, and ai.generate() for generating summaries of your data. It also adds support for Vertex AI multimodal and ranking models.

Some of the example model types that you can register using model endpoint management are as follows:

  • Vertex AI text embedding and generic models
  • Vertex AI Multimodal model (Preview)
  • Vertex AI ranking models (Preview)
  • Embedding models provided by third-party providers, such as Hugging Face or OpenAI
  • Custom-hosted text embedding models, including self-hosted models or models available through private endpoints
  • Generic models with a JSON-based API—for example, facebook/bart-large-mnli model hosted on Hugging Face, gemini-pro model from the Vertex AI Model Garden, or claude models by Anthropic

Use cases

You can call the registered model endpoints to interact with existing data in your database to generate embeddings or predictions. Some application use cases are as follows:

  • Real-time inference with transaction application: provides real-time recommendations based on the user's current browsing history and in-cart content.
  • Identify sentiment and generate summaries: for a database of customer reviews, generates summaries or identify the key sentiment for each review.
  • Intelligent search and retrieval systems: build search systems for a database of internal knowledge base, and use natural language in SQL operators instead of keywords.
  • Personalized user experiences: optimize a content platform to dynamically personalize what content is displayed to each user based on their past interactions.

For more information about AlloyDB AI use cases, see AlloyDB AI use cases.

How it works

You can use model endpoint management to register a model endpoint that complies to the following:

  • Model input and output supports JSON format.
  • Model can be called using the REST protocol.

When you register a model endpoint with the model endpoint management, it registers each endpoint with a unique model ID that you provided as a reference to the model.

You can use the model endpoint ID to query models to do the following:

  • Generate embeddings to translate text prompts to numerical vectors. You can store generated embeddings as vector data when the vector extension is enabled in the database. For more information, see Query and index embeddings with pgvector.

  • Generate multimodal embeddings to translate multimodal data such as text, images, and videos to embeddings. (Preview)

  • Rank or score a list of items in a query based on a criteria stated using natural language. (Preview)

  • Invoke predictions using SQL.

Key concepts

Before you start using model endpoint management, understand the concepts required to connect to and use the models.

Schemas

Your applications can access model endpoint management using the google_ml_integration extension. The google_ml_integration extension includes functions in public, google_ml, and ai schema. All the functions are included in the google_ml schema, and certain functions are available in the public and ai schemas.

For more information about schemas, see Schemas.

Model provider

Model provider indicates the supported model hosting providers. Setting the model provider is optional, but helps model endpoint management by identifying the provider, and automatically formatting headers for supported models.

For more information about model provider, see Model provider.

Model type

Model type indicates the type of the AI model. The extension supports text embedding as well as any generic model type. The supported model type you can set when registering a model endpoint are text-embedding and generic.

Setting model type is optional when registering generic model endpoints as generic is the default model type.

For more information about model type, see Model type.

Authentication

Auth types indicate the authentication type that you can use to connect to the model endpoint management using the google_ml_integration extension. Setting authentication is optional and is required only if you need to authenticate to access your model.

For more information about authentication, see Authentication.

Prediction functions

Prediction functions are SQL functions that let you interact with AI models from within your AlloyDB database. These functions let you use standard SQL queries to send data to a model endpoint and generate embeddings or predictions.

For more information about prediction functions, see Prediction functions.

Operator functions

The google_ml_integration extension includes the following operator functions, which use default Gemini to use natural language in SQL operators.

For more information about operator functions, see Operator functions.

Transform functions

Transform functions modify the input to a format that the model understands, and converts the model response to the format that the prediction function expects. The transform functions are used when registering the text-embedding model endpoint without built-in support. The signature of the transform functions depends on the input expected by the model.

For more information about transform functions, see Transform functions.

HTTP header generation function

The HTTP header generation function generates the output in JSON key value pairs that are used as HTTP headers. The signature of the prediction function defines the signatures of the header generation function.

For more information about HTTP header generation function, see HTTP header generation function.

What's next