How LiveRamp’s remote execution feature in Identity Engine helps clients build accurate first-party data graphs securely within their virtual private cloud environments.
As privacy regulations become a greater priority in the US, EU, and other global markets, organizations need better ways to responsibly connect first-party data and reach more consumers. Through the power of data collaboration, organizations can get more value from their first-party data in a responsible way.
LiveRamp enables data collaboration with responsible use of data, and has developed solutions for organizations to analyze sensitive data identifiers, such as governmental IDs, healthcare data, or payment data, within their own data systems. All of this data is essential to build an accurate first-party data graph to fully understand customers and deliver better experiences and business results – but security must be a priority.
LiveRamp has developed the Identity Engine’s Remote Execution feature so customers can securely build a first-party data graph directly within their own virtual private cloud (VPC). This blog provides a brief overview of the architecture, security measures, and processes LiveRamp has implemented to facilitate this customer experience.
Building a first-party data graph in remote VPC
With LiveRamp’s Identity Engine’s Remote Execution feature, customers can build a first-party graph directly within their VPC, ensuring all data processing and storage remain entirely within the customer’s environment without being transferred or processed on any LiveRamp servers.
The data itself is never handled within LiveRamp’s infrastructure, even though the graph build process is orchestrated and managed through LiveRamp’s platform. This setup allows LiveRamp to facilitate the graph build while maintaining secure and responsible data usage within the customer’s VPC.
Virtual private cloud architecture overview
Below is a high-level architecture illustrating the separation between the control plane and the data plane:
- Control plane: Managed by LiveRamp, this layer hosts the application and orchestrates the graph build process, ensuring integration and management of tenant workflows.
- Data plane: Resides within the customer’s environment, where all data processing occurs securely within their VPC without data being stored or processed on LiveRamp’s servers.
A view of the data plane infrastructure architecture:
Application
The multi-tenant application manages configurations for each tenant, including cloud-specific details for the data plane, such as the cloud project ID, subnetwork, compute region and zone, input data location, and data warehouse information for storing the built graph (or knowledge base).
The application also includes a workflow builder that allows users to create and manage graph build workflows, which are structured as directed acyclic graphs (DAGs).
Temporal
LiveRamp’s engineers have implemented a hybrid orchestration layer, with one part utilizing our orchestrator built in house to manage dynamically built DAGs, which is a collection of processes, while the other part orchestrates pre-built processes. These pre-built processes are LiveRamp’s managed artifacts, also structured as directed acyclic graphs. The processes are orchestrated using the Temporal service, which supports the extreme load and scalability requirements of our system, allowing multiple tenants to orchestrate a number of individual workflows simultaneously.
Execution
In the context of a graph build workflow, there are multiple ways to begin execution, triggering the following sequence of actions that occur once the backend API is requested:
When a request to trigger a workflow is received, the backend API first inspects the request payload to determine which tenant the request came from. In addition to the request payload, a valid bearer token must be included for authentication, ensuring that only authorized clients can submit the request. Additionally, IP whitelisting is enforced to guarantee that only requests originating from trusted sources are processed. This combination of security measures validate that the request is both legitimate and from an authorized network.
Once the tenant is identified, the backend retrieves a specific pair of service accounts associated with that tenant from the tenant configuration. Each tenant is assigned its own unique set of service account pairs, ensuring that the execution is isolated and secure.
These service accounts play distinct roles in the execution flow:
- Control plane service account: This account is tied to orchestration within the control plane, enabling higher-level management and coordination of tasks.
- Data plane service account: This account is linked to the data plane and provides access to resources directly related to job execution.
The control plane service account holds impersonation permissions over the data plane service account that are obtained during infrastructure provisioning. When the tenant gets onboarded, it acts on behalf of the data plane account. This enables the orchestration process to start the necessary infrastructure, such as a Spark cluster, and triggers the job.
What makes this approach secure?
This approach is secure because of the separation between tenants. Each tenant has its own distinct pair of service accounts, ensuring that resources and executions are completely isolated from one another. This means that no single tenant can interfere with the resources or services of another tenant.
Once execution is complete, the control plane service account can also use its impersonation privileges to clean up resources, including deleting the Spark cluster and turning off unnecessary systems for greater cost efficiency.
Infrastructure deployment
To standardize and automate the infrastructure setup process, LiveRamp’s Engineering Team developed this public Terraform module. This module adheres to best practices for Infrastructure as Code (IaC) and enables customers to provision and manage the required infrastructure in an automated and repeatable manner, ensuring consistency across different environments, such as development, staging, and production.
HashiCorp Terraform Registry
Support and maintenance: Routing data plane logs to the control plane
Remote execution introduces challenges without direct access to the customer’s environment. As a result, providing support can become complex, especially when a graph build requires data engineer assistance or insights into the execution process.
To address this challenge, LiveRamp implemented a custom Log4j appender that allows our team to forward execution logs from the customer’s environment. Specifically, we can capture Spark job logs and send them directly to the control plane’s Google Cloud Logging API, providing us with the ability to inspect logs and troubleshoot issues without needing direct access to the customer’s environment.
This high-level architectural diagram demonstrates how the control plane prepares a dedicated, ephemeral log4j.xml configuration file for each tenant along with its implementation dependencies and loads it onto the Dataproc class/file paths. It is then used by the Dataproc/Spark logging framework at runtime to send non-PII logs from the data plane to the control plane.
Currently, LiveRamp supports the Google Cloud Platform as the data plane. However, the team plans to expand support to include other major cloud providers in the future, with Amazon Web Services being the next immediate addition.
Take the next step in secure data collaboration
By seamlessly integrating with cloud platforms like Google Cloud, and soon AWS, LiveRamp’s solutions enable businesses to unite and activate first-party data while maintaining security and efficiency. LiveRamp’s Identity Engine Remote Execution feature empowers organizations to securely build first-party data graphs within their own VPC, making scalable, responsible data collaboration possible. As privacy regulations evolve, organizations must prioritize consumer trust for richer marketing insights and better business results – LiveRamp provides the foundation to do just that.
If you’re ready to get started, reach out to a LiveRamp expert.