Data Cloud Integration Patterns

Overview

This guide covers practical patterns for integrating with Salesforce Data Cloud using official ingestion and query APIs, plus identity-resolution-aware design.

Consensus Best Practices

Data Cloud Model Primer

Ingestion Patterns

Pattern 1: Streaming Upsert API

Use the Ingestion API when you need near-real-time writes into Data Cloud objects.

Pattern 2: Bulk Ingestion Jobs

Use bulk ingestion for large backfills and scheduled high-volume loads.

Recommended flow:

  1. Create ingestion job.
  2. Upload CSV data in parts.
  3. Close/upload-complete state.
  4. Start processing.
  5. Poll status and retrieve failed-record results.

Best fit:

Query Patterns

Pattern 3: Async Query API (v2-style)

Use async query execution for analytics and downstream extracts.

Typical flow:

  1. Submit SQL query request.
  2. Receive query/job ID.
  3. Poll query status endpoint.
  4. Fetch result rows with offset/limit pagination.

Design guidance:

Identity Resolution Patterns

Pattern 4: Unified Profile-Aware Joins

When identity resolution is enabled, records from multiple sources can be mapped to a unified profile.

Implementation guidance:

Data Cloud for RAG Pipelines

Retrieval Metadata Suggestions

Edge Cases and Limitations

Q&A

Q: When should I use Data Cloud ingestion API vs bulk ingestion?

A: Use ingestion API for frequent smaller updates and near-real-time processing. Use bulk ingestion jobs for large backfills and high-volume scheduled loads.

Q: How should I query Data Cloud data reliably in production?

A: Use async query patterns: submit query, poll status, then page through rows. Persist query IDs and failures so jobs can be retried safely.

Q: What identity key should downstream systems store?

A: Store both source profile identifiers and unified individual identifiers. Unified IDs support person-level aggregation; source IDs preserve traceability.

Q: How do identity resolution changes impact integrations?

A: Unified mappings can change after ruleset updates, so downstream caches and derived tables must be reconciled after those changes.

Q: What is the safest starting point for RAG chunks from Data Cloud?

A: Start with harmonized DMO fields plus minimal identity metadata and strict provenance, then expand only after retrieval metrics justify more fields.

Sources Used