Data Cloud Integration Patterns

Overview

This guide covers practical patterns for integrating with Salesforce Data Cloud using official ingestion and query APIs, plus identity-resolution-aware design.

Consensus Best Practices

Model around Data Cloud entities first (DLO/DMO/unified profile), then map to downstream schemas.
Choose ingestion mode by freshness and volume (streaming upsert vs bulk jobs).
Treat Query API workloads as asynchronous jobs and design explicit polling/retry logic.
Keep identity resolution explicit in downstream joins (source profile vs unified profile).
Store source-system provenance in every extracted chunk for auditability and troubleshooting.

Data Cloud Model Primer

Data Lake Objects (DLOs): raw/landing representations of ingested source data.
Data Model Objects (DMOs): harmonized objects aligned to Salesforce’s cloud information model.
Unified Individual: resolved identity layer used to unify person-level records across sources.

Ingestion Patterns

Pattern 1: Streaming Upsert API

Use the Ingestion API when you need near-real-time writes into Data Cloud objects.

Endpoint pattern: /api/v1/ingest/sources/{sourceApiName}/{objectApiName}
Supports create/update semantics based on configured keys.
Good for transactional pipelines and frequent small-to-medium payloads.

Pattern 2: Bulk Ingestion Jobs

Use bulk ingestion for large backfills and scheduled high-volume loads.

Recommended flow:

Create ingestion job.
Upload CSV data in parts.
Close/upload-complete state.
Start processing.
Poll status and retrieve failed-record results.

Best fit:

Initial historical backfills.
Rebuilds after mapping or identity-rule changes.
Large nightly/weekly syncs.

Query Patterns

Pattern 3: Async Query API (v2-style)

Use async query execution for analytics and downstream extracts.

Typical flow:

Submit SQL query request.
Receive query/job ID.
Poll query status endpoint.
Fetch result rows with offset/limit pagination.

Design guidance:

Build idempotent pollers.
Add query timeout and cancellation policies.
Persist query IDs and status transitions for observability.

Identity Resolution Patterns

Pattern 4: Unified Profile-Aware Joins

When identity resolution is enabled, records from multiple sources can be mapped to a unified profile.

Implementation guidance:

Keep both source profile IDs and unified IDs in downstream models.
Use unified IDs for person-level aggregation.
Keep source-level lineage for explainability and replay.
Re-run downstream reconciliation after identity ruleset changes.

Data Cloud for RAG Pipelines

Retrieval Metadata Suggestions

data_cloud_subject_area
dmo_api_name
source_system
identity_scope (source, unified)
ingestion_timestamp

Edge Cases and Limitations

Identity resolution changes can alter unified mappings and downstream joins.
Async query jobs can succeed with delayed row retrieval if clients don’t handle polling correctly.
Bulk ingestion error files must be parsed and triaged; silent failures create data drift.
Harmonization mismatches (DLO to DMO) can reduce retrieval relevance if mappings are incomplete.

Salesforce → LLM Data Pipelines - End-to-end extraction and chunking design
ETL vs API vs Events - Pattern selection framework
Salesforce LLM Data Governance - Security and governance controls

Q&A

Q: When should I use Data Cloud ingestion API vs bulk ingestion?

A: Use ingestion API for frequent smaller updates and near-real-time processing. Use bulk ingestion jobs for large backfills and high-volume scheduled loads.

Q: How should I query Data Cloud data reliably in production?

A: Use async query patterns: submit query, poll status, then page through rows. Persist query IDs and failures so jobs can be retried safely.

Q: What identity key should downstream systems store?

A: Store both source profile identifiers and unified individual identifiers. Unified IDs support person-level aggregation; source IDs preserve traceability.

Q: How do identity resolution changes impact integrations?

A: Unified mappings can change after ruleset updates, so downstream caches and derived tables must be reconciled after those changes.

Q: What is the safest starting point for RAG chunks from Data Cloud?

A: Start with harmonized DMO fields plus minimal identity metadata and strict provenance, then expand only after retrieval metrics justify more fields.

Sources Used

Data Cloud Ingestion API: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-ingestion-api.html
Create Ingestion Job (Bulk): https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-create-a-job.html
Upload Data in Parts (Bulk): https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-upload-part-data.html
Close or Abort Ingestion Job: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-close-or-abort-a-job.html
Query Services Overview: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-queryservices-overview.html
Query API v2 Reference: https://developer.salesforce.com/docs/data/data-cloud-query-guide/references/data-cloud-query-api-reference/c360a-api-query-v2.html
Get Query Status (v2): https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-get-query-v2.html
Unified Individual API Overview: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-unified-individual-api-overview.html
Data Cloud Profile Explorer (Identity Resolution): https://developer.salesforce.com/docs/data/data-cloud-dev/guide/dc-profile-explorer.html
Cloud Information Model Subject Areas: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360dm-cloud-information-model.html