Data Cloud Integration Patterns
Overview
This guide covers practical patterns for integrating with Salesforce Data Cloud using official ingestion and query APIs, plus identity-resolution-aware design.
Consensus Best Practices
- Model around Data Cloud entities first (DLO/DMO/unified profile), then map to downstream schemas.
- Choose ingestion mode by freshness and volume (streaming upsert vs bulk jobs).
- Treat Query API workloads as asynchronous jobs and design explicit polling/retry logic.
- Keep identity resolution explicit in downstream joins (source profile vs unified profile).
- Store source-system provenance in every extracted chunk for auditability and troubleshooting.
Data Cloud Model Primer
- Data Lake Objects (DLOs): raw/landing representations of ingested source data.
- Data Model Objects (DMOs): harmonized objects aligned to Salesforce’s cloud information model.
- Unified Individual: resolved identity layer used to unify person-level records across sources.
Ingestion Patterns
Pattern 1: Streaming Upsert API
Use the Ingestion API when you need near-real-time writes into Data Cloud objects.
- Endpoint pattern:
/api/v1/ingest/sources/{sourceApiName}/{objectApiName} - Supports create/update semantics based on configured keys.
- Good for transactional pipelines and frequent small-to-medium payloads.
Pattern 2: Bulk Ingestion Jobs
Use bulk ingestion for large backfills and scheduled high-volume loads.
Recommended flow:
- Create ingestion job.
- Upload CSV data in parts.
- Close/upload-complete state.
- Start processing.
- Poll status and retrieve failed-record results.
Best fit:
- Initial historical backfills.
- Rebuilds after mapping or identity-rule changes.
- Large nightly/weekly syncs.
Query Patterns
Pattern 3: Async Query API (v2-style)
Use async query execution for analytics and downstream extracts.
Typical flow:
- Submit SQL query request.
- Receive query/job ID.
- Poll query status endpoint.
- Fetch result rows with offset/limit pagination.
Design guidance:
- Build idempotent pollers.
- Add query timeout and cancellation policies.
- Persist query IDs and status transitions for observability.
Identity Resolution Patterns
Pattern 4: Unified Profile-Aware Joins
When identity resolution is enabled, records from multiple sources can be mapped to a unified profile.
Implementation guidance:
- Keep both source profile IDs and unified IDs in downstream models.
- Use unified IDs for person-level aggregation.
- Keep source-level lineage for explainability and replay.
- Re-run downstream reconciliation after identity ruleset changes.
Data Cloud for RAG Pipelines
Recommended Document Construction
- Build retrieval chunks from harmonized DMO fields instead of raw source payloads.
- Include identity context metadata (
source_profile_id,unified_individual_id) when relevant. - Keep PII-safe defaults; redact fields not needed for retrieval quality.
- Version chunk generation logic with Data Cloud mapping version and ruleset version.
Retrieval Metadata Suggestions
data_cloud_subject_areadmo_api_namesource_systemidentity_scope(source,unified)ingestion_timestamp
Edge Cases and Limitations
- Identity resolution changes can alter unified mappings and downstream joins.
- Async query jobs can succeed with delayed row retrieval if clients don’t handle polling correctly.
- Bulk ingestion error files must be parsed and triaged; silent failures create data drift.
- Harmonization mismatches (DLO to DMO) can reduce retrieval relevance if mappings are incomplete.
Related Patterns
- Salesforce → LLM Data Pipelines - End-to-end extraction and chunking design
- ETL vs API vs Events - Pattern selection framework
- Salesforce LLM Data Governance - Security and governance controls
Q&A
Q: When should I use Data Cloud ingestion API vs bulk ingestion?
A: Use ingestion API for frequent smaller updates and near-real-time processing. Use bulk ingestion jobs for large backfills and high-volume scheduled loads.
Q: How should I query Data Cloud data reliably in production?
A: Use async query patterns: submit query, poll status, then page through rows. Persist query IDs and failures so jobs can be retried safely.
Q: What identity key should downstream systems store?
A: Store both source profile identifiers and unified individual identifiers. Unified IDs support person-level aggregation; source IDs preserve traceability.
Q: How do identity resolution changes impact integrations?
A: Unified mappings can change after ruleset updates, so downstream caches and derived tables must be reconciled after those changes.
Q: What is the safest starting point for RAG chunks from Data Cloud?
A: Start with harmonized DMO fields plus minimal identity metadata and strict provenance, then expand only after retrieval metrics justify more fields.
Sources Used
- Data Cloud Ingestion API: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-ingestion-api.html
- Create Ingestion Job (Bulk): https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-create-a-job.html
- Upload Data in Parts (Bulk): https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-upload-part-data.html
- Close or Abort Ingestion Job: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-close-or-abort-a-job.html
- Query Services Overview: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-queryservices-overview.html
- Query API v2 Reference: https://developer.salesforce.com/docs/data/data-cloud-query-guide/references/data-cloud-query-api-reference/c360a-api-query-v2.html
- Get Query Status (v2): https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-get-query-v2.html
- Unified Individual API Overview: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360a-api-unified-individual-api-overview.html
- Data Cloud Profile Explorer (Identity Resolution): https://developer.salesforce.com/docs/data/data-cloud-dev/guide/dc-profile-explorer.html
- Cloud Information Model Subject Areas: https://developer.salesforce.com/docs/data/data-cloud-ref/guide/c360dm-cloud-information-model.html