Monitoring and Alerting for Salesforce

Overview

Monitoring and alerting patterns enable proactive detection of issues, performance problems, and system health degradation. This guide covers Platform Events monitoring, API health monitoring, async job failure detection, and log aggregation patterns.

Core Principle: Monitor system health proactively, alert on issues before they impact users, and aggregate logs for centralized analysis. Visibility enables rapid issue detection and resolution.

Prerequisites

Required Knowledge:

Understanding of Salesforce platform capabilities
Familiarity with Platform Events and async processing
Knowledge of API patterns and callouts
Understanding of logging and error handling
Knowledge of monitoring tools and dashboards

Recommended Reading:

Performance Tuning - Performance optimization and monitoring
Change Data Capture Patterns - CDC monitoring patterns
Event-Driven Architecture - Platform Events patterns
Error Handling and Logging - Logging patterns

When to Use Monitoring and Alerting

Use Monitoring and Alerting When

Production systems: Need visibility into production system health
Critical integrations: Integrations require monitoring
Async job processing: Async jobs need failure detection
Performance issues: Need to detect performance degradation
Compliance requirements: Regulatory requirements mandate monitoring
Proactive issue detection: Need to detect issues before user impact

Avoid Monitoring and Alerting When

Development environments: Basic logging sufficient
Non-critical systems: System failures have minimal impact
Limited resources: Don’t have resources for monitoring setup
Simple use cases: Basic error handling sufficient

Platform Events Monitoring

Pattern 1: Event Publication Monitoring

Purpose: Monitor Platform Event publication success and failure rates.

Implementation:

Custom Logging Object: Log event publication attempts
Success/Failure Tracking: Track publication success and failures
Metrics Dashboard: Dashboard showing event publication metrics
Alerting: Alert on high failure rates

Example:

public class EventPublicationLogger {
    public static void logEventPublication(String eventType, Boolean success, String errorMessage) {
        Event_Publication_Log__c log = new Event_Publication_Log__c(
            Event_Type__c = eventType,
            Success__c = success,
            Error_Message__c = errorMessage,
            Timestamp__c = Datetime.now()
        );
        insert log;
    }
}

Best Practices:

Log all event publications
Track success/failure rates
Monitor event volume
Alert on failures
Create dashboards

Pattern 2: Event Processing Monitoring

Purpose: Monitor event processing by subscribers.

Implementation:

Subscriber Logging: Log event processing by subscribers
Processing Time: Track event processing duration
Failure Tracking: Track processing failures
Lag Monitoring: Monitor event processing lag

Best Practices:

Log event processing
Track processing time
Monitor processing lag
Alert on failures
Create processing dashboards

API Health Monitoring

Pattern 1: Callout Health Monitoring

Purpose: Monitor API callout health and performance.

Implementation:

Callout Metrics: Track callout duration, status codes, response sizes
Failure Rate Monitoring: Monitor callout failure rates
Circuit Breaker Monitoring: Monitor circuit breaker state
Health Dashboards: Dashboards showing API health

Example:

public class CalloutHealthMonitor {
    public static void trackCallout(String integrationName, String endpoint, 
                                   Integer statusCode, Long duration) {
        Callout_Metric__c metric = new Callout_Metric__c(
            Integration_Name__c = integrationName,
            Endpoint__c = endpoint,
            Status_Code__c = statusCode,
            Duration_ms__c = duration,
            Timestamp__c = Datetime.now()
        );
        insert metric;
    }
    
    public static void checkHealth(String integrationName) {
        Integer failureCount = [
            SELECT COUNT() 
            FROM Callout_Metric__c 
            WHERE Integration_Name__c = :integrationName
            AND Status_Code__c >= 500
            AND Timestamp__c >= :Datetime.now().addHours(-1)
        ];
        
        if (failureCount > 10) {
            sendAlert('High failure rate: ' + integrationName);
        }
    }
}

Best Practices:

Track all callouts
Monitor failure rates
Alert on high failure rates
Monitor response times
Create health dashboards

Pattern 2: API Rate Limit Monitoring

Purpose: Monitor API rate limit usage and approaching limits.

Implementation:

Rate Limit Tracking: Track API call counts
Limit Monitoring: Monitor approaching rate limits
Alerting: Alert when approaching limits
Throttling: Implement throttling when approaching limits

Best Practices:

Track API usage
Monitor rate limits
Alert on high usage
Implement throttling
Create usage dashboards

Async Job Failure Detection

Pattern 1: Batch Job Monitoring

Purpose: Monitor Batch Apex job failures and performance.

Implementation:

Job Status Monitoring: Query AsyncApexJob for job status
Failure Detection: Detect failed batch jobs
Error Logging: Log batch job errors
Alerting: Alert on batch job failures

Example:

public class BatchJobMonitor {
    public static void monitorBatchJobs() {
        List<AsyncApexJob> failedJobs = [
            SELECT Id, ApexClass.Name, Status, NumberOfErrors, 
                   JobItemsProcessed, TotalJobItems
            FROM AsyncApexJob
            WHERE Status = 'Failed'
            AND CreatedDate >= :Datetime.now().addHours(-24)
        ];
        
        if (!failedJobs.isEmpty()) {
            sendAlert('Failed batch jobs detected: ' + failedJobs.size());
        }
    }
}

Best Practices:

Monitor job status
Detect failures
Log errors
Alert on failures
Create job dashboards

Pattern 2: Queueable Job Monitoring

Purpose: Monitor Queueable Apex job failures.

Implementation:

Job Status Tracking: Track Queueable job status
Failure Detection: Detect failed Queueable jobs
Chain Monitoring: Monitor Queueable job chains
Alerting: Alert on Queueable failures

Best Practices:

Track job status
Detect failures
Monitor job chains
Alert on failures
Create monitoring dashboards

Pattern 3: Scheduled Job Monitoring

Purpose: Monitor Scheduled Apex job execution and failures.

Implementation:

Execution Tracking: Track scheduled job execution
Failure Detection: Detect failed scheduled jobs
Execution Time Monitoring: Monitor job execution time
Alerting: Alert on scheduled job failures

Best Practices:

Track execution
Detect failures
Monitor execution time
Alert on failures
Create execution dashboards

Log Aggregation Patterns

Pattern 1: Centralized Logging

Purpose: Aggregate logs from multiple sources for centralized analysis.

Implementation:

Custom Logging Object: Centralized logging object
Log Collection: Collect logs from multiple sources
Log Analysis: Analyze logs for patterns
Log Retention: Implement log retention policies

Best Practices:

Centralize logs
Collect from all sources
Analyze for patterns
Implement retention
Create log dashboards

Pattern 2: Error Log Aggregation

Purpose: Aggregate error logs for analysis and alerting.

Implementation:

Error Logging: Log all errors to custom object
Error Aggregation: Aggregate errors by type, source, frequency
Error Analysis: Analyze error patterns
Alerting: Alert on error spikes

Best Practices:

Log all errors
Aggregate by type
Analyze patterns
Alert on spikes
Create error dashboards

Pattern 3: Performance Log Aggregation

Purpose: Aggregate performance logs for analysis.

Implementation:

Performance Logging: Log performance metrics
Performance Aggregation: Aggregate performance data
Performance Analysis: Analyze performance trends
Alerting: Alert on performance degradation

Best Practices:

Log performance metrics
Aggregate data
Analyze trends
Alert on degradation
Create performance dashboards

Alerting Patterns

Pattern 1: Threshold-Based Alerting

Purpose: Alert when metrics exceed thresholds.

Implementation:

Define Thresholds: Set thresholds for metrics
Monitor Metrics: Monitor metrics continuously
Alert on Threshold: Alert when threshold exceeded
Escalation: Escalate if alert not acknowledged

Best Practices:

Set appropriate thresholds
Monitor continuously
Alert promptly
Implement escalation
Review thresholds regularly

Pattern 2: Anomaly Detection

Purpose: Detect anomalies in metrics (unusual patterns).

Implementation:

Baseline Metrics: Establish baseline metrics
Anomaly Detection: Detect deviations from baseline
Alert on Anomalies: Alert when anomalies detected
Pattern Analysis: Analyze anomaly patterns

Best Practices:

Establish baselines
Detect anomalies
Alert on anomalies
Analyze patterns
Refine detection

Pattern 3: Composite Alerting

Purpose: Alert based on multiple conditions.

Implementation:

Multiple Conditions: Define multiple alert conditions
Composite Logic: Combine conditions (AND, OR)
Alert on Composite: Alert when composite condition met
Context: Include context in alerts

Best Practices:

Define conditions
Combine logically
Alert appropriately
Include context
Review conditions

Performance Tuning - Performance optimization and monitoring
Change Data Capture Patterns - CDC monitoring patterns
Event-Driven Architecture - Platform Events patterns
Error Handling and Logging - Logging patterns
Callout Best Practices - API callout monitoring

Q&A

Q: What are monitoring and alerting patterns for Salesforce?

A: Monitoring and alerting patterns enable: (1) Proactive issue detection (detect issues before user impact), (2) System health visibility (dashboards showing system health), (3) Performance monitoring (track performance metrics), (4) Error tracking (track and analyze errors), (5) Automated alerting (alert on issues automatically). Monitoring and alerting provide visibility into system health and enable rapid issue resolution.

Q: How do I monitor Platform Events?

A: Monitor by: (1) Log event publications (log all event publication attempts), (2) Track success/failure rates (monitor publication success), (3) Monitor event volume (track event counts), (4) Alert on failures (alert when failures detected), (5) Create dashboards (visualize event metrics). Platform Events monitoring ensures event-driven integrations are working correctly.

Q: How do I monitor API callout health?

A: Monitor by: (1) Track callout metrics (duration, status codes, response sizes), (2) Monitor failure rates (track callout failures), (3) Circuit breaker monitoring (monitor circuit breaker state), (4) Alert on failures (alert on high failure rates), (5) Create health dashboards (visualize API health). API health monitoring ensures integrations are functioning correctly.

Q: How do I detect async job failures?

A: Detect by: (1) Query AsyncApexJob (query job status), (2) Monitor job status (track job execution), (3) Detect failures (identify failed jobs), (4) Log errors (log job errors), (5) Alert on failures (alert when jobs fail). Async job failure detection ensures background processing is working correctly.

Q: How do I aggregate logs for analysis?

A: Aggregate by: (1) Centralized logging (log to custom object), (2) Collect from all sources (collect logs from all components), (3) Aggregate by type (group logs by type), (4) Analyze patterns (identify patterns in logs), (5) Create dashboards (visualize log data). Log aggregation enables centralized analysis and pattern detection.

Q: What are best practices for alerting?

A: Best practices: (1) Set appropriate thresholds (not too sensitive, not too lenient), (2) Monitor continuously (real-time or near-real-time), (3) Alert promptly (alert quickly on issues), (4) Include context (provide context in alerts), (5) Implement escalation (escalate if not acknowledged), (6) Review regularly (review and adjust thresholds). Effective alerting enables rapid issue response.

Q: How do I create monitoring dashboards?

A: Create by: (1) Identify key metrics (determine what to monitor), (2) Create custom objects (store metrics in custom objects), (3) Build reports (create reports on metrics), (4) Create dashboards (build dashboards from reports), (5) Share dashboards (share with stakeholders). Monitoring dashboards provide visibility into system health.

Monitoring and Alerting for Salesforce

Overview

Prerequisites

When to Use Monitoring and Alerting

Use Monitoring and Alerting When

Avoid Monitoring and Alerting When

Platform Events Monitoring

Pattern 1: Event Publication Monitoring

Pattern 2: Event Processing Monitoring

API Health Monitoring

Pattern 1: Callout Health Monitoring

Pattern 2: API Rate Limit Monitoring

Async Job Failure Detection

Pattern 1: Batch Job Monitoring

Pattern 2: Queueable Job Monitoring

Pattern 3: Scheduled Job Monitoring

Log Aggregation Patterns

Pattern 1: Centralized Logging

Pattern 2: Error Log Aggregation

Pattern 3: Performance Log Aggregation

Alerting Patterns

Pattern 1: Threshold-Based Alerting

Pattern 2: Anomaly Detection

Pattern 3: Composite Alerting

Related Patterns

Q&A

Q: What are monitoring and alerting patterns for Salesforce?

Q: How do I monitor Platform Events?

Q: How do I monitor API callout health?

Q: How do I detect async job failures?

Q: How do I aggregate logs for analysis?

Q: What are best practices for alerting?

Q: How do I create monitoring dashboards?