How Push Notification Delivery Works Internally: APNs and FCM Deep Dive
We share an in-depth look at how Apple Push Notification service (APNs) and Firebase Cloud Messaging (FCM) handle push notifications.

Introduction
Push notifications enable mobile apps to deliver timely messages and updates to users without requiring the app to be open. From a developer’s perspective, sending a push notification involves making an API call to a platform-specific service, such as Apple Push Notification service (APNs) for iOS or Firebase Cloud Messaging (FCM) for Android. But what happens under the hood after that API call? At Clix, we've spent a lot of time working with push notifications and have gained deep insights into how they work internally. In this post, we’ll explore the end-to-end journey of a push notification – from the moment your backend sends a request, through authentication and routing, all the way to the notification waking the device and displaying it to the user.
We’ll cover the internal workflows of APNs and FCM in depth, including authentication mechanisms, message queuing, network routing, and how devices handle incoming notifications. Along the way, we’ll highlight important constraints and best practices for reliable delivery.
Push Notification Delivery Workflow Overview
At a high level, push notifications follow a similar trajectory on both iOS and Android. The process involves a few key players: your app’s backend server (or cloud function), the push notification service (APNs or FCM), and the user’s device (running the OS and your app). Here’s an overview of the workflow from sending to delivery:
- Backend Sends a Push Request: Your server or cloud function composes a notification payload (typically a JSON structure) and sends it to the appropriate push service’s API endpoint (APNs for Apple devices, FCM for Android/Chrome devices). This request includes an authentication token or key to prove your app’s identity, and a device address (device token for APNs, registration token for FCM).
- Push Service Authenticates and Queues Message: The push service receives the request and verifies the credentials. If valid, it acknowledges the request (e.g., APNs returns an HTTP 200 status with a message ID, FCM returns a message ID in the response) indicating the message is accepted. The message is then queued for delivery. (A success response only means accepted for delivery, not that the device has received it yet)
- Routing the Notification: The push service determines the target device and routes the notification through its network. Both APNs and FCM maintain persistent, secure connections to devices. APNs knows which Apple device (identified by the device token) should get the notification and will push it out over that device’s persistent connection. FCM does the same for Android devices via the Android transport layer, or forwards the notification to APNs for iOS devices using FCM (since Apple devices can only be reached via APNs).
- Device Receives Notification: The user’s device, which is silently listening on the persistent connection, receives the incoming notification data. The device’s operating system wakes up the appropriate service/app to handle the message. On iOS, the system (iOS) receives the APNs message and prepares to display it or deliver it to the app. On Android, Google Play Services (which includes the FCM client) wakes up to handle the message and will in turn notify the target app.
- Notification Display or Processing: Finally, the notification is presented to the user or processed by the app, depending on the app’s state and the notification’s content. If the app is in the background, the OS typically shows a notification in the notification center/tray. If the app is in the foreground, it may receive the message data directly via callback to handle as needed. Some notifications can be “silent” (no alert to the user) but trigger background processing in the app (e.g., content update notifications). In all cases, the device may play a sound, show an alert, or badge the app icon as specified by the notification payload.
We’ll dive separately into how APNs and FCM handle these steps under the hood.
Apple Push Notification Service (APNs) – Inside the Delivery Pipeline
APNs is Apple’s cloud service for delivering remote notifications to iOS, iPadOS, macOS, watchOS, and tvOS devices. Here’s a deep look at how APNs processes a notification from send to display:
1. Authenticating with APNs: To send a notification via APNs, your server must authenticate itself to Apple. There are two methods: the newer token-based authentication (using a JWT signed with an APNs key) or the older certificate-based authentication. In both cases, your request must specify the correct topic (usually your app’s bundle identifier) that you’re authorized to send to.
2. Sending the Notification Request: APNs exposes an HTTP/2 API endpoint. Your server opens a connection to APNs and sends an HTTP/2 POST request containing the device token in the URL path and the JSON payload in the body. It’s efficient to keep this connection open for multiple requests – opening and closing connections for each message is discouraged and can be treated as a denial-of-service attempt by APNs. Apple’s best practices recommend reusing connections and even opening multiple concurrent connections if you send high volume, to achieve higher throughput. Each HTTP/2 request can use its own stream, and APNs allows a large but unspecified number of concurrent streams per connection (managed by the service based on load).
3. APNs Validation and Response: Upon receiving your request, APNs immediately validates the authentication and basic message structure. If anything is wrong (for example, invalid token, invalid device token format, payload too large, unauthorized topic, etc.), APNs will respond with an HTTP error status and a reason. An important thing to note is that a success response from APNs only means APNs has accepted the message for delivery. It does not guarantee the notification has reached the device yet. If the device is offline or other delays occur, APNs will store and forward the notification when possible.
4. APNs Message Queuing and Routing: Once accepted, the APNs backend will enqueue the notification for delivery to the target device identified by the device token. Each Apple device maintains a persistent encrypted connection to APNs when it has network connectivity. This persistent connection is idle most of the time (using almost no power) and is only actively used when APNs delivers a notification. Because the connection is already open, APNs can push the message to the device without the device having to “check in” or poll. The next time the device is reachable, APNs will deliver any queued messages over this channel. APNs also supports coalescing of multiple notifications: if you send several notifications with the same apns-collapse-id, the user’s device will only display the most recent one, as APNs will replace older ones with the latest. However, APNs may drop messages that have expired or those with lower priority if the device remains unreachable for too long.
5. Device Wake-Up and iOS Notification Display: When the notification arrives on the device, the iOS system takes over. If the app is not running or in the background, iOS will typically present the notification to the user – showing an alert/banner, playing a sound, and badging the app icon as specified in the payload (the aps dictionary). If the app is running in the foreground, iOS won’t automatically show a notification banner; instead, it will invoke the app’s delegate callback (giving the app a chance to handle the notification data). In the case of a silent notification, iOS will wake the app in the background for a short period so it can fetch updates or perform tasks – these are background pushes that do not alert the user. Notably, Apple requires that silent pushes be sent with a lower priority to conserve battery. High-priority notifications are delivered immediately and are expected to trigger user-visible alerts. When a high-priority notification arrives, the device’s radio may be woken up promptly to deliver it, whereas low-priority ones might be batched or delayed slightly to optimize battery life. The device’s OS ensures the notification is presented or processed according to the app’s state and Apple’s guidelines, waking the app if needed.
6. APNs Constraints: Apple imposes some limits and provides feedback to help developers ensure notifications are delivered efficiently:
- Payload Size: The payload JSON must not exceed 4 KB (4096 bytes) for standard remote notifications. This limit means you should keep your notification content lean. If you send a payload that’s too large, APNs will reject it with an HTTP 413 error (payload too large).
- Rate Limits: While Apple doesn’t publish strict rate limits for APNs, it does enforce some limits to prevent abuse. For example, APNs may respond with HTTP 429 Too Many Requests if you send an excessive number of notifications to the same device token in a short time. This is a signal to back off. In general, APNs can handle very high throughput – to scale, use multiple parallel connections as needed, but avoid bombarding APNs or a single device with a sudden flood.
- Error Handling and Retries: APNs responses include status codes and a descriptive reason for errors . Common APNs errors include 400 Bad Request (malformed payload JSON), 403 Forbidden (authentication issue), 410 Gone (the device token is no longer valid for the app), and 413 Payload Too Large (exceeded size limit). For 500 Internal Server Error or 503 Service Unavailable, treat it as a temporary server issue – implement a retry with backoff.
By following the APNs protocol properly, a developer can reliably deliver notifications to iOS users with minimal latency and optimal battery usage – Apple’s push network takes care of efficiently waking the device.
Firebase Cloud Messaging (FCM) – Inside the Delivery Pipeline
Firebase Cloud Messaging (FCM) is Google’s cross-platform messaging solution, and it’s the primary way to send push notifications to Android devices. FCM also integrates with iOS, effectively acting as a wrapper around APNs. Here’s how FCM delivers a message from your server to an Android app:
1. Sending a Message to FCM: Your server (or cloud function) sends a message to FCM specifying a target (which can be a single device registration token, a topic name for pub/sub style messaging, or a user group) and a payload. You can use the Firebase Admin SDK or call the FCM HTTP endpoint directly. Your request must be authorized and include valid parameters (like a known registration token). FCM will verify the sender (project credentials) and that you are allowed to message that target.
2. FCM Backend Processing and Fan-out: Once FCM accepts the message, it enters the Google cloud infrastructure for processing. FCM will stamp the message with a unique ID and metadata (like the timestamp). If the message is targeted to multiple devices (for example, a topic or a multicast to a list of tokens), the FCM backend will fan-out the message – effectively cloning it and enqueueing one message per target device. FCM is built to scale to massive fan-outs (for example, a notification to millions of devices subscribed to a topic). Under the hood, the FCM dispatch system determines which devices (which device connection servers) need to receive the message and prepares it for delivery. If your message was targeted to an iOS device using FCM, at this stage FCM would hand off the message to APNs. In fact, FCM acts as a broker for iOS notifications: you upload your APNs credentials to Firebase, and FCM uses them to send the notification through Apple’s network on your behalf. For Android devices, FCM uses its own Android transport layer – a background service in Google Play Services on the device that maintains a connection to FCM.
3. Persistent Connection to Android Devices: Android devices running Google Play Services maintain a single persistent TCP/IP connection to FCM servers. The device opens this connection when it boots or when an app registers for FCM, and keeps it alive in the background. Importantly, the connection is shared across all apps on the device that use FCM – it’s a central Android system service that handles messaging for all. This means whether the user has one FCM-enabled app or fifty, there is still just one socket open to the FCM servers. This design is crucial for battery efficiency: a single idle socket has negligible power impact, and the device’s radio is only activated when a message is incoming. A persistent connection does not mean the device is constantly using CPU or radio; it simply means the channel is kept available for when data needs to be pushed. In fact, Android use various optimizations to keep the connection open in a low-power state and avoid waking the device unnecessarily. The takeaway is that FCM can deliver a message to an Android device instantly (push) without the app itself having to be running or the device having to poll – the “always-on” connection is the conduit.
4. Message Delivery to the Device: When the FCM backend has a message for a device, it packages the data and sends it through the persistent socket to that device’s FCM client (Google Play Services). Several scenarios can happen at this stage depending on device state and message priority:
- Device Online and Active: If the device is connected and not in a deep sleep mode, FCM will deliver the message as soon as possible. High-priority messages are sent immediately. For normal priority messages, if the device is currently active or the screen is on, they’ll also be delivered promptly.
- Device in Doze Mode or App Standby: Modern Android devices may enter Doze mode when the screen is off and the device is idle. In Doze, normal-priority FCM messages will not wake the device immediately. Instead, FCM will hold onto the message in the cloud, and deliver it when the device periodically exits Doze for maintenance or when the user wakes the device. However, if a message is marked as high priority, FCM will attempt to deliver it by waking the radio even in Doze (this is intended for urgent notifications). Android will still impose limits – e.g., an app that abuses high-priority messages to wake the device too often may get throttled. But generally, high priority is used for time-sensitive alerts (chat messages, etc.), while normal priority is for things that can wait.
- Device Offline: If the device has no network connection (e.g., in airplane mode or completely off), FCM cannot deliver the message immediately. Instead, FCM will store the message on its servers until the device comes online. The message will be delivered as soon as the device reconnects and establishes contact with FCM.
- Collapsible Messages: Similar to APNs collapse identifiers, FCM has a concept of collapse keys. If you send multiple messages with the same collapse_key to a device that is offline or in Doze, FCM will only keep the most recent one and discard the older ones.
- Delivery Receipts: FCM itself does not provide a delivery confirmation to the app server for individual messages. However, if the message cannot be delivered within its TTL, FCM will eventually drop it. On Android, if the app later connects and there were messages that got dropped, the app will get a callback indicating messages were lost. This is a signal to perhaps sync with your server for what was missed.
5. Device Handling and Notification Display (Android): On the device, Google Play Services (the FCM client) receives the message and will take one of two actions: either display the notification on behalf of the app, or deliver the message data to the app’s code. The behavior depends on how the message was sent:
- Notification Message: If you sent a notification message, FCM SDK will usually handle presenting it to the user. If the app is in the background or not running, the system will create a notification in the Android status bar using the provided title, text, and icon. This happens without the app needing to run – it’s convenient for simple use-cases. If the app is in the foreground, by default the notification is not automatically shown; instead the message is delivered to the app callback so the app can decide how to show it (maybe show a custom in-app alert).
- Data Message: If you sent a data-only message, or if a notification message arrives while the app is foreground, then the FCM client will invoke the app’s Firebase messaging service. This wakes up your app to let you handle the message. For example, you might parse the data and create a custom notification, or trigger a background sync. Note that if the app is completely swiped closed, Android will still start the FCM service to deliver a high-priority message, but there are limits on this for background execution. If messages are normal priority and the app is not running, the delivery might wait until the app is opened or the system schedules a background job for the app.
- Device Wake-up: The FCM service on Android has special privileges to wake the device from sleep when a high-priority message arrives. This is done judiciously to avoid battery drain. If too many wake-ups occur, Android may start postponing them.
6. FCM Constraints and Reliability Features: Google’s FCM comes with its own set of limits and behaviors developers should be aware of:
- Payload Size: FCM allows a payload up to 4 KB (4096 bytes) for both notification and data messages. If you send a message that exceeds this, FCM will reject it (with an error like “Message too big”). Note that when sending to topic subscriptions, the maximum data payload is 2048 bytes (2 KB) because of additional overhead in fan-out. Always keep payloads small; if you have more data, consider fetching it in-app after the notification is received.
- Rate Limits and Throttling: FCM can handle extremely high volumes, but to protect the system and devices, it enforces certain rate limits. There are per-minute and per-project limits (for instance, as of the HTTP v1 API, a default limit is 600,000 messages per minute for a single Firebase project. If you exceed this, FCM will respond with HTTP 429 Quota Exceeded errors. There are also device-level throttles: if you target a single device with too many messages too quickly, FCM may drop some or delay them. In general, FCM’s goal is to deliver every message, but when a flood of messages would hurt the user experience or battery, FCM will throttle or collapse them. As a developer, you should design your push usage to avoid spamming users and use collapse keys to avoid wasteful duplication.
- Error Handling and Response Codes: When you send a message via FCM, pay attention to the response. Your application server should handle the error gracefully – e.g., remove invalid tokens, correct bad requests, and retry transient failures.
- Storage and TTL: As noted, FCM will store undelivered messages up to 4 weeks by default. If a device hasn’t connected in that time, FCM discards the message.
- Battery and Priority Considerations: Google is very mindful of battery impact. FCM’s normal vs high priority feature is a direct parallel to APNs priority 5 vs 10. Use normal priority for background updates that can wait; this allows the system to batch these messages and deliver them during natural wake cycles, preserving battery. Use high priority for user-visible or urgent communication – this tells Android to wake up immediately for delivery. However, if an app misuses high priority, the system may start to throttle that app’s messages, effectively treating them as normal priority or even suppressing some.
Best Practices
Understanding the internals of push notification delivery helps you design better messaging in your app:
- Keep payloads concise (under 4 KB).
- Regularly remove invalid tokens.
- Use priorities wisely (high priority for urgent notifications, normal priority for background tasks).
- Implement retries with exponential backoff.
- Monitor delivery through analytics and backend logging.
How Clix Can Help
While APNs and FCM abstract away the heavy lifting of messaging infrastructure, it’s valuable for developers to know what’s happening behind the scenes. At Clix, we understand the intricacies of push notifications and have deep expertise helping developers send notifications effectively. Our tools simplify integration, provide robust analytics, and improve notification reliability, making push notifications easier for your team to manage.