[source-mongodb-v2] CDC unviable for a low-share, high-write collection on a busy shared oplog — change-stream COLLSCAN + shutdown deadlock. What are people doing at scale? #79656
Replies: 2 comments 2 replies
-
|
Hi Henry Bogardus (@usbogie), thank you for this incredibly thorough write-up — the detail on the oplog COLLSCAN behavior, the shutdown deadlock, and the We've escalated this to our engineering team for investigation: airbytehq/oncall#12848. A few clarifying questions that would help the team dig in:
Please ensure you mask or remove any sensitive information (API keys, passwords, tokens, connection strings) before sharing logs. Interim workaround notesWhile the team investigates, a couple of things to consider:
We'll provide updates on the oncall issue as the investigation progresses. In the meantime, feel free to share any additional diagnostics there. Need more help? Join Airbyte Community Slack for peer support, or if you're a Cloud customer, open a support ticket referencing this URL. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
We're trying to replicate a single large, high-write MongoDB collection to Snowflake via
source-mongodb-v2CDC, and we cannot make incremental CDC keep up. Catch-up runs at ~50 events/min and falls further behind in real time, and the connector also hits a shutdown deadlock that turns otherwise-fine syncs into failures. We believe we've root-caused it to the change stream's oplog COLLSCAN, but we'd love confirmation and to hear how others run CDC on collections like this at scale.Environment
airbyte/source-mongodb-v2:2.0.7(Debezium 2.6.2.Final, mongodb-driver-core 4.11.0)delete-and-reinsert pattern (re-posting deletes old docs and inserts new ones with new _ids).
update_capture_mode = Lookup,initial_waiting_seconds = 1200, oplog retention ~1 week.resume-token coupling described in [source-mongodb-v2] Update resume token to latest oplog position even when no new records exist #48435).
What we observe
1. Catch-up throughput is ~50 events/min and never converges.
After the initial snapshot (which is fast), CDC resumes from the snapshot-start token and is hours behind. It then advances only ~1 second of cluster-time per ~90 minutes of wall-clock, i.e. it loses ground in real time. Representative steady-state log:
The queue is always size=0 (the Snowflake side is not the bottleneck), and the source EC2 host is ~80% idle CPU with ~0 iowait — so the connector is blocked waiting on MongoDB, not on local compute or the destination.
2. The bottleneck appears to be the change-stream oplog scan. This matches discussion #42393, where the
$changeStreamaggregate does a COLLSCAN of the oplog (reported there: 1.3M oplog docs scanned to return 465 events, ~1m44s, under a global read lock). Our heartbeatinctracks the returned eventord, consistent with scanning past large numbers of other-collection oplog entries to find ours. A bigger Atlas tier wouldn't help because the cost scales with total cluster write volume, not with our collection or hardware.3. Shutdown deadlock → exit code 2 (matches #38705). With default socket timeout, the change-stream fetcher blocks in a
getMorethat can't be interrupted on engine close, so the CDK force-exits non-zero after the orphaned-thread grace period:Each attempt commits ~101 records + state, then exits 2 → Airbyte logs a partial failure and retries; 20 partial failures hit the limit and the job fails.
4. socketTimeoutMS can't thread the needle. We added
socketTimeoutMS=60000to the connection string to make the blocked socket read unwind before the force-kill — that fixed the exit-2 hang, but legitimate change-streamgetMores then exceed 60s and throw:Because the connector runs Debezium with 0 retries, a single slow scan fails the whole sync. There appears to be no socketTimeoutMS value that works: scans routinely exceed ~2 minutes, but the orphaned-thread force-kill fires at ~2 minutes, so "long enough for the scan" and "short enough to unwind on shutdown" don't overlap.
What we've ruled out
initial_waiting_seconds— already at the max (1200).updateLookup, but not theoplog COLLSCAN, and only applies to events written after enabling it (doesn't help the backlog).
Questions for the community
source-mongodb-v2CDC on a low-share, high-write collection in a busy shared oplog at hundreds of millions of docs? What throughput do you actually get, and how is your cluster/connector configured?updatedAt) incremental for mongodb-v2?Happy to share full logs, query profiles, or
explainoutput. Thanks!Beta Was this translation helpful? Give feedback.
All reactions