Offloading Heavy Translation Workloads with Redis BullMQ and Google Cloud Run

Table of Contents
Introduction
A client came to us with a problem: their translation API — built in Node.js — was frequently timing out with 524 errors. The API was responsible for translating restaurant content like menus, categories, and dishes into multiple languages using third-party services such as DeepL and Google Translate.
The timeout issue was tied to the number of items being translated and the number of target languages — processing could take anywhere from one to twenty minutes, depending on the request size. They needed a solution that was both scalable and resource-conscious.
Our Solution
We approached this problem in two phases: first by optimizing the existing translation API, and then by introducing a background job processing architecture using Redis BullMQ and Google Cloud Run.
Phase 1: API-Level Optimization
Improved MongoDB Write Efficiency
The original logic usedpullandpushoperations to modify nested arrays, which led to performance issues. We switched to a pipeline-style update ($setwith filtered arrays and positional updates) to reduce I/O overhead and improve speed.Converted Synchronous HTTP Calls to Asynchronous
Instead of waiting for each translation request to finish sequentially, we parallelized API calls using async patterns, significantly reducing total processing time.
Result:
Translating one restaurant’s 160 items into four languages (English, French, German, Spanish) used to take ~10 minutes. After this optimization, the same task now takes under 40 seconds.
Phase 2: Redis BullMQ + Google Cloud Run Architecture
To ensure long-term scalability and resilience, we designed a queue-based architecture:
Queue Jobs with Redis BullMQ
Each translation request is pushed to a queue. This enables retry logic, preserves job history, and ensures no jobs are lost.Run Jobs in Google Cloud Run (Serverless)
Cloud Run functions process each job independently, isolating heavy translation logic from the API server and scaling only when needed.Non-Blocking API Response
The main API immediately responds to users after queuing the request, improving responsiveness and user experience.Lightweight and Cost-Efficient
Since the client runs on a low-spec server, offloading heavy computation avoids memory and CPU spikes — without the need to scale up or introduce dedicated infrastructure.Rate-Limit-Aware Execution
Instead of parallel workers, we process one job at a time, respecting the third-party API’s rate limits while maintaining consistent throughput.
Outcome
By introducing job queues and serverless workers, we delivered a system that is:
- Reliable — no job loss and automatic retry on failure
- Responsive — immediate API replies with background processing
- Resource-Conscious — offloads heavy work from the main server
- Cost-Efficient — leverages serverless pricing model
- Scalable — supports multiple users and job types
- Optimized — aligns job throughput with third-party API limits
This upgrade has not only made the translation system more stable but also elevated the overall experience for end users and the client’s team.
Technologies Used
- Redis BullMQ for job queuing
- Google Cloud Run for scalable, on-demand translation workers
- MongoDB optimized update queries
- Node.js async job orchestration
- DeepL / Google Translate APIs for multilingual support