Offloading Heavy Translation Workloads with Redis BullMQ and Google Cloud Run

Introduction

A client came to us with a problem: their translation API — built in Node.js — was frequently timing out with 524 errors. The API was responsible for translating restaurant content like menus, categories, and dishes into multiple languages using third-party services such as DeepL and Google Translate.

The timeout issue was tied to the number of items being translated and the number of target languages — processing could take anywhere from one to twenty minutes, depending on the request size. They needed a solution that was both scalable and resource-conscious.

Our Solution

We approached this problem in two phases: first by optimizing the existing translation API, and then by introducing a background job processing architecture using Redis BullMQ and Google Cloud Run.

Phase 1: API-Level Optimization

Improved MongoDB Write Efficiency
The original logic used pull and push operations to modify nested arrays, which led to performance issues. We switched to a pipeline-style update ($set with filtered arrays and positional updates) to reduce I/O overhead and improve speed.
Converted Synchronous HTTP Calls to Asynchronous
Instead of waiting for each translation request to finish sequentially, we parallelized API calls using async patterns, significantly reducing total processing time.

Result:
Translating one restaurant’s 160 items into four languages (English, French, German, Spanish) used to take ~10 minutes. After this optimization, the same task now takes under 40 seconds.

Phase 2: Redis BullMQ + Google Cloud Run Architecture

To ensure long-term scalability and resilience, we designed a queue-based architecture:

Queue Jobs with Redis BullMQ
Each translation request is pushed to a queue. This enables retry logic, preserves job history, and ensures no jobs are lost.
Run Jobs in Google Cloud Run (Serverless)
Cloud Run functions process each job independently, isolating heavy translation logic from the API server and scaling only when needed.
Non-Blocking API Response
The main API immediately responds to users after queuing the request, improving responsiveness and user experience.
Lightweight and Cost-Efficient
Since the client runs on a low-spec server, offloading heavy computation avoids memory and CPU spikes — without the need to scale up or introduce dedicated infrastructure.
Rate-Limit-Aware Execution
Instead of parallel workers, we process one job at a time, respecting the third-party API’s rate limits while maintaining consistent throughput.