How can you make AI apps respond faster in real time (without delays)?
Multiple issues can be the cause. some fixes
- Semantic caching
- Since it’s real time application are you using pub/sub. Check the region. Put it as close as possible to your deployments.
- Use asynchronous tool calls if multiple agents and where possible.
Really an audit of your application is required. Question is vague.
We are dealing with similar issues at my company. We need near-realtime audio processing - transcribe and then analyze the text in various ways. A delay of 1 or 2 minutes is fine for us but the pipeline we need to run is quite long.
Since we are using Gemini, We are currently experimenting with provisioned throughtput to see if it helps at least stabilize the response times.
For some parts of the pipeline we are actually moving back to smaller local models, not necessarily generative.