I have a cosmosGB gremlin API set up with 400 RU/s. If I have to run a query that needs 800 RUs, does it mean that this query takes 2 sec to execute? If i increase the throughput to 1600 RU/s, does this query execute in half a second? I am not seeing any significant changes in query performance by playing around with the RUs.
As I explained in a different, but somewhat related answer here, Request Units are allocated on a per-second basis. In the event a given query will cost more than the number of Request Units available in that one-second window:
Let's say you had 400 RU/sec, and you executed a query that cost 800 RU. It would complete, but then you'd be in debt for around 2 seconds (400 RU per second, times two seconds). At this point, you wouldn't be throttled anymore.
The speed in which a query executes does not depend on the number of RU allocated. Whether you had 1,000 RU/second OR 100,000 RU/second, a query would run in the same amount of time (aside from any throttle time preventing the query from running initially). So, aside from throttling, your 800 RU query would run consistently, regardless of RU count.
Makes sense, thank you. So if I have batch jobs to run(more RUs), would it be a good idea to run those during off-peak hours in order to make sure customers are not throttled during regular business hours? In other words, if I am ok with some downtime in offpeak hours, can i keep my throughput at the minimum, and run the expensive ones in offpeak?
@MichaelScott - honestly the way you distribute traffic is up to you. However, if I were in your position, I'd likely increase my RU capacity during peak hours, and decrease in non-peak. You have complete flexibility over RU allocation - you can adjust it any time. Just consider the cost of an extra few hundred RU - it's fairly negligible, even moreso if you only raise RU for a subset of each day.