It is well known that resource pooling (or, equivalently, the use of flexible resources that can serve multiple types of requests) significantly improves the performance of service systems. On the other hand, complete resource pooling often results in higher infrastructure (communication and coordination) costs. This leads us to explore the benefits that can be derived by a limited amount of resource pooling, and the question whether a limited amount of pooled resources can deliver most of the benefits of complete resource pooling.
More concretely, we propose and analyze a multi-server model that captures such a performance tradeoff. In our model, a fraction p of an available resource is deployed as a flexible/pooled resource (e.g., it can serve a most-loaded station) while the remaining fraction 1−p is allocated to local servers that can only serve requests addressed specifically to their respective stations. Using a fluid model approach, we demonstrate a surprising phase transition in the steady-state delay, as p changes: in the limit of a large number of stations, and even if p is small, the average queue length in steady state scales (as a function of traffic intensity) at a much slower (exponentially smaller) rate.
We also discuss an alternative model of limited flexibility in which there are n arrival streams and each one of n servers is capable of serving only a small fraction of these streams.
Joint work with Kuang Xu.