Abstract:
The deployment of modern applications in geo-distributed systems results in performance fluctuation which is a consequence of long-tail latency. To deliver high- quality services these applications always strive to adapt to the changing situation and an appropriate replica selection strategy is one efficient way to achieve this. Several replica selection strategies have already been developed but none of them are efficient enough to reduce tail latency and to adapt to the dynamic environment of the geo- distributed systems. Hence for developing a more efficient replica selection strategy first we conduct some extensive experiments to analyze and evaluate two popular state-of-the-art replica selection strategies of key-value store systems: Cassandra dynamic snitch and C3. After analyzing all the experimental results we develop a prediction-based replica selection strategy for a locally distributed system and implement it on Cassandra 3.0. For evaluation, we test Cassandra dynamic snitch, C3, and our proposed strategy on a locally distributed 15 nodes Cassandra cluster. Experimental results show that our proposed algorithm outperforms both C3 and dynamic snitch. Then we deploy a geo-distributed 15 nodes Cassandra cluster on Amazon EC2 and conducted some experiments to analyze how our developed replica selection strategy performs in geo-distributed systems. Though our strategy outperforms both C3 and dynamic snitch in locally distributed systems however its performance is not that much promising in the geo-distributed systems. Analyzing the experimental outcomes, finally, we extend the prediction model of the replica selection strategy that is designed for locally distributed system and designed a prediction-based replica selection strategy for the geo-distributed systems. In this paper, we present the design and implementation of the prediction-based replica selection strategy for reducing tail latency in geo-distributed systems. We have meticulously designed the proposed strategy to adapt to the dynamic behavior of the distributed system. For evaluating its effectiveness in reducing tail latency and improving the overall throughput we perform some extensive experiments in a 15 nodes Cassandra Cluster that is deployed on Amazon EC2. For generating test datasets and workloads we use industry-standard Yahoo Cloud Serving Benchmark (YCSB). Our experimental results show that the proposed strategy not only reduces tail latency but also increases the overall throughput of the geo-distributed systems. Apart from that, our proposed strategy also increases the performance of the request reissue and request duplication strategies.