Cassandra datastax driver ResultSet sharing in multiple threads for fast reading

I've huge tables in cassandra, more than 2 billions rows and increasing. The rows have a date field and it is following date bucket pattern so as to limit each row.

Even then, I've more than a million entries for a particular date.

I want to read and process rows for each day as fast as possible. What I am doing is that getting instance of com.datastax.driver.core.ResultSet and obtain iterator from it and share that iterator across multiple threads.

So, essentially I want to increase the read throughput. Is this the correct way? If not, please suggest a better way.


Unfortunately you cannot do this as is. The reason why is that a ResultSet provides an internal paging state that is used to retrieve rows 1 page at a time.

You do have options however. Since I imagine you are doing range queries (queries across multiple partitions), you can use a strategy where you submit multiple queries across token ranges at a time using the token directive. A good example of this is documented in Paging through unordered partitioner results.

java-driver 2.0.10 and 2.1.5 each provide a mechanism for retrieving token ranges from Hosts and splitting them. There is an example of how to do this in the java-driver's integration tests in

    PreparedStatement rangeStmt = session.prepare("SELECT i FROM foo WHERE token(i) > ? and token(i) <= ?");

    TokenRange foundRange = null;
    for (TokenRange range : metadata.getTokenRanges()) {
        List<Row> rows = rangeQuery(rangeStmt, range);
        for (Row row : rows) {
            if (row.getInt("i") == testKey) {
                // We should find our test key exactly once
                    .describedAs("found the same key in two ranges: " + foundRange + " and " + range)
                foundRange = range;
                // That range should be managed by the replica
                assertThat(metadata.getReplicas("test", range)).contains(replica);
private List<Row> rangeQuery(PreparedStatement rangeStmt, TokenRange range) {
    List<Row> rows = Lists.newArrayList();
    for (TokenRange subRange : range.unwrap()) {
        Statement statement = rangeStmt.bind(subRange.getStart(), subRange.getEnd());
    return rows;

You could basically generate your statements and submit them in async fashion, the example above just iterates through the statements one at a time.

Another option is to use the spark-cassandra-connector, which essentially does this under the covers and in a very efficient way. I find it very easy to use and you don't even need to set up a spark cluster to use it. See this document for how to use the Java API.

Need Your Help

current average for each row of data

mysql average

I have a table output with as-Date&nbsp;&nbsp;&nbsp;Output

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.