Performance improvement : How to read only last 10 lines of 100,000 files within 30mins

I have one doubt regarding which collection should I use. Have discussed a lot but wanted more inputs.

I have a source system from where 100,000s of trade files come to my application in say every 30mins. Each file having many lines of code (say 1000). My app should store and process only last 10 lines of trade details.

If I read file contents using buffer reader line by line then I have to keep on adding each line details in some collection and finally once I reach the last line somehow remove all and keep only last 10 lines. So by keeping all 1000 lines in collection even if I do not require all is a performance issue. Is there any collection or any approach to improve this.

Answers


You can use a CircularFifoBuffer:

CircularFifoBuffer is a first in first out buffer with a fixed size that replaces its oldest element if full.

Usage for keeping in memory only the last 10 lines:

CircularFifoBuffer buffer = new CircularFifoBuffer(10);
// read lines and add them to the buffer

At the end of reading the lines, the buffer only contains the last 10 lines.


Use a RandomAccessFile, and try ever larger buffers to read. I made a tail function with a line-length-hint, to make a guess. Be aware that whether the file ends with a newline or may make a difference in the result. Also the code can be improved upon (power of two block size and so on).

        File textFile = new File("...");
        String[] lines = tail(textFile, "UTF-8", 10, 160);
        System.out.println("#Lines: " + lines.length);
        for (String line : lines) {
            System.out.println(line);
        }


String[] tail(File textFile, String charSet, int lines, int lineLengthHint)
        throws IOException {
    if (lineLengthHint < 80) {
        lineLengthHint = 80;
    }
    RandomAccessFile in = new RandomAccessFile(textFile, "r");
    try {
        long fileSize = in.length();
        int bytesCount = lines * lineLengthHint;
        // Loop allocating a byte array hopefully sufficiently large.
        for (;;) {
            if (fileSize < bytesCount) {
                bytesCount = (int)fileSize;
            }
            byte[] bytes = new byte[bytesCount];
            in.seek(fileSize - bytesCount);
            in.readFully(bytes);

            int startIndex = bytes.length; // Position of last '\n'.
            int lineEndsFromStart = 0;
            boolean bytesCountSufficient = true;
            while (lineEndsFromStart - 1 < lines) {
                int pos = startIndex - 1;
                while (pos >= 0 && bytes[pos] != '\n') {
                    --pos;
                }
                startIndex = pos; // -1 will do fine.
                ++lineEndsFromStart;
                if (pos < 0) {
                    bytesCountSufficient = false;
                    break;
                }
            }
            if (bytesCountSufficient || fileSize == bytesCount) {
                String text = new String(bytes, startIndex + 1,
                    bytes.length - (startIndex + 1), charSet);
                return text.split("\r?\n");
            }
            // Not bytesCountSufficient:
            //lineLengthHint += 10; // Average line length was larger.
            bytesCount += lineLengthHint * 4; // Try with more.
        }
    } finally {
        in.close();
    }
}

You could easily fashion a discarding queue which keeps only the last 10 lines. A LinkedList would be a good start for such an implementation. See this previous question on the topic.

This won't solve the problem of reading in the whole file, but getting around that means quite a bit more coding. You'd need a RandomAccessFile and search for the 10nth newline from the end. The appropriateness of this solution depends on how big the files are.


You could use a String array of size 10 and only always store the last 10 lines:

BufferedReader in = ...
String[] buffer = new String[10];
int bufferStartIndex = 0;
for (String line; (line = in.readLine()) != null;) {
    buffer[bufferStartIndex++ % buffer.length] = line;
}

At the end of the for-loop, bufferStartIndex will point to the first of the 10 last lines of the file. However if the file contains less than 10 lines, then you should reset bufferStartIndex to 0.


import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.LinkedList;
import java.util.Queue;

public class Test {
    private static Queue<String> bottom=new LinkedList<String>();
    private static int count=0;

    public static void main(String[] args) throws IOException{
        func(3);
    }

    //function to get count, bottom n lines
    private static void func(int n) throws IOException{
        FileInputStream fstream = new FileInputStream("abc.txt");
        BufferedReader br = new BufferedReader(new InputStreamReader(fstream));

        String strLine;

        //Read File Line By Line
        while ((strLine = br.readLine()) != null){
          count++;
          if(count<=n){
              //initialize bottom as top n 
              bottom.add(strLine);
          }else{
              bottom.remove();
              bottom.add(strLine);
               }
        }
        System.out.println(count);
        System.out.println(bottom.toString());
        br.close();    
    }
}

I have used Queue to get the bottom n lines. For further details you can visit: http://blog.everestkc.com.np


Need Your Help

“screen” somehow unmaps my arrow keys in emacs after a ^Z

emacs screen

Every time I use emacs, I can use the arrow keys just fine to move the cursor around. But when I'm running emacs within screen, and I push emacs to the background (ctrl-Z) then return it to the

Automatic resizing of controls in javafx2

java javafx-2

I am trying to add a ToolBar control, containing 2 Buttons and a TextField in my scene graph.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.