Using Java to read and process textfile with custom column and row separators

I have a text file which contains content scraped from webpages. The text file is structured like this:

|NEWTAB|lkfalskdjlskjdflsj|NEWTAB|lkjsldkjslkdjf|NEWTAB|sdlfkjsldkjf|NEWLINE|lksjlkjsdl|NEWTAB|lkjlkjlkj|NEWTAB|sdkjlkjsld

|NEWLINE| indicates the start of a new line (i.e., a new row in the data) |NEWTAB| indicates the start of a new field within a line (i.e. a new column in the data)

I need to split the text file into fields and lines and store in an array or some other data structure. Content between |NEWLINE| strings may contain actual new lines (i.e. \n), but these don't indicate an actual new row in the data.

I started by reading each character in one by one and looking at sets of 8 consecutive characters to see if they contained |NEWTAB|. My method proved to be unreliable and ugly. I am looking for the best practice on this. Would the best method be to read the whole text file in as a single string, and then use a string split on "|NEWLINE|" and then string splits on the resulting strings using "|NEWTAB|"?

Many thanks!

Answers


I think that the other answers will work too, but my solution is as follows:

FileReader inputStream = null;
StringBuilder builder = new StringBuilder();

try {
    inputStream = new FileReader(args[0]);
    int c;
    char d;

    while ((c = inputStream.read()) != -1) {
        d = (char)c;
        builder.append(d);
    }
} 

finally {
    if (inputStream != null) {
        inputStream.close();
    }
}   

String myString = builder.toString();

String rows[] = myString.split("\\|NEWLINE\\|");

for (String row : rows) {
    String cols[] = row.split("\\|NEWTAB\\|");

    /* do something with cols - e.g., store */
}

You could do something like this:

Scanner scanner = new Scanner(new File("myFile.txt"));

List<List<String>> rows = new ArrayList<List<String>>();
List<String> column = new ArrayList<String>();

while (scanner.hasNext()) {

    for (String elem : scanner.nextLine().split("\\|")) {

    System.out.println(elem);

    if (elem.equals("NEWTAB") || elem.equals(""))
        continue;
    else if (elem.equals("NEWLINE")) {
        rows.add(column);
        column = new ArrayList<String>();
    } else
        column.add(elem);
    }       
}

Took me a while to write it up, since I don't have IntelliJ or Eclipse on this computer and had to use Emacs.

EDIT: This is a bit more verbose than I like, but it works with |s that are part of the text:

Scanner scanner = new Scanner(new File("myFile.txt"));
List<List<String>> rows = new ArrayList<List<String>>();
List<String> lines = new ArrayList<String>();
String line = "";

while (scanner.hasNext()) {     
    line += scanner.nextLine();
    int index = 0;      
    while ((index = line.indexOf("|NEWLINE|")) >= 0) {
        lines.add(line.substring(0, index));
        line = line.substring(index + 9);       
    }       
}

if (!line.equals(""))
    lines.add(line);

for (String l : lines) {
    List<String> columns = new ArrayList<String>();
    for (String column : l.split("\\|NEWTAB\\|"))
        if (!column.equals(""))
            columns.add(column);
    rows.add(columns);
}

Need Your Help

Strategy for dealing with large db tables

mysql ruby-on-rails ruby database

I'm looking at building a Rails application which will have some pretty

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.