Get text between `<pre>` and `</pre>` into an ArrayList

I need to get some specific text out of a string and get that into an arraylist, but I have no idea where to start. The string looks like this:

String exampleString = "some text I don't know <pre>the text I want to get</pre><pre>Some more text I want to get</pre> some text I don't know"

But the problem is that I don't know how many text sections there are with <pre> text </pre> it's even possible that there aren't any of those sections at all.

So could anyone tell me how to get the texts between those <pre> and </pre> and how to get those into an arraylist.

Thank you so much!

UPDATE: What I do know about the text from which I said "some text I don't know" is that it doesn't contain <pre> or </pre>

Answers


try {
    Pattern pattern = Pattern.compile("<pre>(.+?)</pre>");
    Matcher matcher = pattern.matcher(yourText);

    while (matcher.find()) {
        //  matcher.group() will contain the match from the previous find() statement
    }
}
catch(Exception ex){}

edited: corrected regex syntax


Assuming there's no embedded tags you can do something like this:

private List<String> getText(String text){

    List<String> result = new ArrayList<String>();

    String[] sections = text.split("<pre>");
    int i = 0;
    for (String s : sections) {
        i = s.indexOf("</pre>");
        if (i >= 0)          
           results.add(s.substring(0, i));        
    }  
    return result;
}

Example of code running when

say:

text = "test text here <pre> item one </pre> and then another item <pre> item 2 </pre> and then some stuff."

So the first thing to explain is:

String[] sections = text.split("<pre");

This defines a new array of strings and assigns it to the result of a call to the String split function of "text"

This function breaks the string up in to sections delimited by "<pre>" so you get:

sections[0] = "test text here" 
sections[1] = "item one </pre> and then another item"
sections[2] = "item 2 </pre> and then some stuff."

so as you can see from that all we now need to do is remove anything after "</pre>" which is where the next bit comes in:

for (String s : sections)

Is the start of a "for each" loop that assigns the String s to each element of the array sections in turn.

So for each of the 3 strings above we do this:

 i = s.indexOf("</pre>");
    if (i >= 0)          
       results.add(s.substring(0, i));

So if the string contains </pre> then take a substring from the begining up until the "</pre>" and add it to our results. Since sections[1] and sections[2] so contain it they will end up in the results.

I hope this helps?


Here's how i'd implement JavaJugglers solution to avoid using while (true):

private List<String> getText(String text){
    List<String> result = new ArrayList<String>();

    int indexStart = text.indexOf("<pre>");
    int indexEnd = text.indexOf("</pre>");
    while (indexStart >= 0 && indexEnd > indexStart) {
        result.add(text.substring(indexStart + 5, indexEnd));
        text = text.substring(indexEnd + 6);
        indexStart = text.indexOf("<pre>");
        indexEnd = text.indexOf("</pre>");
    }

    return result;
}

Need Your Help

unresolved token - c++

c++-cli

I am trying to solve a lesson in my study. I am going to have an abstract class, CFigure, on top and different figures below, currently I have made a circle class. I am going to call this from a C#

Local HTML File - WebBrowser - Windows phone 7

wpf silverlight c#-3.0 webbrowser-control

I need help in displaying HTML File in webbrowser in Windows phone 7 app.