Get information from HTML by identifying tags

All- I have never done this sort of thing before and am very confused. I have spent hours on Google looking for some example code or any hints but whenever I try to work with the available code I always get errors and end up more confused than when I started. So enough self pity and excuses. Lets get to the question: I have an app the "screen scraps" a website, right now Google. I have got it so I can display the html in a text view: My problem now is getting the information from this. All I want is the information in the title tag ("Google"). I have read about HTML parsing but was very confused when I read the information on TagSoup or others like that. Do I need those to do it or is there just some Java code I can write to pull out the title tag and make it into a string and than display it. If I do need an HTML parser, can someone give me some example code. I could not find any on their website. Here is the code I have so far:

public class MainActivity extends Activity {
String page;
String display;
Document doc;
@Override
public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);        
    new AddStringTask().execute();      
}
@Override
public boolean onCreateOptionsMenu(Menu menu) {
    getMenuInflater().inflate(R.menu.activity_main, menu);
    return true;
}    
class AddStringTask extends AsyncTask<Void, String, Void> {    
    @Override
    protected Void doInBackground(Void... unused) {
        DefaultHttpClient httpClient = new DefaultHttpClient();
        HttpGet httpGet = new HttpGet("http://www.google.com");
        ResponseHandler<String> resHandler = new BasicResponseHandler();
        try {
            page = httpClient.execute(httpGet, resHandler);             
        } catch (ClientProtocolException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onProgressUpdate(String... item) {           
    }       
    @Override
    protected void onPostExecute(Void unused) {  
        TextView google = (TextView) findViewById(R.id.google);         
        google.setText(page);       
    }       
}
}

Any sample code would be great because I need to see some examples. Thanks a lot for your time.

Answers


jsoup: Java HTML Parser should do the trick:

Document doc = Jsoup.connect("http://google.com/").get();
String docTitle = doc.title();

Then pass docTitle to google.setText()

You can find more examples here.


Need Your Help

My shell script for starting NGiNX is not working

shell ubuntu nginx ubuntu-12.04

I found this small shell script in a book... NGiNX works it just this script that just does not work. Because every time I do /etc/init.d/nginx start (that is where the file is) it sends me this me...

How to log warn only in log4j

java properties log4j warnings

In Struts 2 application we use log4j for logging. I want to log only warn but when i try to use in my log4j.properties

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.