Machine learning example- determine if a website is business or personal

I have a machine learning problem. I am given a long list of domains and I have to figure out which are ecommerce websites and which are personal websites. It is kind of a difficult problem because I do not have any training data to work with. I have come up with a couple ideas:

  1. Go through a couple hundred of these websites manually to tell if they are business or personal and develop a training set this way (Long and boring!).

  2. Crawl these websites and search for some keywords eg. "Buy Now", "Price", "Credit Card". etc.

Does anybody have any other approaches?

Thanks

Answers


You could adaptively modify your keyword sets: As you crawl around, a word that correlates highly with existing keywords can be added to the list. Peter p.s. I would add this as a comment but I don't have enough reputation points...


Need Your Help

Is there a way to avoid errors when releasing an object while code is still running?

cocoa-touch memory-management

I have a UIView subclass that draws itself when -drawRect: is called. It only takes a moment, but under extreme circumstances, such as low memory and deletion of the instance when going to another...

Geb - perform Ctrl + P

grails selenium geb

is it possible to perform CtrlP click with Geb? I tried it like this, but it doesn't work as I expected:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.