How to combine different NLP features for machine learning?

I'm trying to do some KNN learning using different NLP features. For example, I want to use bag-of-words and local POS tags.

Separately, I have some idea of how to calculate similarity with a single feature. Like using cosine similarity with counts (for bag-of-words vectors), or using perhaps Hamming distance for POS tags.

However, I don't know how to combine the two. How do people in this area normally do this? Could anyone help me with that?

Thanks in advance.


I would use a simple linear combination of both features. So you individually compare the bag-of-words vectors using cosine similarity and using the Hamming distance for the POS tags, and then take the average of both outcomes. So if cosine comparison and Hamming distance results in the following ranks:

rank score    cosine    Hamming
1             red       blue
2             blue      yellow
3             yellow    orange
4             orange    red

Then the final ranking (given the ranking score above which you can change of course to, e.g., an exponential scale if you want to put more emphasis on the higher ranked labels) will be as follows (with lower score being better):

label    total score
blue     3
red      5
yellow   5
orange   7

So the output label would be blue. In this case the linear combination puts 50% weight on the cosine similarity output and 50% weight on the Hamming distance output. You can perform tests with different weights (e.g., 70% cosine, 30% Hamming) to find the optimal balance between both measures.

Need Your Help

null listener after pause and resume fragment

android android-fragments nullpointerexception listener onpause

i am writing a communication application. i have an issue that i have 2 activity (activity1 and activity2) in activity1 i have a ViewPager, first fragment in it is FriendsFragment that show list of

Debug.Assert vs. Specific Thrown Exceptions

c# exception-handling assert

I've just started skimming 'Debugging MS .Net 2.0 Applications' by John Robbins, and have become confused by his evangelism for Debug.Assert(...).

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.