Kolmogorov-Smirnov or a Chi-Square test for a distribution?

I used model fitting to fit the negative binomial distribution to my discrete data. As a final step it looks like I need to perform a Kolmogrov-Smirnov test to determine if the model fits the data well. All the references I could find talk about using the test for normally distributed continuous data. Can someone tell me if this can be done in R for data that is not normally distributed and discrete? (Even a chi-square test should do I'm guessing but please correct me if I'm wrong.)

UPDATE:

So I found that the vcd package contains a function goodfit that can be used for this purpose in the following way:

library(vcd)

# Define the data
data <- c(67, 81, 93, 65, 18, 44, 31, 103, 64, 19, 27, 57, 63, 25, 22, 150,
          31, 58, 93, 6, 86, 43, 17, 9, 78, 23, 75, 28, 37, 23, 108, 14, 137,
          69, 58, 81, 62, 25, 54, 57, 65, 72, 17, 22, 170, 95, 38, 33, 34, 68,
          38, 117, 28, 17, 19, 25, 24, 15, 103, 31, 33, 77, 38, 8, 48, 32, 48,
          26, 63, 16, 70, 87, 31, 36, 31, 38, 91, 117, 16, 40, 7, 26, 15, 89,
          67, 7, 39, 33, 58)

gf <- goodfit(data, type = "nbinomial", method = "MinChisq") 
plot(gf)

But after the gf <- ... step, R complains saying:

Warning messages:
1: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced
2: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced
3: In pnbinom(q, size, prob, lower.tail, log.p) : NaNs produced

and when I say plot it complains:

Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' is a list, but does not have components 'x' and 'y'

I am not sure what is happening because if I set data to be the following:

data <- <- rnbinom(200, size = 1.5, prob = 0.8)

everything works fine. Any suggestions?

Answers


A KS-Test is for continuous variables only, plus you have to fully specify the distribution you are testing against. If you still wanted to do it, it would look like this:

ks.test(data, pnbinom, size=100, prob=0.8)

It compares the empirical cumulative distribution function of data against the specified one (whether that makes sense probably depends on your data). You would have to choose parameters for size and prob based on theoretical considerations, the test is not valid if you estimate those parameters based on the data.

Your problem with goodfit() might have to do with your data, are you sure these are counts? barplot(table(data)) does not look like it's approximately following a negative binomial distribution, compare, e.g., with barplot(table(rnbinom(200, size = 1.5, prob = 0.8)))

Finally, I'm not sure if the approach of doing a null-hypothesis test after fitting is appropriate. You may want to look into quantitative fit measures beyond / based on $\chi^2$ of which there are many (RMSEA, SRMR, ...).


Need Your Help

Can't access php files on local apache server

php apache apache2 hhvm

I have setup a local apache2 server, and it works and display correctly for any html files located at home/user/website, but it gives me an 404 error when i try to access php files.

getJSON doesn't work on the server but does locally

javascript jquery json getjson

I am using jQuery for a cascading check box, but the getJSON command dosen't work on the server (locally it works fine). It couldn't find the data.json file (see error debug).

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.