Novice SQL query question for a movie ratings database

I have a database with one table, like so:

UserID (int), MovieID (int), Rating (real)

The userIDs and movieIDs are large numbers, but my database only has a sample of the many possible values (4000 unique users, and 3000 unique movies)

I am going to do a matrix SVD (singular value decomposition) on it, so I want to return this database as an ordered array. Basically, I want to return each user in order, and for each user, return each movie in order, and then return the rating for that user, movie pair, or null if that user did not rate that particular movie. example:

USERID | MOVIEID | RATING
-------------------------
99835   8847874    4
99835   8994385    3
99835   9001934    null
99835   3235524    2
           .
           .
           .
109834  8847874    null
109834  8994385    1
109834  9001934    null

etc

This way, I can simply read these results into a two dimensional array, suitable for my SVD algorithm. (Any other suggestions for getting a database of info into a simple two dimensional array of floats would be appreciated)

It is important that this be returned in order so that when I get my two dimensional array back, I will be able to re-map the values to the respective users and movies to do my analysis.

Answers


SELECT m.UserID, m.MovieID, r.Rating
    FROM (SELECT a.userid, b.movieid
              FROM (SELECT DISTINCT UserID FROM Ratings) AS a,
                   (SELECT DISTINCT MovieID FROM Ratings) AS b
         ) AS m LEFT OUTER JOIN Ratings AS r
         ON (m.MovieID = r.MovieID AND m.UserID = r.UserID)
    ORDER BY m.UserID, m.MovieID;

Now tested and it seems to work!

The concept is to create the cartesian product of the list of UserID values in the Ratings table with the list of MovieID values in the Ratings table (ouch!), and then do an outer join of that complete matrix with the Ratings table (again) to collect the ratings values.

This is NOT efficient.

It might be effective.

You might do better though to just run the plain simple select of the data, and arrange to populate the arrays as the data arrives. If you have many thousands of users and movies, you are going to be returning many millions of rows, but most of them are going to have nulls. You should treat the incoming data as a description of a sparse matrix, and first set the matrix in the program to all zeroes (or other default value), and then read the stream from the database and set just the rows that were actually present.

That query is the basically trivial:

SELECT UserID, MovieID, Rating
    FROM Ratings
    ORDER BY UserID, MovieID;

Need Your Help

How to set frame of CCSprite

java android cocos2d-iphone cocos2d-android

I have class of character in which is CCSprite variable. Character is moving to point and animating walking animation after touching button or jumping, everything works fine, but after animation CC...

Choose to create common functions or class private member functions

c++ function design

Say I have a function named foo(), and a class A. For class A, it will use foo() for some purpose, but foo() won't use any attribute of class A.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.