Is there any gain from storing comments and replies on separate tables instead of a unary relation on a comment table?

Choice 1: comments {commentid,replyto,comment} //replyto will be null on many posts

Choice 2: comments {commentid,comment} replies {replyid, replyto, reply}

It looks like a matter of choice rather than linear benefit analysis at the moment.


In fact this is not 'only' matter of choice, but aware decision. Relational databases are not good at solving problems of hierarchical nature. There were tons of discussions, articles, and even books about that, so lets narrow the problem to your case.

The second choice would work fine ONLY if you were to allow replies to comments, and not to replies itself, thus this would be a tree with maximum 2 levels. That might be ok, but if you were to do that better solution would be to place everything with COMMENTS table, and add two columns: THREAD_ID (all the comments with the same THREAD_ID would belong to same thread), SEQ_NUM (or simply DATE would tell us which comment was first). Similar way of organising comments is implemented here on SO.

The first choice is quite simple and generic - but implements recurention with all its cons. Lets stop a bit and think... note that we are actually NOT building a tree, but a 'forest'. We will have many commen threads and every single thread will be a separate tree - relatively small amount of data to organise. In that case I would add a THREAD_ID column to COMMENTS table and use only that table (it would be also good to set an composite index on COMMENTS table containing THREAD_ID and COMMENTID columns - in exactly that order).

So upon above I would choose "choice 1".

Next decision should be about where to do the processing and comment tree construction? I would just get all the comments from the table an organise them on a controller (MVC) side, i.e. JAVA or C++. Traversing the list of comments and building the tree in Main Memory (using objects and pointers or hash tables) would be an easy thing. It is a good option also because small amount of nodes (comments and replies within one thread).

The first option looks like a simple one, but the problem is that you're building a tree-structure in SQL. and SQL does not support hierarchical data.

Not recommended - ever

TABLE comment
id unsigned integer auto_increment primary key,
reply_to unsigned integer, 
comment text,
foreign key FK_comment_reply_to(reply_to) references 

Recommended - if you want a tree 2 levels deep

If you build it using 2 tables

TABLE main_post
id unsigned integer auto_increment primary key,
body text,

TABLE reply
id unsigned integer auto_increment primary key,
reply_to unsigned integer, 
body text,
foreign key FK_reply_reply_to(reply_to) references 

Then you are building a much simpler structure that can be easily queried in SQL because the tree is only 1 level deep. For this reason I'd recommend choice number 2.

Alternatives for deeper trees

If you want a hierarchical structure I'd look at nested sets insteads, see:

Need Your Help

Complexity Classes for a List

c++ arrays list vector

Why is it that for removing something from an array or vector is O(n) complex but when removing something from a list it is O(1) complexity?

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.