Storing Searching Records Based on Multiple Conditions
I have a table of people quite standard stuff e.g. :
CREATE TABLE [Contact]( [ContactID] [bigint] IDENTITY(1,1) NOT NULL, [ContactType] [nvarchar](50) NULL, [Forename] [nvarchar](60) NULL, [Surname] [nvarchar](60) NULL, [Company] [nvarchar](60) NULL } Example Data : 01, "Student", "Bob", "Smith", Blank 02, "Staff", "Robert", "Smithe", "Roberts And Sons" Etc
This table contains all the fields common to all contacts. However i have some "types" of contact that may or may not have a field which is specic only to that type of contact. For example if the record has "ContactType='student'" i want to store an extra field called "studentid". There are many different types of contact each with slightly different field requirements. To add to this situation at somepoint in the future each contact type might have extra fields added.
If i add each field to the contacts table i would end up with lots of fields which are not required for 99% of the records. So i was planning on creating a second table like this:
CREATE TABLE [ContactMetaData]( [ContactID] [bigint] NOT NULL, [PropName] [nvarchar](200) NOT NULL, [PropData] [nvarchar](200) NULL ) Example Data: 01, "StudentID", "0123456" 01, "CourseName", "IT" 01, "Average", "10" 02, "Ranking", "22" 02, "ProductTypes", "IT Equipment" ETC
For each extra field i just add a record into this table with the name and value for the field. I can use code to pull this information up etc.
My question is
Is this the best approach as i'm stumped of another way other than a huge table with every single field. Given this approach is it possible to do complex querys across many of the property fields and if so how? e.g. how would i list all the students on the "IT" course with an "Average" of 10 whos "forename" starts with "D"?
Yes, if you really have many many fields that are not used, go for this approach (called an EAV data structure). You can do all queries on this structure that you can with a normal model, just use appropriate joins to pivot them.
Your proposed property-based approach is reasonable, especially if there is a need to create new contact types and properties at run-time without changing the schema.
If the set of contact types and properties is fixed at run-time, then you might alternatively consider creating a subsidiary table for each type. That is, create tables like StudentInfo, StaffInfo, etc. If you did this, you would no longer need the ContactType field as it would be implicit in the presence of a row in the corresponding subsidiary table. This model would also handle awkward situations where someone falls into two categories, e.g. a student who is also staff.
Ignoring all that, however, let's look at your second question about querying the property tables. You can perform such queries. Your example would be answered by this query:
SELECT * FROM Contact AS c INNER JOIN ContactMetaData AS crs ON crs.ContactId = c.ContactId AND crs.PropName = 'CourseName' AND crs.PropData = 'IT' INNER JOIN ContactMetaData AS av ON av.ContactId = c.ContactId AND av.PropName = 'Average' AND av.PropData = 10 WHERE c.Forename LIKE 'D%'
The trick is to join to the ContactMetaData table multiple times, once for each custom property you wish to test.
A problem with this style of query is that it will almost certainly have to be generated at run-time -- after all, the set of properties is dynamic at run-time. You can avoid this by taking a different approach to expressing the query:
DECLARE @propertyCriteria TABLE ( PropName NVARCHAR(200) NOT NULL, PropData NVARCHAR(200) NULL ) INSERT INTO @propertyCriteria VALUES ('CourseName', 'IT') INSERT INTO @propertyCriteria VALUES ('Average', '10') SELECT * FROM Contact AS c WHERE c.Forename LIKE 'D%' AND NOT EXISTS ( SELECT * FROM @propertyCriteria AS crit LEFT JOIN ContactMetaData AS meta ON meta.ContactId = c.ContactId AND meta.PropName = crit.PropName AND meta.PropData = crit.PropData WHERE meta.ContactId IS NULL )
This has the advantage that the query is now fixed at run-time as the dynamic property criteria are determined by what is inserted into the temporary table @propertyCriteria. The disadvantages are that there is now a need for the temporary table, and that the query will not perform as well as it did with inner joins (although you probably won't notice the difference if your database only has 50,000 records in it). Also note that this second approach only works for criteria which are ANDed together. If you want ORs, the solution becomes even more complex.
Incidentally, if you are going to allow NULLs in both the property values and the criteria, don't forget to account for the fact that comparison operators involving NULLs always return false (i.e. NULL = NULL is false, NULL <> NULL is false, etc).