Efficiently querying MSSQL database

I have been given the task of getting some data from a MSSQL database. I am not the DB owner and I do not have the ability to make any changes or add any indices or anything. I have to work with what I have. (I think the DB designer was on drugs.)

The DB is accessed via a python script, but I will show pseudo code here as it's the SQL that important.

For this there are 5 items of data, let's call them A, B, C, D, and RecipeInstance. In the database, A, B, C, and D are concatenated and stored in a single column as A@B@C@D. There is a one to many relationship between 'A@B@C@D' and RecipeInstance.

My 2 tasks are:

1) Given A, B, C, and D get all the recipes

This is easy enough conceptually, but my query is very slow. Here's my query for this:

SELECT PDEName as recipe
FROM RecipeInstance
WHERE PdeInstanceId
IN (SELECT DISTINCT PdeInstanceId FROM RecipeTableValue WHERE CellValue
IN (SELECT DISTINCT PDEName FROM RunInstance WHERE PdeInstanceId
IN (SELECT PdeInstanceId FROM RunTableValue WHERE CellValue = 'A@B@C@D')))

This query takes 16 seconds. I really need to make it faster. I tried breaking it down into 4 seperate queries, but together they still took 16 seconds. There are no useful indices on these tables, and I cannot create any. Can anyone think of anyway to make this faster?

2) Given A, B, C, and Recipe get D

This is more complicaed, since there's no relationship back from RecipeInstance to TargetInstance where D is. Here is what I came up with:

select PdeName as TargetPdeName
FROM TargetInstance
WHERE PdeName like 'A@B@C@%'

# this query returns between 20,000 and 40,000 rows

foreach TargetPdeName returned from the above query
    SELECT PDEName as RecipePdeName
    FROM RecipeInstance
    WHERE PdeInstanceId
    IN (SELECT DISTINCT PdeInstanceId FROM RecipeTableValue WHERE CellValue
    IN (SELECT DISTINCT PDEName FROM RunInstance WHERE PdeInstanceId
    IN (SELECT PdeInstanceId FROM RunTableValue WHERE CellValue = TargetPdeName)))

    if RecipePdeName == Recipe:
        # this is the one we want
        (a, b, c, d) = TargetPdeName.split('@')
        return d

So the problem here is obviously that I have to run tens of thousands of queries, each one taking 16 seconds. Can anyone see how I can traverse this relationship backwards in an efficient manner?

Answers


Below are JOIN and EXISTS queries. Try both and let us know how they run.

1)

JOIN version

SELECT DISTINCT reci.PDEName as recipe
FROM RecipeInstance reci
JOIN RecipeTableValue rectv ON reci.PdeInstanceId = rectv.PdeInstanceId
JOIN RunInstance runi ON rectv.CellValue = runi.PDEName
JOIN RunTableValue runtv ON runi.PdeInstanceId = runtv.PdeInstanceId 
WHERE runtv.CellValue = 'A@B@C@D'

EXISTS version

SELECT PDEName as recipe
FROM RecipeInstance reci
WHERE EXISTS (
    SELECT * FROM RecipeTableValue rectv 
    WHERE rectv.PdeInstanceId = reci.PdeInstanceId
    AND EXISTS (
        SELECT * FROM RunInstance runi 
        WHERE runi.PDEName = rectv.CellValue
        AND EXISTS (
            SELECT * FROM RunTableValue runtv 
            WHERE runi.PdeInstanceId = runtv.PdeInstanceId
            AND CellValue = 'A@B@C@D'            
        )
    )
)

2) EDIT: To split ti.PdeName by @ and extract the last value you'll need to define your own function. See How do I split a string so I can access item x

JOIN version

SELECT DISTINCT ti.PdeName
FROM RecipeInstance reci
JOIN RecipeTableValue rectv ON reci.PdeInstanceId = rectv.PdeInstanceId
JOIN RunInstance runi ON rectv.CellValue = runi.PDEName
JOIN RunTableValue runtv ON runi.PdeInstanceId = runtv.PdeInstanceId 
JOIN TargetInstance ti ON runtv.CellValue = ti.PdeName
WHERE reci.PDEName = "MyRecipe"

EXISTS version

SELECT ti.PdeName
FROM TargetInstance ti
WHERE EXISTS (
    SELECT * FROM RunTableValue runtv
    WHERE runtv.CellValue = ti.PdeName
    AND EXISTS (
        SELECT * FROM RunInstance runi
        WHERE runi.PdeInstanceId = runtv.PdeInstanceId 
        AND EXISTS (
            SELECT * FROM RecipeTableValue rectv
            WHERE rectv.CellValue = runi.PDEName
            AND EXISTS (
                SELECT * FROM RecipeInstance reci
                WHERE reci.PdeInstanceId = rectv.PdeInstanceId
                AND reci.PDEName = "MyRecipe"
            )
        )
    )
)

Need Your Help

np.average not working when data is missing when using pandas groupby

python numpy pandas weighted-average

I'm getting stuck at computing weighted averages with pandas groupby and numpy's np.average. The problem seems to be the missings in the data (that is missings; in the data, not in the weigths). I ...

Copying data from one sqlite db to another

database sqlite sqlite3

I have 2 sqlite databases with common data but with different purposes and I wanted to avoid reinserting data, so I was wondering if it was possible to copy a whole table from one DB to another. I ...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.