# Efficiently querying MSSQL database

I have been given the task of getting some data from a MSSQL database. I am not the DB owner and I do not have the ability to make any changes or add any indices or anything. I have to work with what I have. (I think the DB designer was on drugs.)

The DB is accessed via a python script, but I will show pseudo code here as it's the SQL that important.

For this there are 5 items of data, let's call them A, B, C, D, and RecipeInstance. In the database, A, B, C, and D are concatenated and stored in a single column as A@B@C@D. There is a one to many relationship between 'A@B@C@D' and RecipeInstance.

My 2 tasks are:

1) Given A, B, C, and D get all the recipes

This is easy enough conceptually, but my query is very slow. Here's my query for this:

SELECT PDEName as recipe FROM RecipeInstance WHERE PdeInstanceId IN (SELECT DISTINCT PdeInstanceId FROM RecipeTableValue WHERE CellValue IN (SELECT DISTINCT PDEName FROM RunInstance WHERE PdeInstanceId IN (SELECT PdeInstanceId FROM RunTableValue WHERE CellValue = 'A@B@C@D')))

This query takes 16 seconds. I really need to make it faster. I tried breaking it down into 4 seperate queries, but together they still took 16 seconds. There are no useful indices on these tables, and I cannot create any. Can anyone think of anyway to make this faster?

2) Given A, B, C, and Recipe get D

This is more complicaed, since there's no relationship back from RecipeInstance to TargetInstance where D is. Here is what I came up with:

select PdeName as TargetPdeName FROM TargetInstance WHERE PdeName like 'A@B@C@%' # this query returns between 20,000 and 40,000 rows foreach TargetPdeName returned from the above query SELECT PDEName as RecipePdeName FROM RecipeInstance WHERE PdeInstanceId IN (SELECT DISTINCT PdeInstanceId FROM RecipeTableValue WHERE CellValue IN (SELECT DISTINCT PDEName FROM RunInstance WHERE PdeInstanceId IN (SELECT PdeInstanceId FROM RunTableValue WHERE CellValue = TargetPdeName))) if RecipePdeName == Recipe: # this is the one we want (a, b, c, d) = TargetPdeName.split('@') return d

So the problem here is obviously that I have to run tens of thousands of queries, each one taking 16 seconds. Can anyone see how I can traverse this relationship backwards in an efficient manner?

## Answers

Below are JOIN and EXISTS queries. Try both and let us know how they run.

1)

JOIN version

SELECT DISTINCT reci.PDEName as recipe FROM RecipeInstance reci JOIN RecipeTableValue rectv ON reci.PdeInstanceId = rectv.PdeInstanceId JOIN RunInstance runi ON rectv.CellValue = runi.PDEName JOIN RunTableValue runtv ON runi.PdeInstanceId = runtv.PdeInstanceId WHERE runtv.CellValue = 'A@B@C@D'

EXISTS version

SELECT PDEName as recipe FROM RecipeInstance reci WHERE EXISTS ( SELECT * FROM RecipeTableValue rectv WHERE rectv.PdeInstanceId = reci.PdeInstanceId AND EXISTS ( SELECT * FROM RunInstance runi WHERE runi.PDEName = rectv.CellValue AND EXISTS ( SELECT * FROM RunTableValue runtv WHERE runi.PdeInstanceId = runtv.PdeInstanceId AND CellValue = 'A@B@C@D' ) ) )

2) **EDIT:** To split ti.PdeName by @ and extract the last value you'll need to define your own function. See How do I split a string so I can access item x

JOIN version

SELECT DISTINCT ti.PdeName FROM RecipeInstance reci JOIN RecipeTableValue rectv ON reci.PdeInstanceId = rectv.PdeInstanceId JOIN RunInstance runi ON rectv.CellValue = runi.PDEName JOIN RunTableValue runtv ON runi.PdeInstanceId = runtv.PdeInstanceId JOIN TargetInstance ti ON runtv.CellValue = ti.PdeName WHERE reci.PDEName = "MyRecipe"

EXISTS version

SELECT ti.PdeName FROM TargetInstance ti WHERE EXISTS ( SELECT * FROM RunTableValue runtv WHERE runtv.CellValue = ti.PdeName AND EXISTS ( SELECT * FROM RunInstance runi WHERE runi.PdeInstanceId = runtv.PdeInstanceId AND EXISTS ( SELECT * FROM RecipeTableValue rectv WHERE rectv.CellValue = runi.PDEName AND EXISTS ( SELECT * FROM RecipeInstance reci WHERE reci.PdeInstanceId = rectv.PdeInstanceId AND reci.PDEName = "MyRecipe" ) ) ) )