Topological order for a dependency subgraph

I'm looking for a variation of the standard topological sort algorithm which operates on a subset of nodes.

Consider a graph of labeled nodes with three types of directed edges: "depends on", "before", and "after".

The function I want accepts a subset of nodes and returns a linear ordering. The linear ordering obeys the "before" and "after" constraints, as well as treats "depends on" as "before" constraints. The nodes in the linear ordering should be a superset of the input nodes, such that dependencies are included.

Example graph:

A depends on B
B depends on C
D before C
E after C

X after Y can trivially be rewritten to Y before X

Test Cases:

f({A})     -> [C B A]
f({A D})   -> [D C B A]
f({B D E}) -> [D C B E] or [D C E B]

bonus points: Algorithm can also be configured to force a first and last node in the ordering.


Take the topological sort of the entire graph intersected with the union of the nodes of interest along with their dependencies. In pseudocode:

λ N = A ∩ (N ∪ D)

Where A is the ordered set of the topologically sorted graph, N is the subset of nodes you care about, and D is the dependencies of N. Note that the intersection operator must respect the ordering of A.

Or in Haskell (using numbers for nodes instead of letters, as in your example):

import Data.List (intersect, union)
import Data.Graph (buildG, reachable, topSort)

graph = buildG (0, 4) [(3,2), (2,4), (2,1), (1,0)]

dependencies = buildG (0, 4) [(0, 1), (1, 2)]

ordering = topSort graph

f nodes = ordering `intersect` (nodes `union` deps)
  where deps = concatMap (reachable dependencies) nodes

This assumes that you can specify all of the edges in your graph. Note that you only need to compute the total ordering once, so it should be performant over subsequent calls.

The above code will output:

> f [0]

> f [0, 3]

> f [1, 3, 4]

which matches your above test cases.

If for some reason you can't specify every edge in the graph, but rather just relative constraints, calculate (N ∪ D) as above and apply constraint satisfaction. The naive way to do this would be to try every permutation of those nodes until you find one that satisfies all constraints. Obviously, you can do it much more efficiently than that with even a simple depth-first & back-tracking approaching.

Edit: Depth-first code

Pretty simple. We create a tree of all the permutations of the nodes we care about, and then walk / prune that tree until we find a permutation that satisfies all of our constraints (Note that we append dependencies to constraints, because dependencies are constraints too). All constraints are specified in the form (A, B) which means "A must come after B".

Since we generate the permutations as a tree, rather than a list, we can easily prune large chunks of the search space as soon as a given path prefix violates a constraint.

import Data.Maybe (fromMaybe, isJust)
import Data.List (union, nub, elemIndex, find)
import Data.Tree (unfoldTree, Tree (Node))
import Control.Applicative (liftA2)

dependencies = [(0, 1), (1, 2)]

constraints = [(2, 3), (4, 2)] ++ dependencies

f nodes = search $ permutationsTree $ (deps `union` nodes)
  where deps = nub $ concatMap dependenciesOf nodes

search (Node path children)
  | satisfies && null children = Just path
  | satisfies = fromMaybe Nothing $ find isJust $ map search children
  | otherwise = Nothing
  where satisfies   = all (isSatisfied path) constraints
        constraints = constraintsFor path

constraintsFor xs = filter isApplicable constraints
  where isApplicable (a, b) = (a `elem` xs) && (b `elem` xs)

isSatisfied path (a, b) = fromMaybe False $ liftA2 (>) i1 i2
  where i1 = a `elemIndex` path
        i2 = b `elemIndex` path 

permutationsTree xs = unfoldTree next ([], xs)
  where next (path, xs)      = (path, map (appendTo path) (select xs))
        appendTo path (a, b) = (path ++ [a], b)

select [] = []
select (x:xs) = (x, xs) : map (fmap (x:)) (select xs)

dependenciesOf x = nub $ x : concatMap dependenciesOf deps
  where deps = map snd $ filter (\(a, b) -> a == x) dependencies

Most of the code is fairly straight forward, but here are a couple notes off the top of my head.

  • Computationally, this is vastly more expensive than the previously posted algorithm. Even with a more sophisticated constraint solver, you're unlikely to do better (as there isn't really any pre-computation you can do with constraints like this... at least none that are immediately obvious to me).

  • The 'f' function returns a Maybe, because there may not be a path that meets all of the constraints specified.

  • constraintsFor is responsible for about 43% of the total computation time. It's quite naive. We could do a few things to speed it up:

    1) Once a path satisfies a constraint, appending nodes to it can't make it violate that constraint, but we don't leverage this fact. Instead we just keep re-testing all relevant constraints to a given path, even if the constraint is known to have previously passed.

    2) We do a linear search over the constraints to find which ones are applicable. If, instead, we indexed them to the nodes they apply to, we could speed this up.

    3) Reducing the number of constraints to test would obviously also reduce the isSatisfied calls, which account for about 25% of the computation time.

  • If you were to implement code like this in an strictly executed environment, the code structure would have to be modified a bit. As is, this code relies heavily on the permutations tree being lazy, which allows us to not have to intertwine the searching code with the tree generating code because the search code simply won't go down a path it has deemed unfit.

Finally, if you want find all solutions, rather than only the first one, just change the search body to:

| satisfies && null children = [path]
| satisfies = concatMap search children
| otherwise = []

I didn't spend any time optimizing this code or anything like that, simply because the original algorithm is clearly superior assuming you're able to specify the complete graph (which, I believe, you can).

Need Your Help

Is explicitly clearing/zeroing sensitive variables after use sensible?

c linux security memory-management

I have noticed some programs explicitly zero sensitive memory allocations after use. For example, OpenSSL has a method to clear the memory occupied by an RSA key:

Linq: How can I get a result in a grouping query which is value agnostic and has dynamic columns?

c# linq

I have a table called Foo and I have two columns: Lorem and Ipsum, so the scheme of the table is: