Problem Statement

There exists two very large binary trees, with one tree, `T1` relatively much bigger than the other `T2`. How can we efficiently determine if `T2` is a sub tree of `T1`?

Note: A tree `T2` is a sub tree of `T1` if and only if there is a node in `T1` which when cut off from `T1`, the tree rooted at that node is exactly identical to `T2` in terms of data and structure.

Example

``````T1 =                     T2 =                T3 =
5                 3                   3
/   \              / \                 / \
3     6            2   4               1   4
/ \                /                     \
2   4              1                       2
/
1

T2 is a sub tree of T1 but T3 is not.
``````

Explanation

Before we determine if `T2` is a sub tree of `T1`, let’s ask ourselves if we know how can we determine if a string `S2` is a sub string of a string `S1`? There is a very efficient pattern matching algorithm that exist, it is also one that I am yet to get a good grasp on. It is the Knuth-Morris-Pratt algorithm.

I will cover the `KMP` algorithm at some other time, but for now we can leverage the in built sub string function available in most modern languages.

Okay, now can we reduce the sub tree problem to a sub string problem? Yes! If we traverse `T1` and store it’s result as a string, then all we have to do is traverse `T2`, store it’s result as a string and check if `T2.string` is a sub string of `T1.string`. But how?

Traversals

Recall, there are 3 ways to traverse a tree. In-order, Pre-order and Post-order. Let’s see how these traversal techniques might help us.

In-order

In-order traversal follows the `left-root-right` approach. So for `T1`, the in-order string would be `123456`, for `T2` it would be `1234` and `T3` would be `1234`.

The problem with this is that if the tree is a `Binary Search Tree`, the in-order traversal will always result in the same string, that is in ascending order. Hence, if we used this traversal, we would assert that `T3` is a sub tree of `T1` but by definition it is not!

Pre-order

Pre-order traversal follows the `root-left-right` approach. So for T1, we get `532146`, T2 is `3214` and T3 is `3124`. This works! Or does it? 🤔

According to the definition above, there must be a node in `T1` which when cut off from `T1`, the tree rooted at this node is identical to `T2`. So for the algorithm to proceed, it should find a node in `T1` whose value is equal to the value of the root in `T2`. Which means the root needs to be processed before any other node and that’s exactly what pre-order traversal does!

``````    1       1
/         \
2           2
``````

For the two examples above, the pre-order traversal will yield the same result, i.e, `12`, but by definition they’re not the same.

What if we noted the position of the `NULL` pointer and represented it by `X`? What is the pre-order result now? `12X` and `1X2`.

Using this approach, the pre-order traversals for `T1`, `T2` and `T3` now become `5321X4XX6XX`, `321X4XX` and `31X24XX` respectively. Now we can use our sub string approach and be absolutely sure that we’re getting the correct result.

Time Complexity

Let’s assume `T1` has `N` elements and `T2` has `M` elements. We visit each element in `T1` one time making it a `O(N)` process and we visit each element in `T2` one time making it a `O(M)` process making the traversal a `O(N + M)` process. The KMP algorithm is linear in terms of the size of the string and the pattern to match as well.

Space Complexity

Since we are building and storing strings for the traversal, the space complexity is `O(N + M)` as well.

But T1 and T2 are **very large** trees! Can we afford this extra space? Can we use recursion? Yes!

Recursive Approach

We can solve this problem recursively as well. All we would have to do is start `DFS` on `T1` and find the node that is equivalent to the root of `T2` and then start another recursion to match the two trees starting at that point.

The code to do this as follows:

``````func matchTrees(t1: Node?, t2: Node?) -> Bool {
if t1 == nil && t2 == nil {
return true
}

if t1 == nil || t2 == nil {
return false
}

if t1?.value != t2?.value {
return false
}

return matchTrees(t1: t1?.left, t2: t2?.left) && matchTrees(t1: t1?.right, t2: t2?.right)
}

func findRoot(t1: Node?, t2: Node?) -> Bool {
if t1 == nil {
return false
}

if t1?.value == t2?.value {
return matchTrees(t1: t1, t2: t2)
}

return findRoot(t1: t1?.left, t2: t2) || findRoot(t1: t1?.right, t2: t2)
}

func checkSubTree(t1: Node?, t2: Node?) -> Bool {
if t2 == nil {
return true
}

return findRoot(t1: t1, t2: t2)
}
``````
Time Complexity

With the recursive approach, in the worst case, we may have to visit each and every node of `T1` once to find the root of `T2` in `T1`, which is `O(N)` time. Once the root is found, we traverse `T2` as long as the data in the corresponding pointers match making it `O(N + kM)` where `k` is the number of times the pointers match.

Space Complexity

While we don’t maintain an additinoal data structure, we definitely do build up a recursive stack frame which is definitely smaller than `O(N+M)`.

Solution is hosted on Github.