Problem Statement
There exists two very large binary trees, with one tree, T1
relatively much bigger than the other T2
. How can we efficiently determine if T2
is a sub tree of T1
?
Note: A tree T2
is a sub tree of T1
if and only if there is a node in T1
which when cut off from T1
, the tree rooted at that node is exactly identical to T2
in terms of data and structure.
Example
T1 = T2 = T3 =
5 3 3
/ \ / \ / \
3 6 2 4 1 4
/ \ / \
2 4 1 2
/
1
T2 is a sub tree of T1 but T3 is not.
Explanation
Before we determine if T2
is a sub tree of T1
, let’s ask ourselves if we know how can we determine if a string S2
is a sub string of a string S1
? There is a very efficient pattern matching algorithm that exist, it is also one that I am yet to get a good grasp on. It is the Knuth-Morris-Pratt algorithm.
I will cover the KMP
algorithm at some other time, but for now we can leverage the in built sub string function available in most modern languages.
Okay, now can we reduce the sub tree problem to a sub string problem? Yes! If we traverse T1
and store it’s result as a string, then all we have to do is traverse T2
, store it’s result as a string and check if T2.string
is a sub string of T1.string
. But how?
Traversals
Recall, there are 3 ways to traverse a tree. In-order, Pre-order and Post-order. Let’s see how these traversal techniques might help us.
In-order
In-order traversal follows the left-root-right
approach. So for T1
, the in-order string would be 123456
, for T2
it would be 1234
and T3
would be 1234
.
The problem with this is that if the tree is a Binary Search Tree
, the in-order traversal will always result in the same string, that is in ascending order. Hence, if we used this traversal, we would assert that T3
is a sub tree of T1
but by definition it is not!
Pre-order
Pre-order traversal follows the root-left-right
approach. So for T1, we get 532146
, T2 is 3214
and T3 is 3124
. This works! Or does it? 🤔
According to the definition above, there must be a node in T1
which when cut off from T1
, the tree rooted at this node is identical to T2
. So for the algorithm to proceed, it should find a node in T1
whose value is equal to the value of the root in T2
. Which means the root needs to be processed before any other node and that’s exactly what pre-order traversal does!
1 1
/ \
2 2
For the two examples above, the pre-order traversal will yield the same result, i.e, 12
, but by definition they’re not the same.
What if we noted the position of the NULL
pointer and represented it by X
? What is the pre-order result now? 12X
and 1X2
.
Using this approach, the pre-order traversals for T1
, T2
and T3
now become 5321X4XX6XX
, 321X4XX
and 31X24XX
respectively. Now we can use our sub string approach and be absolutely sure that we’re getting the correct result.
Time Complexity
Let’s assume T1
has N
elements and T2
has M
elements. We visit each element in T1
one time making it a O(N)
process and we visit each element in T2
one time making it a O(M)
process making the traversal a O(N + M)
process. The KMP algorithm is linear in terms of the size of the string and the pattern to match as well.
Space Complexity
Since we are building and storing strings for the traversal, the space complexity is O(N + M)
as well.
But T1 and T2 are **very large** trees! Can we afford this extra space? Can we use recursion? Yes!
Recursive Approach
We can solve this problem recursively as well. All we would have to do is start DFS
on T1
and find the node that is equivalent to the root of T2
and then start another recursion to match the two trees starting at that point.
The code to do this as follows:
func matchTrees(t1: Node?, t2: Node?) -> Bool {
if t1 == nil && t2 == nil {
return true
}
if t1 == nil || t2 == nil {
return false
}
if t1?.value != t2?.value {
return false
}
return matchTrees(t1: t1?.left, t2: t2?.left) && matchTrees(t1: t1?.right, t2: t2?.right)
}
func findRoot(t1: Node?, t2: Node?) -> Bool {
if t1 == nil {
return false
}
if t1?.value == t2?.value {
return matchTrees(t1: t1, t2: t2)
}
return findRoot(t1: t1?.left, t2: t2) || findRoot(t1: t1?.right, t2: t2)
}
func checkSubTree(t1: Node?, t2: Node?) -> Bool {
if t2 == nil {
return true
}
return findRoot(t1: t1, t2: t2)
}
Time Complexity
With the recursive approach, in the worst case, we may have to visit each and every node of T1
once to find the root of T2
in T1
, which is O(N)
time. Once the root is found, we traverse T2
as long as the data in the corresponding pointers match making it O(N + kM)
where k
is the number of times the pointers match.
Space Complexity
While we don’t maintain an additinoal data structure, we definitely do build up a recursive stack frame which is definitely smaller than O(N+M)
.
Solution is hosted on Github.