A subsequence of a given sequence is the given sequence with
some elements (possibly none) left out. Formally, given a sequence
X = (x1, x2, ...,
xm), a sequence Z = (z1,
z2, ..., zk), is a
subsequence of X if there is a strictly increasing
sequence (i1, i2, ...,
ik) of indices of X such that
xij = zj.
For example, a sequence Z = (B, C, D, B) is a subsequence of
X = (A, B, C, B, D, A, B), with corresponding index
sequence (2, 3, 5, 7).
Given two sequences X and Y, a sequence Z is a
common subsequence of X and Y if Z is a
subsequence of both X and Y. A longest common
subsequence (LCS) of X and Y is a subsequence of
X and Y of the maximum length. The
longest-common-subsequence-length problem is defined as follows.
Problem 1 (Longest Common Subseqence Length) [20 points]
Input: two sequences X = (x1,
x2, ..., xm) and
Y = (y1, y2, ...,
yn)
Output: the length of a longest common subsequence of X
and Y
For example, let X = (A, B, C, B, D, A, B) and Y =
(B, D, C, A, B, A). The sequence (B, C, A) is a common subsequence of
X and Y. The sequence (B, C, B, A) is an LCS of X
and Y. The sequence (B, D, A, B) is another LCS. The length of
an LCS of X and Y is 4.
Sample input.
ABCBDAB BDCABAPrompt and read the input strings from standard input. I will test your programs with inputs with no spaces between the characters.
The length of a longest common subsequence of ABCBDAB and BDCABA is 4.Note. You may assume that input sequences are always character sequences. As the above example shows, there are many instances where two given sequences have more than one LCS. However, by the definition, the length of an LCS is clearly unique.
Target running time.
For given sequences X of length m and Y of length
n, your program must run in O(mn) time.
Hints.
First, we define a notation for a prefix of a sequence. For an
arbitrary sequence Z = (z1,
z2, ..., zk) and an index
i <= k, let Zi denote the
sequence (z1, ..., zi)
consisting of the first i elements of Z.
If the last elements of X and Y match (that is,
xm = yn), then an
LCS of X and Y is simply an LCS of
Xm-1 and Yn-1 with
the last matching element attached at the end. Otherwise, the last
elements of X and Y will not both be in the solution
together, and an LCS of X and Y is the longer
of the following two sequences: an LCS of X and
Yn-1 and an LCS of
Xm-1 and Y.
A solution for the method described above involves establishing a recursive procedure. We will use dynamic programming with a two-dimensional array c, where each entry c[i][j] will store the length of an LCS of the subsequences Xi and Yj. First, we note the base case: If either i = 0 or j = 0, one of the sequences has length 0, and the LCS of Xi and Yj thus has length 0. Otherwise, we have the following recursive formulas.
Be careful! The LCS algorithm we've looked at numbers the characters
from 1 to the length of the sequence. This is one off from Java
string indexing! Think through how you will make your indexing scheme
consistent for both your known-values array and your strings. Be sure to
think about the extreme cases of the complete strings and one or both arguments
being the empty string.
Problem 2 (Longest Common Subseqence) [5 points]
Input: two sequences X = (x1,
x2, ..., xm) and
Y = (y1, y2, ...,
yn)
Output: a longest common subsequence of X and Y
HINT: For this problem, it suffices to find one LCS, even in the
case when more than one such sequence exist (such as the example given
above). Build the table as you did for part 1. Look at entry c[m][n].
Given the value and the values in nearby entries along with the associated
input values, can you figure out what values went into computing c[m][n]?
Can you use that information to determine when to add a character to the LCS?
Note that if you work backwards, you will produce the LCS in reverse order;
what do you do to get it printed into forward order?
Target timing.
Your program must run within the same timing bound as before.