Longest Increasing Subsequence in q

Chapter 6.2 of “Algorithms” introduces the Longest Increasing Subsequence problem. I implemented the solution in q as follows (lis_q).

First, we start with the same sequence of numbers used in the book:

q)N:5 2 8 6 3 6 9 7

Next, we find the “edges” of the DAG. This can be done either using a list:

q)f:{x,enlist(y;where y>x[;0])}
q)show ()f/N
5 `int$()
2 `int$()
8 0 1
6 0 1
3 ,1
6 0 1 4
9 0 1 2 3 4 5
7 0 1 3 4 5

or a table:

q)t:([]N:();E:())
q)f:{:x,:(y;where y>x`N)}
q)show t f/N
N E
-------------
5 `int$()
2 `int$()
8 0 1
6 0 1
3 ,1
6 0 1 4
9 0 1 2 3 4 5
7 0 1 3 4 5

I prefer the table as we will be addressing the columns often. Columns in tables are just dictionary entries and access is fast. Also columns can be named, e.g. N for numbers, E for edges, so it makes the code more readable.

Having identified the edges, we can compute the longest increasing subsequence length for each node using dynamic programming. We introduces a new column L for the lengths:

q)t:([]N:();E:();L:())
q)f:{:x,:(y;e;1+max 0,(x`L)e:where y>x`N)}
q)show t f/N
N E           L
---------------
5 `int$()     1
2 `int$()     1
8 0 1         2
6 0 1         2
3 ,1          2
6 0 1 4       3
9 0 1 2 3 4 5 4
7 0 1 3 4 5   4

As we can see, there are two longest increasing subsequences of length 4. To reconstruct the actual subsequences, we need to keep track of the previous node at each node. We introduce yet another column P for the previous nodes:

q)t:([]N:();E:();P:();L:())
q)f:{:x,:(y;e;e v?m;1+m:max 0,v:(x`L)e:where y>x`N)}
q)show t f/N
N E           P L
-----------------
5 `int$()       1
2 `int$()       1
8 0 1         0 2
6 0 1         0 2
3 ,1          1 2
6 0 1 4       4 3
9 0 1 2 3 4 5 5 4
7 0 1 3 4 5   5 4

Observe that we don’t need column E after P and L are computed. So we get rid of it to reduce memory consumption:

q)t:([]N:();P:();L:())
q)f:{:x,:(y;e v?m;1+m:max 0,v:(x`L)e:where y>x`N)}
q)show t f/N
N P L
-----
5   1
2   1
8 0 2
6 0 2
3 1 2
6 4 3
9 5 4
7 5 4

To get the subsequences, we still need to traverse P. This is achieved by:

q)g:{flip reverse -1_(x`P)scan where(x`L)=max x`L}
q)show g t f/N
1 4 5 6
1 4 5 7
q)N g t f/N
2 3 6 9
2 3 6 7

Putting everything together, we see that the Longest Increasing Subsequences problem can be solved using:

lis:{
 t:([]N:();P:();L:());                              / define empty table
 f:{:x,:(y;e v?m;1+m:max 0,v:(x`L)e:where y>x`N)};  / fill up table
 g:{flip reverse -1_(x`P)scan where(x`L)=max x`L};  / traverse nodes
 g t f/x}

The complexity of this algorithm is proportional to the square of the problem size as shown below:

q)\t lis 5000?2000
281
q)\t lis 10000?2000
1042
q)\t lis 20000?2000
4080

Because q is a vector-based language, the lis function obtains several longest increasing subsequences with distinct end-points (if they exist) without additional coding efforts. However it returns the same subset of subsequences at each call because e v?m only returns one prior node.

Finding subsequences randomly

What if we want some other subsequences? To return some other subsequences, we can modify e v?m to e(1?where v=m)0. This returns a random selection of one node from all prior nodes. The code is:

lis_rand:{
 t:([]N:();P:();L:());                                       / empty table
 f:{:x,:(y;e(1?where v=m)0;1+m:max 0,v:(x`L)e:where y>x`N)}; / fill up table
 g:{flip reverse -1_(x`P)scan where(x`L)=max x`L};           / traverse nodes
 g t f/x}

Finding all subsequences

Can we find all subsequences? To do that, the P column must now hold all prior nodes rather than just one, i.e. we use e where v=m. Furthermore, the node traversal function g must be modified to handle list of prior nodes. The code is:

lis_all:{
 t:([]N:();P:();L:());                                       / empty table
 f:{:x,:(y;e where v=m;1+m:max 0,v:(x`L)e:where y>x`N)};     / fill up table
 g:{(m-1){raze y,/:'(x`P)(last')y}[x]/where(x`L)=m:max x`L}; / get all lis
 reverse each g t f/x}

To see all 3 functions in action, we create a random sequence and apply them to it:

q)N:30?20
q)N
14 13 3 8 10 8 13 0 14 13 16 6 19 7 14 9 0 10 18 5 3 5 0 7 6 8 6 16 10 18
q)lis_all N
2  3  4  6  8  10 12
2  3  4  6  8  10 18
2  3  4  6  8  10 29
2  3  4  6  8  27 29
2  3  4  6  14 27 29
2  3  4  9  14 27 29
2  11 13 15 17 27 29
7  11 13 15 17 27 29
7  20 21 23 25 27 29
16 20 21 23 25 27 29
7  20 21 24 25 27 29
16 20 21 24 25 27 29
7  20 21 23 25 28 29
16 20 21 23 25 28 29
7  20 21 24 25 28 29
16 20 21 24 25 28 29
q)lis N
2 3 4 6 8 10 12
2 3 4 6 8 10 18
2 3 4 6 8 10 29
q)lis_rand N
2 3  4  6  8  10 12
2 3  4  6  8  10 18
7 20 21 23 25 28 29
q)lis_rand N
2 3 4 6 8  10 12
2 3 4 6 8  10 18
2 3 4 6 14 27 29

In imperative languages, the coding effort to create lis_all would probably be more complex than lis and lis_rand. In q, the “thinking” effort dominated the solution. Once I understood the consequences of ./:' it was relatively easy to code lis_all and let q handle the complexity of finding all the subsequences.

Footnote:

To understand ./:' try:

q)("abc";"de";"f"),/:'("12";"345";"6789")
("abc1";"abc2")
("de3";"de4";"de5")
("f6";"f7";"f8";"f9")

Due to the possibility of exponential growth, lis_all should be used with care on long sequences. Calling lis_random multiple times followed by distinct may make more sense for long sequences. Lastly, the above algorithm has complexity O(n^2). It is possible to do O(n lg n) – see this Wikipedia entry.

Advertisements
Posted in k&q. Tags: . Leave a Comment »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: