By default, q’s string search and replace function ssr does not handle regular expressions. However it is possible to obtain regular expression capability by writing a q extension to use the PCRE library.
Here is my implementation: re_q-20080714.bz2. To see it at work:
q)sub:`re 2:(`sub;3) q)s:"quick brown fox" q)sub[s;"brown";"BROWN"] "quick BROWN fox" q)sub[s;"";"|"] "|q|u|i|c|k| |b|r|o|w|n| |f|o|x|" q)sub[s;"\\b";"*"] "*quick* *brown* *fox*" q)sub[s;"(\\w)(\\w+)";"\\2\\1"] "uickq rownb oxf"
It can run faster than the ssr function:
q)\t do[1000000;sub[s;"brown";"BROWN"]] 7141 q)\t do[1000000;ssr[s;"brown";"BROWN"]] 12236
I’ve used the C++ interface of PCRE as most of the work is already implemented in pcrecpp’s GlobalReplace() function.
20080715: Attila pointed out that the speed difference between sub and ssr is not so surprising since ssr is not a built-in function of k. Rather it is defined in q.k as:
ssr:{,/@[x;1+2*!_.5*#x:(0,/(0,+/~+\(>':"["=y)-<':("]"=y))+/:x ss y)_x;$[100>@z;:[;z];z]]}
Related sites
- http://www.pcre.org: The website of the PCRE library
- http://q.o.potam.us: An earlier q extension that uses PCRE. I’m not the first to think of using PCRE in q! :-)
- http://en.wikipedia.org/wiki/Regular_expression: Wikipedia’s entry on regular expressions
- http://docs.python.org/lib/module-re.html: Documentation for Python’s re module
