- matching is defined by recursion over the structure of the regexp (i.e. there is no need to convert the regexp to an automata); and
- it uses the "overloaded strings" extension to allow treating string literals as regexps.
Matching is implemented using continuations that specify what to do with the suffix left-over after matching the string; this leads to a very elegant and self-documented solution.
There is, however, one drawback: the handling of star-closure is inefficient due to the test cs'/=cs to check that a prefix of the string was consumed. The changes needed to implement a more efficient version are left as an exercise for the reader.
There is, however, one drawback: the handling of star-closure is inefficient due to the test cs'/=cs to check that a prefix of the string was consumed. The changes needed to implement a more efficient version are left as an exercise for the reader.
{- ------------------------------------------------------------------ Regular expression matching in Haskell This is a classical example of functional programming with continuations. Adapted from the SML version by Olivier Danvy (from "Defunctionalization at Work", BRICS RS-01-23, 2001). This version uses the "overloaded strings" GHC extension for mixing character literals in regexps. Some examples to try on the GHCi interpreter; (Note: use the command ":set -XOverloadedStrings" to enable overloaded strings for the GHCi session.) > :set -XOverloadedStrings > match (star "ab") "" = True > match (star "ab") "aba" = False > match (star "ab") "abab" = True > match (star "a" <+> star "b") "aaaa" = True > match (star "a" <+> star "b") "aabb" = False > match (star ("a" <+> "b")) "aabb" = True Pedro Vasconcelos, 2012 ------------------------------------------------------------------- -} -- enable the overloaded strings language extension in GHC {-# LANGUAGE OverloadedStrings #-} module RE where -- import the interface for overloading strings import GHC.Exts ( IsString(..) ) -- Abstract syntax of regular expressions; -- literal characters, empty word, empty language -- concatenation, union and Kleene star closure data Regexp = Lit Char | Epsilon | Empty | Concat Regexp Regexp | Union Regexp Regexp | Star Regexp deriving (Show, Read, Eq) -- "Smart" constructors that perform some simplifications -- Infix binary operators for concation and union infixl 7 <> infixl 6 <+> (<>) :: Regexp -> Regexp -> Regexp Empty <> _ = Empty _ <> Empty = Empty Epsilon <> r = r r <> Epsilon = r r <> r' = Concat r r' (<+>) :: Regexp -> Regexp -> Regexp Empty <+> r = r r <+> Empty = r r <+> r' = Union r r' -- Kleene star closure star :: Regexp -> Regexp star Empty = Epsilon star Epsilon = Epsilon star (Star r) = Star r star r = Star r -- coercion of strings to regular expressions -- e.g. fromString "abc" = Lit 'a' <> Lit 'b' <> Lit 'c' <> Epsilon instance IsString Regexp where fromString cs = foldr ((<>) . Lit) Epsilon cs -- Higher-order, continuation-based matcher for regular expressions -- worker function to match a regexp with string -- 3rd argument is the continuation for the non-matched suffix accept :: Regexp -> String -> (String -> Bool) -> Bool accept Empty cs k = False accept Epsilon cs k = k cs accept (Lit c) (c':cs) k = c==c' && k cs accept (Lit c) [] k = False accept (Concat r1 r2) cs k = accept r1 cs (\cs' -> accept r2 cs' k) accept (Union r1 r2) cs k = accept r1 cs k || accept r2 cs k accept (Star r) cs k = accept_star r cs k accept_star r cs k = k cs || accept r cs (\cs' -> cs'/=cs && accept_star r cs' k) -- top-level matcher; initial continuation tests for the empty string match :: Regexp -> String -> Bool match r cs = accept r cs null -- end of file --
No comments:
Post a Comment