• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!



Page history last edited by PBworks 17 years, 5 months ago

Micmatch http://martin.jambon.free.fr/micmatch.html is a library which extends OCaml with a convenient syntax for text manipulation using regexps.


This is a public discussion board about the use and the development of Micmatch.


Micmatch 0.697 is out!

See changes at http://martin.jambon.free.fr/micmatch-changes.txt




Brief history: the Micmatch project was started in 2004. It's been two years now, and no major bug has been found. Several features were added since then, but now it seems quite mature.


Plans: release a 1.0 version that supports the upcoming version of Camlp4. There should be no incompatible changes with the previous releases, but some features may become deprecated. The main issues with respect to the widespread of the library is that it relies on Camlp4, which provides too much freedom and not enough guarantees of compatibility between various extensions and between different versions of Camlp4. Hopefully this will find a remedy and eventually a whole lot of Camlp4 extensions will fall into a standard "macro" category, which guarantees intercompatibility of all syntax extensions of that class. Besides these syntax and standardization issues, the micmatch_pcre package provides good functionality and supports most of the features of PCRE.


How to make things easier for everyone:

  • keep only official support for PCRE, and change the name to micmatch instead of micmatch_pcre. That means much less worries for the developers and for the packagers.
  • guarantee that the extension will be 100% compatible with other syntax extensions. That means a real integration of macros in the OCaml language and everyone using Camlp4 extensions without a fear of the unknown. Some insights from the OCaml development team would be highly appreciated here.



The 1.0 release is planned to include the following features:

  • compatibility with the previous versions
  • a list of deprecated features (which are kept for compatibility purposes)
  • the implementation of the micmatch_pcre macros (SEARCH, MAP, COLLECT, ...) in micmatch_str. Possible warnings concerning undocumented features of Str which are unlikely to change in the future. not a priority since support for Str is likely to disappear in future versions, unless someone takes care of it
  • alternate syntax. It would be nice to use a unified syntax for macros, that other syntax extensions use too and which guarantees intercompatibility with any other macro. However, there is no such standard yet.


The idea is to implement these features and then release a 0.99 version first, see if people like it and after a month or so make the 1.0 release. But this is unlikely to happen until the new Camlp4 is out.



Possible developments & experimental features


General-purpose system for extending pattern-matching


Applications would include:

  • matching against lazy values (thus evaluating only what is necessary)
  • regular expressions over arrays or lists
  • views (see below)




Views as described below are now implemented in micmatch 0.697 as an experimental feature. This is open for comments. Add yours at the bottom of this section.


Views is a pattern-matching technique where you can match data against patterns that do not match the concrete/physical representation of the data. A simple application of views in OCaml is to provide pattern-matching over object methods or lazy values.


Something along the lines of http://hackage.haskell.org/trac/haskell-prime/wiki/ViewPatterns could be implemented relatively easily using micmatch's code base.


Proposal through examples:


Pattern-matching over object methods


let view XY = fun obj -> try Some (obj#x, obj#y) with _ -> None

(* Test if a list of objects starts with coordinates x=0 and y=0 *)
let test_origin = function
    %XY (0, 0) :: _ -> true
  | _ -> false


Pattern-matching over ranges of numbers


Here we use the previous example and show two things:

  • View constructors without an argument are defined as functions that return a bool, instead of an option type.
  • The main interest of views is that they behave like any pattern. You can compose them to build more complex patterns.


let view Positive = fun x -> x >= 0
let view Negative = fun x -> x <= 0

(* Take an object with methods x and y, and test whether the returned values are positive. *)
let test_positive_coords = function
   %XY (%Positive, %Positive) -> true
 | _ -> false

(* Handle the results of the standard "compare" function, which returns ints: *)
let view Greater = fun x -> x > 0
let view Lower = fun x -> x < 0
let view Equal = fun x -> x = 0
let view GreaterEqual = fun x -> x >= 0
let view LowerEqual = fun x -> x <= 0

(* It is slower than using a simple "if then else", but it's less error-prone since you dont'
   need to remember whether -1 means lower or greater, once the views are defined *)
match compare x y with
   %Lower -> ...
 | _ -> ...


Pattern-matching over lazy values


let view Lazy = fun x -> Some (Lazy.force x)

(* x and y are computed only if necessary. If x <> 1 and y <> 2 then only one of them
   will be actually computed: *)
match x, y with
   (%Lazy 1, %Lazy 2) -> ...
 | _ -> ...

(* Lazy lists (where only the tail is lazy) *)

type 'a lazy_list = Empty | Cons of ('a * 'a lazy_list lazy_t)

let view Empty = fun l -> Lazy.force l = Empty
let view Cons = fun l -> match Lazy.force l with Cons x -> Some x

match ll with
    %Empty -> ...
  | %Cons (x, %Empty) -> ...
  | %Cons (x1, %Cons (x2, %Empty)) -> ...
  | _ -> ...


instead of:


match Lazy.force ll with
    Empty -> ...
  | Cons (x1, tl) -> 
      match Lazy.force tl with
          Empty -> ...
        | Cons (x2, tl2) ->
            match Lazy.force tl2 with
                Empty -> ...
              | _ -> ...


This is the general mechanism proposed for views in OCaml. If you want a really nice syntax for lazy lists, you would have to write a camlp4 syntax extension that understands %[x1;x2] as %Cons (x1, %Cons (x2, _)).




let view X = f

is translated into:

let view_X = f


Similarly, we have local views:

let view X = f in ...


Given the nature of camlp4, this is the simplest solution that allows us to make views available to other modules, since they are just functions, with a standard name. When a view X is encountered in a pattern, it uses the view_X function. The compiler will complain if doesn't have the right type, but not the preprocessor.


About inline views: since views are simple functions, we could insert functions directly in patterns. I believe it would make the pattern really difficult to read, especially since views are expected to be most useful in already complex patterns.


Efficiency: using views is not going to make a program any faster, but should be handy in complex situations.


About completeness checking: our definition of views doesn't allow the compiler to warn against incomplete or redundants pattern-matching. We have the same situation with regexps. What we define here are incomplete or overlapping views, which have a broader spectrum of applications than views which are defined as sum types. See the original proposal of views for Haskell by Simon Peyton-Jones.


Remarks and suggestions


This reference might be useful: http://citeseer.ist.psu.edu/okasaki98view.html


Please put your suggestions here.

Comments (0)

You don't have permission to comment on this page.