| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Micmatch

This version was saved 17 years, 2 months ago View current version     Page history
Saved by PBworks
on January 22, 2007 at 3:47:40 pm
 

Micmatch http://martin.jambon.free.fr/micmatch.html is a library which extends OCaml with a convenient syntax for text manipulation using regexps.

 

This is a public discussion board about the use and the development of Micmatch.

 

Upcoming release

 

The latest public release is 0.696. The following things are implemented in the development version:

  • (bug) installation of executables now correctly follows $BINDIR or $PREFIX/bin
  • (+ui) new FILTER macro which returns true or false
  • (+ui) changed grammar entry level of macros (now "expr1" instead of "top"). Allows for less parentheses.
  • (pkg) added dependency to the Unix library
  • (+ui) added filename globbing in the Micmatch library

 

For older changes see http://martin.jambon.free.fr/micmatch-changes.txt

 

Roadmap

 

Brief history: the Micmatch project was started in 2004. It's been two years now, and no major bug has been found. Several features were added since then, but now it seems quite mature.

 

Plans: release a 1.0 version that supports the upcoming version of Camlp4. There should be no incompatible changes with the previous releases, but some features may become deprecated. The main issues with respect to the widespread of the library is that it relies on Camlp4, which provides too much freedom and not enough guarantees of compatibility between various extensions and between different versions of Camlp4. Hopefully this will find a remedy and eventually a whole lot of Camlp4 extensions will fall into a standard "macro" category, which guarantees intercompatibility of all syntax extensions of that class. Besides these syntax and standardization issues, the micmatch_pcre package provides good functionality and supports most of the features of PCRE.

 

How to make things easier for everyone:

  • keep only official support for PCRE, and change the name to micmatch instead of micmatch_pcre. That means much less worries for the developers and for the packagers.
  • guarantee that the extension will be 100% compatible with other syntax extensions. That means a real integration of macros in the OCaml language and everyone using Camlp4 extensions without a fear of the unknown. Some insights from the OCaml development team would be highly appreciated here.

 

 

The 1.0 release is planned to include the following features:

  • compatibility with the previous versions
  • a list of deprecated features (which are kept for compatibility purposes)
  • the implementation of the micmatch_pcre macros (SEARCH, MAP, COLLECT, ...) in micmatch_str. Possible warnings concerning undocumented features of Str which are unlikely to change in the future. not a priority since support for Str is likely to disappear in future versions, unless someone takes care of it
  • alternate syntax. It would be nice to use a unified syntax for macros, that other syntax extensions use too and which guarantees intercompatibility with any other macro. However, there is no such standard yet.

 

The idea is to implement these features and then release a 0.99 version first, see if people like it and after a month or so make the 1.0 release. But this is unlikely to happen until the new Camlp4 is out.

 

 

Possible developments

 

General-purpose system for extending pattern-matching

 

Applications would include:

  • matching against lazy values (thus evaluating only what is necessary)
  • regular expressions over arrays or lists
  • views (see below)

 

Views

 

Views is a pattern-matching technique where you can match data against patterns that do not match the concrete/physical representation of the data. A simple application of views in OCaml is to provide pattern-matching over object methods or lazy values.

 

Something along the lines of http://hackage.haskell.org/trac/haskell-prime/wiki/ViewPatterns could be implemented relatively easily using micmatch's code base.

 

Proposal through examples:

 

Pattern-matching over object methods

 

view XY = fun obj -> try Some (obj#x, obj#y) with _ -> None

(* Test if a list of objects starts with coordinates x=0 and y=0 *)
let test_origin = function
    view XY (0, 0) :: _ -> true
  | _ -> false

 

Pattern-matching over ranges of numbers

 

Here we use the previous example and show two things:

  • View constructors without an argument are defined as functions that return a bool, instead of an option type.
  • The main interest of views is that they behave like any pattern. You can compose them to build more complex patterns.

 

view Positive = fun x -> x >= 0
view Negative = fun x -> x <= 0

(* Take an object with methods x and y, and test whether the returned values are positive. *)
let test_positive_coords = function
   view XY (view Positive, view Positive) -> true
 | _ -> false

(* Handle the results of the standard "compare" function, which returns ints: *)
view Greater = fun x -> x > 0
view Lower = fun x -> x < 0
view Equal = fun x -> x = 0
view GreaterEqual = fun x -> x >= 0
view LowerEqual = fun x -> x <= 0

(* It is slower than using a simple "if then else", but it's less error-prone since you dont'
   need to remember whether -1 means lower or greater, once the views are defined *)
match compare x y with
   view Lower -> ...
 | _ -> ...

 

Pattern-matching over lazy values

 

view Lazy = fun x -> Some (Lazy.force x)

(* x and y are computed only if necessary. If x <> 1 and y <> 2 then only one of them
   will be actually computed: *)
match x, y with
   (view Lazy 1, view Lazy 2) -> ...
 | _ -> ...

(* Lazy lists (where only the tail is lazy) *)

type 'a lazy_list = Empty | Cons of ('a * 'a lazy_list lazy_t)

view Empty = fun l -> Lazy.force l = Empty
view Cons = fun l -> match Lazy.force l with Cons x -> Some x

match ll with
    view Empty -> ...
  | view Cons (x, view Empty) -> ...
  | view Cons (x1, view Cons (x2, view Empty)) -> ...
  | _ -> ...

 

instead of:

 

match Lazy.force ll with
    Empty -> ...
  | Cons (x1, tl) -> 
      match Lazy.force tl with
          Empty -> ...
        | Cons (x2, tl2) ->
            match Lazy.force tl2 with
                Empty -> ...
              | _ -> ...

 

This is the general mechanism proposed for views in OCaml. If you want a really nice syntax for lazy lists, you would have to write a camlp4 syntax extension that understands view [x1;x2] as view Cons (x1, view Cons (x2, _)).

 

Implementation

 

view X = f

would translated into:

let view_X = f

 

Similarly, we could have local views:

view X = f in ...

 

Given the nature of camlp4, this is the simplest solution that allows us to make views available to other modules, since they are just functions, with a standard name. When a view X is encountered in a pattern, it uses the view_X function. The compiler will complain if doesn't have the right type, but not the preprocessor.

 

About inline views: since views are simple functions, we could insert functions directly in patterns. I believe it would make the pattern really difficult to read, especially since views are expected to be most useful in already complex patterns.

 

Efficiency: using views is not going to make a program any faster, but should be handy in complex situations.

 

About completeness checking: our definition of views doesn't allow the compiler to warn against incomplete or redundants pattern-matching. We have the same situation with regexps. What we define here are incomplete or overlapping views, which have a broader spectrum of applications than views which are defined as sum types. See the original proposal of views for Haskell by Simon Peyton-Jones.

 

Remarks and suggestions

 

Please put your suggestions here.

Comments (0)

You don't have permission to comment on this page.