This whole entry could be summarized as 'use M-x re-builder
' to build your
regular expressions. But let's see if I can stretch that wisdom over a couple
of lines…
For searching and replacing, regular expressions ('regexps') are a very useful tool. For example, see the entry about getting your ip-number. I am not going to explain regexps here – there are plenty of good references about them. Of course, emacs supports regexps - but it's not always so easy, compaired to e.g. Perl. I am only providing some trivial examples here, please see Steve Yegge's post on the regexp tricks possible with then-new Emacs 22 (I can't remember ever needing that kind of regexp-pr0n in real life though…)
Back to regexps - on of the issues with regexps in Elisp is that they need
extra quoting, that is, lots of \
-escape characters; regexps can be hard to
comprehend, and this does not help… Why the extra quoting? Let's look at a
simple example. Suppose we want to search for the word cat
. And not
category
or concatenate
. The regular expression would then be \bcat\b
.
In Perl you could write this as /\bcat\b\/
(in Perl you specify regexps by
putting them between /
-characters).
Not so in Emacs-Lisp. On the Lisp-level, there are no regexps; there are only
strings and only the regexp functions understand their true nature. But
before the strings ever get those functions, the Lisp interpreter does what
it does best: interpreting. And when it sees \b
, it interprets it as the
backspace
-character.
To make it not do that, you'll need to pay the 'slash-tax' and write something like:
(re-search-forward "\\bcat\\b")Things can go ugly quickly from there - think of when you need search for something with a backslash, like our regex
\bcat\b
itself; you'd need to do:
(re-search-forward "\\\\bcat\\\\b")
slash tax break
To make things even more interesting, in different contexts, different rules apply. The above is all about regexps in strings in Emacs-Lisp. However, things are different when you provide a string interactively.
Suppose you search through your buffer (with M-x isearch-forward-regexp
or
C-M-s
). Now, your input is not interpreted by the Lisp interpreter (after
all, it's just user input). So, you're exempt from the slash tax, and you can
use \bcat\b
to match, well, \bcat\b
.
re-builder
So, regexps can be hard, and Emacs-Lisp makes it somewhat harder. A natural
way to come up with the regular expression you need, is to use
trial-and-error, and this is exactly what isearch-forward-regexp
and
friends do. But what about the slash-taxed regexps that you need in your Lisp
code?
The answer is M-x re-builder
. I am sure many people are already using it,
but even if there were only one person that finds out about this through
this blog-post, it'd be worth it! And this is the whole trick here: whenever
you need a regexp in your code, put the kind of string it should match in
a buffer, and enter M-x re-builder
.
re-builder
will put some quotes in the minibuffer. You type your regexp
there, and it will show the matches in the buffer as you type. It even supports different
regex-syntaxes. By default, re-builder will help you with the
strings-in-Emacs-lisp kind of regexps; this is called the read-syntax. But you
can switch to the user-input regexps with C-c TAB string RET
(yes, these are
called string
here). There are some other possible syntaxes as well.
One final trick for re-builder
is the subexpression mode, that you
activate with C-c C-e
(and leave with q
). You can than see what
subexpressions match (ie. if we can match cat
, cut
, cot
etc., with
\\bc\\(.\\)t\\b
, and the subexpression would then contain the middle
letter. re-builder
automatically converts between the syntaxes it supports,
so you could use 'string-mode' as well, bc\(.\)t\b
.
11 comments:
even if there were only one person that finds out about this through this blog-post, it'd be worth it!score.
One person that found re-builder from this blog post says thank you.
Thanks a lot! Are there any flavors of this that build PCRE regexps?
Thanks a bunch! I'd adore a more introductory tutorial to regex syntax in Emacs, but this is fantastic nonetheless.
For nicer regexp syntax, there's also rx:
(require 'rx)
(rx word-boundary (or "cat" "dog") word-boundary)
rx rocks.
In the fourth paragraph it is said:
In Perl you could write this as /\bcat\b\/ (...)I think the correct expression would be /\bcat\b/, without the last backslash.
BTW, very nice article. Also the first time that I read about re-builder.
@alberto: you are very right -- you see how all those \-chars get confusing quickly :-)
won't update the post now, as it would show up as a new article in planet.emacsen as well...
oh this tip made my day. i've been futzing around with backslash after backslash without a good trial-and-error system until i ran into this. THANKS!!
See also this. WYSIWYG search that highlights regexp groups can be a great way to test out a regexp.
http://emacswiki.org/emacs/RegularExpressionHelp#toc3
re-builder! Awesomeness! Thank you!
Post a Comment