Regular expression searches are used extensively in GNU Emacs. The two
these searches well.
Regular expression searches are described in section `Regular Expression Search' in The GNU Emacs Manual, as well as in
section `Regular Expressions' in The GNU Emacs Lisp Reference Manual. In writing this chapter, I am presuming that you have at
least a mild acquaintance with them. The major point to remember is
that regular expressions permit you to search for patterns as well as
for literal strings of characters. For example, the code in
forward-sentence searches for the pattern of possible
characters that could mark the end of a sentence, and moves point to
Before looking at the code for the
forward-sentence function, it
is worth considering what the pattern that marks the end of a sentence
must be. The pattern is discussed in the next section; following that
is a description of the regular expression search function,
is described in the section following. Finally, the
forward-paragraph function is described in the last section of
forward-paragraph is a complex function that
introduces several new features.
sentence-end is bound to the pattern that marks the
end of a sentence. What should this regular expression be?
Clearly, a sentence may be ended by a period, a question mark, or an exclamation mark. Indeed, only clauses that end with one of those three characters should be considered the end of a sentence. This means that the pattern should include the character set:
However, we do not want
forward-sentence merely to jump to a
period, a question mark, or an exclamation mark, because such a character
might be used in the middle of a sentence. A period, for example, is
used after abbreviations. So other information is needed.
According to convention, you type two spaces after every sentence, but only one space after a period, a question mark, or an exclamation mark in the body of a sentence. So a period, a question mark, or an exclamation mark followed by two spaces is a good indicator of an end of sentence. However, in a file, the two spaces may instead be a tab or the end of a line. This means that the regular expression should include these three items as alternatives. This group of alternatives will look like this:
\\($\\| \\| \\) ^ ^^ TAB SPC
Here, `$' indicates the end of the line, and I have pointed out where the tab and two spaces are inserted in the expression. Both are inserted by putting the actual characters into the expression.
Two backslashes, `\\', are required before the parentheses and vertical bars: the first backslash to quote the following backslash in Emacs; and the second to indicate that the following character, the parenthesis or the vertical bar, is special.
Also, a sentence may be followed by one or more carriage returns, like this:
Like tabs and spaces, a carriage return is inserted into a regular expression by inserting it literally. The asterisk indicates that the RET is repeated zero or more times.
But a sentence end does not consist only of a period, a question mark or an exclamation mark followed by appropriate space: a closing quotation mark or a closing brace of some kind may precede the space. Indeed more than one such mark or brace may precede the space. These require a expression that looks like this:
In this expression, the first `]' is the first character in the expression; the second character is `"', which is preceded by a `\' to tell Emacs the `"' is not special. The last three characters are `'', `)', and `}'.
All this suggests what the regular expression pattern for matching the
end of a sentence should be; and, indeed, if we evaluate
sentence-end we find that it returns the following value:
sentence-end => "[.?!]\"')}]*\\($\\| \\| \\)[ ]*"
re-search-forward function is very like the
search-forward function. (See section The
re-search-forward searches for a regular expression. If the
search is successful, it leaves point immediately after the last
character in the target. If the search is backwards, it leaves point
just before the first character in the target. You may tell
re-search-forward to return
t for true. (Moving point
is therefore a `side effect'.)
re-search-forward function takes
nilas the third argument causes the function to signal an error (and print a message) when the search fails; any other value causes it to return
nilif the search fails and
tif the search succeeds.
re-search-forwardto search backwards.
The template for
re-search-forward looks like this:
(re-search-forward "regular-expression" limit-of-search what-to-do-if-search-fails repeat-count)
The second, third, and fourth arguments are optional. However, if you want to pass a value to either or both of the last two arguments, you must also pass a value to all the preceding arguments. Otherwise, the Lisp interpreter will mistake which argument you are passing the value to.
forward-sentence function, the regular expression will be
the value of the variable
"[.?!]\"')}]*\\($\\| \\| \\)[ ]*"
The limit of the search will be the end of the paragraph (since a
sentence cannot go beyond a paragraph). If the search fails, the
function will return
nil; and the repeat count will be provided
by the argument to the
The command to move the cursor forward a sentence is a straightforward illustration of how to use regular expression searches in Emacs Lisp. Indeed, the function looks longer and more complicated than it is; this is because the function is designed to go backwards as well as forwards; and, optionally, over more than one sentence. The function is usually bound to the key command M-e.
Here is the code for
(defun forward-sentence (&optional arg) "Move forward to next sentence-end. With argument, repeat. With negative argument, move backward repeatedly to sentence-beginning. Sentence ends are identified by the value of sentence-end treated as a regular expression. Also, every paragraph boundary terminates sentences as well." (interactive "p") (or arg (setq arg 1)) (while (< arg 0) (let ((par-beg (save-excursion (start-of-paragraph-text) (point)))) (if (re-search-backward (concat sentence-end "[^ \t\n]") par-beg t) (goto-char (1- (match-end 0))) (goto-char par-beg))) (setq arg (1+ arg))) (while (> arg 0) (let ((par-end (save-excursion (end-of-paragraph-text) (point)))) (if (re-search-forward sentence-end par-end t) (skip-chars-backward " \t\n") (goto-char par-end))) (setq arg (1- arg))))
The function looks long at first sight and it is best to look at its skeleton first, and then its muscle. The way to see the skeleton is to look at the expressions that start in the left-most columns:
(defun forward-sentence (&optional arg) "documentation..." (interactive "p") (or arg (setq arg 1)) (while (< arg 0) body-of-while-loop (while (> arg 0) body-of-while-loop
This looks much simpler! The function definition consists of
interactive expression, an
Let's look at each of these parts in turn.
We note that the documentation is thorough and understandable.
The function has an
interactive "p" declaration. This means
that the processed prefix argument, if any, is passed to the
function as its argument. (This will be a number.) If the function
is not passed an argument (it is optional) then the argument
arg will be bound to 1. When
forward-sentence is called
non-interactively without an argument,
arg is bound to
or expression handles the prefix argument. What it does is
either leave the value of
arg as it is, but only if
is bound to a value; or it sets the value of
arg to 1, in the
arg is bound to
while loops follow the
or expression. The first
while has a true-or-false-test that tests true if the prefix
forward-sentence is a negative number. This is for
going backwards. The body of this loop is similar to the body of the
while clause, but it is not exactly the same. We will
while loop and concentrate on the second
while loop is for moving point forward. Its skeleton
looks like this:
(while (> arg 0) ; true-or-false-test (let varlist (if (true-or-false-test) then-part else-part (setq arg (1- arg)))) ;
while loop is of the decrementing kind.
(See section Loop with a Decrementing Counter.) It
has a true-or-false-test that tests true so long as the counter (in
this case, the variable
arg) is greater than zero; and it has a
decrementer that subtracts 1 from the value of the counter every time
the loop repeats.
If no prefix argument is given to
forward-sentence, which is
the most common way the command is used, this
while loop will
run once, since the value of
arg will be 1.
The body of the
while loop consists of a
which creates and binds a local variable, and has, as its body, an
The body of the
while loop looks like this:
(let ((par-end (save-excursion (end-of-paragraph-text) (point)))) (if (re-search-forward sentence-end par-end t) (skip-chars-backward " \t\n") (goto-char par-end)))
let expression creates and binds the local variable
par-end. As we shall see, this local variable is designed to
provide a bound or limit to the regular expression search. If the
search fails to find a proper sentence ending in the paragraph, it will
stop on reaching the end of the paragraph.
But first, let us examine how
par-end is bound to the value of
the end of the paragraph. What happens is that the
let sets the
par-end to the value returned when the Lisp interpreter
evaluates the expression
(save-excursion (end-of-paragraph-text) (point))
In this expression,
(end-of-paragraph-text) moves point to the
end of the paragraph,
(point) returns the value of point, and then
save-excursion restores point to its original position. Thus,
par-end to the value returned by the
save-excursion expression, which is the position of the end of
the paragraph. (The
(end-of-paragraph-text) function uses
forward-paragraph, which we will discuss shortly.)
Emacs next evaluates the body of the
let, which is an
expression that looks like this:
(if (re-search-forward sentence-end par-end t) ; if-part (skip-chars-backward " \t\n") ; then-part (goto-char par-end))) ; else-part
if tests whether its first argument is true and if so,
evaluates its then-part; otherwise, the Emacs Lisp interpreter
evaluates the else-part. The true-or-false-test of the
expression is the regular expression search.
It may seem odd to have what looks like the `real work' of
forward-sentence function buried here, but this is a common
way this kind of operation is carried out in Lisp.
re-search-forward function searches for the end of the
sentence, that is, for the pattern defined by the
regular expression. If the pattern is found--if the end of the sentence is
re-search-forward function does two things:
re-search-forwardfunction carries out a side effect, which is to move point to the end of the occurrence found.
re-search-forwardfunction returns a value of true. This is the value received by the
if, and means that the search was successful.
The side effect, the movement of point, is completed before the
if function is handed the value returned by the successful
conclusion of the search.
if function receives the value of true from a successful
if evaluates the then-part,
which is the expression
(skip-chars-backward " \t\n"). This
expression moves backwards over any blank spaces, tabs or carriage
returns until a printed character is found and then leaves point after
the character. Since point has already been moved to the end of the
pattern that marks the end of the sentence, this action leaves point
right after the closing printed character of the sentence, which is
usually a period.
On the other hand, if the
re-search-forward function fails to
find a pattern marking the end of the sentence, the function returns
false. The false then causes the
if to evaluate its third
argument, which is
(goto-char par-end): it moves point to the
end of the paragraph.
Regular expression searches are exceptionally useful and the pattern
re-search-forward, in which the search is the
test of an
if expression, is handy. You will see or write code
incorporating this pattern often.
forward-paragraph: a Goldmine of Functions
forward-paragraph function moves point forward to the end
of the paragraph. It is usually bound to M-} and makes use of a
number of functions that are important in themselves, including
The function definition for
forward-paragraph is considerably
longer than the function definition for
because it works with a paragraph, each line of which may begin with a
A fill prefix consists of a string of characters that are repeated at the beginning of each line. For example, in Lisp code, it is a convention to start each line of a paragraph-long comment with `;;; '. In Text mode, four blank spaces make up another common fill prefix, creating an indented paragraph. (See section `Fill Prefix' in The GNU Emacs Manual, for more information about fill prefixes.)
The existence of a fill prefix means that in addition to being able to
find the end of a paragraph whose lines begin on the left-most
forward-paragraph function must be able to find the
end of a paragraph when all or many of the lines in the buffer begin
with the fill prefix.
Moreover, it is sometimes practical to ignore a fill prefix that exists, especially when blank lines separate paragraphs. This is an added complication.
Rather than print all of the
forward-paragraph function, we
will only print parts of it. Read without preparation, the function
can be daunting!
In outline, the function looks like this:
(defun forward-paragraph (&optional arg) "documentation..." (interactive "p") (or arg (setq arg 1)) (let* varlist (while (< arg 0) ; backward-moving-code ... (setq arg (1+ arg))) (while (> arg 0) ; forward-moving-code ... (setq arg (1- arg)))))
The first parts of the function are routine: the function's argument list consists of one optional argument. Documentation follows.
The lower case `p' in the
interactive declaration means
that the processed prefix argument, if any, is passed to the function.
This will be a number, and is the repeat count of how many paragraphs
point will move. The
or expression in the next line handles
the common case when no argument is passed to the function, which occurs
if the function is called from other code rather than interactively.
This case was described earlier. (See section
forward-sentence.) Now we reach the end of the
familiar part of this function.
The next line of the
forward-paragraph function begins a
let* expression. This is a different kind of expression than
we have seen so far. The symbol is
let* special form is like
let except that Emacs sets
each variable in sequence, one after another, and variables in the
latter part of the varlist can make use of the values to which Emacs
set variables in the earlier part of the varlist.
let* expression in this function, Emacs binds two
The value to which
paragraph-separate is bound depends on the
Let's look at each in turn. The symbol
set to the value returned by evaluating the following list:
(and fill-prefix (not (equal fill-prefix "")) (not paragraph-ignore-fill-prefix) (regexp-quote fill-prefix))
This is an expression whose first element is the function
and function evaluates each of its arguments until one of
the arguments returns a value of
nil, in which case the
and expression returns
nil; however, if none of the
arguments returns a value of
nil, the value resulting from
evaluating the last argument is returned. (Since such a value is not
nil, it is considered true in Lisp.) In other words, an
and expression returns a true value only if all its arguments
In this case, the variable
fill-prefix-regexp is bound to a
nil value only if the following four expressions produce a
true (i.e., a non-
nil) value when they are evaluated; otherwise,
fill-prefix-regexp is bound to
(not (equal fill-prefix "")
nilif the variable
paragraph-ignore-fill-prefixhas been turned on by being set to a true value such as
andfunction. If all the arguments to the
andare true, the value resulting from evaluating this expression will be returned by the
andexpression and bound to the variable
The result of evaluating this
and expression successfully is that
fill-prefix-regexp will be bound to the value of
fill-prefix as modified by the
regexp-quote does is read a string and return a regular
expression that will exactly match the string and match nothing else.
This means that
fill-prefix-regexp will be set to a value that
will exactly match the fill prefix if the fill prefix exists.
Otherwise, the variable will be set to
The second local variable in the
let* expression is
paragraph-separate. It is bound to the value returned by
evaluating the expression:
(if fill-prefix-regexp (concat paragraph-separate "\\|^" fill-prefix-regexp "[ \t]*$") paragraph-separate)))
This expression shows why
let* rather than
let was used.
The true-or-false-test for the
if depends on whether the variable
fill-prefix-regexp evaluates to
nil or some other value.
fill-prefix-regexp does not have a value, Emacs evaluates
the else-part of the
if expression and binds
paragraph-separate to its local value.
paragraph-separate is a regular expression that matches what
fill-prefix-regexp does have a value, Emacs evaluates
the then-part of the
if expression and binds
paragraph-separate to a regular expression that includes the
fill-prefix-regexp as part of the pattern.
paragraph-separate is set to the original value
of the paragraph separate regular expression concatenated with an
alternative expression that consists of the
followed by a blank line. The `^' indicates that the
fill-prefix-regexp must begin a line, and the optional
whitespace to the end of the line is defined by
The `\\|' defines this portion of the regexp as an alternative to
Now we get into the body of the
let*. The first part of the body
let* deals with the case when the function is given a
negative argument and is therefore moving backwards. We will skip this
The second part of the body of the
let* deals with forward
motion. It is a
while loop that repeats itself so long as the
arg is greater than zero. In the most common use of
the function, the value of the argument is 1, so the body of the
while loop is evaluated exactly once, and the cursor moves
forward one paragraph.
This part handles three situations: when point is between paragraphs, when point is within a paragraph and there is a fill prefix, and when point is within a paragraph and there is no fill prefix.
while loop looks like this:
(while (> arg 0) (beginning-of-line) ;; between paragraphs (while (prog1 (and (not (eobp)) (looking-at paragraph-separate)) (forward-line 1))) ;; within paragraphs, with a fill prefix (if fill-prefix-regexp ;; There is a fill prefix; it overrides paragraph-start. (while (and (not (eobp)) (not (looking-at paragraph-separate)) (looking-at fill-prefix-regexp)) (forward-line 1)) ;; within paragraphs, no fill prefix (if (re-search-forward paragraph-start nil t) (goto-char (match-beginning 0)) (goto-char (point-max)))) (setq arg (1- arg)))
We can see immediately that this is a decrementing counter
loop, using the expression
(setq (1- arg)) as the decrementer.
The body of the loop consists of three expressions:
;; between paragraphs (beginning-of-line) (while body-of-while) ;; within paragraphs, with fill prefix (if true-or-false-test then-part ;; within paragraphs, no fill prefix else-part
When the Emacs Lisp interpreter evaluates the body of the
while loop, the first thing it does is evaluate the
(beginning-of-line) expression and move point to the beginning
of the line. Then there is an inner
while loop. This
while loop is designed to move the cursor out of the blank
space between paragraphs, if it should happen to be there. Finally
there is an
if expression that actually moves point to the end
of the paragraph.
First, let us look at the inner
while loop. This loop handles
the case when point is between paragraphs; it uses three functions
that are new to us:
prog1is similar to the
prognfunction, except that
prog1evaluates its arguments in sequence and then returns the value of its first argument as the value of the whole expression. (
prognreturns the value of its last argument as the value of the expression.) The second and subsequent arguments to
prog1are evaluated only for their side effects.
eobpis an abbreviation of `End Of Buffer P' and is a function that returns true if point is at the end of the buffer.
looking-atis a function that returns true if the text following point matches the regular expression passed
looking-atas its argument.
while loop we are studying looks like this:
(while (prog1 (and (not (eobp)) (looking-at paragraph-separate)) (forward-line 1)))
This is a
while loop with no body! The true-or-false-test of the
loop is the expression:
(prog1 (and (not (eobp)) (looking-at paragraph-separate)) (forward-line 1)))
The first argument to the
prog1 is the
and expression. It
has within in it a test of whether point is at the end of the buffer and
also a test of whether the pattern following point matches the regular
expression for separating paragraphs.
If the cursor is not at the end of the buffer and if the characters
following the cursor mark the separation between two paragraphs, then
and expression is true. After evaluating the
expression, the Lisp interpreter evaluates the second argument to
prog1, which is
forward-line. This moves point forward
one line. The value returned by the
prog1 however, is the
value of its first argument, so the
while loop continues so
long as point is not at the end of the buffer and is between
paragraphs. When, finally, point is moved to a paragraph, the
and expression tests false. Note however, that the
forward-line command is carried out anyhow. This means that
when point is moved from between paragraphs to a paragraph, it is left
at the beginning of the second line of the paragraph.
The next expression in the outer
while loop is an
expression. The Lisp interpreter evaluates the then-part of the
if when the
fill-prefix-regexp variable has a value other
nil, and it evaluates the else-part when the value of
if fill-prefix-regexp is
nil, that is, when there is no
It is simplest to look at the code for the case when there is no fill
prefix first. This code consists of yet another inner
expression, and reads as follows:
(if (re-search-forward paragraph-start nil t) (goto-char (match-beginning 0)) (goto-char (point-max)))
This expression actually does the work that most people think of as
the primary purpose of the
forward-paragraph command: it causes
a regular expression search to occur that searches forward to the
start of the next paragraph and if it is found, moves point there; but
if the start of another paragraph if not found, it moves point to the
end of the accessible region of the buffer.
The only unfamiliar part of this is the use of
This is another function that is new to us. The
match-beginning function returns a number specifying the
location of the start of the text that was matched by the last regular
match-beginning function is used here because of a
characteristic of a forward search: a successful forward search,
regardless of whether it is a plain search or a regular expression
search, will move point to the end of the text that is found. In this
case, a successful search will move point to the end of the pattern for
paragraph-start, which will be the beginning of the next
paragraph rather than the end of the current one.
However, we want to put point at the end of the current paragraph, not at the beginning of the next one. The two positions may be different, because there may be several blank lines between paragraphs.
When given an argument of 0,
match-beginning returns the position
that is the start of the text that the most recent regular
expression search matched. In this case, the most recent regular
expression search is the one looking for
match-beginning returns the beginning position of the pattern,
rather than the end of the pattern. The beginning position is the end
of the paragraph.
(Incidentally, when passed a positive number as an argument, the
match-beginning function will place point at that parenthesized
expression in the last regular expression. It is a useful function.)
if expression just discussed is the else-part of an enclosing
if expression which tests whether there is a fill prefix. If
there is a fill prefix, the then-part of this
if is evaluated.
It looks like this:
(while (and (not (eobp)) (not (looking-at paragraph-separate)) (looking-at fill-prefix-regexp)) (forward-line 1))
What this expression does is move point forward line by line so long as three conditions are true:
The last condition may be puzzling, until you remember that point was
moved to the beginning of the line early in the
function. This means that if the text has a fill prefix, the
looking-at function will see it.
In summary, when moving forward, the
does the following:
For review, here is the code we have just been discussing, formatted for clarity:
(interactive "p") (or arg (setq arg 1)) (let* ( (fill-prefix-regexp (and fill-prefix (not (equal fill-prefix "")) (not paragraph-ignore-fill-prefix) (regexp-quote fill-prefix))) (paragraph-separate (if fill-prefix-regexp (concat paragraph-separate "\\|^" fill-prefix-regexp "[ \t]*$") paragraph-separate))) backward-moving-code (omitted) ... (while (> arg 0) ; forward-moving-code (beginning-of-line) (while (prog1 (and (not (eobp)) (looking-at paragraph-separate)) (forward-line 1))) (if fill-prefix-regexp (while (and (not (eobp)) ; then-part (not (looking-at paragraph-separate)) (looking-at fill-prefix-regexp)) (forward-line 1)) ; else-part: the inner-if (if (re-search-forward paragraph-start nil t) (goto-char (match-beginning 0)) (goto-char (point-max)))) (setq arg (1- arg))))) ; decrementer
The full definition for the
forward-paragraph function not only
includes this code for going forwards, but also code for going backwards.
If you are reading this inside of GNU Emacs and you want to see the
whole function, you can type M-. (
find-tag) and the name
of the function when prompted for it. If the
first asks you for the name of a `TAGS' table, give it the name
of the `TAGS' file in your `emacs/src' directory, which will
have a pathname such as `/usr/local/lib/emacs/19.23/src/TAGS'.
(The exact path to the `emacs/src' directory depends on how your
copy of Emacs was installed. If you don't know the path, you can
sometimes find out by typing C-h i to enter Info and then typing
C-x C-f to see the path to the `emacs/info' directory. The
path to the `TAGS' file is often the corresponding
`emacs/src' path; sometimes, however, Info files are stored
You can also create your own `TAGS' file for directories that lack one.
You can create your own `TAGS' file to help you jump to sources.
For example, if you have a large number of files in your
`~/emacs' directory, as I do--I have 137 `.el' files in it,
of which I load 17-- you will find it easier to jump to specific
functions if you create a `TAGS' file for that directory than if
you search for the function name with
grep or some other tool.
You can create a `TAGS' file by calling the
that comes as a part of the Emacs distribution. Usually,
is compiled and installed when Emacs is built. (
etags is not
an Emacs Lisp function or a part of Emacs; it is a C program.)
To create a `TAGS' file, first switch to the directory in which
you want to create the file. In Emacs you can do this with the
M-x cd command, or by visiting a file in the directory, or by
listing the directory with C-x d (
dired). Then type
M-! etags *.el
to create a `TAGS' file. The
etags program takes all the
usual shell `wildcards'. For example, if you have two directories for
which you want a single `TAGS file', type the command like this,
where `../elisp/' is the second directory:
M-! etags *.el ../elisp/*.el
M-! etags --help
to see a list of the options accepted by
etags program handles Emacs Lisp, Common Lisp, Scheme, C,
Fortran, Pascal, LaTeX, and most assemblers. The program has no
switches for specifying the language; it recognizes the language in an
input file according to its file name and contents.
Also, `etags' is very helpful when you are writing code yourself
and want to refer back to functions you have already written. Just
etags again at intervals as you write new functions, so
they become part of the `TAGS' file.
Here is a brief summary of some recently introduced functions.
nil. (The expression is evaluated only for its side effects.) For example:
(let ((foo 2)) (while (> foo 0) (insert (format "foo is %d.\n" foo)) (setq foo (1- foo)))) => foo is 2. foo is 1. nil(The
insertfunction inserts its arguments at point; the
formatfunction returns a string formatted from its arguments the way
messageformats its arguments;
\nproduces a new line.)
nilor an error message.
(let* ((foo 7) (bar (* 3 foo))) (message "`bar' is %d." bar)) => `bar' is 21.
tfor true if the text after point matches the argument, which should be a regular expression.
tfor true if point is at the end of the accessible part of a buffer. The end of the accessible part is the end of the buffer if the buffer is not narrowed; it is the end of the narrowed part if the buffer is narrowed.
(prog1 1 2 3 4) => 1
Go to the first, previous, next, last section, table of contents.