LinuxSelfhelp.com

Go to the first, previous, next, last section, table of contents.


Running @command{awk} and @command{gawk}

This major node covers how to run awk, both POSIX-standard and @command{gawk}-specific command-line options, and what @command{awk} and @command{gawk} do with non-option arguments. It then proceeds to cover how @command{gawk} searches for source files, obsolete options and/or features, and known bugs in @command{gawk}. This major node rounds out the discussion of @command{awk} as a program and as a language.

While a number of the options and features described here were discussed in passing earlier in the book, this major node provides the full details.

Invoking @command{awk}

There are two ways to run @command{awk}---with an explicit program or with one or more program files. Here are templates for both of them; items enclosed in [...] in these templates are optional:

awk [options] -f progfile [--] file ...
awk [options] [--] 'program' file ...

Besides traditional one-letter POSIX-style options, @command{gawk} also supports GNU long options.

It is possible to invoke @command{awk} with an empty program:

awk " datafile1 datafile2

Doing so makes little sense though; @command{awk} exits silently when given an empty program. (d.c.) If @option{--lint} has been specified on the command-line, @command{gawk} issues a warning that the program is empty.

Command-Line Options

Options begin with a dash and consist of a single character. GNU-style long options consist of two dashes and a keyword. The keyword can be abbreviated, as long as the abbreviation allows the option to be uniquely identified. If the option takes an argument, then the keyword is either immediately followed by an equals sign (`=') and the argument's value, or the keyword and the argument's value are separated by whitespace. If a particular option with a value is given more than once, it is the last value that counts.

Each long option for @command{gawk} has a corresponding POSIX-style option. The long and short options are interchangeable in all contexts. The options and their meanings are as follows:

-F fs
--field-separator fs
Sets the FS variable to fs (see section Specifying How Fields Are Separated).
-f source-file
--file source-file
Indicates that the @command{awk} program is to be found in source-file instead of in the first non-option argument.
-v var=val
--assign var=val
Sets the variable var to the value val before execution of the program begins. Such variable values are available inside the BEGIN rule (see section Other Command-Line Arguments). The @option{-v} option can only set one variable, but it can be used more than once, setting another variable each time, like this: `awk -v foo=1 -v bar=2 ...'. Caution: Using @option{-v} to set the values of the built-in variables may lead to surprising results. @command{awk} will reset the values of those variables as it needs to, possibly ignoring any predefined value you may have given.
-mf N
-mr N
Set various memory limits to the value N. The `f' flag sets the maximum number of fields and the `r' flag sets the maximum record size. These two flags and the @option{-m} option are from the Bell Laboratories research version of Unix @command{awk}. They are provided for compatibility but otherwise ignored by @command{gawk}, since @command{gawk} has no predefined limits. (The Bell Laboratories @command{awk} no longer needs these options; it continues to accept them to avoid breaking old programs.)
-W gawk-opt
Following the POSIX standard, implementation-specific options are supplied as arguments to the @option{-W} option. These options also have corresponding GNU-style long options. Note that the long options may be abbreviated, as long as the abbreviations remain unique. The full list of @command{gawk}-specific options is provided next.
--
Signals the end of the command-line options. The following arguments are not treated as options even if they begin with `-'. This interpretation of @option{--} follows the POSIX argument parsing conventions. This is useful if you have file names that start with `-', or in shell scripts, if you have file names that will be specified by the user that could start with `-'.

The previous list described options mandated by the POSIX standard, as well as options available in the Bell Laboratories version of @command{awk}. The following list describes @command{gawk}-specific options:

-W compat
-W traditional
--compat
--traditional
Specifies compatibility mode, in which the GNU extensions to the @command{awk} language are disabled, so that @command{gawk} behaves just like the Bell Laboratories research version of Unix @command{awk}. @option{--traditional} is the preferred form of this option. @xref{POSIX/GNU, ,Extensions in @command{gawk} Not in POSIX @command{awk}}, which summarizes the extensions. Also see section Downward Compatibility and Debugging.
-W copyright
--copyright
Print the short version of the General Public License and then exit.
-W copyleft
--copyleft
Just like @option{--copyright}. This option may disappear in a future version of @command{gawk}.
-W dump-variables[=file]
--dump-variables[=file]
Print a sorted list of global variables, their types, and final values to file. If no file is provided, @command{gawk} prints this list to a file named `awkvars.out' in the current directory. Having a list of all the global variables is a good way to look for typographical errors in your programs. You would also use this option if you have a large program with a lot of functions, and you want to be sure that your functions don't inadvertently use global variables that you meant to be local. (This is a particularly easy mistake to make with simple variable names like i, j, and so on.)
-W gen-po
--gen-po
Analyze the source program and generate a GNU gettext Portable Object file on standard output for all string constants that have been marked for translation. @xref{Internationalization, ,Internationalization with @command{gawk}}, for information about this option.
-W help
-W usage
--help
--usage
Print a "usage" message summarizing the short and long style options that @command{gawk} accepts and then exit.
-W lint[=fatal]
--lint[=fatal]
Warn about constructs that are dubious or non-portable to other @command{awk} implementations. Some warnings are issued when @command{gawk} first reads your program. Others are issued at runtime, as your program executes. With an optional argument of `fatal', lint warnings become fatal errors. This may be drastic but its use will certainly encourage the development of cleaner @command{awk} programs.
-W lint-old
--lint-old
Warn about constructs that are not available in the original version of @command{awk} from Version 7 Unix (see section Major Changes Between V7 and SVR3.1).
-W non-decimal-data
--non-decimal-data
Enable automatic interpretation of octal and hexadecimal values in input data (see section Allowing Non-Decimal Input Data). Caution: This option can severely break old programs. Use with care.
-W posix
--posix
Operate in strict POSIX mode. This disables all @command{gawk} extensions (just like @option{--traditional}) and adds the following additional restrictions: If you supply both @option{--traditional} and @option{--posix} on the command-line, @option{--posix} takes precedence. @command{gawk} also issues a warning if both options are supplied.
-W profile[=file]
--profile[=file]
Enable profiling of @command{awk} programs (@pxref{Profiling, ,Profiling Your @command{awk} Programs}). By default, profiles are created in a file named `awkprof.out'. The optional file argument allows you to specify a different file name for the profile file. When run with @command{gawk}, the profile is just a "pretty printed" version of the program. When run with @command{pgawk}, the profile contains execution counts for each statement in the program in the left margin, and function call counts for each function.
-W re-interval
--re-interval
Allow interval expressions (see section Regular Expression Operators) in regexps. Because interval expressions were traditionally not available in @command{awk}, @command{gawk} does not provide them by default. This prevents old @command{awk} programs from breaking.
-W source program-text
--source program-text
Program source code is taken from the program-text. This option allows you to mix source code in files with source code that you enter on the command-line. This is particularly useful when you have library functions that you want to use from your command-line programs (@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}).
-W version
--version
Print version information for this particular copy of @command{gawk}. This allows you to determine if your copy of @command{gawk} is up to date with respect to whatever the Free Software Foundation is currently distributing. It is also useful for bug reports (see section Reporting Problems and Bugs).

As long as program text has been supplied, any other options are flagged as invalid with a warning message but are otherwise ignored.

In compatibility mode, as a special case, if the value of fs supplied to the @option{-F} option is `t', then FS is set to the tab character ("\t"). This is only true for @option{--traditional} and not for @option{--posix} (see section Specifying How Fields Are Separated).

The @option{-f} option may be used more than once on the command-line. If it is, @command{awk} reads its program source from all of the named files, as if they had been concatenated together into one big file. This is useful for creating libraries of @command{awk} functions. These functions can be written once and then retrieved from a standard place, instead of having to be included into each individual program. (As mentioned in section Function Definition Syntax, function names must be unique.)

Library functions can still be used, even if the program is entered at the terminal, by specifying `-f /dev/tty'. After typing your program, type Ctrl-d (the end-of-file character) to terminate it. (You may also use `-f -' to read program source from the standard input but then you will not be able to also use the standard input as a source of data.)

Because it is clumsy using the standard @command{awk} mechanisms to mix source file and command-line @command{awk} programs, @command{gawk} provides the @option{--source} option. This does not require you to pre-empt the standard input for your source code; it allows you to easily mix command-line and library source code (@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}).

If no @option{-f} or @option{--source} option is specified, then @command{gawk} uses the first non-option command-line argument as the text of the program source code.

If the environment variable @env{POSIXLY_CORRECT} exists, then @command{gawk} behaves in strict POSIX mode, exactly as if you had supplied the @option{--posix} command-line option. Many GNU programs look for this environment variable to turn on strict POSIX mode. If @option{--lint} is supplied on the command-line and @command{gawk} turns on POSIX mode because of @env{POSIXLY_CORRECT}, then it issues a warning message indicating that POSIX mode is in effect. You would typically set this variable in your shell's startup file. For a Bourne-compatible shell (such as @command{bash}), you would add these lines to the `.profile' file in your home directory:

POSIXLY_CORRECT=true
export POSIXLY_CORRECT

For a @command{csh} compatible shell,(48) you would add this line to the `.login' file in your home directory:

setenv POSIXLY_CORRECT true

Having @env{POSIXLY_CORRECT} set is not recommended for daily use, but it is good for testing the portability of your programs to other environments.

Other Command-Line Arguments

Any additional arguments on the command-line are normally treated as input files to be processed in the order specified. However, an argument that has the form var=value, assigns the value value to the variable var---it does not specify a file at all. (This was discussed earlier in section Assigning Variables on the Command Line.)

All these arguments are made available to your @command{awk} program in the ARGV array (see section Built-in Variables). Command-line options and the program text (if present) are omitted from ARGV. All other arguments, including variable assignments, are included. As each element of ARGV is processed, @command{gawk} sets the variable ARGIND to the index in ARGV of the current element.

The distinction between file name arguments and variable-assignment arguments is made when @command{awk} is about to open the next input file. At that point in execution, it checks the file name to see whether it is really a variable assignment; if so, @command{awk} sets the variable instead of reading a file.

Therefore, the variables actually receive the given values after all previously specified files have been read. In particular, the values of variables assigned in this fashion are not available inside a BEGIN rule (see section The BEGIN and END Special Patterns), because such rules are run before @command{awk} begins scanning the argument list.

The variable values given on the command-line are processed for escape sequences (see section Escape Sequences). (d.c.)

In some earlier implementations of @command{awk}, when a variable assignment occurred before any file names, the assignment would happen before the BEGIN rule was executed. @command{awk}'s behavior was thus inconsistent; some command-line assignments were available inside the BEGIN rule, while others were not. Unfortunately, some applications came to depend upon this "feature." When @command{awk} was changed to be more consistent, the @option{-v} option was added to accommodate applications that depended upon the old behavior.

The variable assignment feature is most useful for assigning to variables such as RS, OFS, and ORS, which control input and output formats before scanning the data files. It is also useful for controlling state if multiple passes are needed over a data file. For example:

awk 'pass == 1  { pass 1 stuff }
     pass == 2  { pass 2 stuff }' pass=1 mydata pass=2 mydata

Given the variable assignment feature, the @option{-F} option for setting the value of FS is not strictly necessary. It remains for historical compatibility.

The @env{AWKPATH} Environment Variable

In most @command{awk} implementations, you must supply a precise path name for each program file, unless the file is in the current directory. But in @command{gawk}, if the file name supplied to the @option{-f} option does not contain a `/', then @command{gawk} searches a list of directories (called the search path), one by one, looking for a file with the specified name.

The search path is a string consisting of directory names separated by colons. @command{gawk} gets its search path from the @env{AWKPATH} environment variable. If that variable does not exist, @command{gawk} uses a default path, which is `.:/usr/local/share/awk'.(49) may use a different directory; it will depend upon how @command{gawk} was built and installed. The actual directory is the value of `$(datadir)' generated when @command{gawk} was configured. You probably don't need to worry about this though.} (Programs written for use by system administrators should use an @env{AWKPATH} variable that does not include the current directory, `.'.)

The search path feature is particularly useful for building libraries of useful @command{awk} functions. The library files can be placed in a standard directory in the default path and then specified on the command-line with a short file name. Otherwise, the full file name would have to be typed for each file.

By using both the @option{--source} and @option{-f} options, your command-line @command{awk} programs can use facilities in @command{awk} library files. @xref{Library Functions, , A Library of @command{awk} Functions}. Path searching is not done if @command{gawk} is in compatibility mode. This is true for both @option{--traditional} and @option{--posix}. See section Command-Line Options.

Note: If you want files in the current directory to be found, you must include the current directory in the path, either by including `.' explicitly in the path or by writing a null entry in the path. (A null entry is indicated by starting or ending the path with a colon or by placing two colons next to each other (`::').) If the current directory is not included in the path, then files cannot be found in the current directory. This path search mechanism is identical to the shell's.

Starting with version 3.0, if @env{AWKPATH} is not defined in the environment, @command{gawk} places its default search path into ENVIRON["AWKPATH"]. This makes it easy to determine the actual search path that @command{gawk} will use from within an @command{awk} program.

While you can change ENVIRON["AWKPATH"] within your @command{awk} program, this has no effect on the running program's behavior. This makes sense: the @env{AWKPATH} environment variable is used to find the program source files. Once your program is running, all the files have been found, and @command{gawk} no longer needs to use @env{AWKPATH}.

Obsolete Options and/or Features

This minor node describes features and/or command-line options from previous releases of @command{gawk} that are either not available in the current version or that are still supported but deprecated (meaning that they will not be in the next release).

For version 3.1 of @command{gawk}, there are no deprecated command-line options from the previous version of @command{gawk}. The use of `next file' (two words) for nextfile was deprecated in @command{gawk} 3.0 but still worked. Starting with version 3.1, the two word usage is no longer accepted.

The process-related special files described in section Special Files for Process-Related Information, work as described, but are now considered deprecated. @command{gawk} prints a warning message every time they are used. (Use PROCINFO instead; see section Built-in Variables That Convey Information.) They will be removed from the next release of @command{gawk}.

Undocumented Options and Features

Use the Source, Luke!
Obi-Wan

This minor node intentionally left blank.

Known Bugs in @command{gawk}


Go to the first, previous, next, last section, table of contents.