246 lines
8.4 KiB
Groff
246 lines
8.4 KiB
Groff
.CD "awk \(en pattern matching language"
|
|
.SX "awk \fIrules\fR [\fIfile\fR] ...
|
|
.FL "\fR(none)"
|
|
.EX "awk rules input" "Process \fIinput\fR according to \fIrules\fR"
|
|
.EX "awk rules \(en >out" "Input from terminal, output to \fIout\fR"
|
|
.PP
|
|
AWK is a programming language devised by Aho, Weinberger, and Kernighan
|
|
at Bell Labs (hence the name).
|
|
\fIAwk\fR programs search files for
|
|
specific patterns and performs \*(OQactions\*(CQ for every occurrence
|
|
of these patterns. The patterns can be \*(OQregular expressions\*(CQ
|
|
as used in the \fIed\fR editor. The actions are expressed
|
|
using a subset of the C language.
|
|
.PP
|
|
The patterns and actions are usually placed in a \*(OQrules\*(CQ file
|
|
whose name must be the first argument in the command line,
|
|
preceded by the flag \fB\(enf\fR. Otherwise, the first argument on the
|
|
command line is taken to be a string containing the rules
|
|
themselves. All other arguments are taken to be the names of text
|
|
files on which the rules are to be applied, with \fB\(en\fR being the
|
|
standard input. To take rules from the standard input, use \fB\(enf \(en\fR.
|
|
.PP
|
|
The command:
|
|
.HS
|
|
.Cx "awk rules prog.\d\s+2*\s0\u"
|
|
.HS
|
|
would read the patterns and actions rules from the file \fIrules\fR
|
|
and apply them to all the arguments.
|
|
.PP
|
|
The general format of a rules file is:
|
|
.HS
|
|
~~~<pattern> { <action> }
|
|
~~~<pattern> { <action> }
|
|
~~~...
|
|
.HS
|
|
There may be any number of these <pattern> { <action> }
|
|
sequences in the rules file. \fIAwk\fR reads a line of input from
|
|
the current input file and applies every <pattern> { <action> }
|
|
in sequence to the line.
|
|
.PP
|
|
If the <pattern> corresponding to any { <action> } is missing,
|
|
the action is applied to every line of input. The default
|
|
{ <action> } is to print the matched input line.
|
|
.SS "Patterns"
|
|
.PP
|
|
The <pattern>s may consist of any valid C expression. If the
|
|
<pattern> consists of two expressions separated by a comma, it
|
|
is taken to be a range and the <action> is performed on all
|
|
lines of input that match the range. <pattern>s may contain
|
|
\*(OQregular expressions\*(CQ delimited by an @ symbol. Regular
|
|
expressions can be thought of as a generalized \*(OQwildcard\*(CQ
|
|
string matching mechanism, similar to that used by many
|
|
operating systems to specify file names. Regular expressions
|
|
may contain any of the following characters:
|
|
.HS
|
|
.in +0.75i
|
|
.ta +0.5i
|
|
.ti -0.5i
|
|
x An ordinary character
|
|
.ti -0.5i
|
|
\\ The backslash quotes any character
|
|
.ti -0.5i
|
|
^ A circumflex at the beginning of an expr matches the beginning of a line.
|
|
.ti -0.5i
|
|
$ A dollar-sign at the end of an expression matches the end of a line.
|
|
.ti -0.5i
|
|
\&. A period matches any single character except newline.
|
|
.ti -0.5i
|
|
* An expression followed by an asterisk matches zero or more occurrences
|
|
of that expression: \*(OQfo*\*(CQ matches \*(OQf\*(CQ, \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
|
|
.ti -0.5i
|
|
+ An expression followed by a plus sign matches one or more occurrences
|
|
of that expression: \*(OQfo+\*(CQ matches \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
|
|
.ti -0.5i
|
|
[] A string enclosed in square brackets matches any single character in that
|
|
string, but no others. If the first character in the string is a circumflex, the
|
|
expression matches any character except newline and the characters in the
|
|
string. For example, \*(OQ[xyz]\*(CQ matches \*(OQxx\*(CQ and \*(OQzyx\*(CQ, while
|
|
\*(OQ[^xyz]\*(CQ matches \*(OQabc\*(CQ but not \*(OQaxb\*(CQ. A range of characters may be
|
|
specified by two characters separated by \*(OQ-\*(CQ.
|
|
.in -0.75i
|
|
.SS "Actions"
|
|
.PP
|
|
Actions are expressed as a subset of the C language. All
|
|
variables are global and default to int's if not formally
|
|
declared.
|
|
Only char's and int's and pointers and arrays of
|
|
char and int are allowed. \fIAwk\fR allows only decimal integer
|
|
constants to be used\(emno hex (0xnn) or octal (0nn). String
|
|
and character constants may contain all of the special C
|
|
escapes (\\n, \\r, etc.).
|
|
.PP
|
|
\fIAwk\fR supports the \*(OQif\*(CQ, \*(OQelse\*(CQ,
|
|
\*(OQwhile\*(CQ and \*(OQbreak\*(CQ flow of
|
|
control constructs, which behave exactly as in C.
|
|
.PP
|
|
Also supported are the following unary and binary operators,
|
|
listed in order from highest to lowest precedence:
|
|
.HS
|
|
.ta 0.25i 1.75i 3.0i
|
|
.nf
|
|
\fB Operator Type Associativity\fR
|
|
() [] unary left to right
|
|
.tr ~~
|
|
! ~ ++ \(en\(en \(en * & unary right to left
|
|
.tr ~
|
|
* / % binary left to right
|
|
+ \(en binary left to right
|
|
<< >> binary left to right
|
|
< <= > >= binary left to right
|
|
== != binary left to right
|
|
& binary left to right
|
|
^ binary left to right
|
|
| binary left to right
|
|
&& binary left to right
|
|
|| binary left to right
|
|
= binary right to left
|
|
.fi
|
|
.HS
|
|
Comments are introduced by a '#' symbol and are terminated by
|
|
the first newline character. The standard \*(OQ/*\*(CQ and \*(OQ*/\*(CQ
|
|
comment delimiters are not supported and will result in a
|
|
syntax error.
|
|
.SP 0.5
|
|
.SS "Fields"
|
|
.SP 0.5
|
|
.PP
|
|
When \fIawk\fR reads a line from the current input file, the
|
|
record is automatically separated into \*(OQfields.\*(CQ A field is
|
|
simply a string of consecutive characters delimited by either
|
|
the beginning or end of line, or a \*(OQfield separator\*(CQ character.
|
|
Initially, the field separators are the space and tab character.
|
|
The special unary operator '$' is used to reference one of the
|
|
fields in the current input record (line). The fields are
|
|
numbered sequentially starting at 1. The expression \*(OQ$0\*(CQ
|
|
references the entire input line.
|
|
.PP
|
|
Similarly, the \*(OQrecord separator\*(CQ is used to determine the end
|
|
of an input \*(OQline,\*(CQ initially the newline character. The field
|
|
and record separators may be changed programatically by one of
|
|
the actions and will remain in effect until changed again.
|
|
.PP
|
|
Multiple (up to 10) field separators are allowed at a time, but
|
|
only one record separator.
|
|
.PP
|
|
Fields behave exactly like strings; and can be used in the same
|
|
context as a character array. These \*(OQarrays\*(CQ can be considered
|
|
to have been declared as:
|
|
.SP 0.15
|
|
.HS
|
|
~~~~~char ($n)[ 128 ];
|
|
.HS
|
|
.SP 0.15
|
|
In other words, they are 128 bytes long. Notice that the
|
|
parentheses are necessary because the operators [] and $
|
|
associate from right to left; without them, the statement
|
|
would have parsed as:
|
|
.HS
|
|
.SP 0.15
|
|
~~~~~char $(1[ 128 ]);
|
|
.HS
|
|
.SP 0.15
|
|
which is obviously ridiculous.
|
|
.PP
|
|
If the contents of one of these field arrays is altered, the
|
|
\*(OQ$0\*(CQ field will reflect this change. For example, this
|
|
expression:
|
|
.HS
|
|
.SP 0.15
|
|
~~~~~*$4 = 'A';
|
|
.HS
|
|
.SP 0.15
|
|
will change the first character of the fourth field to an upper-
|
|
case letter 'A'. Then, when the following input line:
|
|
.HS
|
|
.SP 0.15
|
|
~~~~~120 PRINT "Name address Zip"
|
|
.SP 0.15
|
|
.HS
|
|
is processed, it would be printed as:
|
|
.HS
|
|
.SP 0.15
|
|
~~~~~120 PRINT "Name Address Zip"
|
|
.HS
|
|
.SP 0.15
|
|
Fields may also be modified with the strcpy() function (see
|
|
below). For example, the expression:
|
|
.HS
|
|
~~~~~strcpy( $4, "Addr." );
|
|
.HS
|
|
applied to the same line above would yield:
|
|
.HS
|
|
~~~~~120 PRINT "Name Addr. Zip"
|
|
.HS
|
|
.SS "Predefined Variables"
|
|
.PP
|
|
The following variables are pre-defined:
|
|
.HS
|
|
.in +1.5i
|
|
.ta +1.25i
|
|
.ti -1.25i
|
|
FS Field separator (see below).
|
|
.ti -1.25i
|
|
RS Record separator (see below also).
|
|
.ti -1.25i
|
|
NF Number of fields in current input record (line).
|
|
.ti -1.25i
|
|
NR Number of records processed thus far.
|
|
.ti -1.25i
|
|
FILENAME Name of current input file.
|
|
.ti -1.25i
|
|
BEGIN A special <pattern> that matches the beginning of input text.
|
|
.ti -1.25i
|
|
END A special <pattern> that matches the end of input text.
|
|
.in -1.5i
|
|
.HS
|
|
\fIAwk\fR also provides some useful built-in functions for string
|
|
manipulation and printing:
|
|
.HS
|
|
.in +1.5i
|
|
.ta +1.25i
|
|
.ti -1.25i
|
|
print(arg) Simple printing of strings only, terminated by '\\n'.
|
|
.ti -1.25i
|
|
printf(arg...) Exactly the printf() function from C.
|
|
.ti -1.25i
|
|
getline() Reads the next record and returns 0 on end of file.
|
|
.ti -1.25i
|
|
nextfile() Closes the current input file and begins processing the next file
|
|
.ti -1.25i
|
|
strlen(s) Returns the length of its string argument.
|
|
.ti -1.25i
|
|
strcpy(s,t) Copies the string \*(OQt\*(CQ to the string \*(OQs\*(CQ.
|
|
.ti -1.25i
|
|
strcmp(s,t) Compares the \*(OQs\*(CQ to \*(OQt\*(CQ and returns 0 if they match.
|
|
.ti -1.25i
|
|
toupper(c) Returns its character argument converted to upper-case.
|
|
.ti -1.25i
|
|
tolower(c) Returns its character argument converted to lower-case.
|
|
.ti -1.25i
|
|
match(s,@re@) Compares the string \*(OQs\*(CQ to the regular expression \*(OQre\*(CQ and
|
|
returns the number of matches found (zero if none).
|
|
.in -1.5i
|
|
.SS "Authors"
|
|
.PP
|
|
\fIAwk\fR was written by Saeko Hirabauashi and Kouichi Hirabayashi.
|