BBS水木清华站∶精华区
发信人: dfbb (赵无忌), 信区: Linux
标 题: [doc]Programminl Perl Chap1
发信站: BBS 水木清华站 (Fri May 8 22:06:33 1998)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>[Chapter 1] An Overview of Perl[Chapter 1] 1.2 Natural and Artificial Languages[Chapter 1] 1.3 A Grade Example[Chapter 1] 1.4 Filehandles[Chapter 1] 1.5 Operators[Chapter 1] 1.6 Control Structures[Chapter 1] 1.7 Regular Expressions[Chapter 1]
1.8 List Processing[Chapter 1] 1.9 What You Don't Know Won't Hurt You (Much)</TITLE>
<META NAME="author" CONTENT="Larry Wall, Tom Christiansen, and Randal Schwartz">
<META NAME="date" CONTENT="Fri Aug 29 18:27:39 1997">
<META NAME="form" CONTENT="html">
<META NAME="metadata" CONTENT="dublincore.0.1">
<META NAME="objecttype" CONTENT="book part">
<META NAME="otheragent" CONTENT="gmat dbtohtml">
<META NAME="publisher" CONTENT="O'Reilly & Associates, Inc.">
<META NAME="source" CONTENT="SGML">
<META NAME="subject" CONTENT="Perl">
<META NAME="title" CONTENT="Programming Perl, Second Edition">
<META HTTP-EQUIV="Content-Script-Type" CONTENT="text/javascript">
<META NAME="GENERATOR" CONTENT="Mozilla/3.01Gold (Win95; I) [Netscape]">
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF">
<H1><A NAME="PERL2-CH-1"></A>1. An Overview of Perl</H1>
<P><B>Contents:<BR>
</B>Getting Started<BR>
<A HREF="ch01_02.htm">Natural and Artificial Languages<BR>
</A><A HREF="ch01_03.htm">A Grade Example<BR>
</A><A HREF="ch01_04.htm">Filehandles<BR>
</A><A HREF="ch01_05.htm">Operators<BR>
</A><A HREF="ch01_06.htm">Control Structures<BR>
</A><A HREF="ch01_07.htm">Regular Expressions<BR>
</A><A HREF="ch01_08.htm">List Processing<BR>
</A><A HREF="ch01_09.htm">What You Don't Know Won't Hurt You (Much)</A><BR>
</P>
<H2><A NAME="PERL2-CH-1-SECT-1"></A>1.1 Getting Started</H2>
<P>We think that Perl is an easy language to learn and use, and we hope
to convince you that we're right. One thing that's easy about Perl is that
you don't have to say much before you say what you want to say. In many
programming languages, you have to declare the types, variables, and subroutines
you are going to use before you can write the first statement of executable
code. And for complex problems demanding complex data structures, this
is a good idea. But for many simple, everyday problems, you would like
a programming language in which you can simply say: </P>
<PRE>print "Howdy, world!\n";
</PRE>
<P>and expect the program to do just that. </P>
<P>Perl is such a language. In fact, the example is a complete program,[1]
and if you feed it to the Perl interpreter, it will print "<TT>Howdy,
world!</TT>" on your screen. </P>
<BLOCKQUOTE class=footnote>
<P>[1] Or script, or application, or executable, or doohickey. Whatever.
</P>
</BLOCKQUOTE>
<P>And that's that. You don't have to say much <I>after</I> you say what
you want to say, either. Unlike many languages, Perl thinks that falling
off the end of your program is just a normal way to exit the program. You
certainly <I>may</I> call the <A HREF="ch03_02.htm#PERL2-CMD-EXIT">exit</A>
function explicitly if you wish, just as you <I>may</I> declare some of
your variables and subroutines, or even <I>force</I> yourself to declare
all your variables and subroutines. But it's your choice. With Perl you're
free to do The Right Thing, however you care to define it. </P>
<P>There are many other reasons why Perl is easy to use, but it would be
pointless to list them all here, because that's what the rest of the book
is for. The devil may be in the details, as they say, but Perl tries to
help you out down there in the hot place too. At every level, Perl is about
helping you get from here to there with minimum fuss and maximum enjoyment.
That's why so many Perl programmers go around with a silly grin on their
face. </P>
<P>This chapter is an overview of Perl, so we're not trying to present
Perl to the rational side of your brain. Nor are we trying to be complete,
or logical. That's what the next chapter is for.[2] This chapter presents
Perl to the <I>other</I> side of your brain, whether you prefer to call
it associative, artistic, passionate, or merely spongy. To that end, we'll
be presenting various views of Perl that will hopefully give you as clear
a picture of Perl as the blind men had of the elephant. Well, okay, maybe
we can do better than that. We're dealing with a camel here. Hopefully,
at least one of these views of Perl will help get you over the hump. </P>
<BLOCKQUOTE class=footnote>
<P>[2] Vulcans (and like-minded humans) should skip this overview and go
straight to <A HREF="ch02_01.htm">Chapter 2, <I>The Gory Details</I></A>,
for maximum information density. If, on the other hand, you're looking
for a carefully paced tutorial, you should probably get Randal's nice book,
<I>Learning Perl</I> (published by O'Reilly & Associates). But don't
throw out this book just yet. </P>
</BLOCKQUOTE>
<H2><A NAME="PERL2-CH-1-SECT-2"></A>1.2 Natural and Artificial Languages</H2>
<P><A NAME="CH01.LANG"></A><A NAME="CH01.NAT"></A><A NAME="CH01.ART"></A>Languages
were first invented by humans, for the benefit of humans. In the annals
of computer science, this fact has occasionally been forgotten.[3] Since
Perl was designed (loosely speaking) by an occasional linguist, it was
designed to work smoothly in the same ways that natural language works
smoothly. Naturally, there are many aspects to this, since natural language
works well at many levels simultaneously. We could enumerate many of these
linguistic principles here, but the most important principle of language
design is simply that easy things should be easy, and hard things should
be possible. That may seem obvious, but many computer languages fail at
one or the other. </P>
<BLOCKQUOTE class=footnote>
<P>[3] More precisely, this fact has occasionally been remembered. </P>
</BLOCKQUOTE>
<P>Natural languages are good at both because people are continually trying
to express both easy things and hard things, so the language evolves to
handle both. Perl was designed first of all to evolve, and indeed it has
evolved. Many people have contributed to the evolution of Perl over the
years. We often joke that a camel is a horse designed by a committee, but
if you think about it, the camel is pretty well adapted for life in the
desert. The camel has evolved to be relatively self-sufficient.[4] </P>
<BLOCKQUOTE class=footnote>
<P>[4] On the other hand, the camel has not evolved to smell good. Neither
has Perl. </P>
</BLOCKQUOTE>
<P>Now when someone utters the word "linguistics", many people
think of one of two things. Either they think of words, or they think of
sentences. But words and sentences are just two handy ways to "chunk"
speech. Either may be broken down into smaller units of meaning, or combined
into larger units of meaning. And the meaning of any unit depends heavily
on the syntactic, semantic, and pragmatic context in which the unit is
located. Natural language has words of various sorts, nouns and verbs and
such. If I say "dog" in isolation, you think of it as a noun,
but I can also use the word in other ways. That is, a noun can function
as a verb, an adjective or an adverb when the context demands it. If you
dog a dog during the dog days of summer, you'll be a dog tired dogcatcher.[5]
</P>
<BLOCKQUOTE class=footnote>
<P>[5] And you're probably dog tired of all this linguistics claptrap.
But we'd like you to understand why Perl is different from the typical
computer language, doggone it! </P>
</BLOCKQUOTE>
<P>Perl also evaluates words differently in various contexts. We will see
how it does that later. Just remember that Perl is trying to understand
what you're saying, like any good listener does. Perl works pretty hard
to try to keep up its end of the bargain. Just say what you mean, and Perl
will usually "get it". (Unless you're talking nonsense, of course--the
Perl parser understands Perl a lot better than either English or Swahili.)
</P>
<P>But back to nouns. A noun can name a particular object, or it can name
a class of objects generically without specifying which one or ones are
currently being referred to. Most computer languages make this distinction,
only we call the particular thing a value and the generic one a variable.
A value just exists somewhere, who knows where, but a variable gets associated
with one or more values over its lifetime. So whoever is interpreting the
variable has to keep track of that association. That interpreter may be
in your brain, or in your computer. </P>
<H3><A NAME="PERL2-CH-1-SECT-2.1"></A>Nouns</H3>
<P><A NAME="CH01.NOUNS"></A><A NAME="CH01.VAR"></A><A NAME="CH01.NV"></A>A
variable is just a handy place to keep something, a place with a name,
so you know where to find your special something when you come back looking
for it later. As in real life, there are various kinds of places to store
things, some of them rather private, and some of them out in public. Some
places are temporary, and other places are more permanent. Computer scientists
love to talk about the "scope" of variables, but that's all they
mean by it. Perl has various handy ways of dealing with scoping issues,
which you'll be happy to learn later when the time is right. Which is not
yet. (Look up the adjectives "local" and "my" in <A HREF="ch03_01.htm">Chapter
3, <I>Functions</I></A>, when you get curious.) </P>
<P>But a more immediately useful way of classifying variables is by what
sort of data they can hold. As in English, Perl's primary type distinction
is between singular and plural data. Strings and numbers are singular pieces
of data, while lists of strings or numbers are plural. (And when we get
to object-oriented programming, you'll find that an object looks singular
from the outside, but may look plural from the inside, like a class of
students.) We call a singular variable a <I>scalar</I>, and a plural variable
an <I>array</I>. Since a string can be stored in a scalar variable, we
might write a slightly longer (and commented) version of our first example
like this: </P>
<PRE>$phrase = "Howdy, world!\n"; # Set a variable.
print $phrase; # Print the variable.
</PRE>
<P>Note that we did not have to predefine what kind of variable <TT>$phrase</TT>
is. The <TT>$</TT> character tells Perl that <TT>phrase</TT> is a scalar
variable, that is, one containing a singular value. An array variable,
by contrast, would start with an <TT>@</TT> character. (It may help you
to remember that a <TT>$</TT> is a stylized "S", for "scalar",
while <TT>@</TT> is a stylized "a", for "array".) </P>
<P>Perl has some other variable types, with unlikely names like "hash",
"handle", and "typeglob". Like scalars and arrays,
these types of variables are also preceded by funny characters.[6] For
completeness, <A HREF="ch01_02.htm#PERL2-CH-1-TAB-1">Table 1.1</A> lists
all the funny characters you'll encounter. </P>
<BLOCKQUOTE class=footnote>
<P>[6] Some language purists point to these funny characters as a reason
to abhor Perl. This is superficial. These characters have many benefits:
Variables can be interpolated into strings with no additional syntax. Perl
scripts are easy to read (for people who have bothered to learn Perl!)
because the nouns stand out from verbs, and new verbs can be added to the
language without breaking old scripts. (We told you Perl was designed to
evolve.) And the noun analogy is not frivolous--there is ample precedent
in various natural languages for requiring grammatical noun markers. It's
how we think! (We think.) </P>
</BLOCKQUOTE>
<TABLE>
<CAPTION>
<P><A NAME="PERL2-CH-1-TAB-1"></A>Table 1.1: Variable Syntax</P>
</CAPTION>
<TR CLASS=row>
<TH ALIGN="left">Type</TH>
<TH ALIGN="left">Character</TH>
<TH ALIGN="left">Example</TH>
<TH ALIGN="left">Is a name for:</TH>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Scalar</TD>
<TD ALIGN="left"><TT>$</TT></TD>
<TD ALIGN="left"><TT>$cents</TT></TD>
<TD ALIGN="left">An individual value (number or string)</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Array</TD>
<TD ALIGN="left"><TT>@</TT></TD>
<TD ALIGN="left"><TT>@large</TT></TD>
<TD ALIGN="left">A list of values, keyed by number</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Hash</TD>
<TD ALIGN="left"><TT>%</TT></TD>
<TD ALIGN="left"><TT>%interest</TT></TD>
<TD ALIGN="left">A group of values, keyed by string</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Subroutine</TD>
<TD ALIGN="left"><TT>&</TT></TD>
<TD ALIGN="left"><TT>&how</TT></TD>
<TD ALIGN="left">A callable chunk of Perl code</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Typeglob</TD>
<TD ALIGN="left"><TT>*</TT></TD>
<TD ALIGN="left"><TT>*struck</TT></TD>
<TD ALIGN="left">Everything named <TT>struck</TT></TD>
</TR>
</TABLE>
<H4><A NAME="PERL2-CH-1-SECT-2.1.1"></A>Singularities</H4>
<P>From our example, you can see that scalars may be assigned a new value
with the <TT>=</TT> operator, just as in many other computer languages.
Scalar variables can be assigned any form of scalar value: integers, floating-point
numbers, strings, and even esoteric things like references to other variables,
or to objects. There are many ways of generating these values for assignment.
</P>
<P>As in the UNIX shell, you can use different quoting mechanisms to make
different kinds of values. Double quotation marks (double quotes) do variable
interpolation[7] and backslash interpretation,[8] while single quotes suppress
both interpolation and interpretation. And backquotes (the ones leaning
to the left) will execute an external program and return the output of
the program, so you can capture it as a single string containing all the
lines of output. </P>
<BLOCKQUOTE class=footnote>
<P>[7] Sometimes called "substitution" by shell programmers,
but we prefer to reserve that word for something else in Perl. So please
call it interpolation. We're using the term in the textual sense ("this
passage is a Gnostic interpolation") rather than in the mathematical
sense ("this point on the graph is an interpolation between two other
points"). </P>
<P>[8] Such as turning <TT>\t</TT> into a tab, <TT>\n</TT> into a newline,
<TT>\ 001</TT> into a CTRL-A, and so on, in the tradition of many UNIX
programs. </P>
</BLOCKQUOTE>
<PRE>$answer = 42; # an integer
$pi = 3.14159265; # a "real" number
$avocados = 6.02e23; # scientific notation
$pet = "Camel"; # string
$sign = "I love my $pet"; # string with interpolation
$cost = 'It costs $100'; # string without interpolation
$thence = $whence; # another variable
$x = $moles * $avocados; # an expression
$cwd = `pwd`; # string output from a command
$exit = system("vi $x"); # numeric status of a command
$fido = new Camel "Fido"; # an object
</PRE>
<P>Uninitialized variables automatically spring into existence as needed.
Following the principle of least surprise, they are created with a null
value, either <TT>""</TT> or <TT>0</TT>. Depending on where you
use them, variables will be interpreted automatically as strings, as numbers,
or as "true" and "false" values (commonly called Boolean
values). Various operators expect certain kinds of values as parameters,
so we will speak of those operators as "providing" or "supplying"
a scalar context to those parameters. Sometimes we'll be more specific,
and say it supplies a numeric context, a string context, or a Boolean context
to those parameters. (Later we'll also talk about list context, which is
the opposite of scalar context.) Perl will automatically convert the data
into the form required by the current context, within reason. For example,
suppose you said this: </P>
<PRE>$camels = '123';
print $camels + 1, "\n";
</PRE>
<P>The original value of <TT>$camels</TT> is a string, but it is converted
to a number to add <TT>1</TT> to it, and then converted back to a string
to be printed out as <TT>124</TT>. The newline, represented by <TT>"\n"</TT>,
is also in string context, but since it's already a string, no conversion
is necessary. But notice that we had to use double quotes there--using
single quotes to say <TT>'\n'</TT> would result in a two-character string
consisting of a backslash followed by an "<TT>n</TT>", which
is not a newline by anybody's definition. </P>
<P>So, in a sense, double quotes and single quotes are yet another way
of specifying context. The interpretation of the innards of a quoted string
depends on which quotes you use. Later we'll see some other operators that
work like quotes syntactically, but use the string in some special way,
such as for pattern matching or substitution. These all work like double-quoted
strings too. The <I>double-quote</I> context is the "interpolative"
context of Perl, and is supplied by many operators that don't happen to
resemble double quotes. </P>
<H4><A NAME="PERL2-CH-1-SECT-2.1.2"></A>Pluralities</H4>
<P>Some kinds of variables hold multiple values that are logically tied
together. Perl has two types of multivalued variables: arrays and hashes.
In many ways these behave like scalars. They spring into existence with
nothing in them when needed. When you assign to them, they supply a <I>list</I>
context to the right side of the assignment. </P>
<P>You'd use an array when you want to look something up by number. You'd
use a hash when you want to look something up by name. The two concepts
are complementary. You'll often see people using an array to translate
month numbers into month names, and a corresponding hash to translate month
names back into month numbers. (Though hashes aren't limited to holding
only numbers. You could have a hash that translates month names to birthstone
names, for instance.) </P>
<H5><A NAME="PERL2-CH-1-SECT-2.1.2.1"></A>Arrays.</H5>
<P>An <I>array</I> is an ordered list of scalars, accessed[9] by the scalar's
position in the list. The list may contain numbers, or strings, or a mixture
of both. (In fact, it could also contain references to other lists, but
we'll get to that in <A HREF="ch04_01.htm">Chapter 4, <I>References and
Nested Data Structures</I></A>, when we're discussing multidimensional
arrays.) To assign a list value to an array, you simply group the variables
together (with a set of parentheses): </P>
<BLOCKQUOTE class=footnote>
<P>[9] Or keyed, or indexed, or subscripted, or looked up. Take your pick.
</P>
</BLOCKQUOTE>
<PRE>@home = ("couch", "chair", "table", "stove");
</PRE>
<P>Conversely, if you use <TT>@home</TT> in a list context, such as on
the right side of a list assignment, you get back out the same list you
put in. So you could set four scalar variables from the array like this:
</P>
<PRE>($potato, $lift, $tennis, $pipe) = @home;
</PRE>
<P>These are called list assignments. They logically happen in parallel,
so you can swap two variables by saying: </P>
<PRE>($alpha,$omega) = ($omega,$alpha);
</PRE>
<P>As in C, arrays are zero-based, so while you would talk about the first
through fourth elements of the array, you would get to them with subscripts
0 through 3.[10] Array subscripts are enclosed in square brackets [like
this], so if you want to select an individual array element, you would
refer to it as <TT>$home[</TT><I>n</I><TT>]</TT>, where <I>n</I> is the
subscript (one less than the element number) you want. See the example
below. Since the element you are dealing with is a scalar, you always precede
it with a <TT>$</TT>. </P>
<BLOCKQUOTE class=footnote>
<P>[10] If this seems odd to you, just think of the subscript as an offset,
that is, the count of how many array elements come before it. Obviously,
the first element doesn't have any elements before it, and so has an offset
of 0. This is how computers think. (We think.) </P>
</BLOCKQUOTE>
<P>If you want to assign to one array element at a time, you could write
the earlier assignment as: </P>
<PRE>$home[0] = "couch";
$home[1] = "chair";
$home[2] = "table";
$home[3] = "stove";
</PRE>
<P>Since arrays are ordered, there are various useful operations that you
can do on them, such as the stack operations, <A HREF="ch03_02.htm#PERL2-CMD-PUSH">push</A>
and <A HREF="ch03_02.htm#PERL2-CMD-POP">pop</A>. A stack is, after all,
just an ordered list, with a beginning and an end. Especially an end. Perl
regards the end of your list as the top of a stack. (Although most Perl
programmers think of a list as horizontal, with the top of the stack on
the right.) </P>
<H5><A NAME="PERL2-CH-1-SECT-2.1.2.2"></A>Hashes.</H5>
<P>A <I>hash</I> is an unordered set of scalars, accessed[11] by some string
value that is associated with each scalar. For this reason hashes are often
called "associative arrays". But that's too long for lazy typists
to type, and we talk about them so often that we decided to name them something
short and snappy.[12] The other reason we picked the name "hash"
is to emphasize the fact that they're disordered. (They are, coincidentally,
implemented internally using a hash-table lookup, which is why hashes are
so fast, and stay so fast no matter how many values you put into them.)
You can't <A HREF="ch03_02.htm#PERL2-CMD-PUSH">push</A> or <A HREF="ch03_02.htm#PERL2-CMD-POP">pop</A>
a hash though, because it doesn't make sense. A hash has no beginning or
end. Nevertheless, hashes are extremely powerful and useful. Until you
start thinking in terms of hashes, you aren't really thinking in Perl.
</P>
<o find your script. Something like
</P>
<BLOCKQUOTE class=footnote>
<P>[16] Although Perl has its share of funny notations, this one must be
blamed on UNIX. <I>chmod</I> (1) means you should refer to the manpage
for the <I>chmod</I> command in section one of your UNIX manual. If you
type either <TT>man 1 chmod</TT> or <TT>man -s 1 chmod</TT> (depending
on your flavor of UNIX), you should be able to find out all the interesting
information your system knows about the command <I>chmod</I>. (Of course,
if your flavor of UNIX happens to be "Not UNIX!" then you'll
need to refer to your system's documentation for the equivalent command,
presuming you are so blessed. Your chief consolation is that, if an equivalent
command does exist, it will have a much better name than <I>chmod</I>.)
</P>
</BLOCKQUOTE>
<PRE>% ../bin/gradation
</PRE>
<P>Finally, if you are unfortunate enough to be on an ancient UNIX system
that doesn't support the magic <TT>#!</TT> line, or if the path to your
interpreter is longer than 32 characters (a built-in limit on many systems),
you may be able to work around it like this: </P>
<PRE>#!/bin/sh -- # perl, to stop looping
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
if 0;
</PRE>
<P>Some operating systems may require variants on this to deal with <I>/bin/csh</I>,
<I>DCL</I>, <I>COMMAND.COM</I>, or whatever happens to be your default
command interpreter. Ask your Local Expert. </P>
<P>Throughout this book, we'll just use <TT>#!/usr/bin/perl</TT> to represent
all these notions and notations, but you'll know what we really mean by
it. </P>
<P>A random clue: when you write a test script, don't call your script
<I>test</I>. UNIX systems have a built-in test command, which will likely
be executed instead of your script. Try <I>try</I> instead. </P>
<P>A not-so-random clue: while learning Perl, and even after you think
you know what you're doing, we suggest using the <B>-w</B> option, especially
during development. This option will turn on all sorts of useful and interesting
warning messages, not necessarily in that order. You can put the <B>-w</B>
switch on the shebang line, like this: </P>
<PRE>#!/usr/bin/perl -w
</PRE>
<P>Now that you know how to run your own Perl program (not to be confused
with the <I>perl</I> program), let's get back to our example. </P>
<H2><A NAME="PERL2-CH-1-SECT-4"></A>1.4 Filehandles</H2>
<P><A NAME="CH01.IOF1"></A><A NAME="CH01.IOF2"></A><A NAME="CH01.FH"></A>Unless
you're using artificial intelligence to model a solipsistic philosopher,
your program needs some way to communicate with the outside world. In lines
3 and 4 of our grade example you'll see the word <TT>GRADES</TT>, which
exemplifies another of Perl's data types, the <I>filehandle</I>. A filehandle
is just a name you give to a file, device, socket, or pipe to help you
remember which one you're talking about, and to hide some of the complexities
of buffering and such. (Internally, filehandles are similar to streams
from a language like C++, or I/O channels from BASIC.) </P>
<P>Filehandles make it easier for you to get input from and send output
to many different places. Part of what makes Perl a good glue language
is that it can talk to many files and processes at once. Having nice symbolic
names for various external objects is just part of being a good glue language.[17]
</P>
<BLOCKQUOTE class=footnote>
<P>[17] Some of the other things that make Perl a good glue language are:
it's 8-bit clean, it's embeddable, and you can embed other things in it
via extension modules. It's concise, and networks easily. It's environmentally
conscious, so to speak. You can invoke it in many different ways (as we
saw earlier). But most of all, the language itself is not so rigidly structured
that you can't get it to "flow" around your problem. It comes
back to that TMTOWTDI thing again. </P>
</BLOCKQUOTE>
<P>You create a filehandle and attach it to a file by using the <A HREF="ch03_02.htm#PERL2-CMD-OPEN">open</A>
function. <A HREF="ch03_02.htm#PERL2-CMD-OPEN">open</A> takes two parameters:
the filehandle and the filename you want to associate it with. Perl also
gives you some predefined (and preopened) filehandles. <TT>STDIN</TT> is
your program's normal input channel, while <TT>STDOUT</TT> is your program's
normal output channel. And <TT>STDERR</TT> is an additional output channel
so that your program can make snide remarks off to the side while it transforms
(or attempts to transform) your input into your output.[18] </P>
<BLOCKQUOTE class=footnote>
<P>[18] These filehandles are typically attached to your terminal, so you
can type to your program and see its output, but they may also be attached
to files (and such). Perl can give you these predefined handles because
your operating system already provides them, one way or another. Under
UNIX, processes inherit standard input, output, and error from their parent
process, typically a shell. One of the duties of a shell is to set up these
I/O streams so that the child process doesn't need to worry about them.
</P>
</BLOCKQUOTE>
<P>Since you can use the <A HREF="ch03_02.htm#PERL2-CMD-OPEN">open</A>
function to create filehandles for various purposes (input, output, piping),
you need to be able to specify which behavior you want. As you would do
on the UNIX command line, you simply add characters to the filename. </P>
<PRE>open(SESAME, "filename"); # read from existing file
open(SESAME, "<filename"); # (same thing, explicitly)
open(SESAME, ">filename"); # create file and write to it
open(SESAME, ">>filename"); # append to existing file
open(SESAME, "| output-pipe-command"); # set up an output filter
open(SESAME, "input-pipe-command |"); # set up an input filter
</PRE>
<P>As you can see, the name you pick is arbitrary. Once opened, the filehandle
<TT>SESAME</TT> can be used to access the file or pipe until it is explicitly
closed (with, you guessed it, <TT>close(SESAME)</TT>), or the filehandle
is attached to another file by a subsequent <A HREF="ch03_02.htm#PERL2-CMD-OPEN">open</A>
on the same filehandle.[19] </P>
<BLOCKQUOTE class=footnote>
<P>[19] Opening an already opened filehandle implicitly closes the first
file, making it inaccessible to the filehandle, and opens a different file.
You must be careful that this is what you really want to do. Sometimes
it happens accidentally, like when you say <TT>open($handle,$file)</TT>,
and <TT>$handle</TT> happens to contain the null string. Be sure to set
<TT>$handle</TT> to something unique, or you'll just open a new file on
the null filehandle. </P>
</BLOCKQUOTE>
<P>Once you've opened a filehandle for input (or if you want to use <TT>STDIN</TT>),
you can read a line using the line reading operator, <TT><></TT>.
This is also known as the angle operator, because of its shape. The angle
operator encloses the filehandle (<TT><SESAME></TT>) you want to
read lines from.[20] An example using the <TT>STDIN</TT> filehandle to
read an answer supplied by the user would look something like this: </P>
<BLOCKQUOTE class=footnote>
<P>[20] The empty angle operator, <TT><></TT>, will read lines from
all the files specified on the command line, or <TT>STDIN</TT>, if none
were specified. (This is standard behavior for many UNIX filter programs.)
</P>
</BLOCKQUOTE>
<PRE>print STDOUT "Enter a number: "; # ask for a number
$number = <STDIN>; # input the number
print STDOUT "The number is $number\n"; # print the number
</PRE>
<P>Did you see what we just slipped by you? What's the <TT>STDOUT</TT>
doing in those <A HREF="ch03_02.htm#PERL2-CMD-PRINT">print</A> statements
there? Well, that's one of the ways you can use an output filehandle. A
filehandle may be supplied as the first argument to the <A HREF="ch03_02.htm#PERL2-CMD-PRINT">print</A>
statement, and if present, tells the output where to go. In this case,
the filehandle is redundant, because the output would have gone to <TT>STDOUT</TT>
anyway. Much as <TT>STDIN</TT> is the default for input, <TT>STDOUT</TT>
is the default for output. (In line 18 of our grade example, we left it
out, to avoid confusing you up till now.) </P>
<P>We also did something else to trick you. If you try the above example,
you may notice that you get an extra blank line. This happens because the
read does not automatically remove the newline from your input line (your
input would be, for example, "<TT>9\n</TT>"). For those times
when you do want to remove the newline, Perl provides the <A HREF="ch03_02.htm#PERL2-CMD-CHOP">chop</A>
and <A HREF="ch03_02.htm#PERL2-CMD-CHOMP">chomp</A> functions. <A HREF="ch03_02.htm#PERL2-CMD-CHOP">chop</A>
will indiscriminately remove (and return) the last character passed to
it, while <A HREF="ch03_02.htm#PERL2-CMD-CHOMP">chomp</A> will only remove
the end of record marker (generally, "<TT>\n</TT>"), and return
the number of characters so removed. You'll often see this idiom for inputting
a single line: </P>
<PRE>chop($number = <STDIN>); # input number and remove newline
</PRE>
<P>which means the same thing as </P>
<P>$number = <STDIN>; # input number<BR>
chop($number); # remove newline<BR>
</P>
<H2><A NAME="PERL2-CH-1-SECT-5"></A>1.5 Operators</H2>
<P><A NAME="CH01.OP"></A>As we alluded to earlier, Perl is also a mathematical
language. This is true at several levels, from low-level bitwise logical
operations, up through number and set manipulation, on up to larger predicates
and abstractions of various sorts. And as we all know from studying math
in school, mathematicians love strange symbols. What's worse, computer
scientists have come up with their own versions of these strange symbols.
Perl has a number of these strange symbols too, but take heart, most are
borrowed directly from C, FORTRAN, <I>sed</I> (1) or <I>awk</I> (1), so
they'll at least be familiar to users of those languages. </P>
<P>Perl's built-in operators may be classified by number of operands into
unary, binary, and trinary operators. They may be classified by whether
they're infix operators or prefix operators. They may also be classified
by the kinds of objects they work with, such as numbers, strings, or files.
Later, we'll give you a table of all the operators, but here are some to
get you started. </P>
<H3><A NAME="PERL2-CH-1-SECT-5.1"></A>Arithmetic Operators</H3>
<P>Arithmetic operators do exactly what you would expect from learning
them in school. They perform some sort of mathematical function on numbers.
</P>
<TABLE>
<CAPTION>
<P><A NAME="PERL2-CH-1-TAB-2"></A>Table 1.2: Some Binary Arithmetic Operators</P>
</CAPTION>
<TR CLASS=row>
<TH ALIGN="left">Example</TH>
<TH ALIGN="left">Name</TH>
<TH ALIGN="left">Result</TH>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>$a + $b</TT></TD>
<TD ALIGN="left">Addition</TD>
<TD ALIGN="left">Sum of <TT>$a</TT> and <TT>$b</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>$a * $b</TT></TD>
<TD ALIGN="left">Multiplication</TD>
<TD ALIGN="left">Product of <TT>$a</TT> and <TT>$b</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>$a % $b</TT></TD>
<TD ALIGN="left">Modulus</TD>
<TD ALIGN="left">Remainder of <TT>$a</TT> divided by <TT>$b</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>$a ** $b</TT></TD>
<TD ALIGN="left">Exponentiation</TD>
<TD ALIGN="left"><TT>$a</TT> to the power of <TT>$b</TT></TD>
</TR>
</TABLE>
<P>Yes, we left subtraction and division out of <A HREF="ch01_05.htm#PERL2-CH-1-TAB-2">Table
1.2</A>. But we suspect you can figure out how they should work. Try them
and see if you're right. (Or cheat and look in the index.) Arithmetic operators
are evaluated in the order your math teacher taught you (exponentiation
before multiplication, and multiplication before addition). You can always
use parentheses to make it come out differently. </P>
<H3><A NAME="PERL2-CH-1-SECT-5.2"></A>String Operators</H3>
<P>There is also an "addition" operator for strings that does
concatenation. Unlike some languages that confuse this with numeric addition,
Perl defines a separate operator (<TT>.</TT>) for string concatenation:
</P>
<PRE>$a = 123;
$b = 456;
print $a + $b; # prints 579
print $a . $b; # prints 123456
</PRE>
<P>There's also a "multiply" operation for strings, also called
the <I>repeat</I> operator. Again, it's a separate operator (<B>x</B>)
to keep it distinct from numeric multiplication: </P>
<PRE>$a = 123;
$b = 3;
print $a * $b; # prints 369
print $a x $b; # prints 123123123
</PRE>
<P>These string operators bind as tightly as their corresponding arithmetic
operators. The repeat operator is a bit unusual in taking a string for
its left argument but a number for its right argument. Note also how Perl
is automatically converting from numbers to strings. You could have put
all the literal numbers above in quotes, and it would still have produced
the same output. Internally though, it would have been converting in the
opposite direction (that is, from strings to numbers). </P>
<P>A couple more things to think about. String concatenation is also implied
by the interpolation that happens in double-quoted strings. When you print
out a list of values, you're also effectively concatenating strings. So
the following three statements produce the same output: </P>
<PRE>print $a . ' is equal to ' . $b . "\n"; # dot operator
print $a, ' is equal to ', $b, "\n"; # list
print "$a is equal to $b\n"; # interpolation
</PRE>
<P>Which of these you use in any particular situation is entirely up to
you. </P>
<P>The <B>x</B> operator may seem relatively worthless at first glance,
but it is quite useful at times, especially for things like this: </P>
<PRE>print "-" x $scrwid, "\n";
</PRE>
<P>which draws a line across your screen, presuming your screen width is
in <TT>$scrwid</TT>. </P>
<H3><A NAME="PERL2-CH-1-SECT-5.3"></A>Assignment Operators</H3>
<P>Although it's not exactly a mathematical operator, we've already made
extensive use of the simple assignment operator, <TT>=</TT>. Try to remember
that <TT>=</TT> means "gets set to" rather than "equals".
(There is also a mathematical equality operator <TT>==</TT> that means
"equals", and if you start out thinking about the difference
between them now, you'll save yourself a lot of headache later.) </P>
<P>Like the operators above, assignment operators are binary infix operators,
which means they have an operand on either side of the operator. The right
operand can be any expression you like, but the left operand must be a
valid <I>lvalue</I> (which, when translated to English, means a valid storage
location like a variable, or a location in an array). The most common assignment
operator is simple assignment. It determines the value of the expression
on its right side, and sets the variable on the left side to that value:
</P>
<PRE>$a = $b;
$a = $b + 5;
$a = $a * 3;
</PRE>
<P>Notice the last assignment refers to the same variable twice; once for
the computation, once for the assignment. There's nothing wrong with that,
but it's a common enough operation that there's a shortcut for it (borrowed
from C). If you say: </P>
<PRE>lvalue operator= expression
</PRE>
<P>it is evaluated as if it were: </P>
<PRE>lvalue = lvalue operator expression
</PRE>
<P>except that the lvalue is not computed twice. (This only makes a difference
if evaluation of the lvalue has side effects. But when it <I>does</I> make
a difference, it usually does what you want. So don't sweat it.) </P>
<P>So, for example, you could write the above as: </P>
<PRE>$a *= 3;
</PRE>
<P>which reads "multiply <TT>$a</TT> by 3". You can do this with
almost any binary operator in Perl, even some that you can't do it with
in C: </P>
<PRE>$line .= "\n"; # Append newline to $line.
$fill x= 80; # Make string $fill into 80 repeats of itself.
$val ||= "2"; # Set $val to 2 if it isn't already set.
</PRE>
<P>Line 6 of our grade example contains two string concatenations, one
of which is an assignment operator. And line 14 contains a <TT>+=</TT>.
</P>
<P>Regardless of which kind of assignment operator you use, the final value
is returned as the value of the assignment as a whole. (This is unlike,
say, Pascal, in which assignment is a statement and has no value.) This
is why we could say: </P>
<PRE>chop($number = <STDIN>);
</PRE>
<P>and have it chop the final value of <TT>$number</TT>. You also frequently
see assignment as the condition of a <B>while</B> loop, as in line 4 of
our grade example. </P>
<H3><A NAME="PERL2-CH-1-SECT-5.4"></A>Autoincrement and Autodecrement Operators</H3>
<P>As if <TT>$variable += 1</TT> weren't short enough, Perl borrows from
C an even shorter way to increment a variable. The autoincrement and autodecrement
operators simply add (or subtract) one from the value of the variable.
They can be placed on either side of the variable, depending on when you
want them to be evaluated (see <A HREF="ch01_05.htm#PERL2-CH-1-TAB-3">Table
1.3</A>). </P>
<TABLE>
<CAPTION>
<P><A NAME="PERL2-CH-1-TAB-3"></A>Table 1.3: Unary Arithmetic Operators</P>
</CAPTION>
<TR CLASS=row>
<TH ALIGN="left">Example</TH>
<TH ALIGN="left">Name</TH>
<TH ALIGN="left">Result</TH>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>+ +$a, $a+ +</TT></TD>
<TD ALIGN="left">Autoincrement</TD>
<TD ALIGN="left">Add 1 to <TT>$a</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>- -$a, $a- -</TT></TD>
<TD ALIGN="left">Autodecrement</TD>
<TD ALIGN="left">Subtract 1 from <TT>$a</TT></TD>
</TR>
</TABLE>
<P>If you place one of the auto operators before the variable, it is known
as a pre-incremented (pre-decremented) variable. Its value will be changed
before it is referenced. If it is placed after the variable, it is known
as a post-incremented (post-decremented) variable and its value is changed
after it is used. For example: </P>
<PRE>$a = 5; # $a is assigned 5
$b = ++$a; # $b is assigned the incremented value of $a, 6
$c = $a--; # $c is assigned 6, then $a is decremented to 5
</PRE>
<P>Line 15 of our grade example increments the number of scores by one,
so that we'll know how many scores we're averaging the grade over. It uses
a post-increment operator (<TT>$scores++</TT>), but in this case it doesn't
matter, since the expression is in a void context, which is just a funny
way of saying that the expression is being evaluated only for the side
effect of incrementing the variable. The value returned is being thrown
away.[21] </P>
<BLOCKQUOTE class=footnote>
<P>[21] The optimizer will notice this and optimize the post-increment
into a pre-increment, because that's a little more efficient to execute.
(You didn't need to know that, but we hoped it would cheer you up.) </P>
</BLOCKQUOTE>
<H3><A NAME="PERL2-CH-1-SECT-5.5"></A>Logical Operators</H3>
<P>Logical operators, also known as "short-circuit" operators,
allow the program to make decisions based on multiple criteria, without
using nested conditionals. They are known as short-circuit because they
skip evaluating their right argument if evaluating their left argument
is sufficient to determine the overall value. </P>
<P>Perl actually has two sets of logical operators, a crufty old set borrowed
from C, and a nifty new set of ultralow-precedence operators that parse
more like people expect them to parse, and are also easier to read. (Once
they're parsed, they behave identically though.) See <A HREF="ch01_05.htm#PERL2-CH-1-TAB-4">Table
1.4</A> for examples of logical operators. </P>
<TABLE>
<CAPTION>
<P><A NAME="PERL2-CH-1-TAB-4"></A>Table 1.4: Logical Operators</P>
</CAPTION>
<TR CLASS=row>
<TH ALIGN="left">Example</TH>
<TH ALIGN="left">Name</TH>
<TH ALIGN="left">Result</TH>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>$a && $b</TT></TD>
<TD ALIGN="left">And</TD>
<TD ALIGN="left"><TT>$a</TT> if <TT>$a</TT> is false, <TT>$b</TT> otherwise</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>$a || $b</TT></TD>
<TD ALIGN="left">Or</TD>
<TD ALIGN="left"><TT>$a</TT> if <TT>$a</TT> is true, <TT>$b</TT> otherwise</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>! $a</TT></TD>
<TD ALIGN="left">Not</TD>
<TD ALIGN="left">True if <TT>$a</TT> is not true</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>$a and $b</TT></TD>
<TD ALIGN="left">And</TD>
<TD ALIGN="left"><TT>$a</TT> if <TT>$a</TT> is false, <TT>$b</TT> otherwise</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>$a or $b</TT></TD>
<TD ALIGN="left">Or</TD>
<TD ALIGN="left"><TT>$a</TT> if <TT>$a</TT> is true, <TT>$b</TT> otherwise</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>not $a</TT></TD>
<TD ALIGN="left">Not</TD>
<TD ALIGN="left">True if <TT>$a</TT> is not true</TD>
</TR>
</TABLE>
<P>Since the logical operators "short circuit" the way they do,
they're often used to conditionally execute code. The following line (from
our grade example) tries to open the file <I>grades</I>. </P>
<PRE>open(GRADES, "grades") or die "Can't open file grades: $!\n";
</PRE>
<P>If it opens the file, it will jump to the next line of the program.
If it can't open the file, it will provide us with an error message and
then stop execution. </P>
<P>Literally, the above message means "Open <I>grades</I> or die!"
Besides being another example of natural language, the short-circuit operators
preserve the visual flow. Important actions are listed down the left side
of the screen, and secondary actions are hidden off to the right. (The
<B>$!</B> variable contains the error message returned by the operating
system--see "Special Variables" in <A HREF="ch02_01.htm">Chapter
2, <I>The Gory Details</I></A>). Of course, these logical operators can
also be used within the more traditional kinds of conditional constructs,
such as the <B>if</B> and <B>while</B> statements. </P>
<H3><A NAME="PERL2-CH-1-SECT-5.6"></A>Comparison Operators</H3>
<P>Comparison, or relational, operators tell us how two scalar values (numbers
or strings) relate to each other. There are two sets of operators--one
does numeric comparison and the other does string comparison. (In either
case, the arguments will be "coerced" to have the appropriate
type first.) <A HREF="ch01_05.htm#PERL2-CH-1-TAB-5">Table 1.5</A> assumes
<TT>$a</TT> and <TT>$b</TT> are the left and right arguments, respectively.
</P>
<TABLE>
<CAPTION>
<P><A NAME="PERL2-CH-1-TAB-5"></A>Table 1.5: Some Numeric and String Comparison
Operators</P>
</CAPTION>
<TR CLASS=row>
<TH ALIGN="left">Comparison</TH>
<TH ALIGN="left">Numeric</TH>
<TH ALIGN="left">String</TH>
<TH ALIGN="left">Return Value</TH>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Equal</TD>
<TD ALIGN="left"><TT>==</TT></TD>
<TD ALIGN="left"><TT>eq</TT></TD>
<TD ALIGN="left">True if <TT>$a</TT> is equal to <TT>$b</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Not equal</TD>
<TD ALIGN="left"><TT>!=</TT></TD>
<TD ALIGN="left"><TT>ne</TT></TD>
<TD ALIGN="left">True if <TT>$a</TT> is not equal to <TT>$b</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Less than</TD>
<TD ALIGN="left"><TT><</TT></TD>
<TD ALIGN="left"><TT>lt</TT></TD>
<TD ALIGN="left">True if <TT>$a</TT> is less than <TT>$b</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Greater than</TD>
<TD ALIGN="left"><TT>></TT></TD>
<TD ALIGN="left"><TT>gt</TT></TD>
<TD ALIGN="left">True if <TT>$a</TT> is greater than <TT>$b</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Less than or equal</TD>
<TD ALIGN="left"><TT><=</TT></TD>
<TD ALIGN="left"><TT>le</TT></TD>
<TD ALIGN="left">True if <TT>$a</TT> not greater than <TT>$b</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Comparison</TD>
<TD ALIGN="left"><TT><=></TT></TD>
<TD ALIGN="left"><TT>cmp</TT></TD>
<TD ALIGN="left">0 if equal, 1 if <TT>$a</TT> greater, -1 if <TT>$b</TT>
greater</TD>
</TR>
</TABLE>
<P>The last pair of operators (<TT><=></TT> and <TT>cmp</TT>) are
entirely redundant. However, they're incredibly useful in <A HREF="ch03_02.htm#PERL2-CMD-SORT">sort</A>
subroutines (see <A HREF="ch03_01.htm">Chapter 3, <I>Functions</I></A>).[22]
</P>
<BLOCKQUOTE class=footnote>
<P>[22] Some folks feel that such redundancy is evil because it keeps a
language from being minimalistic, or orthogonal. But Perl isn't an orthogonal
language; it's a diagonal language. By which we mean that Perl doesn't
force you to always go at right angles. Sometimes you just want to follow
the hypotenuse of the triangle to get where you're going. TMTOWTDI is about
shortcuts. Shortcuts are about efficiency. </P>
</BLOCKQUOTE>
<H3><A NAME="PERL2-CH-1-SECT-5.7"></A>File Test Operators</H3>
<P>The file test operators allow you to test whether certain file attributes
are set before you go and blindly muck about with the files. For example,
it would be very nice to know that the file <I>/etc/passwd</I> already
exists before you go and open it as a new file, wiping out everything that
was in there before. See <A HREF="ch01_05.htm#PERL2-CH-1-TAB-6">Table 1.6</A>
for examples of file test operators. </P>
<TABLE>
<CAPTION>
<P><A NAME="PERL2-CH-1-TAB-6"></A>Table 1.6: Some File Test Operators</P>
</CAPTION>
<TR CLASS=row>
<TH ALIGN="left">Example</TH>
<TH ALIGN="left">Name</TH>
<TH ALIGN="left">Result</TH>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>-e $a</TT></TD>
<TD ALIGN="left">Exists</TD>
<TD ALIGN="left">True if file named in <TT>$a</TT> exists</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>-r $a</TT></TD>
<TD ALIGN="left">Readable</TD>
<TD ALIGN="left">True if file named in <TT>$a</TT> is readable</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>-w $a</TT></TD>
<TD ALIGN="left">Writable</TD>
<TD ALIGN="left">True if file named in <TT>$a</TT> is writable</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>-d $a</TT></TD>
<TD ALIGN="left">Directory</TD>
<TD ALIGN="left">True if file named in <TT>$a</TT> is a directory</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>-f $a</TT></TD>
<TD ALIGN="left">File</TD>
<TD ALIGN="left">True if file named in <TT>$a</TT> is a regular file</TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left"><TT>-T $a</TT></TD>
<TD ALIGN="left">Text File</TD>
<TD ALIGN="left">True if file named in <TT>$a</TT> is a text file</TD>
</TR>
</TABLE>
<P>Here are some examples: </P>
<PRE>-e "/usr/bin/perl" or warn "Perl is improperly installed\n";
-f "/vmunix" and print "Congrats, we seem to be running BSD Unix\n";
</PRE>
<P>Note that a regular file is not the same thing as a text file. Binary
files like <I>/vmunix</I> are regular files, but they aren't text files.
Text files are the opposite of binary files, while regular files are the
opposite of irregular files like directories and devices. </P>
<P>There are a lot of file test operators, many of which we didn't list.
Most of the file tests are unary Boolean operators: they take only one
operand, a scalar that evaluates to a file or a filehandle, and they return
either a true or false value. A few of them return something fancier, like
the file's size or age, but you can look those up when you need them. </P>
<P>
<HR align=left width=515></P>
<H2><A NAME="PERL2-CH-1-SECT-6"></A>1.6 Control Structures</H2>
<P><A NAME="CH01.CONTROL"></A>So far, except for our one large example,
all of our examples have been completely linear; we executed each command
in order. We've seen a few examples of using the short circuit operators
to cause a single command to be (or not to be) executed. While you can
write some very useful linear programs (a lot of CGI scripts fall into
this category), you can write much more powerful programs if you have conditional
expressions and looping mechanisms. Collectively, these are known as control
structures. So you can also think of Perl as a control language. </P>
<P>But to have control, you have to be able to decide things, and to decide
things, you have to know the difference between what's true and what's
false. </P>
<H3><A NAME="PERL2-CH-1-SECT-6.1"></A>What Is Truth?</H3>
<P>We've bandied about the term truth,[23] and we've mentioned that certain
operators return a true or a false value. Before we go any further, we
really ought to explain exactly what we mean by that. Perl treats truth
a little differently than most computer languages, but after you've worked
with it awhile it will make a lot of sense. (Actually, we're hoping it'll
make a lot of sense after you've read the following.) </P>
<BLOCKQUOTE class=footnote>
<P>[23] Strictly speaking, this is not true. </P>
</BLOCKQUOTE>
<P>Basically, Perl holds truths to be self-evident. That's a glib way of
saying that you can evaluate almost anything for its truth value. Perl
uses practical definitions of truth that depend on the type of thing you're
evaluating. As it happens, there are many more kinds of truth than there
are of nontruth. </P>
<P>Truth in Perl is always evaluated in a scalar context. (Other than that,
no type coercion is done.) So here are the rules for the various kinds
of values that a scalar can hold: </P>
<OL CLASS=orderedlist>
<LI>Any string is true except for <TT>""</TT> and <TT>"0"</TT>.
</LI>
<LI>Any number is true except for 0. </LI>
<LI>Any reference is true. </LI>
<LI>Any undefined value is false. </LI>
</OL>
<P>Actually, the last two rules can be derived from the first two. Any
reference (rule 3) points to something with an address, and would evaluate
to a number or string containing that address, which is never 0. And any
undefined value (rule 4) would always evaluate to 0 or the null string.
</P>
<P>And in a way, you can derive rule 2 from rule 1 if you pretend that
everything is a string. Again, no coercion is actually done to evaluate
truth, but if a coercion to string <I>were</I> done, then any numeric value
of 0 would simply turn into the string <TT>"0"</TT>, and be false.
Any other number would not turn into the string <TT>"0"</TT>,
and so would be true. Let's look at some examples so we can understand
this better: </P>
<PRE>0 # would become the string "0", so false
1 # would become the string "1", so true
10 - 10 # 10-10 is 0, would convert to string "0", so false
0.00 # becomes 0, would convert to string "0", so false
"0" # the string "0", so false
"" # a null string, so false
"0.00" # the string "0.00", neither empty nor exactly "0", so true
"0.00" + 0 # the number 0 (coerced by the +), so false.
\$a # a reference to $a, so true, even if $a is false
undef() # a function returning the undefined value, so false
</PRE>
<P>Since we mumbled something earlier about truth being evaluated in a
scalar context, you might be wondering what the truth value of a list is.
Well, the simple fact is, there <I>is</I> no operation in Perl that will
return a list in a scalar context. They all return a scalar value instead,
and then you apply the rules of truth to that scalar. So there's no problem,
as long as you can figure out what any given operator will return in a
scalar context. </P>
<H4><A NAME="PERL2-CH-1-SECT-6.1.1"></A>The if and unless statements</H4>
<P>We saw earlier how a logic operator could function as a conditional.
A slightly more complex form of the logic operators is the <B>if</B> statement.
The <B>if</B> statement evaluates a truth condition, and executes a block
if the condition is true. </P>
<P>A block is one or more statements grouped together by a set of braces.
Since the <B>if</B> statement executes a block, the braces are required
by definition. If you know a language like C, you'll notice that this is
different. Braces are optional in C if you only have a single line of code,
but they are not optional in Perl. </P>
<PRE>if ($debug_level > 0) {
# Something has gone wrong. Tell the user.
print "Debug: Danger, Will Robinson, danger!\n";
print "Debug: Answer was '54', expected '42'.\n";
}
</PRE>
<P>Sometimes, just executing a block when a condition is met isn't enough.
You may also want to execute a different block if that condition <I>isn't</I>
met. While you could certainly use two <B>if</B> statements, one the negation
of the other, Perl provides a more elegant solution. After the block, <B>if</B>
can take an optional second condition, called <B>else</B>, to be executed
only if the truth condition is false. (Veteran computer programmers will
not be surprised at this point.) </P>
<P>Other times, you may even have more than two possible choices. In this
case, you'll want to add an <B>elsif</B> truth condition for the other
possible choices. (Veteran computer programmers may well be surprised by
the spelling of "elsif", for which nobody here is going to apologize.
Sorry.) </P>
<PRE>if ($city eq "New York") {
print "New York is northeast of Washington, D.C.\n";
}
elsif ($city eq "Chicago") {
print "Chicago is northwest of Washington, D.C.\n";
}
elsif ($city eq "Miami") {
print "Miami is south of Washington, D.C. And much warmer!\n";
}
else {
print "I don't know where $city is, sorry.\n";
}
</PRE>
<P>The <B>if</B> and <B>elsif</B> clauses are each computed in turn, until
one is found to be true or the <B>else</B> condition is reached. When one
of the conditions is found to be true, its block is executed and all the
remaining branches are skipped. Sometimes, you don't want to do anything
if the condition is true, only if it is false. Using an empty <B>if</B>
with an <B>else</B> may be messy, and a negated <B>if</B> may be illegible;
it sounds weird to say "do something if not this is true". In
these situations, you would use the <B>unless</B> statement. </P>
<PRE>unless ($destination eq $home) {
print "I'm not going home.\n";
}
</PRE>
<P>There is no "elsunless" though. This is generally construed
as a feature. </P>
<H3><A NAME="PERL2-CH-1-SECT-6.2"></A>Iterative (Looping) Constructs</H3>
<P>Perl has four main iterative statement types: <B>while</B>, <B>until</B>,
<B>for</B>, and <B>foreach</B>. These statements allow a Perl program to
repeatedly execute the same code for different values. </P>
<H4><A NAME="PERL2-CH-1-SECT-6.2.1"></A>The while and until statements</H4>
<P>The <B>while</B> and <B>until</B> statements function similarly to the
<B>if</B> and <B>unless</B> statements, in a looping fashion. First, the
conditional part of the statement is checked. If the condition is met (if
it is true for a <B>while</B> or false for an <B>until</B>) the block of
the statement is executed. </P>
<PRE>while ($tickets_sold < 10000) {
$available = 10000 - $tickets_sold;
print "$available tickets are available. How many would you like: ";
$purchase = <STDIN>;
chomp($purchase);
$tickets_sold += $purchase;
}
</PRE>
<P>Note that if the original condition is never met, the loop will never
be entered at all. For example, if we've already sold 10,000 tickets, we
might want to have the next line of the program say something like: </P>
<PRE>print "This show is sold out, please come back later.\n";
</PRE>
<P>In our grade example earlier, line 4 reads: </P>
<PRE>while ($line = <GRADES>) {
</PRE>
<P>This assigns the next line to the variable <TT>$line</TT>, and as we
explained earlier, returns the value of <TT>$line</TT> so that the condition
of the <B>while</B> statement can evaluate <TT>$line</TT> for truth. You
might wonder whether Perl will get a false negative on blank lines and
exit the loop prematurely. The answer is that it won't. The reason is clear,
if you think about everything we've said. The line input operator leaves
the newline on the end of the string, so a blank line has the value <TT>"\n"</TT>.
And you know that <TT>"\n"</TT> is not one of the canonical false
values. So the condition is true, and the loop continues even on blank
lines. </P>
<P>On the other hand, when we finally do reach the end of the file, the
line input operator returns the undefined value, which always evaluates
to false. And the loop terminates, just when we wanted it to. There's no
need for an explicit test against the <A HREF="ch03_02.htm#PERL2-CMD-EOF">eof</A>
function in Perl, because the input operators are designed to work smoothly
in a conditional context. </P>
<P>In fact, almost everything is designed to work smoothly in a conditional
context. For instance, an array in a scalar context returns its length.
So you often see: </P>
<PRE>while (@ARGV) {
process(shift @ARGV);
}
</PRE>
<P>The loop automatically exits when <TT>@ARGV</TT> is exhausted. </P>
<H4><A NAME="PERL2-CH-1-SECT-6.2.2"></A>The for statement</H4>
<P>Another iterative statement is the <B>for</B> loop. A <B>for</B> loop
runs exactly like the <B>while</B> loop, but looks a good deal different.
(C programmers will find it very familiar though.) </P>
<PRE>for ($sold = 0; $sold < 10000; $sold += $purchase) {
$available = 10000 - $sold;
print "$available tickets are available. How many would you like: ";
$purchase = <STDIN>;
chomp($purchase);
}
</PRE>
<P>The <B>for</B> loop takes three expressions within the loop's parentheses:
an expression to set the initial state of the loop variable, a condition
to test the loop variable, and an expression to modify the state of the
loop variable. When the loop starts, the initial state is set and the truth
condition is checked. If the condition is true, the block is executed.
When the block finishes, the modification expression is executed, the truth
condition is again checked, and if true, the block is rerun with the new
values. As long as the truth condition remains true, the block and the
modification expression will continue to be executed. </P>
<H4><A NAME="PERL2-CH-1-SECT-6.2.3"></A>The foreach statement</H4>
<P>The last of Perl's main iterative statements is the <B>foreach</B> statement.
<B>foreach</B> is used to execute the same code for each of a known set
of scalars, such as an array: </P>
<PRE>foreach $user (@users) {
if (-f "$home{$user}/.nexrc") {
print "$user is cool... they use a perl-aware vi!\n";
}
}
</PRE>
<P>In a <B>foreach</B> statement, the expression in parentheses is evaluated
to produce a list. Then each element of the list is aliased to the loop
variable in turn, and the block of code is executed once for each element.
Note that the loop variable becomes a reference to the element itself,
rather than a copy of the element. Hence, modifying the loop variable will
modify the original array. </P>
<P>You find many more <B>foreach</B> loops in the typical Perl program
than <B>for</B> loops, because it's very easy in Perl to generate the lists
that <B>foreach</B> wants to iterate over. A frequently seen idiom is a
loop to iterate over the sorted keys of a hash: </P>
<PRE>foreach $key (sort keys %hash) {
</PRE>
<P>In fact, line 9 of our grade example does precisely that. </P>
<H4><A NAME="PERL2-CH-1-SECT-6.2.4"></A>Breaking out: next and last</H4>
<P>The <A HREF="ch03_02.htm#PERL2-CMD-NEXT">next</A> and <A HREF="ch03_02.htm#PERL2-CMD-LAST">last</A>
operators allow you to modify the flow of your loop. It is not at all uncommon
to have a special case; you may want to skip it, or you may want to quit
when you encounter it. For example, if you are dealing with UNIX accounts,
you may want to skip the system accounts (like root or lp). The <A HREF="ch03_02.htm#PERL2-CMD-NEXT">next</A>
operator would allow you to skip to the end of your current loop iteration,
and start the next iteration. The <A HREF="ch03_02.htm#PERL2-CMD-LAST">last</A>
operator would allow you to skip to the end of your block, as if your test
condition had returned false. This might be useful if, for example, you
are looking for a specific account and want to quit as soon as you find
it. </P>
<PRE>foreach $user (@users) {
if ($user eq "root" or $user eq "lp") {
next;
}
if ($user eq "special") {
print "Found the special account.\n";
# do some processing
last;
}
}
</PRE>
<P>It's possible to break out of multi-level loops by labeling your loops
and specifying which loop you want to break out of. Together with statement
modifiers (another form of conditional we haven't talked about), this can
make for very readable loop exits, if you happen to think English is readable:
</P>
<PRE>LINE: while ($line = <ARTICLE>) {
last LINE if $line eq "\n"; # stop on first blank line
next LINE if $line =~ /^#/; # skip comment lines
# your ad here
}
</PRE>
<P>You may be saying, "Wait a minute, what's that funny <TT>^#</TT>
thing there inside the leaning toothpicks? That doesn't look much like
English." And you're right. That's a pattern match containing a regular
expression (albeit a rather simple one). And that's what the next section
is about. Perl is above all a text processing language, and regular expressions
are at the heart of Perl's text processing. </P>
<P>
<HR align=left width=515></P>
<H2><A NAME="PERL2-CH-1-SECT-7"></A>1.7 Regular Expressions</H2>
<P><A NAME="CH01.RE"></A><I>Regular expressions</I> (aka regexps, regexes
or REs) are used by many UNIX programs, such as <I>grep</I>, <I>sed</I>
and <I>awk</I>,[24] editors like <I>vi</I> and <I>emacs</I>, and even some
of the shells. A regular expression is a way of describing a set of strings
without having to list all the strings in your set. </P>
<BLOCKQUOTE class=footnote>
<P>[24] A good source of information on regular expression concepts is
the Nutshell Handbook <I>sed & awk</I> by Dale Dougherty (O'Reilly
& Associates). You might also keep an eye out for Jeffrey Friedl's
forthcoming book, <I>Mastering Regular Expressions</I> (O'Reilly &
Associates). </P>
</BLOCKQUOTE>
<P>Regular expressions are used several ways in Perl. First and foremost,
they're used in conditionals to determine whether a string matches a particular
pattern. So when you see something that looks like <TT>/foo/</TT>, you
know you're looking at an ordinary <I>pattern-matching</I> operator. </P>
<P>Second, if you can locate patterns within a string, you can replace
them with something else. So when you see something that looks like <TT>s/foo/bar/</TT>,
you know it's asking Perl to substitute "bar" for "foo",
if possible. We call that the <I>substitution</I> operator. </P>
<P>Finally, patterns can specify not only where something is, but also
where it isn't. So the <A HREF="ch03_02.htm#PERL2-CMD-SPLIT">split</A>
operator uses a regular expression to specify where the data isn't. That
is, the regular expression defines the <I>delimiters</I> that separate
the fields of data. Our grade example has a couple of trivial examples
of this. Lines 5 and 12 each split strings on the space character in order
to return a list of words. But you can split on any delimiter you can specify
with a regular expression. </P>
<P>(There are various modifiers you can use in each of these situations
to do exotic things like ignore case when matching alphabetic characters,
but these are the sorts of gory details that we'll cover in <A HREF="ch02_01.htm">Chapter
2, <I>The Gory Details</I></A>.) </P>
<P>The simplest use of regular expressions is to match a literal expression.
In the case of the splits we just mentioned, we matched on a single space.
But if you match on several characters in a row, they all have to match
sequentially. That is, the pattern looks for a substring, much as you'd
expect. Let's say we want to show all the lines of an HTML file that are
links to other HTML files (as opposed to FTP links). Let's imagine we're
working with HTML for the first time, and we're being a little naive yet.
We know that these links will always have "http:" in them somewhere.
We could loop through our file with this:[25] </P>
<BLOCKQUOTE class=footnote>
<P>[25] This is very similar to what the UNIX command <TT>grep 'http:'
file</TT> would do. On MS-DOS you could use the <I>find</I> command, but
it doesn't know how to do more complicated regular expressions. (However,
the misnamed <I>findstr</I> program of Windows NT does know about regular
expressions.) </P>
</BLOCKQUOTE>
<PRE>while ($line = <FILE>) {
if ($line =~ /http:/) {
print $line;
}
}
</PRE>
<P>Here, the <TT>=~</TT> (pattern binding operator) is telling Perl to
look for a match of the regular expression <TT>http:</TT> in the variable
<TT>$line</TT>. If it finds the expression, the operator returns a true
value and the block (a <B>print</B> command) is executed. By the way, if
you don't use the <TT>=~</TT> binding operator, then Perl will search a
default variable instead of <TT>$line</TT>. This default space is really
just a special variable that goes by the odd name of <B>$_</B>. In fact,
many of the operators in Perl default to using the <B>$_</B> variable,
so an expert Perl programmer might write the above as: </P>
<PRE>while (<FILE>) {
print if /http:/;
}
</PRE>
<P>(Hmm, another one of those statement modifiers seems to have snuck in
there. Insidious little beasties.) </P>
<P>This stuff is pretty handy, but what if we wanted to find all the links,
not just the HTTP links? We could give a list of links, like "<TT>http:</TT>",
"<TT>ftp:</TT>", "<TT>", and so on. But
that list could get long, and what would we do when a new kind of link
was added? </P>
<PRE>while (<FILE>) {
print if /http:/;
print if /ftp:/;
print if //;
# What next?
}
</PRE>
<P>Since regular expressions are descriptive of a set of strings, we can
just describe what we are looking for: a number of alphabetic characters
followed by a colon. In regular expression talk (Regexpese?), that would
be <TT>/[a-zA-Z]+:/</TT>, where the brackets define a <I>character class</I>.
The <TT>a-z</TT> and <TT>A-Z</TT> represent all alphabetic characters (the
dash means the range of all characters between the starting and ending
character, inclusive). And the <TT>+</TT> is a special character which
says "one or more of whatever was before me". It's what we call
a <I>quantifier</I>, meaning a gizmo that says how many times something
is allowed to repeat. (The slashes aren't really part of the regular expression,
but rather part of the pattern match operator. The slashes are acting like
quotes that just happen to contain a regular expression.) </P>
<P>Because certain classes like the alphabetics are so commonly used, Perl
defines special cases for them. See <A HREF="ch01_07.htm#PERL2-CH-1-TAB-7">Table
1.7</A> for these special cases. </P>
<TABLE>
<CAPTION>
<P><A NAME="PERL2-CH-1-TAB-7"></A>Table 1.7: Regular Expression Character
Classes</P>
</CAPTION>
<TR CLASS=row>
<TH ALIGN="left">Name</TH>
<TH ALIGN="left">Definition</TH>
<TH ALIGN="left">Code</TH>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Whitespace</TD>
<TD ALIGN="left"><TT>[ \t\n\r\f]</TT></TD>
<TD ALIGN="left"><TT>\s</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Word character</TD>
<TD ALIGN="left"><TT>[a-zA-Z_0-9]</TT></TD>
<TD ALIGN="left"><TT>\w</TT></TD>
</TR>
<TR CLASS=row>
<TD ALIGN="left">Digit</TD>
<TD ALIGN="left"><TT>[0-9]</TT></TD>
<TD ALIGN="left"><TT>\d</TT></TD>
</TR>
</TABLE>
<P>Note that these match <I>single</I> characters. A <TT>\w</TT> will match
any single word character, not an entire word. (Remember that <TT>+</TT>
quantifier? You can say <TT>\w+</TT> to match a word.) Perl also provides
the negation of these classes by using the uppercased character, such as
<TT>\D</TT> for a non-digit character. </P>
<P>(We should note that <TT>\w</TT> is not always equivalent to <TT>[a-zA-Z_0-9]</TT>.
Some locales define additional alphabetic characters outside the ASCII
sequence, and <TT>\w</TT> respects them.) </P>
<P>There is one other very special character class, written with a "<TT>.</TT>",
that will match any character whatsoever.[26] For example, <TT>/a./</TT>
will match any string containing an "<TT>a</TT>" that is not
the last character in the string. Thus it will match "<TT>at</TT>"
or "<TT>am</TT>" or even "<TT>a+</TT>", but not "<TT>a</TT>",
since there's nothing after the "<TT>a</TT>" for the dot to match.
Since it's searching for the pattern anywhere in the string, it'll match
"<TT>oasis</TT>" and "<TT>camel</TT>", but not "<TT>sheba</TT>".
It matches "<TT>caravan</TT>" on the first "<TT>a</TT>".
It could match on the second "<TT>a</TT>", but it stops after
it finds the first suitable match, searching from left to right. </P>
<BLOCKQUOTE class=footnote>
<P>[26] Except that it won't normally match a newline. When you think about
it, a "<TT>.</TT>" doesn't normally match a newline in <I>grep</I>
(1) either. </P>
</BLOCKQUOTE>
<H3><A NAME="PERL2-CH-1-SECT-7.1"></A>Quantifiers</H3>
<P>The characters and character classes we've talked about all match single
characters. We mentioned that you could match multiple "word"
characters with <TT>\w+</TT> in order to match an entire word. The <TT>+</TT>
is one kind of quantifier, but there are others. (All of them are placed
after the item being quantified.) </P>
<P>The most general form of quantifier specifies both the minimum and maximum
number of times an item can match. You put the two numbers in braces, separated
by a comma. For example, if you were trying to match North American phone
numbers, <TT>/ \d{7,11}/</TT> would match at least seven digits, but no
more than eleven digits. If you put a single number in the braces, the
number specifies both the minimum and the maximum; that is, the number
specifies the exact number of times the item can match. (If you think about
it, all unquantified items have an implicit <TT>{1}</TT> quantifier.) </P>
<P>If you put the minimum and the comma but omit the maximum, then the
maximum is taken to be infinity. In other words, it will match at least
the minimum number of times, plus as many as it can get after that. For
example, <TT>/ \d{7}/</TT> will only match a local (North American) phone
number (seven digits), while <TT>/ \d{7,}/</TT> will match any phone number,
even an international one (unless it happens to be shorter than seven digits).
There is no special way of saying "at most" a certain number
of times. Just say <TT>/.{0,5}/</TT>, for example, to find at most five
arbitrary characters. </P>
<P>Certain combinations of minimum and maximum occur frequently, so Perl
defines special quantifiers for them. We've already seen <TT>+</TT>, which
is the same as <TT>{1,}</TT>, or "at least one of the preceding item".
There is also <TT>*</TT>, which is the same as <TT>{0,}</TT>, or "zero
or more of the preceding item", and <TT>?</TT>, which is the same
as <TT>{0,1}</TT>, or "zero or one of the preceding item" (that
is, the preceding item is optional). </P>
<P>There are a couple things about quantification that you need to be careful
of. First of all, Perl quantifiers are by default <I>greedy</I>. This means
that they will attempt to match as much as they can as long as the entire
expression still matches. For example, if you are matching <TT>/ \d+/</TT>
against "<TT>1234567890</TT>", it will match the entire string.
This is something to especially watch out for when you are using "<TT>.</TT>",
any character. Often, someone will have a string like: </P>
<PRE>spp:Fe+H20=FeO2;H:2112:100:Stephen P Potter:/home/spp:/bin/tcsh
</PRE>
<P>and try to match "<TT>spp</TT>" with <TT>/.+:/</TT>. However,
since the <TT>+</TT> quantifier is greedy, this pattern will match everything
up to and including "<TT>/home/spp</TT>". Sometimes you can avoid
this by using a negated character class, that is, by saying <TT>/[^:]+:/</TT>,
which says to match one or more non-colon characters (as many as possible),
up to the first colon. It's that little caret in there that negates the
sense of the character class.[27] The other point to be careful about is
that regular expressions will try to match as <I>early</I> as possible.
This even takes precedence over being greedy. Since scanning happens left-to-right,
this means that the pattern will match as far left as possible, even if
there is some other place where it could match longer. (Regular expressions
are greedy, but they aren't into delayed gratification.) For example, suppose
you're using the substitution command (<TT>s///</TT>) on the default variable
space (variable <B>$_</B>, that is), and you want to remove a string of
x's from the middle of the string. If you say: </P>
<BLOCKQUOTE class=footnote>
<P>[27] Sorry, we didn't pick that notation, so don't blame us. That's
just how regular expressions are customarily written in UNIX culture. </P>
</BLOCKQUOTE>
<PRE>$_ = "fred xxxxxxx barney";
s/x*//;
</PRE>
<P>it will have absolutely no effect. This is because the <TT>x*</TT> (meaning
zero or more "<TT>x</TT>" characters) will be able to match the
"nothing" at the beginning of the string, since the null string
happens to be zero characters wide and there's a null string just sitting
there plain as day before the "<TT>f</TT>" of "<TT>fred</TT>".[28]
</P>
<BLOCKQUOTE class=footnote>
<P>[28] Even the authors get caught by this from time to time. </P>
</BLOCKQUOTE>
<P>There's one other thing you need to know. By default quantifiers apply
to a single preceding character, so <TT>/bam{2}/</TT> will match "<TT>bamm</TT>"
but not "<TT>bambam</TT>". To apply a quantifier to more than
one character, use parentheses. So to match "<TT>bambam</TT>",
use the pattern <TT>/(bam){2}/</TT>. </P>
<H3><A NAME="PERL2-CH-1-SECT-7.2"></A>Minimal Matching</H3>
<P>If you were using an ancient version of Perl and you didn't want greedy
matching, you had to use a negated character class. (And really, you were
still getting greedy matching of a constrained variety.) </P>
<P>In modern versions of Perl, you can force nongreedy, minimal matching
by use of a question mark after any quantifier. Our same username match
would now be <TT>/.*?:/</TT>. That <TT>.*?</TT> will now try to match as
few characters as possible, rather than as many as possible, so it stops
at the first colon rather than the last. </P>
<H3><A NAME="PERL2-CH-1-SECT-7.3"></A>Nailing Things Down</H3>
<P>Whenever you try to match a pattern, it's going to try to match in every
location till it finds a match. An <I>anchor</I> allows you to restrict
where the pattern can match. Essentially, an anchor is something that matches
a "nothing", but a special kind of nothing that depends on its
surroundings. You could also call it a rule, or a constraint, or an assertion.
Whatever you care to call it, it tries to match something of zero width,
and either succeeds or fails. (If it fails, it merely means that the pattern
can't match that particular way. The pattern will go on trying to match
some other way, if there are any other ways to try.) </P>
<P>The special character string <TT>\b</TT> matches at a word boundary,
which is defined as the "nothing" between a word character (<TT>\w</TT>)
and a non-word character (<TT>\W</TT>), in either order. (The characters
that don't exist off the beginning and end of your string are considered
to be non-word characters.) For example, </P>
<PRE>/\bFred\b/
</PRE>
<P>would match both "<TT>The Great Fred</TT>" and "<TT>Fred
the Great</TT>", but would not match "<TT>Frederick the Great</TT>"
because the "<TT>de</TT>" in "<TT>Frederick</TT>" does
not contain a word boundary. </P>
<P>In a similar vein, there are also anchors for the beginning of the string
and the end of the string. If it is the first character of a pattern, the
caret (<TT>^</TT>) matches the "nothing" at the beginning of
the string. Therefore, the pattern <TT>/^Fred/</TT> would match "Frederick
the Great" and not "The Great Fred", whereas <TT>/Fred^/</TT>
wouldn't match either. (In fact, it doesn't even make much sense.) The
dollar sign (<TT>$</TT>) works like the caret, except that it matches the
"nothing" at the end of the string instead of the beginning.[29]
</P>
<BLOCKQUOTE class=footnote>
<P>[29] This is a bit oversimplified, since we're assuming here that your
string contains only one line. <TT>^</TT> and <TT>$</TT> are actually anchors
for the beginnings and endings of lines rather than strings. We'll try
to straighten this all out in <A HREF="ch02_01.htm">Chapter 2, <I>The Gory
Details</I></A> (to the extent that it can be straightened out). </P>
</BLOCKQUOTE>
<P>So now you can probably figure out that when we said: </P>
<PRE>next LINE if $line =~ /^#/;
</PRE>
<P>we meant "Go to the next iteration of <TT>LINE</TT> loop if this
line happens to begin with a <TT>#</TT> character." </P>
<H3><A NAME="PERL2-CH-1-SECT-7.4"></A>Backreferences</H3>
<P>We mentioned earlier that you can use parentheses to group things for
quantifiers, but you can also use parentheses to remember bits and pieces
of what you matched. A pair of parentheses around a part of a regular expression
causes whatever was matched by that part to be remembered for later use.
It doesn't change what the part matches, so <TT>/ \d+/</TT> and <TT>/(\d+)/</TT>
will still match as many digits as possible, but in the latter case they
will be remembered in a special variable to be backreferenced later. </P>
<P>How you refer back to the remembered part of the string depends on where
you want to do it from. Within the same regular expression, you use a backslash
followed by an integer. The integer corresponding to a given pair of parentheses
is determined by counting left parentheses from the beginning of the pattern,
starting with one. So for example, to match something similar to an HTML
tag (like "<TT><B>Bold</B></TT>", you might use <TT>/<(.*?)>.*?<\/\1>/</TT>.
This forces the two parts of the pattern to match the exact same string,
such as the "<TT>B</TT>" above. </P>
<P>Outside the regular expression itself, such as in the replacement part
of a substitution, the special variable is used as if it were a normal
scalar variable named by the integer. So, if you wanted to swap the first
two words of a string, for example, you could use: </P>
<PRE>s/(\S+)\s+(\S+)/$2 $1/
</PRE>
<P>The right side of the substitution is really just a funny kind of double-quoted
string, which is why you can interpolate variables there, including backreference
variables. This is a powerful concept: interpolation (under controlled
circumstances) is one of the reasons Perl is a good text-processing language.
The other reason is the pattern matching, of course. Regular expressions
are good for picking things apart, and interpolation is good for putting
things back together again. Perhaps there's hope for Humpty Dumpty after
all. </P>
<H2><A NAME="PERL2-CH-1-SECT-8"></A>1.8 List Processing</H2>
<P><A NAME="CH01.LC"></A>Much earlier in this chapter, we mentioned that
Perl has two main contexts, scalar context (for dealing with singular things)
and list context (for dealing with plural things). Many of the traditional
operators we've described so far have been strictly scalar in their operation.
They always take singular arguments (or pairs of singular arguments for
binary operators), and always produce a singular result, even in a list
context. So if you write this: </P>
<PRE>@array = (1 + 2, 3 - 4, 5 * 6, 7 / 8);
</PRE>
<P>you know that the list on the right side contains exactly four values,
because the ordinary math operators always produce scalar values, even
in the list context provided by the assignment to an array. </P>
<P>However, other Perl operators can produce either a scalar or a list
value, depending on their context. They just "know" whether a
scalar or a list is expected of them. But how will you know that? It turns
out to be pretty easy to figure out, once you get your mind around a few
key concepts. </P>
<P>First, list context has to be provided by something in the "surroundings".
In the example above, the list assignment provides it. If you look at the
various syntax summaries scattered throughout <A HREF="ch02_01.htm">Chapter
2, <I>The Gory Details</I></A> and <A HREF="ch03_01.htm">Chapter 3, <I>Functions</I></A>,
you'll see various operators that are defined to take a <I><TT>LIST</TT></I>
as an argument. Those are the operators that <I>provide</I> a list context.
Throughout this book, <I><TT>LIST</TT></I> is used as a specific technical
term to mean "a syntactic construct that provides a list context".
For example, if you look up <A HREF="ch03_02.htm#PERL2-CMD-SORT">sort</A>,
you'll find the syntax summary: </P>
<PRE>sort <I>LIST</I>
</PRE>
<P>That means that <A HREF="ch03_02.htm#PERL2-CMD-SORT">sort</A> provides
a list context to its arguments. </P>
<P>Second, at compile time, any operator that takes a <I><TT>LIST</TT></I>
provides a list context to each syntactic element of that <I><TT>LIST</TT></I>.
So every top-level operator or entity in the <I><TT>LIST</TT></I> knows
that it's supposed to produce the best list it knows how to produce. This
means that if you say: </P>
<PRE>sort @guys, @gals, other();
</PRE>
<P>then each of <TT>@guys</TT>, <TT>@gals</TT>, and <TT>other()</TT> knows
that it's supposed to produce a list value. </P>
<P>Finally, at run-time, each of those <I><TT>LIST</TT></I> elements produces
its list in turn, and then (this is important) all the separate lists are
joined together, end to end, into a single list. And that squashed-flat,
one-dimensional list is what is finally handed off to the function that
wanted a <I><TT>LIST</TT></I> in the first place. So if <TT>@guys</TT>
contains <TT>(Fred,Barney)</TT>, <TT>@gals</TT> contains <TT>(Wilma,Betty)</TT>,
and the <TT>other()</TT> function returns the single-element list <TT>(Dino)</TT>,
then the <I><TT>LIST</TT></I> that sort sees is </P>
<PRE>(Fred,Barney,Wilma,Betty,Dino)
</PRE>
<P>and the <I><TT>LIST</TT></I> that <B>sort</B> returns is </P>
<PRE>(Barney,Betty,Dino,Fred,Wilma)
</PRE>
<P>Some operators produce lists (like <A HREF="ch03_02.htm#PERL2-CMD-KEYS">keys</A>),
some consume them (like <A HREF="ch03_02.htm#PERL2-CMD-PRINT">print</A>),
and some transform lists into other lists (like <A HREF="ch03_02.htm#PERL2-CMD-SORT">sort</A>).
Operators in the last category can be considered filters; only, unlike
in the shell, the flow of data is from right to left, since list operators
operate on their arguments passed in from the right. You can stack up several
list operators in a row: </P>
<PRE>print reverse sort map {lc} keys %hash;
</PRE>
<P>That takes the keys of <TT>%hash</TT> and returns them to the <A HREF="ch03_02.htm#PERL2-CMD-MAP">map</A>
function, which lowercases all the keys by applying the <A HREF="ch03_02.htm#PERL2-CMD-LC">lc</A>
operator to each of them, and passes them to the <A HREF="ch03_02.htm#PERL2-CMD-SORT">sort</A>
function, which sorts them, and passes them to the <B>reverse</B> function,
which reverses the order of the list elements, and passes them to the <B>print</B>
function, which prints them. </P>
<P>As you can see, that's much easier to describe in Perl than in English.
</P>
<H2><A NAME="PERL2-CH-1-SECT-9"></A>1.9 What You Don't Know Won't Hurt
You (Much)</H2>
<P>Finally, allow us to return once more to the concept of Perl as a natural
language. Speakers of a natural language are allowed to have differing
skill levels, to speak different subsets of the language, to learn as they
go, and generally, to put the language to good use before they know the
whole language. You don't know all of Perl yet, just as you don't know
all of English. But that's Officially Okay in Perl culture. You can work
with Perl usefully, even though we haven't even told you how to write your
own subroutines yet. We've scarcely begun to explain how to view Perl as
a system management language, or a rapid prototyping language, or a networking
language, or an object-oriented language. We could write chapters about
some of these things. (Come to think of it, we already did.) </P>
<P>But in the end, you must create your own view of Perl. It's your privilege
as an artist to inflict the pain of creativity on yourself. We can teach
you how <I>we</I> paint, but we can't teach you how <I>you</I> paint. There's
More Than One Way To Do It. </P>
<P>Have the appropriate amount of fun. </P>
</BODY>
</HTML>
--
※ 来源:·BBS 水木清华站 bbs.net.tsinghua.edu.cn·[FROM: ie0.ie.ac.cn]
BBS水木清华站∶精华区