Strings
Now what was all that "Hello, " + name + "!" stuff about? The first program in this chapter was simplyprint("Hello, world!")
It is customary to begin with a program like this in programming tutorials. The problem is that I haven’t really explained how it works yet. You know the basics of the print statement (I’ll have more to say about that later), but what is "Hello, world!"? It’s called a string (as in “a string of characters”). Strings are found in almost every useful, real-world Python program and have many uses. Their main use is to represent bits of text, such as the exclamation “Hello, world!”Single-Quoted Strings and Escaping Quotes Strings are values, just as numbers are:
>>> "Hello, world!"
'Hello, world!'
There is one thing that may be a bit surprising about this example, though: when Python printed out our string, it used single quotes, whereas we used double quotes. What’s the difference? Actually, there is no difference.
>>> 'Hello, world!'
'Hello, world!'
Here, we use single quotes, and the result is the same. So why allow both? Because in some cases it may be useful.
>>> "Let's go!"
"Let's go!"
>>> '"Hello, world!" she said'
'"Hello, world!" she said'
In the preceding code, the first string contains a single quote (or an apostrophe, as we should perhaps call it in this context), and therefore we can’t use single quotes to enclose the string. If we did, the interpreter would complain (and rightly so).
>>> 'Let's go!'
SyntaxError: invalid syntax
Here, the string is 'Let', and Python doesn’t quite know what to do with the following s (or the rest of the line, for that matter).
In the second string, we use double quotes as part of our sentence. Therefore, we have to use single quotes to enclose our string, for the same reasons as stated previously. Or, actually we don’t have to. It’s just convenient. An alternative is to use the backslash character (\) to escape the quotes in the string, like this:
>>> 'Let\'s go!'
"Let's go!"
Python understands that the middle single quote is a character in the string and not the end of the string. (Even so, Python chooses to use double quotes when printing out the string.) The same works with double quotes, as you might expect.
>>> "\"Hello, world!\" she said"
'"Hello, world!" she said'
Escaping quotes like this can be useful, and sometimes necessary. For example, what would you do without the backslash if your string contained both single and double quotes, as in the string 'Let\'s say "Hello, world!"'?■ Note tired of backslashes? as you will see later in this chapter, you can avoid most of them by using long strings and raw strings (which can be combined).
Concatenating Strings Just to keep whipping this slightly tortured example, let me show you another way of writing the same string:
>>> "Let's say " '"Hello, world!"'
'Let\'s say "Hello, world!"'
I’ve simply written two strings, one after the other, and Python automatically concatenates them (makes them into one string). This mechanism isn’t used very often, but it can be useful at times. However, it works only when you actually write both strings at the same time, directly following one another.
>>> x = "Hello, "
>>> y = "world!"
>>> x y
SyntaxError: invalid syntax
In other words, this is just a special way of writing strings, not a general method of concatenating them. How, then, do you concatenate strings? Just like you add numbers:
>>> "Hello, " + "world!"
'Hello, world!'
>>> x = "Hello, "
>>> y = "world!"
>>> x + y
'Hello, world!'
String Representations, str and repr Throughout these examples, you have probably noticed that all the strings printed out by Python are still quoted. That’s because it prints out the value as it might be written in Python code, not how you would like it to look for the user. If you use print, however, the result is different.
>>> "Hello, world!"
'Hello, world!'
>>> print("Hello, world!")
Hello, world!
The difference is even more obvious if we sneak in the special linefeed character code \n.>>> "Hello,\nworld!"
'Hello,\nworld!'
>>> print("Hello,\nworld!")
Hello,
world!
Values are converted to strings through two different mechanisms. You can access both mechanisms yourself, by using the functions str and repr.9 With str, you convert a value into a string in some reasonable fashion that will probably be understood by a user, for example, converting any special character codes to the corresponding characters, where possible. If you use repr, however, you will generally get a representation of the value as a legal Python expression.
>>> print(repr("Hello,\nworld!"))
'Hello,\nworld!'
>>> print(str("Hello,\nworld!"))
Hello,
world!
Long Strings, Raw Strings, and bytes
There are some useful, slightly specialized ways of writing strings. For example, there’s a custom syntax for writing strings that include newlines (long strings) or backslashes (raw strings). In Python 2, there was also a separate syntax for writing strings with special symbols of different kinds, producing objects of the unicode type. The syntax still works but is now redundant, because all strings in Python 3 are Unicode strings. Instead, a new syntax has been introduced to specify a bytes object, roughly corresponding to the oldschool strings. As we shall see, these still play an important part in the handling of Unicode encodings.
Long Strings
If you want to write a really long string, one that spans several lines, you can use triple quotes instead of ordinary quotes.
print('''This is a very long string. It continues here. And it's not over yet. "Hello, world!" Still here.''')
You can also use triple double quotes, """like this""". Note that because of the distinctive enclosing quotes, both single and double quotes are allowed inside, without being backslash-escaped.■ Tip Ordinary strings can also span several lines. If the last character on a line is a backslash, the line break itself is “escaped” and ignored. For example:
print("Hello, \ world!")
Actually, str is a class, just like int. repr, however, is a function.
would print out Hello, world!. the same goes for expressions and statements in general.
>>> 1 + 2 + \ 4 + 5 12 >>> print \ ('Hello, world') Hello, world
Raw Strings
Raw strings aren’t too picky about backslashes, which can be very useful sometimes.10 In ordinary strings, the backslash has a special role: it escapes things, letting you put things into your string that you couldn’t normally write directly. For example, as we’ve seen, a newline is written \n and can be put into a string like this:
>>> print('Hello,\nworld!')
Hello, world!
This is normally just dandy, but in some cases, it’s not what you want. What if you wanted the string to include a backslash followed by an n? You might want to put the DOS pathname C:\nowhere into a string.
>>> path = 'C:\nowhere'
>>> path 'C:\nowhere'
This looks correct, until you print it and discover the flaw.
>>> print(path)
C: owhere
It’s not exactly what we were after, is it? So what do we do? We can escape the backslash itself.
>>> print('C:\\nowhere')
C:\nowhere
This is just fine. But for long paths, you wind up with a lot of backslashes.path = 'C:\\Program Files\\fnord\\foo\\bar\\baz\\frozz\\bozz'
Raw strings are useful in such cases. They don’t treat the backslash as a special character at all. Every character you put into a raw string stays the way you wrote it.
Raw strings can be especially useful when writing regular expressions.
>>> print(r'C:\nowhere')
C:\nowhere
>>> print(r'C:\Program Files\fnord\foo\bar\baz\frozz\bozz')
C:\Program Files\fnord\foo\bar\baz\frozz\bozz
As you can see, raw strings are prefixed with an r. It would seem that you can put anything inside a raw string, and that is almost true. Quotes must be escaped as usual, although that means you get a backslash in your final string, too.
>>> print(r'Let\'s go!')
Let\'s go!
The one thing you can’t have in a raw string is a lone, final backslash. In other words, the last character in a raw string cannot be a backslash unless you escape it (and then the backslash you use to escape it will be part of the string, too). Given the previous example, that ought to be obvious. If the last character (before the final quote) is an unescaped backslash, Python won’t know whether or not to end the string.
>>> print(r"This is illegal\")
SyntaxError: EOL while scanning string literal
Okay, so it’s reasonable, but what if you want the last character in your raw string to be a backslash? (Perhaps it’s the end of a DOS path, for example.) Well, I’ve given you a whole bag of tricks in this section that should help you solve that problem, but basically you need to put the backslash in a separate string. A simple way of doing that is the following:
Comments
Post a Comment