String Functions

PL/SQL provides a rich set of string functions that allow you to get information about strings and modify the contents of those strings in very high-level, powerful ways. Table 8-3 shows the string functions covered in detail in this section. The remaining functions are specific to Oracle's National Language Support (NLS) and Trusted Oracle features. National Language Support functions are covered later in this chapter in Section 8.5.

Table 8-3. String functions
Name Description
ASCII Returns the ASCII code of a character.
ASCIISTR Converts a string to a string of ASCII characters.
CHR Returns the character associated with the specified collating code.
COMPOSE Takes a Unicode string and returns it in a fully normalized form.
CONCAT Concatenates two strings into one.
DECOMPOSE Takes a Unicode string and returns it in with any precomposed characters decomposed into separate elements.
GREATEST Returns the string that would come last (if sorted alphabetically).
INITCAP Sets the first letter of each word to uppercase. All other letters are set to lowercase.
INSTR, INSTRB, INSTRC, INSTR2, INSTR4 Returns the location in a string of the specified substring.
LEAST Returns the string that would come first (if sorted alphabetically).
LENGTH, LENGTHB, LENGTHC, LENGTH2, LENGTH4 Returns the length of a string.
LOWER Converts all letters to lowercase.
LPAD Pads a string on the left with the specified characters.
LTRIM Trims the left side of a string of all specified characters.
REPLACE Replaces a character sequence in a string with a different set of characters.
RPAD Pads a string on the right with the specified characters.
RTRIM Trims the right side of a string of all specified characters.
SOUNDEX Returns the "soundex" of a string.
SUBSTR, SUBSTRB, SUBSTRC, SUBSTR2, SUBSTR4 Returns the specified portion of a string.
TO_CHAR Converts national character set data into its database character set equivalent.
TO_MULTI_BYTE Takes a string as input and, where possible, converts any single-byte characters to their multibyte equivalents. In UTF-8, for example, many characters can be represented using one byte or using multiple bytes.
TO_SINGLE_BYTE Does the reverse of TO_MULTI_BYTE, converting multibyte characters to their single-byte equivalents.
TRANSLATE Translates single characters in a string to different characters.
TRANSLATE...USING Translates character data between character sets.
TRIM Combines functionality of RTRIM and LTRIM.
UNISTR Does the reverse of ASCIISTR, converting a string into Unicode.
UPPER Converts all letters in the string to uppercase.

The following sections briefly describe each of the PL/SQL string functions.

ASCII  
   

The ASCII function returns the NUMBER code that represents the specified character in the database character set. The specification of the ASCII function is:

FUNCTION ASCII (single_character IN VARCHAR2) RETURN NUMBER

where single_character is the character to be located in the collating sequence. Even though the function is named ASCII, it will return the code location in whatever the database character set is set to, such as EBCDIC Code Page 500 or 7-bit ASCII. Remember that the collating code for uppercase letters is different from that for lowercase letters. For example, in the 7-bit ASCII character set, ASCII ("a") returns 97. ASCII ("A") returns 65 because the uppercase letters come before the lowercase letters in the sequence.

If you pass more than one character in the parameter to ASCII, it returns the collating code for the first character and ignores the other characters. As a result, the following calls to ASCII all return the same value of 100:

ASCII ('defg')ASCII ('d')ASCII ('d_e_f_g')
ASCIISTR  
   

The ASCIISTR function takes a string in any character set and converts it into a string of ASCII characters. Any non-ASCII characters are represented using the form \XXXX, where XXXX represents the Unicode value for the character. The specification for ASCIISTR is:

FUNCTION ASCIISTR (string1 IN VARCHAR2) RETURN VARCHAR2;

Following is an example of ASCIISTR in use:

BEGIN DBMS_OUTPUT.PUT_LINE( ASCIISTR('The letter ã is not an ASCII character.') );END;

The output is:

The letter \00E3 is not an ASCII character.

The UNISTR function, described later in this chapter, is the inverse of ASCIISTR.

String Functions - student2.ru For information on Unicode, including the underlying bytecodes used to represent characters in the Unicode character set, visit http://unicode.org.
CHR  
   

The CHR function is the inverse of ASCII. It returns a VARCHAR2 character (length 1) that corresponds to the location in the collating sequence provided as a parameter. The specification of the CHR function is:

FUNCTION CHR (code_location IN NUMBER) RETURN VARCHAR2

where code_location is the number specifying the location in the collating sequence.

The CHR function is especially valuable when you need to make reference to a nonprintable character in your code. For example, the location in the standard ASCII collating sequence for the newline character is 10 (the ASCII linefeed). The CHR function gives you a way to search for the newline character in a string, and perform operations on a string based on the presence of that control character.

You can also insert a linefeed into a character string using the CHR function. For example, suppose you have to build a report that displays the address of a company. A company can have up to four address strings (in addition to city, state, and zip code). You need to put each address string on a new line, but you don't want any blank lines embedded in the address. The following SELECT will not do the trick:

SELECT name, address1, address2, address3, address4, city || ', ' || state || ' ' || zipcode location FROM company;

Assuming that each column (report field) goes on a new line, you will end up using six lines per address, no matter how many of these address strings are NULL. For example:

HAROLD HENDERSON22 BUNKER COURTSUITE 100 WYANDANCH, MN 66557

You can use the CHR function to suppress these internal blank lines as follows:

SELECT name || DECODE (address1, NULL, NULL, CHR (10) || address1) || DECODE (address2, NULL, NULL, CHR (10) || address2) || DECODE (address3, NULL, NULL, CHR (10) || address3) || DECODE (address4, NULL, NULL, CHR (10) || address4) || CHR (10) || city || ', ' || state || ' ' || zipcode FROM company;

Now the query returns a single formatted column per company. The DECODE statement offers IF-THEN logic within SQL and executes as follows: "If the address string is NULL then concatenate NULL; otherwise insert a linefeed character concatenated with the address string." In this way, blank address lines are ignored, and the address will be scrunched down to:

HAROLD HENDERSON22 BUNKER COURTSUITE 100WYANDANCH, MN 66557

By default, the CHR function translates the character code you specify into a character from the database character set. If you wish to use a character from the national character set instead, you can use the USING NCHAR_CS clause:

DECLARE x NVARCHAR2(30);BEGIN x := CHR(65 USING NCHAR_CS); ...

The NCHR function also provides this same functionality.

COMPOSE  
   

COMPOSE takes a Unicode string as input and returns that string in its fully normalized form. The specification for COMPOSE is:

FUNCTION COMPOSE (string1 IN VARCHAR2) RETURN VARCHAR2

Unicode allows certain characters to be represented in multiple ways. For example, the ã character that you've seen throughout this chapter can be represented in the following two ways:

ã = \00E3ã = a\0303

In the first case, \00E3 is the Unicode code point (in hexadecimal) for the character ã. In the second case, \0303 is a nonspacing Unicode character representing the addition of a ~ to the preceding character. Thus, ã is built up starting with a simple "a" and then adding the ~ on top. The complete character \00E3 is known as a precomposed character. The COMPOSE function takes decomposed strings such as "a\0303" and converts them into their composed counterparts. For example:

DECLARE x VARCHAR2(30); y VARCHAR2(30);BEGIN SELECT COMPOSE(UNISTR('a\0303 is a composed character')), ASCIISTR(COMPOSE(UNISTR('a\0303 is a composed character'))) INTO x, y FROM dual; DBMS_OUTPUT.PUT_LINE(x); DBMS_OUTPUT.PUT_LINE(y);END;

The output is:

ã is a composed character\00E3 is a composed character
String Functions - student2.ru In Oracle9i Release 1, COMPOSE must be called from a SQL statement; it cannot be used in a PL/SQL expression. In Oracle9i Release 2, however, you can invoke COMPOSE from a PL/SQL expression.
CONCAT  
   

The CONCAT function concatenates by taking two VARCHAR2 strings and returning them appended together in the order specified. The specification of the CONCAT function is:

FUNCTION CONCAT (string1 IN VARCHAR2, string2 IN VARCHAR2) RETURN VARCHAR2

CONCAT always appends string2 to the end of string1. If either string is NULL, CONCAT returns the non-NULL argument all by its lonesome. If both strings are NULL, CONCAT returns NULL. Here are some examples of uses of CONCAT (where --> means that the function returns the value shown):

CONCAT ('abc', 'defg') --> 'abcdefg'CONCAT (NULL, 'def') --> 'def'CONCAT ('ab', NULL) --> 'ab'CONCAT (NULL, NULL) --> NULL

I have a confession to make about CONCAT: I have never used it in all my years of PL/SQL coding. In fact, I never even noticed it was available until I did the research for this book. How can this be? I certainly have performed many acts of concatenation in my time. However, PL/SQL (and the Oracle RDBMS) offers a secondconcatenation operator—the double vertical bars (||). For example:

DECLARE x VARCHAR2(100);BEGIN x := 'abc' || 'def' || 'ghi'; DBMS_OUTPUT.PUT_LINE(x);END;

The output is:

abcdefghi

To perform the identical concatenation using CONCAT, we'd need to nest one call to CONCAT inside another:

x := CONCAT(CONCAT('abc','def'),'ghi');

You can see that the || operator is not only much easier to use than CONCAT, but results in much more readable code.

DECOMPOSE  
   

DECOMPOSE is the opposite of the COMPOSE function. It takes a Unicode string as input, and returns that string with anyprecomposed characters decomposed into their separate elements. Its specification is:

FUNCTION DECOMPOSE(string1 IN VARCHAR2) RETURN VARCHAR2

The following example undoes the effect of the COMPOSE example shown earlier:

DECLARE x VARCHAR2(30);BEGIN x := 'ã is a composed character'; SELECT ASCIISTR(DECOMPOSE(x)) INTO x FROM dual; DBMS_OUTPUT.PUT_LINE(x);END;

The output is:

a\0303 is a composed character

In this case, the character ã in the input string has been decomposed into the letter "a" followed by the nonspacing character \0303 (indicating the tilde).

String Functions - student2.ru Like COMPOSE, DECOMPOSE cannot be invoked directly from a PL/SQL expression in Oracle 9i Release 1; you must invoke it from a SQL statement. This restriction is removed in Oracle9i Release 2.
GREATEST  
   

GREATEST takes one or more strings as input, and returns the string that would come last (i.e., that is the greatest) if the inputs were to be sorted in ascending order. The ordering of the strings is based on character code in the database character set. GREATEST has the following specification:

FUNCTION GREATEST (string1 IN VARCHAR2, string2 IN VARCHAR2,...) RETURN VARCHAR2

Following is an example:

BEGIN DBMS_OUTPUT.PUT_LINE(GREATEST('Jonathan','Steven','Bill'));END; Steven

Also see the LEAST function, which is the opposite of GREATEST.

INITCAP  
   

The INITCAP function reformats the case of the string argument, setting the first letter of each word to uppercase and the remainder of the letters to lowercase. A word is a set of characters separated by a space or nonalphanumeric character (such as # or _ ). The specification of INITCAP is:

FUNCTION INITCAP (string_in IN VARCHAR2) RETURN VARCHAR2

Here are some examples of the impact of INITCAP on your strings:

· Shift all lowercase to mixed case:

INITCAP ('this is lower') --> 'This Is Lower'

· Shift all uppercase to mixed case:

INITCAP ('BIG>AND^TALL') --> 'Big>And^Tall'

· Shift a confusing blend of cases to consistent initcap format:

INITCAP ('wHatISthis_MESS?') --> 'Whatisthis_Mess?'

· Create Visual Basic-style variable names (I use REPLACE, explained later, to strip out the embedded spaces):

· REPLACE (INITCAP ('ALMOST_UNREADABLE_VAR_NAME'), '_', NULL) --> 'AlmostUnreadableVarName'

When and why would you use INITCAP? Many Oracle shops like to store all character string data, such as names and addresses, in uppercase for consistency. This practice makes it easier to search for records that match certain criteria.

The problem with storing all the data in uppercase is that, while it is a convenient "machine format," it is not particularly readable or presentable. How easy is it to scan a page of information that looks like the following?

CUSTOMER TRACKING LIST - GENERATED ON 12-MAR-1994LAST PAYMENT WEDNESDAY: PAUL JOHNSON, 123 MADISON AVE - $1200LAST PAYMENT MONDAY: HARRY SIMMERSON, 555 AXELROD RD - $1500

It is hard for the eye to pick out the individual words and different types of information; all that text just blends together. Furthermore, solid uppercase has a "machine" or even "mainframe" feel to it; you'd never actually type it that way. A mixture of upper- and lowercase can make your output much more readable and friendly in appearance:

Customer Tracking List - Generated On 12-Mar-1994Last Payment Wednesday: Paul Johnson, 123 Madison Ave - $1200Last Payment Monday: Harry Simmerson, 555 Axelrod Rd - $1500

Can you see any problems with using INITCAP to format output? There are a couple of drawbacks to the way it works. First, as with the string "BIG AND TALL", INITCAP is not very useful for generating titles because it doesn't know that little words like "and" and "the" should not be capitalized. That is a relatively minor problem compared with the second one: INITCAP is completely ignorant of real-world surname conventions. Names with internal capital letters, in particular, cannot be generated with INITCAP. Consider the following example in which the "D" in "McDonald's" ends up in lowercase.

INITCAP ('HAMBURGERS BY THE BILLIONS AT MCDONALDS') --> 'Hamburgers By The Billions At Mcdonalds'

For these reasons, use INITCAP with caution when printing reports or displaying data. The information it produces may not always be formatted correctly.

INSTR, INSTRB, INSTRC, INSTR2, and INSTR4  
   

The INSTR family of functions allow you to search a string to find a match for a substring. If the substring is found, the functions return the position, in the source string, of the first character of the substring. If there is no match, then the functions return 0.

The five INSTR functions differ only in terms of how they look at the string and substring:

INSTR

Strings consist of characters. The return value indicates the character position at which the substring is found.

INSTRB

Strings consist ofbytes. The return value indicates the byte position at which the substring is found.

INSTRC

Strings consist of Unicode characters. Decomposed Unicode characters are recognized (e.g., a\0303 is recognized as being the same as \00E3 or ã).

INSTR2

Looks at strings in terms of Unicode code units.

INSTR4

Looks at strings in terms of Unicode code points.

All of the INSTR functions share the same specification:

FUNCTION INSTR (string1 IN VARCHAR2, string2 IN VARCHAR2 [,start_position IN NUMBER := 1 [, nth_appearance IN NUMBER := 1]])RETURN NUMBER

where string1 is the string searched by INSTR for the position in which the nth_appearance of string2 is found. The start_position parameter is the character (not byte) position in the string where the search will start. It is optional and defaults to 1 (the beginning of string1). The nth_appearance parameter is also optional and defaults to 1.

Both the start_position and the nth_appearance parameters can be literals (like 5 or 157), variables, or complex expressions, as follows:

INSTR (company_name, 'INC', (last_location + 5) * 10)

If start_position is negative, then INSTR counts back start_position number of characters from the end of the string, and then searches from that point toward the beginning of the string for the nth match. Figure 8-2 illustrates the two directions in which INSTR searches, depending on whether the start_position parameter is positive or negative.

Наши рекомендации