Core Function Scanf
Scanf( <expression>, <def> )
Contents |
Description
Parses input from a string according to a format.
Parameters
expression
The string to evaluate.
def
The formation string containing the definition of how to parse the string.
extra ...
Optionally pass in variables by reference that will contain the parsed values.
Return Value
If using the extra params
Success: Returns number of matches and fills in the extra variables (Will make some 0 if there was no match found for that variable).
Failure: Returns 0.
If NOT using the extra params
Success: Returns array of all captured objects from the parsed string.
Failure: Returns empty array.
Remarks
If only two parameters were passed to this function, the values parsed will be returned as an array. Otherwise, if optional parameters are passed, the function will return the number of assigned values.
If there are more substrings expected in the format than there are available within str, they will be ignored and you will get what did match.
C and C++ developers have used the scanf() family of functions (scanf(), sscanf(), fscanf(), etc.) as a quick and easy way to parse well-structured input. The basic idea is to be able to specify the format of an input string in a way that allows the function to extract fields from that string.
For example, if your input string is "X123 Y456", you could specify a format string of "X%d Y%d" to extract the two numeric values, 123 and 456. The "%d" tells the function to extract a decimal value at the current location. The parser reads the decimal value until a non-digit character is encountered. The values are then assigned to variables and returned to the caller. In this example, the X and Y are character literals. These are characters in the input string expected to match the same characters in the format string.
Processing stops either when the end of the format string is reached, or when characters in the input string cannot be processed according to the format string.
To be sure, there are limits to this approach. For example, you wouldn't use scanf() to parse source code. It works best will well-structured input that can readily be defined into fields. Thoes who like the regular expressions can use those for parsing well-structured text. However, for cases where scanf() works, or for developers who are accustomed to using scanf(), it provides a simple and convenient way to parse many types of text.
The scanf() Format String
The scanf() format string provides a flexible way to describe the fields in the input string. Although there are standards, different C compilers seemed to have slightly different rules about the meaning of some parts of the format string. The following definition is for format strings used by the this scanf().
Characters Description Whitespace Any whitespace characters in the format string causes the position to advance to the next non-whitespace character in the input string. Whitespace characters include spaces, tabs and new lines. Non-Whitespace except percent (%) Any character that is not a whitespace character or part of a format specifier (which begins with a % character) advances past the same matching character in the input string. Format specifier A sequence that begins with a percent sign (%) to signify a format specifier, or field, that will be parsed and stored in a variable. A format specifier has the following form.
%[*][width][modifiers]type
Items within square brackets ([]) are optional. The following table describes elements within the format specifier.
Element Meaning * Indicates that this field is parsed normally but not stored in a variable. width Specifies the maximum number of characters to be read for this field. modifiers If supplied, modifies the size of the data type where the field is stored. If not supplied, the default size is used. Supported modifiers are listed below. hh: For integer fields, the result is stored in an 8-bit variable. Ignored for floating point fields. h: For integer fields, the result is stored in a 16-bit variable. Ignored for floating point fields. l For integer fields, the result is stored in a 64-bit variable. Floating point fields are stored in a double. ll Same effect as the l modifier. width Specifies the maximum number of characters to include in this field. type Specifies the field type as described in the following table.
Type Meaning c Reads a single character. If a width > 1 is specified, an array of characters is read. d, i Reads a decimal integer. Number may begin with 0 (octal), 0x (hexadecimal) or a + or - sign. e, E, f, g,G Reads a floating point variable. Number may begin with a + or - sign, and may be written using exponential notation. o Reads an unsigned octal integer s Reads a string of characters up to the end of the input string, the next whitespace character, or until the number of characters specified for the width has been read. u Reads an unsigned decimal integer. Number may begin with 0 (octal), 0x (hexadecimal) or a + sign. x, X Reads an unsigned hexadecimal integer. [] Reads a string of characters that are included within square brackets. For example, "[abc]" will read all characters that are either a, b, or c. Use "[^abc]" to read all character that are not a, b, or c. If the first character after "[" or after "[^" is "]", the closing square bracket is considered to be one of the characters rather than the end of the scanset. This supports macros such as [a-z] will read any letter between a-z so if you wanted to read only hex chars you could enter [a-zA-Z0-9] and it would work.
Example
my $RET = Scanf("X123 Y456", "X%d Y%d"); printr($RET); my $RET = Scanf("Copyright 2009-2011 CompanyName (Multi-Word Message)", "Copyright %d-%d %s (%[^)]"); printr($RET);
Not using optional parameters
// getting the serial number list($serial) = Scanf("SN/2350001", "SN/%d"); // and the date of manufacturing $mandate = "January 01 2000"; list($month, $day, $year) = Scanf($mandate, "%s %d %d"); println("Item $serial was manufactured on: $year-" . substr($month, 0, 3) . "-$day");
Using optional parameters If optional parameters are passed, the function will return the number of assigned values.
// get author info and generate DocBook entry $auth = "24\tLewis Carroll"; $n = Scanf($auth, "%d\t%s %s", $id, $first, $last); print("<author id='$id'> <firstname>$first</firstname> <surname>$last</surname> </author>\n");
Example of how to parse a file name without getting the . trapped inside the first %s
$out = scanf('file_name.gif', 'file_%[^.].%s', $fpart1, $fpart2); println("Name '$fpart1' Ext '$fpart2'");
Example of using [] to spawn a character set
$date = 'january-2008'; // notice it is scanning for all characters a-z and uppercase A-Z // so it will match any case of the month name Scanf($date, '%[a-zA-Z]-%d', $month, $year); println("Parsed values: '$month', '$year'");