Struct
GLibRegex
since: 2.14
Description [src]
struct GRegex {
/* No available fields */
}
A GRegex
is the “compiled” form of a regular expression pattern.
GRegex
implements regular expression pattern matching using syntax and
semantics similar to Perl regular expression. See the
PCRE documentation for the syntax definition.
Some functions accept a start_position
argument, setting it differs
from just passing over a shortened string and setting G_REGEX_MATCH_NOTBOL
in the case of a pattern that begins with any kind of lookbehind assertion.
For example, consider the pattern “\Biss\B” which finds occurrences of “iss”
in the middle of words. (“\B” matches only if the current position in the
subject is not a word boundary.) When applied to the string “Mississipi”
from the fourth byte, namely “issipi”, it does not match, because “\B” is
always false at the start of the subject, which is deemed to be a word
boundary. However, if the entire string is passed , but with
start_position
set to 4, it finds the second occurrence of “iss” because
it is able to look behind the starting point to discover that it is
preceded by a letter.
Note that, unless you set the G_REGEX_RAW
flag, all the strings passed
to these functions must be encoded in UTF-8. The lengths and the positions
inside the strings are in bytes and not in characters, so, for instance,
“\xc3\xa0” (i.e. “à”) is two bytes long but it is treated as a
single character. If you set G_REGEX_RAW
the strings can be non-valid
UTF-8 strings and a byte is treated as a character, so “\xc3\xa0” is two
bytes and two characters long.
When matching a pattern, “\n” matches only against a “\n” character in the string, and “\r” matches only a “\r” character. To match any newline sequence use “\R”. This particular group matches either the two-character sequence CR + LF (“\r\n”), or one of the single characters LF (linefeed, U+000A, “\n”), VT vertical tab, U+000B, “\v”), FF (formfeed, U+000C, “\f”), CR (carriage return, U+000D, “\r”), NEL (next line, U+0085), LS (line separator, U+2028), or PS (paragraph separator, U+2029).
The behaviour of the dot, circumflex, and dollar metacharacters are
affected by newline characters, the default is to recognize any newline
character (the same characters recognized by “\R”). This can be changed
with G_REGEX_NEWLINE_CR
, G_REGEX_NEWLINE_LF
and G_REGEX_NEWLINE_CRLF
compile options, and with G_REGEX_MATCH_NEWLINE_ANY
,
G_REGEX_MATCH_NEWLINE_CR
, G_REGEX_MATCH_NEWLINE_LF
and
G_REGEX_MATCH_NEWLINE_CRLF
match options. These settings are also
relevant when compiling a pattern if G_REGEX_EXTENDED
is set, and an
unescaped “#” outside a character class is encountered. This indicates
a comment that lasts until after the next newline.
Creating and manipulating the same GRegex
structure from different
threads is not a problem as GRegex
does not modify its internal
state between creation and destruction, on the other hand GMatchInfo
is not threadsafe.
The regular expressions low-level functionalities are obtained through the excellent PCRE library written by Philip Hazel.
Available since: 2.14
Constructors
g_regex_new
Compiles the regular expression to an internal form, and does
the initial setup of the GRegex
structure.
since: 2.14
Functions
g_regex_check_replacement
Checks whether replacement
is a valid replacement string
(see g_regex_replace()), i.e. that all escape sequences in
it are valid.
since: 2.14
g_regex_escape_nul
Escapes the nul characters in string
to “\x00”. It can be used
to compile a regex with embedded nul characters.
since: 2.30
g_regex_escape_string
Escapes the special characters used for regular expressions
in string
, for instance “a.b*c” becomes “a.b*c”. This
function is useful to dynamically generate regular expressions.
since: 2.14
g_regex_split_simple
Breaks the string on the pattern, and returns an array of the tokens. If the pattern contains capturing parentheses, then the text for each of the substrings will also be returned. If the pattern does not match anywhere in the string, then the whole string is returned as the first token.
since: 2.14
Instance methods
g_regex_get_has_cr_or_lf
Checks whether the pattern contains explicit CR or LF references.
since: 2.34
g_regex_get_max_backref
Returns the number of the highest back reference in the pattern, or 0 if the pattern does not contain back references.
since: 2.14
g_regex_get_max_lookbehind
Gets the number of characters in the longest lookbehind assertion in the pattern. This information is useful when doing multi-segment matching using the partial matching facilities.
since: 2.38
g_regex_get_pattern
Gets the pattern string associated with regex
, i.e. a copy of
the string passed to g_regex_new().
since: 2.14
g_regex_match
Scans for a match in string
for the pattern in regex
.
The match_options
are combined with the match options specified
when the regex
structure was created, letting you have more
flexibility in reusing GRegex
structures.
since: 2.14
g_regex_match_all
Using the standard algorithm for regular expression matching only the longest match in the string is retrieved. This function uses a different algorithm so it can retrieve all the possible matches. For more documentation see g_regex_match_all_full().
since: 2.14
g_regex_match_all_full
Using the standard algorithm for regular expression matching only
the longest match in the string
is retrieved, it is not possible
to obtain all the available matches. For instance matching
"<a> <b> <c>"
against the pattern "<.*>"
you get "<a> <b> <c>"
.
since: 2.14
g_regex_match_full
Scans for a match in string
for the pattern in regex
.
The match_options
are combined with the match options specified
when the regex
structure was created, letting you have more
flexibility in reusing GRegex
structures.
since: 2.14
g_regex_replace
Replaces all occurrences of the pattern in regex
with the
replacement text. Backreferences of the form \number
or
\g<number>
in the replacement text are interpolated by the
number-th captured subexpression of the match, \g<name>
refers
to the captured subexpression with the given name. \0
refers
to the complete match, but \0
followed by a number is the octal
representation of a character. To include a literal \
in the
replacement, write \\\\
.
since: 2.14
g_regex_replace_eval
Replaces occurrences of the pattern in regex with the output of
eval
for that occurrence.
since: 2.14
g_regex_replace_literal
Replaces all occurrences of the pattern in regex
with the
replacement text. replacement
is replaced literally, to
include backreferences use g_regex_replace().
since: 2.14
g_regex_split
Breaks the string on the pattern, and returns an array of the tokens. If the pattern contains capturing parentheses, then the text for each of the substrings will also be returned. If the pattern does not match anywhere in the string, then the whole string is returned as the first token.
since: 2.14
g_regex_split_full
Breaks the string on the pattern, and returns an array of the tokens. If the pattern contains capturing parentheses, then the text for each of the substrings will also be returned. If the pattern does not match anywhere in the string, then the whole string is returned as the first token.
since: 2.14
g_regex_unref
Decreases reference count of regex
by 1. When reference count drops
to zero, it frees all the memory associated with the regex structure.
since: 2.14