Method

GLibMatchInfofetch_pos

since: 2.14

Declaration [src]

gboolean
g_match_info_fetch_pos (
  const GMatchInfo* match_info,
  gint match_num,
  gint* start_pos,
  gint* end_pos
)

Description [src]

Returns the start and end positions (in bytes) of a successfully matching capture parenthesis.

Valid values for match_num are 0 for the full text of the match, 1 for the first paren set, 2 for the second, and so on.

As end_pos is set to the byte after the final byte of the match (on success), the length of the match can be calculated as end_pos - start_pos.

As a best practice, initialize start_pos and end_pos to identifiable values, such as G_MAXINT, so that you can test if g_match_info_fetch_pos() actually changed the value for a given capture parenthesis.

The parameter match_num corresponds to a matched capture parenthesis. The actual value you use for match_num depends on the method used to generate match_info. The following sections describe those methods.

Methods Using Non-deterministic Finite Automata Matching

The methods g_regex_match() and g_regex_match_full() return a GMatchInfo using traditional (greedy) pattern matching, also known as Non-deterministic Finite Automaton (NFA) matching. You pass the returned GMatchInfo from these methods to g_match_info_fetch_pos() to determine the start and end positions of capture parentheses. The values for match_num correspond to the capture parentheses in order, with 0 corresponding to the entire matched string.

match_num can refer to a capture parenthesis with no match. For example, the string b matches against the pattern (a)?b, but the capture parenthesis (a) has no match. In this case, g_match_info_fetch_pos() returns true and sets start_pos and end_pos to -1 when called with match_num as 1 (for (a)).

For an expanded example, a regex pattern is (a)?(.*?)the (.*), and a candidate string is glib regexes are the best. In this scenario there are four capture parentheses numbered 0–3: an implicit one for the entire string, and three explicitly declared in the regex pattern.

Given this example, the following table describes the return values from g_match_info_fetch_pos() for various values of match_num.

match_num Contents Return value Returned start_pos Returned end_pos
0 Matches entire string True 0 25
1 Does not match first character True -1 -1
2 All text before the True 0 17
3 All text after the True 21 25
4 Capture paren out of range False Unchanged Unchanged

The following code sample and output implements this example.

#include <glib.h>

int
main (int argc, char *argv[])
{
  g_autoptr(GError) local_error = NULL;
  const char *regex_pattern = "(a)?(.*?)the (.*)";
  const char *test_string = "glib regexes are the best";
  g_autoptr(GRegex) regex = NULL;

  regex = g_regex_new (regex_pattern,
                       G_REGEX_DEFAULT,
                       G_REGEX_MATCH_DEFAULT,
                       &local_error);
  if (regex == NULL)
    {
      g_printerr ("Error creating regex: %s\n", local_error->message);
      return 1;
    }

  g_autoptr(GMatchInfo) match_info = NULL;
  g_regex_match (regex, test_string, G_REGEX_MATCH_DEFAULT, &match_info);

  int n_matched_strings = g_match_info_get_match_count (match_info);

  // Print header line
  g_print ("match_num Contents                  Return value returned start_pos returned end_pos\n");

  // Iterate over each capture paren, including one that is out of range as a demonstration.
  for (int match_num = 0; match_num <= n_matched_strings; match_num++)
    {
      gboolean found_match;
      g_autofree char *paren_string = NULL;
      int start_pos = G_MAXINT;
      int end_pos = G_MAXINT;

      found_match = g_match_info_fetch_pos (match_info,
                                            match_num,
                                            &start_pos,
                                            &end_pos);

      // If no match, display N/A as the found string.
      if (start_pos == G_MAXINT || start_pos == -1)
        paren_string = g_strdup ("N/A");
      else
        paren_string = g_strndup (test_string + start_pos, end_pos - start_pos);

      g_print ("%-9d %-25s %-12d %-18d %d\n", match_num, paren_string, found_match, start_pos, end_pos);
    }

  return 0;
}
match_num Contents                  Return value returned start_pos returned end_pos
0         glib regexes are the best 1            0                  25
1         N/A                       1            -1                 -1
2         glib regexes are          1            0                  17
3         best                      1            21                 25
4         N/A                       0            2147483647         2147483647

Methods Using Deterministic Finite Automata Matching

The methods g_regex_match_all() and g_regex_match_all_full() return a GMatchInfo using Deterministic Finite Automaton (DFA) pattern matching. This algorithm detects overlapping matches. You pass the returned GMatchInfo from these methods to g_match_info_fetch_pos() to determine the start and end positions of each overlapping match. Use the method g_match_info_get_match_count() to determine the number of overlapping matches.

For example, a regex pattern is <.*>, and a candidate string is <a> <b> <c>. In this scenario there are three implicit capture parentheses: one for the entire string, one for <a> <b>, and one for <a>.

Given this example, the following table describes the return values from g_match_info_fetch_pos() for various values of match_num.

match_num Contents Return value Returned start_pos Returned end_pos
0 Matches entire string True 0 11
1 Matches <a> <b> True 0 7
2 Matches <a> True 0 3
3 Capture paren out of range False Unchanged Unchanged

The following code sample and output implements this example.

#include <glib.h>

int
main (int argc, char *argv[])
{
  g_autoptr(GError) local_error = NULL;
  const char *regex_pattern = "<.*>";
  const char *test_string = "<a> <b> <c>";
  g_autoptr(GRegex) regex = NULL;

  regex = g_regex_new (regex_pattern,
                       G_REGEX_DEFAULT,
                       G_REGEX_MATCH_DEFAULT,
                       &local_error);
  if (regex == NULL)
    {
      g_printerr ("Error creating regex: %s\n", local_error->message);
      return -1;
    }

  g_autoptr(GMatchInfo) match_info = NULL;
  g_regex_match_all (regex, test_string, G_REGEX_MATCH_DEFAULT, &match_info);

  int n_matched_strings = g_match_info_get_match_count (match_info);

  // Print header line
  g_print ("match_num Contents                  Return value returned start_pos returned end_pos\n");

  // Iterate over each capture paren, including one that is out of range as a demonstration.
  for (int match_num = 0; match_num <= n_matched_strings; match_num++)
    {
      gboolean found_match;
      g_autofree char *paren_string = NULL;
      int start_pos = G_MAXINT;
      int end_pos = G_MAXINT;

      found_match = g_match_info_fetch_pos (match_info, match_num, &start_pos, &end_pos);

      // If no match, display N/A as the found string.
      if (start_pos == G_MAXINT || start_pos == -1)
        paren_string = g_strdup ("N/A");
      else
        paren_string = g_strndup (test_string + start_pos, end_pos - start_pos);

      g_print ("%-9d %-25s %-12d %-18d %d\n", match_num, paren_string, found_match, start_pos, end_pos);
    }

  return 0;
}
match_num Contents                  Return value returned start_pos returned end_pos
0         <a> <b> <c>               1            0                  11
1         <a> <b>                   1            0                  7
2         <a>                       1            0                  3
3         N/A                       0            2147483647         2147483647

Available since: 2.14

Parameters

match_num

Type: gint

Number of the capture parenthesis.

start_pos

Type: gint*

Pointer to location where to store the start position, or NULL.

The argument will be set by the function.
The argument can be NULL.
end_pos

Type: gint*

Pointer to location where to store the end position (the byte after the final byte of the match), or NULL.

The argument will be set by the function.
The argument can be NULL.

Return value

Type: gboolean

True if match_num is within range, false otherwise. If the capture paren has a match, start_pos and end_pos contain the start and end positions (in bytes) of the matching substring. If the capture paren has no match, start_pos and end_pos are -1. If match_num is out of range, start_pos and end_pos are left unchanged.