Function

GLibstr_tokenize_and_fold

since: 2.40

Declaration

gchar**
g_str_tokenize_and_fold (
  const gchar* string,
  const gchar* translit_locale,
  gchar*** ascii_alternates
)

Description

Tokenizes string and performs folding on each token.

A token is a non-empty sequence of alphanumeric characters in the source string, separated by non-alphanumeric characters. An “alphanumeric” character for this purpose is one that matches g_unichar_isalnum() or g_unichar_ismark().

Each token is then (Unicode) normalised and case-folded. If ascii_alternates is non-NULL and some of the returned tokens contain non-ASCII characters, ASCII alternatives will be generated.

The number of ASCII alternatives that are generated and the method for doing so is unspecified, but translit_locale (if specified) may improve the transliteration if the language of the source string is known.

Available since: 2.40

Parameters

string

Type: const gchar*

A string to tokenize.

The data is owned by the caller of the function.
The value is a NUL terminated UTF-8 string.
translit_locale

Type: const gchar*

The language code (like ‘de’ or ‘en_GB’) from which string originates.

The argument can be NULL.
The data is owned by the caller of the function.
The value is a NUL terminated UTF-8 string.
ascii_alternates

Type: An array of gchar**

a return location for ASCII alternates.

The argument will be set by the function.
The argument can be NULL.
The array must be NULL-terminated.
The called function takes ownership of the data, and is responsible for freeing it.
Each element is a NUL terminated UTF-8 string.

Return value

Type: An array of utf8

The folded tokens.

The array is NULL-terminated.
The caller of the function takes ownership of the data, and is responsible for freeing it.
Each element is a NUL terminated UTF-8 string.