Data Structures

Data Structures

GLib includes a number of basic data structures, such as arrays, linked lists, hash tables, queues, trees, etc.

Arrays

GLib arrays (GArray) are similar to standard C arrays, except that they grow automatically as elements are added.

Array elements can be of any size (though all elements of one array are the same size), and the array can be automatically cleared to ‘0’s and zero-terminated.

To create a new array use g_array_new().

To add elements to an array with a cost of O(n) at worst, use g_array_append_val(), g_array_append_vals(), g_array_prepend_val(), g_array_prepend_vals(), g_array_insert_val() and g_array_insert_vals().

To access an element of an array in O(1) (to read it or to write it), use g_array_index().

To set the size of an array, use g_array_set_size().

To free an array, use g_array_unref() or g_array_free().

All the sort functions are internally calling a quick-sort (or similar) function with an average cost of O(n log(n)) and a worst case cost of O(n^2).

Here is an example that stores integers in a GArray:

GArray *array;
int i;

// We create a new array to store int values.
// We don't want it zero-terminated or cleared to 0's.
array = g_array_new (FALSE, FALSE, sizeof (int));

for (i = 0; i < 10000; i++)
  {
    g_array_append_val (array, i);
  }

for (i = 0; i < 10000; i++)
  {
    if (g_array_index (array, int, i) != i)
      g_print ("ERROR: got %d instead of %d\n",
               g_array_index (array, int, i), i);
  }

g_array_free (array, TRUE);

Pointer Arrays

Pointer Arrays (GPtrArray) are similar to Arrays but are used only for storing pointers.

If you remove elements from the array, elements at the end of the array are moved into the space previously occupied by the removed element. This means that you should not rely on the index of particular elements remaining the same. You should also be careful when deleting elements while iterating over the array.

To create a pointer array, use g_ptr_array_new().

To add elements to a pointer array, use g_ptr_array_add().

To remove elements from a pointer array, use g_ptr_array_remove(), g_ptr_array_remove_index() or g_ptr_array_remove_index_fast().

To access an element of a pointer array, use g_ptr_array_index().

To set the size of a pointer array, use g_ptr_array_set_size().

To free a pointer array, use g_ptr_array_unref() or g_ptr_array_free().

An example using a GPtrArray:

GPtrArray *array;
char *string1 = "one";
char *string2 = "two";
char *string3 = "three";

array = g_ptr_array_new ();
g_ptr_array_add (array, (gpointer) string1);
g_ptr_array_add (array, (gpointer) string2);
g_ptr_array_add (array, (gpointer) string3);

if (g_ptr_array_index (array, 0) != (gpointer) string1)
  g_print ("ERROR: got %p instead of %p\n",
           g_ptr_array_index (array, 0), string1);

g_ptr_array_free (array, TRUE);

Byte Arrays

GByteArray is a mutable array of bytes based on GArray, to provide arrays of bytes which grow automatically as elements are added.

To create a new GByteArray use g_byte_array_new().

To add elements to a GByteArray, use g_byte_array_append() and g_byte_array_prepend().

To set the size of a GByteArray, use g_byte_array_set_size().

To free a GByteArray, use g_byte_array_unref() or g_byte_array_free().

An example for using a GByteArray:

GByteArray *array;
int i;

array = g_byte_array_new ();
for (i = 0; i < 10000; i++)
  {
    g_byte_array_append (array, (guint8*) "abcd", 4);
  }

for (i = 0; i < 10000; i++)
  {
    g_assert (array->data[4*i] == 'a');
    g_assert (array->data[4*i+1] == 'b');
    g_assert (array->data[4*i+2] == 'c');
    g_assert (array->data[4*i+3] == 'd');
  }

g_byte_array_free (array, TRUE);

See GBytes if you are interested in an immutable object representing a sequence of bytes.

Singly-linked Lists

The GSList structure and its associated functions provide a standard singly-linked list data structure. The benefit of this data structure is to provide insertion/deletion operations in O(1) complexity where access/search operations are in O(n). The benefit of GSList over GList (doubly-linked list) is that they are lighter in space as they only need to retain one pointer but it double the cost of the worst case access/search operations.

Each element in the list contains a piece of data, together with a pointer which links to the next element in the list. Using this pointer it is possible to move through the list in one direction only (unlike the doubly-linked lists, which allow movement in both directions).

The data contained in each element can be either integer values, by using one of the Type Conversion Macros, or simply pointers to any type of data.

Note that most of the GSList functions expect to be passed a pointer to the first element in the list. The functions which insert elements return the new start of the list, which may have changed.

There is no function to create a GSList. NULL is considered to be the empty list so you simply set a GSList* to NULL.

To add elements, use g_slist_append(), g_slist_prepend(), g_slist_insert() and g_slist_insert_sorted().

To remove elements, use g_slist_remove().

To find elements in the list use g_slist_last(), g_slist_next(), g_slist_nth(), g_slist_nth_data(), g_slist_find() and g_slist_find_custom().

To find the index of an element use g_slist_position() and g_slist_index().

To call a function for each element in the list use g_slist_foreach().

To free the entire list, use g_slist_free().

Doubly-linked Lists

The GList structure and its associated functions provide a standard doubly-linked list data structure. The benefit of this data-structure is to provide insertion/deletion operations in O(1) complexity where access/search operations are in O(n). The benefit of GList over GSList (singly-linked list) is that the worst case on access/search operations is divided by two which comes at a cost in space as we need to retain two pointers in place of one.

Each element in the list contains a piece of data, together with pointers which link to the previous and next elements in the list. Using these pointers it is possible to move through the list in both directions (unlike the singly-linked GSList, which only allows movement through the list in the forward direction).

The doubly-linked list does not keep track of the number of items and does not keep track of both the start and end of the list. If you want fast access to both the start and the end of the list, and/or the number of items in the list, use a GQueue instead.

The data contained in each element can be either integer values, by using one of the Type Conversion Macros, or simply pointers to any type of data.

Note that most of the GList functions expect to be passed a pointer to the first element in the list. The functions which insert elements return the new start of the list, which may have changed.

There is no function to create a GList. NULL is considered to be a valid, empty list so you simply set a GList* to NULL to initialize it.

To add elements, use g_list_append(), g_list_prepend(), g_list_insert() and g_list_insert_sorted().

To visit all elements in the list, use a loop over the list:

GList *l;
for (l = list; l != NULL; l = l->next)
  {
    // do something with l->data
  }

To call a function for each element in the list, use g_list_foreach().

To loop over the list and modify it (e.g. remove a certain element) a while loop is more appropriate, for example:

GList *l = list;
while (l != NULL)
  {
    GList *next = l->next;
    if (should_be_removed (l))
      {
        // possibly free l->data
        list = g_list_delete_link (list, l);
      }
    l = next;
  }

To remove elements, use g_list_remove().

To navigate in a list, use g_list_first(), g_list_last(), g_list_next(), g_list_previous().

To find elements in the list use g_list_nth(), g_list_nth_data(), g_list_find() and g_list_find_custom().

To find the index of an element use g_list_position() and g_list_index().

To free the entire list, use g_list_free() or g_list_free_full().

Hash Tables

A GHashTable provides associations between keys and values which is optimized so that given a key, the associated value can be found, inserted or removed in amortized O(1). All operations going through each element take O(n) time (list all keys/values, table resize, etc.).

Note that neither keys nor values are copied when inserted into the GHashTable, so they must exist for the lifetime of the GHashTable. This means that the use of static strings is OK, but temporary strings (i.e. those created in buffers and those returned by GTK widgets) should be copied with g_strdup() before being inserted.

If keys or values are dynamically allocated, you must be careful to ensure that they are freed when they are removed from the GHashTable, and also when they are overwritten by new insertions into the GHashTable. It is also not advisable to mix static strings and dynamically-allocated strings in a GHashTable, because it then becomes difficult to determine whether the string should be freed.

To create a GHashTable, use g_hash_table_new().

To insert a key and value into a GHashTable, use g_hash_table_insert().

To look up a value corresponding to a given key, use g_hash_table_lookup() or g_hash_table_lookup_extended().

g_hash_table_lookup_extended() can also be used to simply check if a key is present in the hash table.

To remove a key and value, use g_hash_table_remove().

To call a function for each key and value pair use g_hash_table_foreach() or use an iterator to iterate over the key/value pairs in the hash table, see GHashTableIter. The iteration order of a hash table is not defined, and you must not rely on iterating over keys/values in the same order as they were inserted.

To destroy a GHashTable use g_hash_table_unref() or g_hash_table_destroy().

A common use-case for hash tables is to store information about a set of keys, without associating any particular value with each key. GHashTable optimizes one way of doing so: If you store only key-value pairs where key == value, then GHashTable does not allocate memory to store the values, which can be a considerable space saving, if your set is large. The functions g_hash_table_add() and g_hash_table_contains() are designed to be used when using GHashTable this way.

GHashTable is not designed to be statically initialised with keys and values known at compile time. To build a static hash table, use a tool such as gperf.

Double-ended Queues

The GQueue structure and its associated functions provide a standard queue data structure. Internally, GQueue uses the same data structure as GList to store elements with the same complexity over insertion/deletion (O(1)) and access/search (O(n)) operations.

The data contained in each element can be either integer values, by using one of the Type Conversion Macros, or simply pointers to any type of data.

As with all other GLib data structures, GQueue is not thread-safe. For a thread-safe queue, use GAsyncQueue.

To create a new GQueue, use g_queue_new().

To initialize a statically-allocated GQueue, use G_QUEUE_INIT or g_queue_init().

To add elements, use g_queue_push_head(), g_queue_push_head_link(), g_queue_push_tail() and g_queue_push_tail_link().

To remove elements, use g_queue_pop_head() and g_queue_pop_tail().

To free the entire queue, use g_queue_free().

Asynchronous Queues

Often you need to communicate between different threads. In general it’s safer not to do this by shared memory, but by explicit message passing. These messages only make sense asynchronously for multi-threaded applications though, as a synchronous operation could as well be done in the same thread.

Asynchronous queues are an exception from most other GLib data structures, as they can be used simultaneously from multiple threads without explicit locking and they bring their own builtin reference counting. This is because the nature of an asynchronous queue is that it will always be used by at least 2 concurrent threads.

For using an asynchronous queue you first have to create one with g_async_queue_new(). GAsyncQueue structs are reference counted, use g_async_queue_ref() and g_async_queue_unref() to manage your references.

A thread which wants to send a message to that queue simply calls g_async_queue_push() to push the message to the queue.

A thread which is expecting messages from an asynchronous queue simply calls g_async_queue_pop() for that queue. If no message is available in the queue at that point, the thread is now put to sleep until a message arrives. The message will be removed from the queue and returned. The functions g_async_queue_try_pop() and g_async_queue_timeout_pop() can be used to only check for the presence of messages or to only wait a certain time for messages respectively.

For almost every function there exist two variants, one that locks the queue and one that doesn’t. That way you can hold the queue lock (acquire it with g_async_queue_lock() and release it with g_async_queue_unlock() over multiple queue accessing instructions. This can be necessary to ensure the integrity of the queue, but should only be used when really necessary, as it can make your life harder if used unwisely. Normally you should only use the locking function variants (those without the _unlocked suffix).

In many cases, it may be more convenient to use GThreadPool when you need to distribute work to a set of worker threads instead of using GAsyncQueue manually. GThreadPool uses a GAsyncQueue internally.

Binary Trees

The GTree structure and its associated functions provide a sorted collection of key/value pairs optimized for searching and traversing in order. This means that most of the operations (access, search, insertion, deletion, …) on GTree are O(log(n)) in average and O(n) in worst case for time complexity. But, note that maintaining a balanced sorted GTree of n elements is done in time O(n log(n)).

To create a new GTree use g_tree_new().

To insert a key/value pair into a GTree use g_tree_insert() (O(n log(n))).

To remove a key/value pair use g_tree_remove() (O(n log(n))).

To look up the value corresponding to a given key, use g_tree_lookup() and g_tree_lookup_extended().

To find out the number of nodes in a GTree, use g_tree_nnodes(). To get the height of a GTree, use g_tree_height().

To traverse a GTree, calling a function for each node visited in the traversal, use g_tree_foreach().

To destroy a GTree, use g_tree_destroy().

N-ary Trees

The GNode struct and its associated functions provide a N-ary tree data structure, where nodes in the tree can contain arbitrary data.

To create a new tree use g_node_new().

To insert a node into a tree use g_node_insert(), g_node_insert_before(), g_node_append() and g_node_prepend(),

To create a new node and insert it into a tree use g_node_insert_data(), g_node_insert_data_after(), g_node_insert_data_before(), g_node_append_data() and g_node_prepend_data().

To reverse the children of a node use g_node_reverse_children().

To find a node use g_node_get_root(), g_node_find(), g_node_find_child(), g_node_child_index(), g_node_child_position(), g_node_first_child(), g_node_last_child(), g_node_nth_child(), g_node_first_sibling(), g_node_prev_sibling(), g_node_next_sibling() or g_node_last_sibling().

To get information about a node or tree use G_NODE_IS_LEAF(), G_NODE_IS_ROOT(), g_node_depth(), g_node_n_nodes(), g_node_n_children(), g_node_is_ancestor() or g_node_max_height().

To traverse a tree, calling a function for each node visited in the traversal, use g_node_traverse() or g_node_children_foreach().

To remove a node or subtree from a tree use g_node_unlink() or g_node_destroy().

Scalable Lists

The GSequence data structure has the API of a list, but is implemented internally with a balanced binary tree. This means that most of the operations (access, search, insertion, deletion, …) on GSequence are O(log(n)) in average and O(n) in worst case for time complexity. But, note that maintaining a balanced sorted list of n elements is done in time O(n log(n)). The data contained in each element can be either integer values, by using of the Type Conversion Macros, or simply pointers to any type of data.

A GSequence is accessed through “iterators”, represented by a GSequenceIter. An iterator represents a position between two elements of the sequence. For example, the “begin” iterator represents the gap immediately before the first element of the sequence, and the “end” iterator represents the gap immediately after the last element. In an empty sequence, the begin and end iterators are the same.

Some methods on GSequence operate on ranges of items. For example g_sequence_foreach_range() will call a user-specified function on each element with the given range. The range is delimited by the gaps represented by the passed-in iterators, so if you pass in the begin and end iterators, the range in question is the entire sequence.

The function g_sequence_get() is used with an iterator to access the element immediately following the gap that the iterator represents. The iterator is said to “point” to that element.

Iterators are stable across most operations on a GSequence. For example an iterator pointing to some element of a sequence will continue to point to that element even after the sequence is sorted. Even moving an element to another sequence using for example g_sequence_move_range() will not invalidate the iterators pointing to it. The only operation that will invalidate an iterator is when the element it points to is removed from any sequence.

To sort the data, either use g_sequence_insert_sorted() or g_sequence_insert_sorted_iter() to add data to the GSequence or, if you want to add a large amount of data, it is more efficient to call g_sequence_sort() or g_sequence_sort_iter() after doing unsorted insertions.

Reference-counted strings

Reference-counted strings are normal C strings that have been augmented with a reference count to manage their resources. You allocate a new reference counted string and acquire and release references as needed, instead of copying the string among callers; when the last reference on the string is released, the resources allocated for it are freed.

Typically, reference-counted strings can be used when parsing data from files and storing them into data structures that are passed to various callers:

PersonDetails *
person_details_from_data (const char *data)
{
  // Use g_autoptr() to simplify error cases
  g_autoptr(GRefString) full_name = NULL;
  g_autoptr(GRefString) address =  NULL;
  g_autoptr(GRefString) city = NULL;
  g_autoptr(GRefString) state = NULL;
  g_autoptr(GRefString) zip_code = NULL;

  // parse_person_details() is defined elsewhere; returns refcounted strings
  if (!parse_person_details (data, &full_name, &address, &city, &state, &zip_code))
    return NULL;

  if (!validate_zip_code (zip_code))
    return NULL;

  // add_address_to_cache() and add_full_name_to_cache() are defined
  // elsewhere; they add strings to various caches, using refcounted
  // strings to avoid copying data over and over again
  add_address_to_cache (address, city, state, zip_code);
  add_full_name_to_cache (full_name);

  // person_details_new() is defined elsewhere; it takes a reference
  // on each string
  PersonDetails *res = person_details_new (full_name,
                                           address,
                                           city,
                                           state,
                                           zip_code);

  return res;
}

In the example above, we have multiple functions taking the same strings for different uses; with typical C strings, we’d have to copy the strings every time the life time rules of the data differ from the life-time of the string parsed from the original buffer. With reference counted strings, each caller can take a reference on the data, and keep it as long as it needs to own the string.

Reference-counted strings can also be “interned” inside a global table owned by GLib; while an interned string has at least a reference, creating a new interned reference-counted string with the same contents will return a reference to the existing string instead of creating a new reference-counted string instance. Once the string loses its last reference, it will be automatically removed from the global interned strings table.

Reference-counted strings were added to GLib in 2.58.