3.5. Arrays

An array is a data structure that is similar to the library vector type (§ 3.3, p. 96) but offers a different trade-off between performance and flexibility. Like a vector, an array is a container of unnamed objects of a single type that we access by position. Unlike a vector, arrays have fixed size; we cannot add elements to an array. Because arrays have fixed size, they sometimes offer better run-time performance for specialized applications. However, that run-time advantage comes at the cost of lost flexibility.

Tip

If you don’t know exactly how many elements you need, use a vector.

3.5.1. Defining and Initializing Built-in Arrays

Arrays are a compound type (§ 2.3, p. 50). An array declarator has the form a[d], where a is the name being defined and d is the dimension of the array. The dimension specifies the number of elements and must be greater than zero. The number of elements in an array is part of the array’s type. As a result, the dimension must be known at compile time, which means that the dimension must be a constant expression (§ 2.4.4, p. 65):

unsigned cnt = 42; // not a constant expression constexpr unsigned sz = 42; // constant expression // constexpr see § 2.4.4 (p. 66) int arr[10]; // array of ten ints int *parr[sz]; // array of 42 pointers to int string bad[cnt]; // error: cnt is not a constant expression string strs[get_size()]; // ok if get_size is constexpr, error otherwise

By default, the elements in an array are default initialized (§ 2.2.1, p. 43).

Warning

As with variables of built-in type, a default-initialized array of built-in type that is defined inside a function will have undefined values.

When we define an array, we must specify a type for the array. We cannot use auto to deduce the type from a list of initializers. As with vector, arrays hold objects. Thus, there are no arrays of references.

Explicitly Initializing Array Elements

We can list initialize (§ 3.3.1, p. 98) the elements in an array. When we do so, we can omit the dimension. If we omit the dimension, the compiler infers it from the number of initializers. If we specify a dimension, the number of initializers must not exceed the specified size. If the dimension is greater than the number of initializers, the initializers are used for the first elements and any remaining elements are value initialized (§ 3.3.1, p. 98):

const unsigned sz = 3; int ia1[sz] = {0,1,2}; // array of three ints with values 0, 1, 2 int a2[] = {0, 1, 2}; // an array of dimension 3 int a3[5] = {0, 1, 2}; // equivalent to a3[] = {0, 1, 2, 0, 0} string a4[3] = {"hi", "bye"}; // same as a4[] = {"hi", "bye", ""} int a5[2] = {0,1,2}; // error: too many initializers

Character Arrays Are Special

Character arrays have an additional form of initialization: We can initialize such arrays from a string literal (§ 2.1.3, p. 39). When we use this form of initialization, it is important to remember that string literals end with a null character. That null character is copied into the array along with the characters in the literal:

char a1[] = {'C', '+', '+'}; // list initialization, no null char a2[] = {'C', '+', '+', '\0'}; // list initialization, explicit null char a3[] = "C++"; // null terminator added automatically const char a4[6] = "Daniel"; // error: no space for the null!

The dimension of a1 is 3; the dimensions of a2 and a3 are both 4. The definition of a4 is in error. Although the literal contains only six explicit characters, the array size must be at least seven—six to hold the literal and one for the null.

No Copy or Assignment

We cannot initialize an array as a copy of another array, nor is it legal to assign one array to another:

int a[] = {0, 1, 2}; // array of three ints int a2[] = a; // error: cannot initialize one array with another a2 = a; // error: cannot assign one array to another

Warning

Some compilers allow array assignment as a compiler extension. It is usually a good idea to avoid using nonstandard features. Programs that use such features, will not work with a different compiler.

Understanding Complicated Array Declarations

Like vectors, arrays can hold objects of most any type. For example, we can have an array of pointers. Because an array is an object, we can define both pointers and references to arrays. Defining arrays that hold pointers is fairly straightforward, defining a pointer or reference to an array is a bit more complicated:

int *ptrs[10]; // ptrs is an array of ten pointers to int int &refs[10] = /* ? */; // error: no arrays of references int (*Parray)[10] = &arr; // Parray points to an array of ten ints int (&arrRef)[10] = arr; // arrRef refers to an array of ten ints

By default, type modifiers bind right to left. Reading the definition of ptrs from right to left (§ 2.3.3, p. 58) is easy: We see that we’re defining an array of size 10, named ptrs, that holds pointers to int.

Reading the definition of Parray from right to left isn’t as helpful. Because the array dimension follows the name being declared, it can be easier to read array declarations from the inside out rather than from right to left. Reading from the inside out makes it much easier to understand the type of Parray. We start by observing that the parentheses around *Parray mean that Parray is a pointer. Looking right, we see that Parray points to an array of size 10. Looking left, we see that the elements in that array are ints. Thus, Parray is a pointer to an array of ten ints. Similarly, (&arrRef) says that arrRef is a reference. The type to which it refers is an array of size 10. That array holds elements of type int.

Of course, there are no limits on how many type modifiers can be used:

int *(&arry)[10] = ptrs; // arry is a reference to an array of ten pointers

Reading this declaration from the inside out, we see that arry is a reference. Looking right, we see that the object to which arry refers is an array of size 10. Looking left, we see that the element type is pointer to int. Thus, arry is a reference to an array of ten pointers.

Tip

It can be easier to understand array declarations by starting with the array’s name and reading them from the inside out.

Exercises Section 3.5.1

Exercise 3.27: Assuming txt_size is a function that takes no arguments and returns an int value, which of the following definitions are illegal? Explain why.

unsigned buf_size = 1024;

(a) int ia[buf_size];

(b) int ia[4 * 7 - 14];

(c) int ia[txt_size()];

(d) char st[11] = "fundamental";

Exercise 3.28: What are the values in the following arrays?

string sa[10]; int ia[10]; int main() { string sa2[10]; int ia2[10]; }

Exercise 3.29: List some of the drawbacks of using an array instead of a vector.

3.5.2. Accessing the Elements of an Array

As with the library vector and string types, we can use a range for or the subscript operator to access elements of an array. As usual, the indices start at 0. For an array of ten elements, the indices are 0 through 9, not 1 through 10.

When we use a variable to subscript an array, we normally should define that variable to have type size_t. size_t is a machine-specific unsigned type that is guaranteed to be large enough to hold the size of any object in memory. The size_t type is defined in the cstddef header, which is the C++ version of the stddef.h header from the C library.

With the exception that arrays are fixed size, we use arrays in ways that are similar to how we use vectors. For example, we can reimplement our grading program from § 3.3.3 (p. 104) to use an array to hold the cluster counters:

// count the number of grades by clusters of ten: 0--9, 10--19, ... 90--99, 100 unsigned scores[11] = {}; // 11 buckets, all value initialized to 0 unsigned grade; while (cin >> grade) { if (grade <= 100) ++scores[grade/10]; // increment the counter for the current cluster }

The only obvious difference between this program and the one on page 104 is the declaration of scores. In this program scores is an array of 11 unsigned elements. The not so obvious difference is that the subscript operator in this program is the one that is defined as part of the language. This operator can be used on operands of array type. The subscript operator used in the program on page 104 was defined by the library vector template and applies to operands of type vector.

As in the case of string or vector, it is best to use a range for when we want to traverse the entire array. For example, we can print the resulting scores as follows:

for (auto i : scores) // for each counter in scores cout << i << " "; // print the value of that counter cout << endl;

Because the dimension is part of each array type, the system knows how many elements are in scores. Using a range for means that we don’t have to manage the traversal ourselves.

Checking Subscript Values

As with string and vector, it is up to the programmer to ensure that the subscript value is in range—that is, that the index value is equal to or greater than zero and less than the size of the array. Nothing stops a program from stepping across an array boundary except careful attention to detail and thorough testing of the code. It is possible for programs to compile and execute yet still be fatally wrong.

Warning

The most common source of security problems are buffer overflow bugs. Such bugs occur when a program fails to check a subscript and mistakenly uses memory outside the range of an array or similar data structure.

3.5.3. Pointers and Arrays

In C++ pointers and arrays are closely intertwined. In particular, as we’ll see, when we use an array, the compiler ordinarily converts the array to a pointer.

Normally, we obtain a pointer to an object by using the address-of operator (§ 2.3.2, p. 52). Generally speaking, the address-of operator may be applied to any object. The elements in an array are objects. When we subscript an array, the result is the object at that location in the array. As with any other object, we can obtain a pointer to an array element by taking the address of that element:

string nums[] = {"one", "two", "three"}; // array of strings string *p = &nums[0]; // p points to the first element in nums

However, arrays have a special property—in most places when we use an array, the compiler automatically substitutes a pointer to the first element:

string *p2 = nums; // equivalent to p2 = &nums[0]

Note

In most expressions, when we use an object of array type, we are really using a pointer to the first element in that array.

There are various implications of the fact that operations on arrays are often really operations on pointers. One such implication is that when we use an array as an initializer for a variable defined using auto (§ 2.5.2, p. 68), the deduced type is a pointer, not an array:

int ia[] = {0,1,2,3,4,5,6,7,8,9}; // ia is an array of ten ints auto ia2(ia); // ia2 is an int* that points to the first element in ia ia2 = 42; // error: ia2 is a pointer, and we can't assign an int to a pointer

Although ia is an array of ten ints, when we use ia as an initializer, the compiler treats that initialization as if we had written

auto ia2(&ia[0]); // now it's clear that ia2 has type int*

It is worth noting that this conversion does not happen when we use decltype (§ 2.5.3, p. 70). The type returned by decltype(ia) is array of ten ints:

// ia3 is an array of ten ints decltype(ia) ia3 = {0,1,2,3,4,5,6,7,8,9}; ia3 = p; // error: can't assign an int* to an array ia3[4] = i; // ok: assigns the value of i to an element in ia3

Pointers Are Iterators

Pointers that address elements in an array have additional operations beyond those we described in § 2.3.2 (p. 52). In particular, pointers to array elements support the same operations as iterators on vectors or strings (§ 3.4, p. 106). For example, we can use the increment operator to move from one element in an array to the next:

int arr[] = {0,1,2,3,4,5,6,7,8,9}; int *p = arr; // p points to the first element in arr ++p; // p points to arr[1]

Just as we can use iterators to traverse the elements in a vector, we can use pointers to traverse the elements in an array. Of course, to do so, we need to obtain pointers to the first and one past the last element. As we’ve just seen, we can obtain a pointer to the first element by using the array itself or by taking the address-of the first element. We can obtain an off-the-end pointer by using another special property of arrays. We can take the address of the nonexistent element one past the last element of an array:

int *e = &arr[10]; // pointer just past the last element in arr

Here we used the subscript operator to index a nonexisting element; arr has ten elements, so the last element in arr is at index position 9. The only thing we can do with this element is take its address, which we do to initialize e. Like an off-the-end iterator (§ 3.4.1, p. 106), an off-the-end pointer does not point to an element. As a result, we may not dereference or increment an off-the-end pointer.

Using these pointers we can write a loop to print the elements in arr as follows:

for (int *b = arr; b != e; ++b) cout << *b << endl; // print the elements in arr

The Library `begin` and `end` Functions

Although we can compute an off-the-end pointer, doing so is error-prone. To make it easier and safer to use pointers, the new library includes two functions, named begin and end. These functions act like the similarly named container members (§ 3.4.1, p. 106). However, arrays are not class types, so these functions are not member functions. Instead, they take an argument that is an array:

int ia[] = {0,1,2,3,4,5,6,7,8,9}; // ia is an array of ten ints int *beg = begin(ia); // pointer to the first element in ia int *last = end(ia); // pointer one past the last element in ia

begin returns a pointer to the first, and end returns a pointer one past the last element in the given array: These functions are defined in the iterator header.

Using begin and end, it is easy to write a loop to process the elements in an array. For example, assuming arr is an array that holds int values, we might find the first negative value in arr as follows:

// pbeg points to the first and pend points just past the last element in arr int *pbeg = begin(arr), *pend = end(arr); // find the first negative element, stopping if we've seen all the elements while (pbeg != pend && *pbeg >= 0) ++pbeg;

We start by defining two int pointers named pbeg and pend. We position pbeg to denote the first element and pend to point one past the last element in arr. The while condition uses pend to know whether it is safe to dereference pbeg. If pbeg does point at an element, we dereference and check whether the underlying element is negative. If so, the condition fails and we exit the loop. If not, we increment the pointer to look at the next element.

Note

A pointer “one past” the end of a built-in array behaves the same way as the iterator returned by the end operation of a vector. In particular, we may not dereference or increment an off-the-end pointer.

Pointer Arithmetic

Pointers that address array elements can use all the iterator operations listed in Table 3.6 (p. 107) and Table 3.7 (p. 111). These operations—dereference, increment, comparisons, addition of an integral value, subtraction of two pointers—have the same meaning when applied to pointers that point at elements in a built-in array as they do when applied to iterators.

When we add (or subtract) an integral value to (or from) a pointer, the result is a new pointer. That new pointer points to the element the given number ahead of (or behind) the original pointer:

constexpr size_t sz = 5; int arr[sz] = {1,2,3,4,5}; int *ip = arr; // equivalent to int *ip = &arr[0] int *ip2 = ip + 4; // ip2 points to arr[4], the last element in arr

The result of adding 4 to ip is a pointer that points to the element four elements further on in the array from the one to which ip currently points.

The result of adding an integral value to a pointer must be a pointer to an element in the same array, or a pointer just past the end of the array:

// ok: arr is converted to a pointer to its first element; p points one past the end of arr int *p = arr + sz; // use caution -- do not dereference! int *p2 = arr + 10; // error: arr has only 5 elements; p2 has undefined value

When we add sz to arr, the compiler converts arr to a pointer to the first element in arr. When we add sz to that pointer, we get a pointer that points sz positions (i.e., 5 positions) past the first one. That is, it points one past the last element in arr. Computing a pointer more than one past the last element is an error, although the compiler is unlikely to detect such errors.

As with iterators, subtracting two pointers gives us the distance between those pointers. The pointers must point to elements in the same array:

auto n = end(arr) - begin(arr); // n is 5, the number of elements in arr

The result of subtracting two pointers is a library type named ptrdiff_t. Like size_t, the ptrdiff_t type is a machine-specific type and is defined in the cstddef header. Because subtraction might yield a negative distance, ptrdiff_t is a signed integral type.

We can use the relational operators to compare pointers that point to elements of an array, or one past the last element in that array. For example, we can traverse the elements in arr as follows:

int *b = arr, *e = arr + sz; while (b < e) { // use *b ++b; }

We cannot use the relational operators on pointers to two unrelated objects:

int i = 0, sz = 42; int *p = &i, *e = &sz; // undefined: p and e are unrelated; comparison is meaningless! while (p < e)

Although the utility may be obscure at this point, it is worth noting that pointer arithmetic is also valid for null pointers (§ 2.3.2, p. 53) and for pointers that point to an object that is not an array. In the latter case, the pointers must point to the same object, or one past that object. If p is a null pointer, we can add or subtract an integral constant expression (§ 2.4.4, p. 65) whose value is 0 to p. We can also subtract two null pointers from one another, in which case the result is 0.

Interaction between Dereference and Pointer Arithmetic

The result of adding an integral value to a pointer is itself a pointer. Assuming the resulting pointer points to an element, we can dereference the resulting pointer:

int ia[] = {0,2,4,6,8}; // array with 5 elements of type int int last = *(ia + 4); // ok: initializes last to 8, the value of ia[4]

The expression *(ia + 4) calculates the address four elements past ia and dereferences the resulting pointer. This expression is equivalent to writing ia[4].

Recall that in § 3.4.1 (p. 109) we noted that parentheses are required in expressions that contain dereference and dot operators. Similarly, the parentheses around this pointer addition are essential. Writing

last = *ia + 4; // ok: last = 4, equivalent to ia[0] + 4

means dereference ia and add 4 to the dereferenced value. We’ll cover the reasons for this behavior in § 4.1.2 (p. 136).

Subscripts and Pointers

As we’ve seen, in most places when we use the name of an array, we are really using a pointer to the first element in that array. One place where the compiler does this transformation is when we subscript an array. Given

int ia[] = {0,2,4,6,8}; // array with 5 elements of type int

if we write ia[0], that is an expression that uses the name of an array. When we subscript an array, we are really subscripting a pointer to an element in that array:

int i = ia[2]; // ia is converted to a pointer to the first element in ia // ia[2] fetches the element to which (ia + 2) points int *p = ia; // p points to the first element in ia i = *(p + 2); // equivalent to i = ia[2]

We can use the subscript operator on any pointer, as long as that pointer points to an element (or one past the last element) in an array:

int *p = &ia[2]; // p points to the element indexed by 2 int j = p[1]; // p[1] is equivalent to *(p + 1), // p[1] is the same element as ia[3] int k = p[-2]; // p[-2] is the same element as ia[0]

This last example points out an important difference between arrays and library types such as vector and string that have subscript operators. The library types force the index used with a subscript to be an unsigned value. The built-in subscript operator does not. The index used with the built-in subscript operator can be a negative value. Of course, the resulting address must point to an element in (or one past the end of) the array to which the original pointer points.

Warning

Unlike subscripts for vector and string, the index of the built-in subscript operator is not an unsigned type.

Exercises Section 3.5.3

Exercise 3.34: Given that p1 and p2 point to elements in the same array, what does the following code do? Are there values of p1 or p2 that make this code illegal?

p1 += p2 - p1;

Exercise 3.35: Using pointers, write a program to set the elements in an array to zero.

Exercise 3.36: Write a program to compare two arrays for equality. Write a similar program to compare two vectors.

3.5.4. C-Style Character Strings

Warning

Although C++ supports C-style strings, they should not be used by C++ programs. C-style strings are a surprisingly rich source of bugs and are the root cause of many security problems. They’re also harder to use!

Character string literals are an instance of a more general construct that C++ inherits from C: C-style character strings. C-style strings are not a type. Instead, they are a convention for how to represent and use character strings. Strings that follow this convention are stored in character arrays and are null terminated. By null-terminated we mean that the last character in the string is followed by a null character ('\0'). Ordinarily we use pointers to manipulate these strings.

C Library String Functions

The Standard C library provides a set of functions, listed in Table 3.8, that operate on C-style strings. These functions are defined in the cstring header, which is the C++ version of the C header string.h.

Table 3.8. C-Style Character String Functions

Warning

The functions in Table 3.8 do not verify their string parameters.

The pointer(s) passed to these routines must point to null-terminated array(s):

char ca[] = {'C', '+', '+'}; // not null terminated cout << strlen(ca) << endl; // disaster: ca isn't null terminated

In this case, ca is an array of char but is not null terminated. The result is undefined. The most likely effect of this call is that strlen will keep looking through the memory that follows ca until it encounters a null character.

Comparing Strings

Comparing two C-style strings is done quite differently from how we compare library strings. When we compare two library strings, we use the normal relational or equality operators:

string s1 = "A string example"; string s2 = "A different string"; if (s1 < s2) // false: s2 is less than s1

Using these operators on similarly defined C-style strings compares the pointer values, not the strings themselves:

const char ca1[] = "A string example"; const char ca2[] = "A different string"; if (ca1 < ca2) // undefined: compares two unrelated addresses

Remember that when we use an array, we are really using a pointer to the first element in the array (§ 3.5.3, p. 117). Hence, this condition actually compares two const char* values. Those pointers do not address the same object, so the comparison is undefined.

To compare the strings, rather than the pointer values, we can call strcmp. That function returns 0 if the strings are equal, or a positive or negative value, depending on whether the first string is larger or smaller than the second:

if (strcmp(ca1, ca2) < 0) // same effect as string comparison s1 < s2

Caller Is Responsible for Size of a Destination String

Concatenating or copying C-style strings is also very different from the same operations on library strings. For example, if we wanted to concatenate the two strings s1 and s2 defined above, we can do so directly:

// initialize largeStr as a concatenation of s1, a space, and s2 string largeStr = s1 + " " + s2;

Doing the same with our two arrays, ca1 and ca2, would be an error. The expression ca1 + ca2 tries to add two pointers, which is illegal and meaningless.

Instead we can use strcat and strcpy. However, to use these functions, we must pass an array to hold the resulting string. The array we pass must be large enough to hold the generated string, including the null character at the end. The code we show here, although a common usage pattern, is fraught with potential for serious error:

// disastrous if we miscalculated the size of largeStr strcpy(largeStr, ca1); // copies ca1 into largeStr strcat(largeStr, " "); // adds a space at the end of largeStr strcat(largeStr, ca2); // concatenates ca2 onto largeStr

The problem is that we can easily miscalculate the size needed for largeStr. Moreover, any time we change the values we want to store in largeStr, we have to remember to double-check that we calculated its size correctly. Unfortunately, programs similar to this code are widely distributed. Programs with such code are error-prone and often lead to serious security leaks.

Tip

For most applications, in addition to being safer, it is also more efficient to use library strings rather than C-style strings.

3.5.5. Interfacing to Older Code

Many C++ programs predate the standard library and do not use the string and vector types. Moreover, many C++ programs interface to programs written in C or other languages that cannot use the C++ library. Hence, programs written in modern C++ may have to interface to code that uses arrays and/or C-style character strings. The C++ library offers facilities to make the interface easier to manage.

Mixing Library `string`s and C-Style Strings

In § 3.2.1 (p. 84) we saw that we can initialize a string from a string literal:

string s("Hello World"); // s holds Hello World

More generally, we can use a null-terminated character array anywhere that we can use a string literal:

• We can use a null-terminated character array to initialize or assign a string.

• We can use a null-terminated character array as one operand (but not both operands) to the string addition operator or as the right-hand operand in the string compound assignment (+=) operator.

The reverse functionality is not provided: There is no direct way to use a library string when a C-style string is required. For example, there is no way to initialize a character pointer from a string. There is, however, a string member function named c_str that we can often use to accomplish what we want:

char *str = s; // error: can't initialize a char* from a string const char *str = s.c_str(); // ok

The name c_str indicates that the function returns a C-style character string. That is, it returns a pointer to the beginning of a null-terminated character array that holds the same data as the characters in the string. The type of the pointer is const char*, which prevents us from changing the contents of the array.

The array returned by c_str is not guaranteed to be valid indefinitely. Any subsequent use of s that might change the value of s can invalidate this array.

Warning

If a program needs continuing access to the contents of the array returned by str(), the program must copy the array returned by c_str.

Using an Array to Initialize a `vector`

In § 3.5.1 (p. 114) we noted that we cannot initialize a built-in array from another array. Nor can we initialize an array from a vector. However, we can use an array to initialize a vector. To do so, we specify the address of the first element and one past the last element that we wish to copy:

int int_arr[] = {0, 1, 2, 3, 4, 5}; // ivec has six elements; each is a copy of the corresponding element in int_arr vector<int> ivec(begin(int_arr), end(int_arr));

The two pointers used to construct ivec mark the range of values to use to initialize the elements in ivec. The second pointer points one past the last element to be copied. In this case, we used the library begin and end functions (§ 3.5.3, p. 118) to pass pointers to the first and one past the last elements in int_arr. As a result, ivec will have six elements each of which will have the same value as the corresponding element in int_arr.

The specified range can be a subset of the array:

// copies three elements: int_arr[1], int_arr[2], int_arr[3] vector<int> subVec(int_arr + 1, int_arr + 4);

This initialization creates subVec with three elements. The values of these elements are copies of the values in int_arr[1] through int_arr[3].

Advice: Use Library Types Instead of Arrays

Pointers and arrays are surprisingly error-prone. Part of the problem is conceptual: Pointers are used for low-level manipulations and it is easy to make bookkeeping mistakes. Other problems arise because of the syntax, particularly the declaration syntax used with pointers.

Modern C++ programs should use vectors and iterators instead of built-in arrays and pointers, and use strings rather than C-style array-based character strings.