An array is a data structure that is similar to the library vector
type (§ 3.3, p. 96) but offers a different trade-off between performance and flexibility. Like a vector
, an array is a container of unnamed objects of a single type that we access by position. Unlike a vector
, arrays have fixed size; we cannot add elements to an array. Because arrays have fixed size, they sometimes offer better run-time performance for specialized applications. However, that run-time advantage comes at the cost of lost flexibility.
Arrays are a compound type (§ 2.3, p. 50). An array declarator has the form a[d]
, where a
is the name being defined and d
is the dimension of the array. The dimension specifies the number of elements and must be greater than zero. The number of elements in an array is part of the array’s type. As a result, the dimension must be known at compile time, which means that the dimension must be a constant expression (§ 2.4.4, p. 65):
unsigned cnt = 42; // not a constant expression
constexpr unsigned sz = 42; // constant expression
// constexpr see § 2.4.4 (p. 66)
int arr[10]; // array of ten ints
int *parr[sz]; // array of 42 pointers to int
string bad[cnt]; // error: cnt is not a constant expression
string strs[get_size()]; // ok if get_size is constexpr, error otherwise
By default, the elements in an array are default initialized (§ 2.2.1, p. 43).
As with variables of built-in type, a default-initialized array of built-in type that is defined inside a function will have undefined values.
When we define an array, we must specify a type for the array. We cannot use auto
to deduce the type from a list of initializers. As with vector
, arrays hold objects. Thus, there are no arrays of references.
We can list initialize (§ 3.3.1, p. 98) the elements in an array. When we do so, we can omit the dimension. If we omit the dimension, the compiler infers it from the number of initializers. If we specify a dimension, the number of initializers must not exceed the specified size. If the dimension is greater than the number of initializers, the initializers are used for the first elements and any remaining elements are value initialized (§ 3.3.1, p. 98):
const unsigned sz = 3;
int ia1[sz] = {0,1,2}; // array of three ints with values 0, 1, 2
int a2[] = {0, 1, 2}; // an array of dimension 3
int a3[5] = {0, 1, 2}; // equivalent to a3[] = {0, 1, 2, 0, 0}
string a4[3] = {"hi", "bye"}; // same as a4[] = {"hi", "bye", ""}
int a5[2] = {0,1,2}; // error: too many initializers
Character arrays have an additional form of initialization: We can initialize such arrays from a string literal (§ 2.1.3, p. 39). When we use this form of initialization, it is important to remember that string literals end with a null character. That null character is copied into the array along with the characters in the literal:
char a1[] = {'C', '+', '+'}; // list initialization, no null
char a2[] = {'C', '+', '+', '\0'}; // list initialization, explicit null
char a3[] = "C++"; // null terminator added automatically
const char a4[6] = "Daniel"; // error: no space for the null!
The dimension of a1
is 3; the dimensions of a2
and a3
are both 4. The definition of a4
is in error. Although the literal contains only six explicit characters, the array size must be at least seven—six to hold the literal and one for the null.
We cannot initialize an array as a copy of another array, nor is it legal to assign one array to another:
int a[] = {0, 1, 2}; // array of three ints
int a2[] = a; // error: cannot initialize one array with another
a2 = a; // error: cannot assign one array to another
Some compilers allow array assignment as a compiler extension. It is usually a good idea to avoid using nonstandard features. Programs that use such features, will not work with a different compiler.
Like vector
s, arrays can hold objects of most any type. For example, we can have an array of pointers. Because an array is an object, we can define both pointers and references to arrays. Defining arrays that hold pointers is fairly straightforward, defining a pointer or reference to an array is a bit more complicated:
int *ptrs[10]; // ptrs is an array of ten pointers to int
int &refs[10] = /* ? */; // error: no arrays of references
int (*Parray)[10] = &arr; // Parray points to an array of ten ints
int (&arrRef)[10] = arr; // arrRef refers to an array of ten ints
By default, type modifiers bind right to left. Reading the definition of ptrs
from right to left (§ 2.3.3, p. 58) is easy: We see that we’re defining an array of size 10, named ptrs
, that holds pointers to int
.
Reading the definition of Parray
from right to left isn’t as helpful. Because the array dimension follows the name being declared, it can be easier to read array declarations from the inside out rather than from right to left. Reading from the inside out makes it much easier to understand the type of Parray
. We start by observing that the parentheses around *Parray
mean that Parray
is a pointer. Looking right, we see that Parray
points to an array of size 10. Looking left, we see that the elements in that array are int
s. Thus, Parray
is a pointer to an array of ten int
s. Similarly, (&arrRef)
says that arrRef
is a reference. The type to which it refers is an array of size 10. That array holds elements of type int
.
Of course, there are no limits on how many type modifiers can be used:
int *(&arry)[10] = ptrs; // arry is a reference to an array of ten pointers
Reading this declaration from the inside out, we see that arry
is a reference. Looking right, we see that the object to which arry
refers is an array of size 10. Looking left, we see that the element type is pointer to int
. Thus, arry
is a reference to an array of ten pointers.
It can be easier to understand array declarations by starting with the array’s name and reading them from the inside out.
Exercises Section 3.5.1
Exercise 3.27: Assuming
txt_size
is a function that takes no arguments and returns anint
value, which of the following definitions are illegal? Explain why.unsigned buf_size = 1024;
(a)
int ia[buf_size];
(b)
int ia[4 * 7 - 14];
(c)
int ia[txt_size()];
(d)
char st[11] = "fundamental";
string sa[10];
int ia[10];
int main() {
string sa2[10];
int ia2[10];
}Exercise 3.29: List some of the drawbacks of using an array instead of a
vector
.
As with the library vector
and string
types, we can use a range for
or the subscript operator to access elements of an array. As usual, the indices start at 0. For an array of ten elements, the indices are 0 through 9, not 1 through 10.
When we use a variable to subscript an array, we normally should define that variable to have type size_t
. size_t
is a machine-specific unsigned type that is guaranteed to be large enough to hold the size of any object in memory. The size_t
type is defined in the cstddef
header, which is the C++ version of the stddef.h
header from the C library.
With the exception that arrays are fixed size, we use arrays in ways that are similar to how we use vector
s. For example, we can reimplement our grading program from § 3.3.3 (p. 104) to use an array to hold the cluster counters:
// count the number of grades by clusters of ten: 0--9, 10--19, ... 90--99, 100
unsigned scores[11] = {}; // 11 buckets, all value initialized to 0
unsigned grade;
while (cin >> grade) {
if (grade <= 100)
++scores[grade/10]; // increment the counter for the current cluster
}
The only obvious difference between this program and the one on page 104 is the declaration of scores
. In this program scores
is an array of 11 unsigned
elements. The not so obvious difference is that the subscript operator in this program is the one that is defined as part of the language. This operator can be used on operands of array type. The subscript operator used in the program on page 104 was defined by the library vector
template and applies to operands of type vector
.
As in the case of string
or vector
, it is best to use a range for
when we want to traverse the entire array. For example, we can print the resulting scores
as follows:
for (auto i : scores) // for each counter in scores
cout << i << " "; // print the value of that counter
cout << endl;
Because the dimension is part of each array type, the system knows how many elements are in scores
. Using a range for
means that we don’t have to manage the traversal ourselves.
As with string
and vector
, it is up to the programmer to ensure that the subscript value is in range—that is, that the index value is equal to or greater than zero and less than the size of the array. Nothing stops a program from stepping across an array boundary except careful attention to detail and thorough testing of the code. It is possible for programs to compile and execute yet still be fatally wrong.
The most common source of security problems are buffer overflow bugs. Such bugs occur when a program fails to check a subscript and mistakenly uses memory outside the range of an array or similar data structure.
Exercises Section 3.5.2
Exercise 3.30: Identify the indexing errors in the following code:
constexpr size_t array_size = 10;
int ia[array_size];
for (size_t ix = 1; ix <= array_size; ++ix)
ia[ix] = ix;Exercise 3.31: Write a program to define an array of ten
int
s. Give each element the same value as its position in the array.Exercise 3.32: Copy the array you defined in the previous exercise into another array. Rewrite your program to use
vector
s.Exercise 3.33: What would happen if we did not initialize the
scores
array in the program on page 116?
In C++ pointers and arrays are closely intertwined. In particular, as we’ll see, when we use an array, the compiler ordinarily converts the array to a pointer.
Normally, we obtain a pointer to an object by using the address-of operator (§ 2.3.2, p. 52). Generally speaking, the address-of operator may be applied to any object. The elements in an array are objects. When we subscript an array, the result is the object at that location in the array. As with any other object, we can obtain a pointer to an array element by taking the address of that element:
string nums[] = {"one", "two", "three"}; // array of strings
string *p = &nums[0]; // p points to the first element in nums
However, arrays have a special property—in most places when we use an array, the compiler automatically substitutes a pointer to the first element:
string *p2 = nums; // equivalent to p2 = &nums[0]
In most expressions, when we use an object of array type, we are really using a pointer to the first element in that array.
There are various implications of the fact that operations on arrays are often really operations on pointers. One such implication is that when we use an array as an initializer for a variable defined using auto
(§ 2.5.2, p. 68), the deduced type is a pointer, not an array:
int ia[] = {0,1,2,3,4,5,6,7,8,9}; // ia is an array of ten ints
auto ia2(ia); // ia2 is an int* that points to the first element in ia
ia2 = 42; // error: ia2 is a pointer, and we can't assign an int to a pointer
Although ia
is an array of ten int
s, when we use ia
as an initializer, the compiler treats that initialization as if we had written
auto ia2(&ia[0]); // now it's clear that ia2 has type int*
It is worth noting that this conversion does not happen when we use decltype
(§ 2.5.3, p. 70). The type returned by decltype(ia)
is array of ten int
s:
// ia3 is an array of ten ints
decltype(ia) ia3 = {0,1,2,3,4,5,6,7,8,9};
ia3 = p; // error: can't assign an int* to an array
ia3[4] = i; // ok: assigns the value of i to an element in ia3
Pointers that address elements in an array have additional operations beyond those we described in § 2.3.2 (p. 52). In particular, pointers to array elements support the same operations as iterators on vector
s or string
s (§ 3.4, p. 106). For example, we can use the increment operator to move from one element in an array to the next:
int arr[] = {0,1,2,3,4,5,6,7,8,9};
int *p = arr; // p points to the first element in arr
++p; // p points to arr[1]
Just as we can use iterators to traverse the elements in a vector
, we can use pointers to traverse the elements in an array. Of course, to do so, we need to obtain pointers to the first and one past the last element. As we’ve just seen, we can obtain a pointer to the first element by using the array itself or by taking the address-of the first element. We can obtain an off-the-end pointer by using another special property of arrays. We can take the address of the nonexistent element one past the last element of an array:
int *e = &arr[10]; // pointer just past the last element in arr
Here we used the subscript operator to index a nonexisting element; arr
has ten elements, so the last element in arr
is at index position 9. The only thing we can do with this element is take its address, which we do to initialize e
. Like an off-the-end iterator (§ 3.4.1, p. 106), an off-the-end pointer does not point to an element. As a result, we may not dereference or increment an off-the-end pointer.
Using these pointers we can write a loop to print the elements in arr
as follows:
for (int *b = arr; b != e; ++b)
cout << *b << endl; // print the elements in arr
begin
and end
FunctionsAlthough we can compute an off-the-end pointer, doing so is error-prone. To make it easier and safer to use pointers, the new library includes two functions, named begin
and end
. These functions act like the similarly named container members (§ 3.4.1, p. 106). However, arrays are not class types, so these functions are not member functions. Instead, they take an argument that is an array:
int ia[] = {0,1,2,3,4,5,6,7,8,9}; // ia is an array of ten ints
int *beg = begin(ia); // pointer to the first element in ia
int *last = end(ia); // pointer one past the last element in ia
begin
returns a pointer to the first, and end
returns a pointer one past the last element in the given array: These functions are defined in the iterator
header.
Using begin
and end
, it is easy to write a loop to process the elements in an array. For example, assuming arr
is an array that holds int
values, we might find the first negative value in arr
as follows:
// pbeg points to the first and pend points just past the last element in arr
int *pbeg = begin(arr), *pend = end(arr);
// find the first negative element, stopping if we've seen all the elements
while (pbeg != pend && *pbeg >= 0)
++pbeg;
We start by defining two int
pointers named pbeg
and pend
. We position pbeg
to denote the first element and pend
to point one past the last element in arr
. The while
condition uses pend
to know whether it is safe to dereference pbeg
. If pbeg
does point at an element, we dereference and check whether the underlying element is negative. If so, the condition fails and we exit the loop. If not, we increment the pointer to look at the next element.
A pointer “one past” the end of a built-in array behaves the same way as the iterator returned by the
end
operation of avector
. In particular, we may not dereference or increment an off-the-end pointer.
Pointers that address array elements can use all the iterator operations listed in Table 3.6 (p. 107) and Table 3.7 (p. 111). These operations—dereference, increment, comparisons, addition of an integral value, subtraction of two pointers—have the same meaning when applied to pointers that point at elements in a built-in array as they do when applied to iterators.
When we add (or subtract) an integral value to (or from) a pointer, the result is a new pointer. That new pointer points to the element the given number ahead of (or behind) the original pointer:
constexpr size_t sz = 5;
int arr[sz] = {1,2,3,4,5};
int *ip = arr; // equivalent to int *ip = &arr[0]
int *ip2 = ip + 4; // ip2 points to arr[4], the last element in arr
The result of adding 4
to ip
is a pointer that points to the element four elements further on in the array from the one to which ip
currently points.
The result of adding an integral value to a pointer must be a pointer to an element in the same array, or a pointer just past the end of the array:
// ok: arr is converted to a pointer to its first element; p points one past the end of arr
int *p = arr + sz; // use caution -- do not dereference!
int *p2 = arr + 10; // error: arr has only 5 elements; p2 has undefined value
When we add sz
to arr
, the compiler converts arr
to a pointer to the first element in arr
. When we add sz
to that pointer, we get a pointer that points sz
positions (i.e., 5
positions) past the first one. That is, it points one past the last element in arr
. Computing a pointer more than one past the last element is an error, although the compiler is unlikely to detect such errors.
As with iterators, subtracting two pointers gives us the distance between those pointers. The pointers must point to elements in the same array:
auto n = end(arr) - begin(arr); // n is 5, the number of elements in arr
The result of subtracting two pointers is a library type named ptrdiff_t
. Like size_t
, the ptrdiff_t
type is a machine-specific type and is defined in the cstddef
header. Because subtraction might yield a negative distance, ptrdiff_t
is a signed integral type.
We can use the relational operators to compare pointers that point to elements of an array, or one past the last element in that array. For example, we can traverse the elements in arr
as follows:
int *b = arr, *e = arr + sz;
while (b < e) {
// use *b
++b;
}
We cannot use the relational operators on pointers to two unrelated objects:
int i = 0, sz = 42;
int *p = &i, *e = &sz;
// undefined: p and e are unrelated; comparison is meaningless!
while (p < e)
Although the utility may be obscure at this point, it is worth noting that pointer arithmetic is also valid for null pointers (§ 2.3.2, p. 53) and for pointers that point to an object that is not an array. In the latter case, the pointers must point to the same object, or one past that object. If p
is a null pointer, we can add or subtract an integral constant expression (§ 2.4.4, p. 65) whose value is 0 to p
. We can also subtract two null pointers from one another, in which case the result is 0.
The result of adding an integral value to a pointer is itself a pointer. Assuming the resulting pointer points to an element, we can dereference the resulting pointer:
int ia[] = {0,2,4,6,8}; // array with 5 elements of type int
int last = *(ia + 4); // ok: initializes last to 8, the value of ia[4]
The expression *(ia + 4)
calculates the address four elements past ia
and dereferences the resulting pointer. This expression is equivalent to writing ia[4]
.
Recall that in § 3.4.1 (p. 109) we noted that parentheses are required in expressions that contain dereference and dot operators. Similarly, the parentheses around this pointer addition are essential. Writing
last = *ia + 4; // ok: last = 4, equivalent to ia[0] + 4
means dereference ia
and add 4
to the dereferenced value. We’ll cover the reasons for this behavior in § 4.1.2 (p. 136).
As we’ve seen, in most places when we use the name of an array, we are really using a pointer to the first element in that array. One place where the compiler does this transformation is when we subscript an array. Given
int ia[] = {0,2,4,6,8}; // array with 5 elements of type int
if we write ia[0]
, that is an expression that uses the name of an array. When we subscript an array, we are really subscripting a pointer to an element in that array:
int i = ia[2]; // ia is converted to a pointer to the first element in ia
// ia[2] fetches the element to which (ia + 2) points
int *p = ia; // p points to the first element in ia
i = *(p + 2); // equivalent to i = ia[2]
We can use the subscript operator on any pointer, as long as that pointer points to an element (or one past the last element) in an array:
int *p = &ia[2]; // p points to the element indexed by 2
int j = p[1]; // p[1] is equivalent to *(p + 1),
// p[1] is the same element as ia[3]
int k = p[-2]; // p[-2] is the same element as ia[0]
This last example points out an important difference between arrays and library types such as vector
and string
that have subscript operators. The library types force the index used with a subscript to be an unsigned value. The built-in subscript operator does not. The index used with the built-in subscript operator can be a negative value. Of course, the resulting address must point to an element in (or one past the end of) the array to which the original pointer points.
Unlike subscripts for
vector
andstring
, the index of the built-in subscript operator is not anunsigned
type.
Exercises Section 3.5.3
Exercise 3.34: Given that
p1
andp2
point to elements in the same array, what does the following code do? Are there values ofp1
orp2
that make this code illegal?p1 += p2 - p1;
Exercise 3.35: Using pointers, write a program to set the elements in an array to zero.
Exercise 3.36: Write a program to compare two arrays for equality. Write a similar program to compare two
vector
s.
Although C++ supports C-style strings, they should not be used by C++ programs. C-style strings are a surprisingly rich source of bugs and are the root cause of many security problems. They’re also harder to use!
Character string literals are an instance of a more general construct that C++ inherits from C: C-style character strings. C-style strings are not a type. Instead, they are a convention for how to represent and use character strings. Strings that follow this convention are stored in character arrays and are null terminated. By null-terminated we mean that the last character in the string is followed by a null character ('\0'
). Ordinarily we use pointers to manipulate these strings.
The Standard C library provides a set of functions, listed in Table 3.8, that operate on C-style strings. These functions are defined in the cstring
header, which is the C++ version of the C header string.h
.
Table 3.8. C-Style Character String Functions
The functions in Table 3.8 do not verify their string parameters.
The pointer(s) passed to these routines must point to null-terminated array(s):
char ca[] = {'C', '+', '+'}; // not null terminated
cout << strlen(ca) << endl; // disaster: ca isn't null terminated
In this case, ca
is an array of char
but is not null terminated. The result is undefined. The most likely effect of this call is that strlen
will keep looking through the memory that follows ca
until it encounters a null character.
Comparing two C-style strings is done quite differently from how we compare library string
s. When we compare two library string
s, we use the normal relational or equality operators:
string s1 = "A string example";
string s2 = "A different string";
if (s1 < s2) // false: s2 is less than s1
Using these operators on similarly defined C-style strings compares the pointer values, not the strings themselves:
const char ca1[] = "A string example";
const char ca2[] = "A different string";
if (ca1 < ca2) // undefined: compares two unrelated addresses
Remember that when we use an array, we are really using a pointer to the first element in the array (§ 3.5.3, p. 117). Hence, this condition actually compares two const char*
values. Those pointers do not address the same object, so the comparison is undefined.
To compare the strings, rather than the pointer values, we can call strcmp
. That function returns 0 if the strings are equal, or a positive or negative value, depending on whether the first string is larger or smaller than the second:
if (strcmp(ca1, ca2) < 0) // same effect as string comparison s1 < s2
Concatenating or copying C-style strings is also very different from the same operations on library string
s. For example, if we wanted to concatenate the two string
s s1
and s2
defined above, we can do so directly:
// initialize largeStr as a concatenation of s1, a space, and s2
string largeStr = s1 + " " + s2;
Doing the same with our two arrays, ca1
and ca2
, would be an error. The expression ca1 + ca2
tries to add two pointers, which is illegal and meaningless.
Instead we can use strcat
and strcpy
. However, to use these functions, we must pass an array to hold the resulting string. The array we pass must be large enough to hold the generated string, including the null character at the end. The code we show here, although a common usage pattern, is fraught with potential for serious error:
// disastrous if we miscalculated the size of largeStr
strcpy(largeStr, ca1); // copies ca1 into largeStr
strcat(largeStr, " "); // adds a space at the end of largeStr
strcat(largeStr, ca2); // concatenates ca2 onto largeStr
The problem is that we can easily miscalculate the size needed for largeStr
. Moreover, any time we change the values we want to store in largeStr
, we have to remember to double-check that we calculated its size correctly. Unfortunately, programs similar to this code are widely distributed. Programs with such code are error-prone and often lead to serious security leaks.
For most applications, in addition to being safer, it is also more efficient to use library
string
s rather than C-style strings.
Exercises Section 3.5.4
const char ca[] = {'h', 'e', 'l', 'l', 'o'};
const char *cp = ca;
while (*cp) {
cout << *cp << endl;
++cp;
}Exercise 3.38: In this section, we noted that it was not only illegal but meaningless to try to add two pointers. Why would adding two pointers be meaningless?
Exercise 3.39: Write a program to compare two
string
s. Now write a program to compare the values of two C-style character strings.Exercise 3.40: Write a program to define two character arrays initialized from string literals. Now define a third character array to hold the concatenation of the two arrays. Use
strcpy
andstrcat
to copy the two arrays into the third.
Many C++ programs predate the standard library and do not use the string
and vector
types. Moreover, many C++ programs interface to programs written in C or other languages that cannot use the C++ library. Hence, programs written in modern C++ may have to interface to code that uses arrays and/or C-style character strings. The C++ library offers facilities to make the interface easier to manage.
string
s and C-Style StringsIn § 3.2.1 (p. 84) we saw that we can initialize a string
from a string literal:
string s("Hello World"); // s holds Hello World
More generally, we can use a null-terminated character array anywhere that we can use a string literal:
• We can use a null-terminated character array to initialize or assign a
string
.
• We can use a null-terminated character array as one operand (but not both operands) to the
string
addition operator or as the right-hand operand in thestring
compound assignment (+=
) operator.
The reverse functionality is not provided: There is no direct way to use a library string
when a C-style string is required. For example, there is no way to initialize a character pointer from a string
. There is, however, a string
member function named c_str
that we can often use to accomplish what we want:
char *str = s; // error: can't initialize a char* from a string
const char *str = s.c_str(); // ok
The name c_str
indicates that the function returns a C-style character string. That is, it returns a pointer to the beginning of a null-terminated character array that holds the same data as the characters in the string
. The type of the pointer is const char*
, which prevents us from changing the contents of the array.
The array returned by c_str
is not guaranteed to be valid indefinitely. Any subsequent use of s
that might change the value of s
can invalidate this array.
If a program needs continuing access to the contents of the array returned by
str()
, the program must copy the array returned byc_str
.
vector
In § 3.5.1 (p. 114) we noted that we cannot initialize a built-in array from another array. Nor can we initialize an array from a vector
. However, we can use an array to initialize a vector
. To do so, we specify the address of the first element and one past the last element that we wish to copy:
int int_arr[] = {0, 1, 2, 3, 4, 5};
// ivec has six elements; each is a copy of the corresponding element in int_arr
vector<int> ivec(begin(int_arr), end(int_arr));
The two pointers used to construct ivec
mark the range of values to use to initialize the elements in ivec
. The second pointer points one past the last element to be copied. In this case, we used the library begin
and end
functions (§ 3.5.3, p. 118) to pass pointers to the first and one past the last elements in int_arr
. As a result, ivec
will have six elements each of which will have the same value as the corresponding element in int_arr
.
The specified range can be a subset of the array:
// copies three elements: int_arr[1], int_arr[2], int_arr[3]
vector<int> subVec(int_arr + 1, int_arr + 4);
This initialization creates subVec
with three elements. The values of these elements are copies of the values in int_arr[1]
through int_arr[3]
.
Pointers and arrays are surprisingly error-prone. Part of the problem is conceptual: Pointers are used for low-level manipulations and it is easy to make bookkeeping mistakes. Other problems arise because of the syntax, particularly the declaration syntax used with pointers.
Modern C++ programs should use
vector
s and iterators instead of built-in arrays and pointers, and usestring
s rather than C-style array-based character strings.