parts/6.08.BufferBoundaryViolation-HCB.md

6.8 Buffer Boundary Violation [HCB]

6.8.1 Applicability to language

The vulnerability as described in ISO/IEC TR 24772-1 clause 6.8 exists in C++ when arrays are managed using raw pointers or indexing. The range of valid raw pointers to a plain array a are from the first element to one past the last element of the array, i.e., in the range [std::begin(a)..std::end(a)). An object o can be treated as a single element array with respect to pointers referring to it.

C++ provides facilities to encapsulate code that is exposed to this vulnerability. The standard library defines features that mitigate or circumvent this vulnerability. For example, std::string, std::vector, std::deque, and iostreams manage buffers internally; using “range-for” such as for (auto &e :some container) and the algorithm library to access elements e of the container without the possibility of a buffer boundary violation.

However, the member function data() of the contiguous sequence containers returns a non-const pointer to the underlying elements. This allows manipulating the underlying memory directly, bypassing the safety features of the container leading to this vulnerability. For example, std::string::data() returns a non-const char*.

When working directly with iterators referring a container, one need to ensure that those iterators are and remain valid. For example, for a container c incrementing an iterator beyond the end(c) iterator or dereferencing the iterator denoted by end(c) are undefined behavior.

In general, validity of iterators requires programmer care to prevent out-of-bounds access of the underlying container:

For example, using algorithms and iterators correctly to convert an input string to lower case:

std::string to_lowercase(std::string_view s){
    std::string result{};
    transform(
        begin(s), end(s), // input range #1
        std::back_inserter(result), // output iterator #2
        [](char c){ return std::tolower(c);});
    return result;
}

The above example, passes two ranges of characters to the transform algorithm. Potential errors due to a boundary violation could be caused by the following changes:

The second problem occurs in the following code if the length of s is longer than 31:

std::string to_lowercase(std::string_view s){
    std::string result{'\0', 31};
    transform(
        begin(s), end(s), 
        begin(result), // error, only space for 31 characters 
        [](char c){ return std::tolower(c);});
    return result; // size(result) == 31
}

An additional problem occurs when performing an operation that invalidates an in-use iterator, such as the iterator internally used by the range-for statement below:

std::string to_lowercase(std::string s){
    for (auto &c:s){
       s.append(std::tolower(c)); // error, invalidates in-use iterator
    }
    return s;
}

Another way that overflows can occur is through the use of C-style strings, which can be treated as arrays of characters, but mishandling of the nul termination can make overflows possible. See clause 6.7 String Termination[CJM].

Since plain (C-style) arrays when passed as function arguments decay to pointers the array dimension is lost. C++ provides several means of keeping the array dimension available to the called function:

For further explanation and examples, see

6.8.2 Avoidance mechanisms for language users

To avoid the vulnerability or mitigate its ill effects, C++ software developers can: