parts/6.25.LikelyIncorrectExpression-KOA.md

6.25 Likely Incorrect Expression [KOA]

6.25.1 Applicability to language

The vulnerability as described in ISO/IEC TR 24772-1:2019 clause 6.25 exists in C++.

C++ has several instances of operators which are similar in structure, but different in meaning. Examples of operators in C-based languages that can cause confusion are:

The typographical similarity can lead to code like the following, where it is unclear if the expression as spelled is actually intended, or if the author has typos in it, meaning a different operator instead:

auto f(unsigned i, unsigned j)
{
  return (i > 1) & (j = 1); // (>>, &&, ==)?
}

The following code in a production phone OS caused the “bricking” of many users phones:

if (key_data_.has_value() & !key_data_->label().empty())

instead of

if (key_data_.has_value() && !key_data_->label().empty())

or the even clearer using the alternative operator representation and for &&

if (key_data_.has_value() and !key_data_->label().empty())

As a general rule, the use of =, +=, -= in an expression when the operator is not the final assignment to a variable is unsafe since the assignment operator creates side-effects within the expression which are difficult to analyze by a human reader and can be have different results depending upon the order of evaluation of terms within the expression.

But even in assignment expression flipping the assignment symbol with the operator can itself lead to valid code that was not intended:

int i{42};

i += 22; // i becomes 64
i =+ 22; // i becomes 22
i =- 22; // i becomes -22

C++ provides significant freedom in constructing statements. This freedom, if misused, can result in unexpected results and potential vulnerabilities.

Since the order of evaluation within expressions is only partially defined, sub-expressions with side effects on variables used within the overall expression can result in undefined behaviour.

The flexibility of C++ can obscure the intent of a programmer. Consider:

int x,y;
/* ... */
if (x = y){
  /* ... */
}

A fair amount of analysis may need to be done to determine whether the programmer intended to do an assignment as part of the if statement (valid in C++) or whether the programmer made the common mistake of using an = (assignment) instead of a == (equality).

This confusion can be corrected by moving assignments outside of Boolean contexts. This would change the example code to:

int x,y;
/* … */
x = y;
    if (x == 0) {
     /* ... */
    }

This would clearly state what the programmer meant and that the assignment of y to x was intended.

Additional confusion occurs in the use of the logical && or || operators and the bitwise & or | operators. The compiler will implicitly convert arithmetic expressions to bool for operands of the logical operators. Similarly, operands of bool type will be promoted to int for operands of the bitwise operators (see Conversion Errors [FLC]).
It may not be clear whether the programmer intended to use the logical operator && or bitwise operator & instead:

unsigned f(unsigned i, unsigned j)
{
  return (i > 0) & j;
}

Using the alternative tokens and / or in lieu of && and || reduces the possibility of confusion. Similarly, a not_eq b is preferable to a != b since the latter is easily confused with the equally valid expression a |= b.

Programmers can easily get in the habit of inserting the ; statement terminator at the end of statements. However, inadvertently doing this can drastically alter the meaning of code, even though the code is valid as in the following example:

int a,b;
    /* … */
    if (a == b);  // the semi-colon will make the following code always execute
    {             
     /* ... */
    }

Because of the misplaced semi-colon, the code block following the if will always be executed. In this case, it is extremely likely that the programmer did not intend to put the semi-colon there.

Unary ‘+’{.cpp} on a variable is (almost) a no-op, and is possibly a mistype of ‘++’{.cpp}. A unary ‘-’{.cpp} on a variable will switch its sign, unless applied to a variable of an unsigned type, in which case the result is the value subtracted from 2^n where n is the number of bits in the unsigned type.

C++ overloading of operators can also cause confusion.

The language does not impose any restrictions on semantics of overloaded operators. This can cause (potentially generic) code to behave in completely unobvious ways, when such types with “unusual” operator semantics are used.

For example, the boost.spirit library allows code like the following to create parser rules:

r = real_p >> *(ch_p(',') >> real_p); // rule that accepts a comma-separated list of real numbers

This library uses C++ operator overloads to create an embedded domain-specific language for grammar rules, allowing the specification of parser rules as C++ expressions.

When overloaded, related operators like the compound assignment with their base operator are not longer guaranteed to keep their behavioral relationship that they have for built-in types. For example, a += b is not guaranteed to behave like a = a + b, or being defined at all.

Similarly for overloaded relational operators, for a == b, there is no guarantee that a != b is equivalent to !(a == b) if both are overloaded by the user.

Unless all relational operators for a type are defined either explicitly in a consistent way or implicitly, unexpected results can occur. A user-declared three-way comparison operator (<=>) is used by the compiler to synthesize the relational operators consistently. If operator<=> is defined as =default, the equality comparison operators will also be defined; and if operator== with return type bool is defined, a corresponding inequality operator!= is also defined implicitly.

6.25.2 Avoidance mechanisms for language users

To avoid the vulnerability or mitigate its ill effects, C++ software developers can:

— Use the avoidance mechanisms of ISO/IEC 24772-1 clause 6.25.5.