What Is the Value of std::indirect<T>
?
tl;dr: It’s just a T
! Not a optional<T>
.
std::indirect<T>
is a new class template in C++26. It wraps an object of type
T
, providing value-like semantics. The T
is allocated on the heap,
unlike other wrappers like std::optional
. This can be useful if T
is an
incomplete type (as is the case with the PIMPL pattern) or if you want to
reduce the object size.
How is it different from std::unique_ptr<T>
then? std::unique_ptr
has
pointer semantics:
operator=
moves the pointeroperator==
compares the pointer
i.e. the value of a std::unique_ptr
is the pointer to the managed object,
not the object itself.
The values of a std::indirect<T>
on the other hand are exactly the values of
the managed object itself:
operator=
copies/moves theT
objectoperator==
compares theT
object
There is one small problem though: Moving from a std::indirect<T>
leaves it
in a “valueless” state. This makes sense in a way: Under the hood,
std::indirect
is implemented as a pointer to the managed object. Moving it
will just assign this pointer to the other object, and set the source pointer
to nullptr
. Otherwise, moving the object would have to create a new “valid”
object in the empty source object, incurring a heap allocation which we would
like to avoid.
Should you be in any way concerned though? The answer is no: While there is a
member function valueless_after_move()
which you can use to check for this
empty state, you should never have to call it. Structure your program in such
a way that you never need to look at moved from objects. Functions that take
a std::indirect<T>
as an argument should have an implicit precondition that
the argument is not “valueless after move”:
void f(const std::indirect<int>& i) {
// Do _not_ check for `valueless_after_move()` here!
}
Is std::indirect<T>
confused about what it is?
Above I said that the possible values of std::indirect<T>
are exactly
the possible values of T
. Is this true though? Let’s take a short detour and
talk about values.
The meaning of value is central to C++ and programming in general. From n2479:
value
a notion of a unique abstract entity in a mathematical type system.
A type then is a set of values. In addition, in order to represent values in memory, there must be a mapping from bit patterns to those values. This mapping can be interpreted as a mathematical function that is partial (meaning that not all possible bit patterns must map to a value) and surjective (meaning that every value is represented by at least one bit pattern, but there may be multiple bit patterns that represent the same value). Elements of Programming calls this mapping a value type (not to be confused with the colloquial use of the term!).
To make matters more complex, the bit patterns (or datums) that represent
values are often not laid out contiguously in memory, but mixed with
padding bytes or even split up between stack and heap (as is the case for
std::vector<T>
). Endianness comes into play as well. But in theory, you could
collect all the bits that participate in representing the value from an object
by following all the pointers, skipping padding or “unimportant” elements, and
serialize them. EOP calls this mapping of a value type to concrete objects an
object type.
Let’s have a look at a few examples. Here is int
(for a 32-bit, little endian
machine that uses two’s complement). int
models the mathematical integers (ℤ).
block-beta
columns 6
o1["values:"]:1
block:group2:5
space
o1v0["-2147483648"]
o1v1["..."]
o1v2["-1"]
o1v3["0"]
o1v4["1"]
o1v5["..."]
o1v6["2147483647"]
end
o2["value type"]:1
block:group3:5
space
o2v0["0x80000000"]
o2v1["..."]
o2v2["0xFFFFFFFF"]
o2v3["0x00000000"]
o2v4["0x00000001"]
o2v5["..."]
o2v6["0x7FFFFFFF"]
end
o3["object type\n(little endian)"]:1
block:group4:5
o3ve["erroneous"]
o3v0["0x00000080"]
o3v1["..."]
o3v2["0xFFFFFFFF"]
o3v3["0x00000000"]
o3v4["0x01000000"]
o3v5["..."]
o3v6["0xFFFFFF7F"]
end
o2v0-->o1v0
o2v2-->o1v2
o2v3-->o1v3
o2v4-->o1v4
o2v6-->o1v6
o3v0-->o2v0
o3v2-->o2v2
o3v3-->o2v3
o3v4-->o2v4
o3v6-->o2v6
style o1 fill:#fff0,stroke:#fff0
style o2 fill:#fff0,stroke:#fff0
style o3 fill:#fff0,stroke:#fff0
style o1v1 fill:#fff0,stroke:#fff0
style o1v5 fill:#fff0,stroke:#fff0
style o2v1 fill:#fff0,stroke:#fff0
style o2v5 fill:#fff0,stroke:#fff0
style o3v1 fill:#fff0,stroke:#fff0
style o3v5 fill:#fff0,stroke:#fff0
One interesting observation is the “erroneous” state. This is the default
constructed state of an int
where it holds no value and has no meaning:
int i; // holds no value
In this state, the int
can only be assigned to and destroyed. In particular,
calling equality and comparison operators with such objects is not defined since
they operate on values, just like in math.
A more complex example is float
, which models the extended real numbers:
block-beta
columns 6
o1["values:"]:1
block:group2:5
space
space
o1v0["-∞"]
o1v1["..."]
o1v2["-1.0"]
o1v3["..."]
o1v4["0.0"]
o1v5["..."]
o1v6["1.0"]
o1v7["..."]
o1v8["∞"]
end
o2["value type"]:1
block:group3:5
space
space
o2v0["0xFF80-\n0000"]
o2v1["..."]
o2v2["0xBF80-\n0000"]
o2v3["..."]
o2v4a["0x8000-\n0000"]
o2v4b["0x0000-\n0000"]
o2v5["..."]
o2v6["0x7FFF-\nFFFF"]
o2v7["..."]
o2v8["0x7F80-\n0000"]
end
o3["object type\n(little endian)"]:1
block:group4:5
o3ve["erroneous"]
o3vn["NaN \n(many)"]
o3v0["0x0000-\n80FF"]
o3v1["..."]
o3v2["0x0000-\n80BF"]
o3v3["..."]
o3v4a["0x0000-\n0080"]
o3v4b["0x0000-\n0000"]
o3v5["..."]
o3v6["0xFFFF-\nFF7F"]
o3v7["..."]
o3v8["0x8000-\n007F"]
end
o2v0-->o1v0
o2v2-->o1v2
o2v4a-->o1v4
o2v4b-->o1v4
o2v6-->o1v6
o2v8-->o1v8
o3v0 --> o2v0
o3v2 --> o2v2
o3v4a--> o2v4a
o3v4b--> o2v4b
o3v6 --> o2v6
o3v8 --> o2v8
style o1 fill:#fff0,stroke:#fff0
style o2 fill:#fff0,stroke:#fff0
style o3 fill:#fff0,stroke:#fff0
style o1v1 fill:#fff0,stroke:#fff0
style o1v3 fill:#fff0,stroke:#fff0
style o1v5 fill:#fff0,stroke:#fff0
style o1v7 fill:#fff0,stroke:#fff0
style o2v1 fill:#fff0,stroke:#fff0
style o2v3 fill:#fff0,stroke:#fff0
style o2v5 fill:#fff0,stroke:#fff0
style o2v7 fill:#fff0,stroke:#fff0
style o3v1 fill:#fff0,stroke:#fff0
style o3v3 fill:#fff0,stroke:#fff0
style o3v5 fill:#fff0,stroke:#fff0
style o3v7 fill:#fff0,stroke:#fff0
In addition to an erroneous state float
can hold many bit patterns
that mean “not a number”. It would have been fine to leave operator==
undefined for those like for the erroneous state, but the designers of IEEE-754
decided to “fill the semantic hole” and always return false
when a NaN is
compared with another float 1.
Also, zero is special. There are two bit patterns that map to it: One for
-0.0f
and one for 0.0f
. Both represent the same mathematical zero, and
operator==
respects this. Some functions, such as division, will behave
differently when called with the different representations, however. A function
like this is not “regular” (using the definition from EOP) or
“equality-preserving” (a term from the C++ standard).
Again, there is (or should be) an implicit precondition for every function
taking an int
or float
as an argument that it represents a value (i.e. is
neither erroneous nor NaN):
void f(const int& i) {
// Can assume `i` is valid.
}
void g(const float& f) {
// Can assume `f` represents a valid extended real, i.e. is not erroneous or NaN.
}
If a function is equipped to deal with NaNs, it should document this explicitly.
OK, now let’s have a look at std::indirect<int>
:
block-beta
columns 6
o1["values:"]:1
block:group2:5
space
o1v0["-2147483648"]
o1v1["..."]
o1v2["-1"]
o1v3["0"]
o1v4["1"]
o1v5["..."]
o1v6["2147483647"]
end
o2["value type"]:1
block:group3:5
space
o2v0["0x80000000"]
o2v1["..."]
o2v2["0xFFFFFFFF"]
o2v3["0x00000000"]
o2v4["0x00000001"]
o2v5["..."]
o2v6["0x7FFFFFFF"]
end
o3["object type\n(little endian)"]:1
block:group4:5
o3ve["<code>nullptr</code>\n(valueless\nafter move)"]
o3v0["ptr→\n0x00000080"]
o3v1["..."]
o3v2["ptr→\n0xFFFFFFFF"]
o3v3["ptr→\n0x00000000"]
o3v4["ptr→\n0x01000000"]
o3v5["..."]
o3v6["ptr→\n0xFFFFFF7F"]
end
o2v0-->o1v0
o2v2-->o1v2
o2v3-->o1v3
o2v4-->o1v4
o2v6-->o1v6
o3v0-->o2v0
o3v2-->o2v2
o3v3-->o2v3
o3v4-->o2v4
o3v6-->o2v6
style o1 fill:#fff0,stroke:#fff0
style o2 fill:#fff0,stroke:#fff0
style o3 fill:#fff0,stroke:#fff0
style o1v1 fill:#fff0,stroke:#fff0
style o1v5 fill:#fff0,stroke:#fff0
style o2v1 fill:#fff0,stroke:#fff0
style o2v5 fill:#fff0,stroke:#fff0
style o3v1 fill:#fff0,stroke:#fff0
style o3v5 fill:#fff0,stroke:#fff0
The erroneous state is not present anymore, since std::indirect
always
value-initializes the wrapped object.
More interesting is the introduction of the “valueless after move” state. This
state does not carry any meaning. You would assume that functions like
operator==
or the comparison functions would have a precondition that the
object is not in this “valueless after move” state. And indeed, in
p3019r3
we can find wording to that effect:
X.Y.8 Relational operators [indirect.rel]
template <class U, class AA> constexpr auto operator==(const indirect& lhs, const indirect<U, AA>& rhs) noexcept(noexcept(*lhs == *rhs));
[…]
2. Preconditions:
lhs
is not valueless,rhs
is not valueless.
3. Effects: Returns*lhs op *rhs
.
Contrast with the accepted p3019r13:
X.Y.8 Relational operators [indirect.relops]
template <class U, class AA> constexpr bool operator==(const indirect& lhs, const indirect<U, AA>& rhs) noexcept(noexcept(*lhs == *rhs));
[…]
2. Returns: If
lhs
is valueless orrhs
is valueless,lhs.valueless_after_move() == rhs.valueless_after_move()
; otherwise*lhs == *rhs
.
The preconditions have been dropped. “Valueless” indirect
s are now guaranteed
to compare equal. Similarly, they will be ordered before any “valueful”
indirect
by operator<=>
.
I think this raises interesting questions. Is the “valueless after move” state
now considered to be a proper value of std::indirect<T>
? If not, is it still
a conceptual error to call operator==
and friends even though their behavior
is now well defined? Is there any (generic) code that may call functions on
valueless indirect
s?
p3019r13 has this to say:
While the notion that a valueless
indirect
orpolymorphic
is toxic and must not be passed around code is appealing, it would not interact well with generic code which may need to handle a variety of types. […] We opt for consistency with existing standard library types (namelyvariant
, which has a valueless state) and allow copy, move, assignment and move assignment of a valuelessindirect
andpolymorphic
.
When I read this I wondered why generic code would have to access the moved from state. I’m very interested in some piece of generic code that actually has a good reason to execute those code paths.
The best answer I found is this Reddit comment by Howard Hinnant:
A valid and correct sort algorithm could move from an object and then compare it with itself. This would not be an optimal algorithm, but it would be legal. Stranger things have happened. One implementation of
std::reverse
once swapped the middle element of an odd-numbered sequence with itself. Smart? Not really. Correct? Yes. Legal? Yes.
So this is the reason why the “regular” operations (copy, move, comparisons)
should have defined semantics on a moved from object. I would count std::hash
in as well 2. But this is just a concession to code that doesn’t behave very
well. You should treat those operations as being undefined!
What about other operations, such as formatting? Should they return something valid for the moved from state as well? The next paragraph from p3019r13 gives a sad answer:
Like
variant
,indirect
does not support formatting by forwarding to the owned object. There may be no owned object to format so we require the user to write code to determine how to format a valuelessindirect
or to validate that theindirect
is not valueless before formatting*i
(wherei
is an instance ofindirect
for some formattable typeT
).
I think this is needlessly pessimistic. indirect<T>
is a type that holds the
same values as T
. You should never even end up in a situation where the
indirect<T>
you are trying to format is potentially “valueless”.
This line of reasoning would only make sense if the value set of indirect<T>
is not the same as the one of T
, but instead “the values of T
plus one
‘valueless value’”:
block-beta
columns 6
o1["values:"]:1
block:group2:5
o1ve["∅"]
o1v0["-2147483648"]
o1v1["..."]
o1v2["-1"]
o1v3["0"]
o1v4["1"]
o1v5["..."]
o1v6["2147483647"]
end
o2["value type"]:1
block:group3:5
o2ve["0x000000000"]
o2v0["0x180000000"]
o2v1["..."]
o2v2["0x1FFFFFFFF"]
o2v3["0x100000000"]
o2v4["0x100000001"]
o2v5["..."]
o2v6["0x17FFFFFFF"]
end
o3["object type\n(little endian)"]:1
block:group4:5
o3ve["<code>nullptr</code>\n(valueless\nafter move)"]
o3v0["ptr→\n0x00000080"]
o3v1["..."]
o3v2["ptr→\n0xFFFFFFFF"]
o3v3["ptr→\n0x00000000"]
o3v4["ptr→\n0x01000000"]
o3v5["..."]
o3v6["ptr→\n0xFFFFFF7F"]
end
o2ve-->o1ve
o2v0-->o1v0
o2v2-->o1v2
o2v3-->o1v3
o2v4-->o1v4
o2v6-->o1v6
o3ve-->o2ve
o3v0-->o2v0
o3v2-->o2v2
o3v3-->o2v3
o3v4-->o2v4
o3v6-->o2v6
style o1 fill:#fff0,stroke:#fff0
style o2 fill:#fff0,stroke:#fff0
style o3 fill:#fff0,stroke:#fff0
style o1v1 fill:#fff0,stroke:#fff0
style o1v5 fill:#fff0,stroke:#fff0
style o2v1 fill:#fff0,stroke:#fff0
style o2v5 fill:#fff0,stroke:#fff0
style o3v1 fill:#fff0,stroke:#fff0
style o3v5 fill:#fff0,stroke:#fff0
In this case, encountering a “valueless” indirect<T>
would be perfectly
normal, in the same way as encountering an empty std::optional<T>
is.
But because indirect<T>
is not like this, throwing an exception when trying
to format a moved from indirect<T>
would be perfectly fine (and still satisfy
the
Formatter named
requirement I believe).
Conclusion
So, what is a std::indirect<T>
? I’d like to think it is just a T
,
conceptually, holding the same values as T
.
This paragraph from p3019r13 makes me think I’m on the right track:
Both
indirect
andpolymorphic
have a valueless state that is used to implement move. The valueless state is not intended to be observable to the user. There is nooperator bool
orhas_value
member function. Accessing the value of anindirect
orpolymorphic
after it has been moved from is undefined behaviour.
Reducing UB in the spec of indirect
was most likely motivated by making the
class safe to use by generic algorithms, even those that may not always behave
well while still being “correct” according to the standard. This doesn’t mean
that indirect
’s designers elevated the “valueless” state into a proper value.
It’s even in the name!
You should not even have to think about the valueless state in your code. Have
!valueless_after_move()
as an implicit precondition on all functions. Make
sure you don’t maneuver yourself into a situation where you feel the need to
check for the valueless state. If you absolutely need to deal with valueless
indirect
s, don’t rely on their behavior on assignment, comparison, etc.
Knowing which values a type can hold is essential for reasoning about code.
-
One reason they defined
NaN == NaN
to befalse
is for users to be able to distinguish NaN on machines/languages that don’t have aIsNaN(x)
function (quote from here):The exceptions are C predicates
x == x
andx != x
, which are respectively 1 and 0 for every infinite or finite number x but reverse if x is Not a Number (NaN); these provide the only simple unexceptional distinction between NaNs and numbers in languages that lack a word for NaN and a predicate IsNaN(x). Over-optimizing compilers that substitute 1 forx == x
violate IEEE 754. -
What all those operations have in common is that they:
- can be generated by the compiler (in principle, as
std::hash
currently is not) - operate memberwise
Since they operate memberwise, they don’t neccesarily require “valid” objects as arguments, where “valid” means “valid value of the type”. They only need each member to also be “valid regarding the regular operations, but otherwise unspecified”. In this way, this property composes!
It slots right in between the well known basic guarantee, where all invariants of the object hold, and the minimal guarantee, where an object is just destructible (and I guess assignable-to). Maybe call this guarantee the “regular guarantee”? ↩
- can be generated by the compiler (in principle, as