The “unsigned” Conundrum
A few weeks ago, CppCon16 conference organizer Jon Kalb gave a great little lightning talk titled “unsigned: A Guideline For Better Code“. Right up front, he asked the audience what they thought this code would print out to the standard console:
Even though -1 is obviously less 1, the program prints out “a is not less than b“. WTF?
The reason for the apparently erroneous result is due to the convoluted type conversion rules inherited from C regarding unsigned/signed types.
Before evaluating the (a < b) expression, the rules dictate that the signed int object, a, gets implicitly converted to an unsigned int type. For an 8 bit CPU, the figure below shows how the bit pattern 0xFF is interpreted differently by C/C++ compilers depending upon how it is declared:
Thus, after the implicit type conversion of a from -1 to 255, the comparison expression becomes (255 < 1) – which produces the “a is not less than b” output.
Since it’s unreasonable to expect most C++ programmers to remember the entire arcane rule set for implicit conversions/promotions, what heuristic should programmers use to prevent nasty unsigned surprises like Mr. Kalb’s example? Here is his list of initial candidates:
If you’re trolling this post and you’re a C++ hater, then the first guideline is undoubtedly your choice :). If you’re a C++ programmer, the second two are pretty much impractical – especially since unsigned (in the form of size_t) is used liberally throughout the C++ standard library. (By the way, I once heard Bjarne Stroustrup say in a video talk that requiring size_t to be unsigned was a mistake). The third and fourth guidelines are reasonable suggestions; and those are the ones I use in writing my own code and reviewing the code of others.
At the end of his interesting talk, Mr. Kalb presented his own guideline:
I think Jon’s guideline is a nice, thoughtful addition to the last two guidelines on the previous chart. I would like to say that “Don’t use “unsigned” for quantities” subsumes those two, but I’m not sure it does. What do you think?
I agree it was a thoughtful talk, although the fact that we have size_t everywhere does make the advice a bit troublesome. Chandler’s undefined behavior talk made a more interesting case. Around 40 minutes in: https://youtu.be/yG1OZ69H_-o?t=39m41s he gives an interesting example where using an unsigned type as an index results in an optimization pessimization. This is more convincing case but I had a problems godbolting an idiomatic example using containers and size_t or uint32_t that had a similar issue.
Typo: it’s “Jon Kalb”, not “John Kalb”.
I fixed it. Thanks 🙂
I think you missed it in one case: “I think John’s guideline…” and in the tags.
Fixed ’em. 🙂
implicit casts, the root of all evil 😉
Not all evil, just a lot of evil. 🙂
I’ve been playing with foonathan’s type_safe library, which promises to fix both the mixing of unsigned/signed and the pessimistic optimization problems.
I follow the rule of never using unsigned types (unless you absolutely have to). The biggest problem with this is when you have to call the size() member of the standard containers and you get back a size_t. To get around this problem I use a simple template function to cast it to the signed type I want. So instead of using the ugly:
I use
which (at least to me) is slightly less ugly.
Looks like the commenting system messed up the templates… it should have the type int after static_cast, and also after my size function… I give up.
Let me try again: Instead of
static_cast<int>(myvec.size())
I use
size<int>(myvec)
Q1. “What does this code do?”
A1. Trigger a warning.
Q2. “What guideline will help?”
A2. Compile with warnings enabled.
+1
It’s a stupid advise to use signed ints instead of unsigned ones everywhere. Just look at Java: https://blogs.oracle.com/darcy/entry/unsigned_api. After 20 years they added some clumsy support for unsigned integers.
The rule is straightforward: use unsigned integers where you need non-negative integer values. But don’t mix them in the one expression – use explicit casts as appropriate.
Over the years I have been burnt with unsigned just too many times… not worth it for that one extra bit. So now unless I am doing something like bit manipulation I find it safer to just avoid unsigned altogether.
It is not just “one extra bit”. It is a natural representation for object size. In 16-bit address space you need either uint16_t or int32_t which is a double word on this arch.
Second, it is stupid to write thing like this:
void f(int arg);
void f(int arg)
{
if(arg < 0) throw std::domain_error(“The argument must be non-negative”);
…
}
When you must state it directly in the interface of the function:
void f(unsigned arg);
I actually prefer your f(int) to this:
void f(unsigned arg)
{
if(arg > (UINT_MAX>>1)) throw std::domain_error(“Wow, that seems a bit high. Are you sure your calculation did not wrap into negative”);
…
}
I agree with __vic. size_t is exactly what it says on the tin: the size of an object in memory. The make it signed is a total perversion of that concept.
I’ve seen so many awful examples of mixing signed and unsigned and there lies the real danger.
I do agree with the logic of not doing arithmetic on size_t values though. It makes no semantic sense to do so, which is why we have things like ptrdiff_t.
+1
The compiler should already bark at the illegal assignment of the negative value.
Bulldozer00, Thanks for the posting.
Eelis and __vic, I encourage you to watch the video (it is only six minutes). It may not change your minds, but I feel you owe it to yourselves to at least understand the arguments.
Brolloks, I really appreciated your comment (with the domain_error), but it looks like I can’t reply directly to it (I assume because it is too far down the hierarchy.
Jon
It seems that I got the wrong link:
Watched, thanks. Still find the argumentation doubtful.
Taking the opportunity I would like to thank you for your talks/writings about exceptions. They were really eye-opening some time for me 🙂
__vic, Thank you for the kind words.
Ur welcome. Thanks for the interesting lightning talk Jon!
For the definitive solution to this, checkout the safe numerics project at the Boost Library Incubator – http://www.blincubator.com