Home > C++ > The “unsigned” Conundrum

The “unsigned” Conundrum

A few weeks ago, CppCon16 conference organizer Jon Kalb gave a great little lightning talk titled “unsigned: A Guideline For Better Code“. Right up front, he asked the audience what they thought this code would print out to the standard console:

mixed-mode-math

Even though -1 is obviously less 1, the program prints out “a is not less than b“. WTF?

The reason for the apparently erroneous result is due to the convoluted type conversion rules inherited from C regarding unsigned/signed types.

Before evaluating the (a < b) expression, the rules dictate that the signed int object, a, gets implicitly converted to an unsigned int type. For an 8 bit CPU, the figure below shows how the bit pattern 0xFF is interpreted differently by C/C++ compilers depending upon how it is declared:

 

eightbits

 

Thus, after the implicit type conversion of a from -1 to 255, the comparison expression becomes (255 < 1) –  which produces the “a is not less than b” output.

Since it’s unreasonable to expect most C++ programmers to remember the entire arcane rule set for implicit conversions/promotions, what heuristic should programmers use to prevent nasty unsigned surprises like Mr. Kalb’s example?  Here is his list of initial candidates:

 

unsignedguidlines

If you’re trolling this post and you’re a C++ hater, then the first guideline is undoubtedly your choice :). If you’re a C++ programmer, the second two are pretty much impractical – especially since unsigned (in the form of size_t) is used liberally throughout the C++ standard library. (By the way, I once heard Bjarne Stroustrup say in a video talk that requiring size_t to be unsigned was a mistake). The third and fourth guidelines are reasonable suggestions; and those are the ones I use in writing my own code and reviewing the code of others.

At the end of his interesting talk, Mr. Kalb presented his own guideline:

kalbguideline

I think Jon’s guideline is a nice, thoughtful addition to the last two guidelines on the previous chart. I would like to say that “Don’t use “unsigned” for quantities” subsumes those two, but I’m not sure it does. What do you think?

Categories: C++ Tags: , ,
  1. October 16, 2016 at 12:54 pm

    I agree it was a thoughtful talk, although the fact that we have size_t everywhere does make the advice a bit troublesome. Chandler’s undefined behavior talk made a more interesting case. Around 40 minutes in: https://youtu.be/yG1OZ69H_-o?t=39m41s he gives an interesting example where using an unsigned type as an index results in an optimization pessimization. This is more convincing case but I had a problems godbolting an idiomatic example using containers and size_t or uint32_t that had a similar issue.

  2. October 16, 2016 at 3:54 pm

    Typo: it’s “Jon Kalb”, not “John Kalb”.

    • October 16, 2016 at 5:11 pm

      I fixed it. Thanks 🙂

      • JonKalb
        October 19, 2016 at 2:03 pm

        I think you missed it in one case: “I think John’s guideline…” and in the tags.

      • October 19, 2016 at 8:02 pm

        Fixed ’em. 🙂

  3. October 17, 2016 at 6:06 am

    implicit casts, the root of all evil 😉

    • JonKalb
      October 19, 2016 at 2:06 pm

      Not all evil, just a lot of evil. 🙂

  4. fisherro
    October 18, 2016 at 6:19 pm

    I’ve been playing with foonathan’s type_safe library, which promises to fix both the mixing of unsigned/signed and the pessimistic optimization problems.

  5. October 19, 2016 at 2:01 am

    I follow the rule of never using unsigned types (unless you absolutely have to). The biggest problem with this is when you have to call the size() member of the standard containers and you get back a size_t. To get around this problem I use a simple template function to cast it to the signed type I want. So instead of using the ugly:

     static_cast(myvec.size()) 

    I use

     size(myvec) 

    which (at least to me) is slightly less ugly.

    • October 19, 2016 at 2:08 am

      Looks like the commenting system messed up the templates… it should have the type int after static_cast, and also after my size function… I give up.

    • October 19, 2016 at 7:39 am

      Let me try again: Instead of
      static_cast<int>(myvec.size())
      I use
      size<int>(myvec)

  6. Eelis
    October 19, 2016 at 6:03 am

    Q1. “What does this code do?”
    A1. Trigger a warning.

    Q2. “What guideline will help?”
    A2. Compile with warnings enabled.

    • October 19, 2016 at 7:13 am

      +1

      It’s a stupid advise to use signed ints instead of unsigned ones everywhere. Just look at Java: https://blogs.oracle.com/darcy/entry/unsigned_api. After 20 years they added some clumsy support for unsigned integers.

      The rule is straightforward: use unsigned integers where you need non-negative integer values. But don’t mix them in the one expression – use explicit casts as appropriate.

      • October 19, 2016 at 7:59 am

        Over the years I have been burnt with unsigned just too many times… not worth it for that one extra bit. So now unless I am doing something like bit manipulation I find it safer to just avoid unsigned altogether.

      • October 19, 2016 at 8:25 am

        It is not just “one extra bit”. It is a natural representation for object size. In 16-bit address space you need either uint16_t or int32_t which is a double word on this arch.

        Second, it is stupid to write thing like this:

        void f(int arg);

        void f(int arg)
        {
        if(arg < 0) throw std::domain_error(“The argument must be non-negative”);

        }

        When you must state it directly in the interface of the function:

        void f(unsigned arg);

      • October 19, 2016 at 8:35 am

        When you must state it directly
        Please read as
        “When you can state it directly”

      • October 19, 2016 at 8:39 am

        I actually prefer your f(int) to this:

        void f(unsigned arg)
        {
        if(arg > (UINT_MAX>>1)) throw std::domain_error(“Wow, that seems a bit high. Are you sure your calculation did not wrap into negative”);

        }

      • skelband
        October 19, 2016 at 12:28 pm

        I agree with __vic. size_t is exactly what it says on the tin: the size of an object in memory. The make it signed is a total perversion of that concept.
        I’ve seen so many awful examples of mixing signed and unsigned and there lies the real danger.
        I do agree with the logic of not doing arithmetic on size_t values though. It makes no semantic sense to do so, which is why we have things like ptrdiff_t.

    • Bitbeisser
      October 19, 2016 at 10:55 am

      +1
      The compiler should already bark at the illegal assignment of the negative value.

  7. JonKalb
    October 19, 2016 at 2:14 pm

    Bulldozer00, Thanks for the posting.

    Eelis and __vic, I encourage you to watch the video (it is only six minutes). It may not change your minds, but I feel you owe it to yourselves to at least understand the arguments.

    Brolloks, I really appreciated your comment (with the domain_error), but it looks like I can’t reply directly to it (I assume because it is too far down the hierarchy.

    Jon

    • JonKalb
      October 19, 2016 at 9:35 pm

      It seems that I got the wrong link:

    • October 20, 2016 at 2:12 pm

      Watched, thanks. Still find the argumentation doubtful.

      Taking the opportunity I would like to thank you for your talks/writings about exceptions. They were really eye-opening some time for me 🙂

      • JonKalb
        October 20, 2016 at 4:36 pm

        __vic, Thank you for the kind words.

    • October 20, 2016 at 4:42 pm

      Ur welcome. Thanks for the interesting lightning talk Jon!

  8. October 19, 2016 at 3:21 pm

    For the definitive solution to this, checkout the safe numerics project at the Boost Library Incubator – http://www.blincubator.com

  1. No trackbacks yet.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.