Unique IDs in Golang, part 2

This is a continuing series on UID alternatives:

  • Part1 Introduces the topic
  • Part2 Talks about UUID (this post)
  • Part3 Talks about ULID

Universal Unique Identifiers (UUID) are an standard way of generating and representing 128-bit numbers to be used as identifiers.

The standard RFC 4122 defines a way to represent these 128 bits in a string of 32 hexadecimal digits, separated with dashes in this arrangement 8-4-4-4-12, looking like:

12345678-1234-V234-v234-123456789abc

Taking a total of 36 characters. The two positions we have marked with V and v in the example have a special meaning:

The three most significant bits of V denote the UUID variant. The RFC specifies 3:

Both V1/V2 are the same except for the V value and the endianness in byte representation, Variant 1 uses little-endian, variant 2, big-endian.

The 4 most significant bits of v denotes the UUID version. Both variants support 5 versions of UUIDs:

The meaning of each dash-separated section on the string representation varies on each version.

Because type 4 is the most flexible, falling in the Random UID category that we outlined in part 1, we are gonna center the examples on that version.

Golang RNG

Before comparing particular UUID packages, it’s interesting to point out that there are two ways to generate random numbers in the Golang standard library, crypto/rand and math/rand. Without entering into details, the main difference is that crypto/rand uses a cryptographically secure source of entropy from the operating system (/dev/urandom) if available, or falls back to a cryptographically secure algorithm for generating random values. This involves system calls and isn’t fast. In contrast, math/rand uses a faster algorithm, but even when properly seeded it isn’t suitable for security-related uses. Since UUID are based on good-quality randomness, crypto/rand is the logical choice for most packages, but some use cases might prefer a faster approach.

Also note that it is possible to use some strategies to get the most of crypto/rand, and that naive usage of math/rand isn’t optimal, specially in concurrent scenarios, which are the norm in most non-trivial Golang projects.

Golang UUID packages

By far, the most popular package for handling UUID in Golang is satori/go.uuid, with 1136 stars on Github at the time of writing. It supports the five UUID versions, it’s well tested and documented. This package uses crypto/rand to generate the random bits so it’s as secure as it can be, but I couldn’t see how to specify my own source of entropy so it’s not easy to get more performance if needed.

A contender that is also well tested and RFC-compliant is pborman/uuid. It also uses crypto/rand under the hood, but it includes a func SetRand(io.Reader) function to easily set our own source of entropy, potentially enabling us to get more performance or fine-tune it for particular scenarios were math/rand is acceptable and we want more speed.

With both packages claiming RFC compliance and being well tested, let’s center the comparison on performance, as for some systems generating and parsing identifiers is part of the core functionality, so the format being equal, this could be the deciding factor on which package to use.

Generating Performance

For this test we’ll be generating random, version 4 UUIDs. We’ll use this benchmark code:

  package main

  import (
      "math/rand"
      "testing"

      pborman "github.com/pborman/uuid"
      satori "github.com/satori/go.uuid"
  )

  func BenchmarkSatoriV4(b *testing.B) {
      for i := 0; i < b.N; i++ {
          satori.NewV4()
      }
  }

  func BenchmarkPbormanV4(b *testing.B) {
      for i := 0; i < b.N; i++ {
          pborman.NewRandom()
      }
  }

  func BenchmarkPbormanV4MathRand(b *testing.B) {
      rsource := rand.New(rand.NewSource(1)) // Unsafe concurrent use!
      pborman.SetRand(rsource)
      for i := 0; i < b.N; i++ {
          pborman.NewRandom()
      }
  }

Running with go test uuid_generating_test.go -bench=. gives us:

  BenchmarkSatoriV4-8              3000000           497 ns/op
  BenchmarkPbormanV4-8             3000000           499 ns/op
  BenchmarkPbormanV4MathRand-8    20000000            72.6 ns/op

In their default mode, they have pretty much the same performance, which makes sense, but pborman/uuid can be more customizable. For the shake of comparison, I used a math/rand RNG without any mutex, so really fast, naive, and unsafe for concurrent use, but this could be ok having a separate RNG per goroutine or connection, or using a sync/pool, so it’s still a useful benchmark.

To/From string performance

Another typical usage is to marshal/unmarshal UUIDs to/from string, for example to save them into text files, logs, or databases.

Let’s benchmark the two contenders for this common use case with this code:

  package main

  import (
      "testing"

      pborman "github.com/pborman/uuid"
      satori "github.com/satori/go.uuid"
  )

  var (
      pbormanUUID = pborman.NewRandom()
      satoriUUID  = satori.NewV4()
      UUIDstring  = satoriUUID.String()
  )

  func BenchmarkSatoriToString(b *testing.B) {
      for i := 0; i < b.N; i++ {
          _ = satoriUUID.String()
      }
  }

  func BenchmarkPbormanToString(b *testing.B) {
      for i := 0; i < b.N; i++ {
          _ = pbormanUUID.String()
      }
  }

  func BenchmarkSatoriFromString(b *testing.B) {
      for i := 0; i < b.N; i++ {
          satori.FromString(UUIDstring)
      }
  }

  func BenchmarkPbormanFromString(b *testing.B) {
      for i := 0; i < b.N; i++ {
          pborman.Parse(UUIDstring)
      }
  }

Again running with go test uuid_string_test.go -bench=. gives us:

  BenchmarkSatoriToString-8       20000000            95.2 ns/op
  BenchmarkPbormanToString-8      20000000            92.0 ns/op
  BenchmarkSatoriFromString-8     10000000           142 ns/op
  BenchmarkPbormanFromString-8    20000000            62.1 ns/op

Seems like parsing an UUID to a string is roughly equivalent, but satori/go.uuid is 2x slower than pborman/uuid when parsing from a string.

The culprit seems to be the function UnmarshalText which, among other things, supports an extra UUID format, {UUID} (sometimes used by Microsoft GUIDs) that pborman/uuid Parse doesn’t, thus doing less work. But, what if we need that format? Using pborman/uuid would require code like this:

  func withBraceParse(s string) pborman.UUID {
      if s[0] == '{' {
          s = s[1 : len(s)-1]
      }
      return pborman.Parse(s)
  }

Which, when benchmarked with this code is still much faster than satori/go.uuid for this operation:

  BenchmarkSatoriFromBraceString-8        10000000           135 ns/op
  BenchmarkPbormanFromBraceString-8       20000000            70.2 ns/op

So the need to support that format isn’t a selling point for satori/go.uuid.

In conclusion, for easy of use, start with satori/go.uuid. However if you want more performance or customization, pborman/uuid seems to be the way to go.

Shortcomings of UUID.

While UUID usage is widespread, it isn’t without shortcomings:

UUID Alternatives

Tackling UUID shortcomings, while still being simple and random-generated UIDs, the most popular alternative is perhaps ULID, and a discussion of the format with comparison of Golang libraries implementing forms part three.

Hybrid sequential + random UID systems can offer even better performance by reducing the amount of entropy needed for each single UID, at the cost of being more complex to setup and operate. A popular system using this schema is Twitter’s Snowflake. We’ll be discussing an Snowflake alternative, noeqd, and it’s Golang library, in part 4.

References

comments powered by Disqus