Unique IDs in Golang, part 2

May 28, 2017 · 7 minute read · Comments
programming golang uid uuid

This is a continuing series on UID alternatives:

Part1 Introduces the topic

Part2 Talks about UUID (this post)

Part3 Talks about ULID

Universal Unique Identifiers (UUID) are an standard way of generating and representing 128-bit numbers to be used as identifiers.

The standard RFC 4122 defines a way to represent these 128 bits in a string of 32 hexadecimal digits, separated with dashes in this arrangement 8-4-4-4-12, looking like:

12345678-1234-V234-v234-123456789abc

Taking a total of 36 characters. The two positions we have marked with V and v in the example have a special meaning:

The three most significant bits of V denote the UUID variant. The RFC specifies 3:

Variant 0 is obsolete
Variant 1 is the most commonly used
Variant 2 was used for GUIDs in the early Windows versions and is reserved for that

Both V1/V2 are the same except for the V value and the endianness in byte representation, Variant 1 uses little-endian, variant 2, big-endian.

The 4 most significant bits of v denotes the UUID version. Both variants support 5 versions of UUIDs:

Type 1 concatenates the unique MAC address of the generating node with a 60-bit timestamp, corresponding to a single point of time and space and thus deemed unique. It’s however a security concern as the MAC address can be traced to the generating node
Type 2 is similar to type 1, but the timestamp is truncated to accommodate a “local domain” number, representing user ids, group ids, or the like. The reduced timestamp resolution means that type 2 is not suitable for cases were the UUIDs are issued per node/domain at a rate exceeding 1 every 7 seconds
Type 3 deterministically hashes a predefined name from a namespace with md5 to get the UUID. Not suitable for security credentials, but useful to assign an UUID to some entity with an already unique name in another format
Type 5 as type 3, but uses SHA1 as the hash, preferable over type 3, but still not suitable for security credentials
Type 4 is completely random with 122 bits of entropy

The meaning of each dash-separated section on the string representation varies on each version.

Because type 4 is the most flexible, falling in the Random UID category that we outlined in part 1, we are gonna center the examples on that version.

Golang RNG

Before comparing particular UUID packages, it’s interesting to point out that there are two ways to generate random numbers in the Golang standard library, crypto/rand and math/rand. Without entering into details, the main difference is that crypto/rand uses a cryptographically secure source of entropy from the operating system (/dev/urandom) if available, or falls back to a cryptographically secure algorithm for generating random values. This involves system calls and isn’t fast. In contrast, math/rand uses a faster algorithm, but even when properly seeded it isn’t suitable for security-related uses. Since UUID are based on good-quality randomness, crypto/rand is the logical choice for most packages, but some use cases might prefer a faster approach.

Also note that it is possible to use some strategies to get the most of crypto/rand, and that naive usage of math/rand isn’t optimal, specially in concurrent scenarios, which are the norm in most non-trivial Golang projects.

Golang UUID packages

By far, the most popular package for handling UUID in Golang is satori/go.uuid, with 1136 stars on Github at the time of writing. It supports the five UUID versions, it’s well tested and documented. This package uses crypto/rand to generate the random bits so it’s as secure as it can be, but I couldn’t see how to specify my own source of entropy so it’s not easy to get more performance if needed.

A contender that is also well tested and RFC-compliant is pborman/uuid. It also uses crypto/rand under the hood, but it includes a func SetRand(io.Reader) function to easily set our own source of entropy, potentially enabling us to get more performance or fine-tune it for particular scenarios were math/rand is acceptable and we want more speed.

With both packages claiming RFC compliance and being well tested, let’s center the comparison on performance, as for some systems generating and parsing identifiers is part of the core functionality, so the format being equal, this could be the deciding factor on which package to use.

Generating Performance

For this test we’ll be generating random, version 4 UUIDs. We’ll use this benchmark code:

  package main

  import (
      "math/rand"
      "testing"

      pborman "github.com/pborman/uuid"
      satori "github.com/satori/go.uuid"
  )

  func BenchmarkSatoriV4(b *testing.B) {
      for i := 0; i < b.N; i++ {
          satori.NewV4()
      }
  }

  func BenchmarkPbormanV4(b *testing.B) {
      for i := 0; i < b.N; i++ {
          pborman.NewRandom()
      }
  }

  func BenchmarkPbormanV4MathRand(b *testing.B) {
      rsource := rand.New(rand.NewSource(1)) // Unsafe concurrent use!
      pborman.SetRand(rsource)
      for i := 0; i < b.N; i++ {
          pborman.NewRandom()
      }
  }

Running with go test uuid_generating_test.go -bench=. gives us:

  BenchmarkSatoriV4-8              3000000           497 ns/op
  BenchmarkPbormanV4-8             3000000           499 ns/op
  BenchmarkPbormanV4MathRand-8    20000000            72.6 ns/op

In their default mode, they have pretty much the same performance, which makes sense, but pborman/uuid can be more customizable. For the shake of comparison, I used a math/rand RNG without any mutex, so really fast, naive, and unsafe for concurrent use, but this could be ok having a separate RNG per goroutine or connection, or using a sync/pool, so it’s still a useful benchmark.

To/From string performance

Another typical usage is to marshal/unmarshal UUIDs to/from string, for example to save them into text files, logs, or databases.

Let’s benchmark the two contenders for this common use case with this code:

  package main

  import (
      "testing"

      pborman "github.com/pborman/uuid"
      satori "github.com/satori/go.uuid"
  )

  var (
      pbormanUUID = pborman.NewRandom()
      satoriUUID  = satori.NewV4()
      UUIDstring  = satoriUUID.String()
  )

  func BenchmarkSatoriToString(b *testing.B) {
      for i := 0; i < b.N; i++ {
          _ = satoriUUID.String()
      }
  }

  func BenchmarkPbormanToString(b *testing.B) {
      for i := 0; i < b.N; i++ {
          _ = pbormanUUID.String()
      }
  }

  func BenchmarkSatoriFromString(b *testing.B) {
      for i := 0; i < b.N; i++ {
          satori.FromString(UUIDstring)
      }
  }

  func BenchmarkPbormanFromString(b *testing.B) {
      for i := 0; i < b.N; i++ {
          pborman.Parse(UUIDstring)
      }
  }

Again running with go test uuid_string_test.go -bench=. gives us:

  BenchmarkSatoriToString-8       20000000            95.2 ns/op
  BenchmarkPbormanToString-8      20000000            92.0 ns/op
  BenchmarkSatoriFromString-8     10000000           142 ns/op
  BenchmarkPbormanFromString-8    20000000            62.1 ns/op

Seems like parsing an UUID to a string is roughly equivalent, but satori/go.uuid is 2x slower than pborman/uuid when parsing from a string.

The culprit seems to be the function UnmarshalText which, among other things, supports an extra UUID format, {UUID} (sometimes used by Microsoft GUIDs) that pborman/uuid Parse doesn’t, thus doing less work. But, what if we need that format? Using pborman/uuid would require code like this:

  func withBraceParse(s string) pborman.UUID {
      if s[0] == '{' {
          s = s[1 : len(s)-1]
      }
      return pborman.Parse(s)
  }

Which, when benchmarked with this code is still much faster than satori/go.uuid for this operation:

  BenchmarkSatoriFromBraceString-8        10000000           135 ns/op
  BenchmarkPbormanFromBraceString-8       20000000            70.2 ns/op

So the need to support that format isn’t a selling point for satori/go.uuid.

In conclusion, for easy of use, start with satori/go.uuid. However if you want more performance or customization, pborman/uuid seems to be the way to go.

Shortcomings of UUID.

While UUID usage is widespread, it isn’t without shortcomings:

Sections are hard to parse for a human and are UUID version dependent. If we can’t directly interpret each section, the dashes on the string representation add no value
If we only care about an unique variant/version, we wouldn’t need to encode the rest
Having to support more variants and versions, the parsing and generation is slower than it could be
Version 4, being fully random, produces fragmentation in many data structures, and aren’t sortable in a meaningful way
Version 4 are slow to generate, requiring 122 bits of good quality entropy
It’s just not the most efficient way to encode 128 bits into a string. For example, base64 without padding would require 22 characters, not 36

UUID Alternatives

Tackling UUID shortcomings, while still being simple and random-generated UIDs, the most popular alternative is perhaps ULID, and a discussion of the format with comparison of Golang libraries implementing forms part three.

Hybrid sequential + random UID systems can offer even better performance by reducing the amount of entropy needed for each single UID, at the cost of being more complex to setup and operate. A popular system using this schema is Twitter’s Snowflake. We’ll be discussing an Snowflake alternative, noeqd, and it’s Golang library, in part 4.

References

UUID Wikipedia article
RFC 4122 UUID RFC document
Managing Syscall Overhead with crypto/rand is an excellent blog post on how to get better performance from crypto/rand
The hidden dangers of default rand another excellent post, this time about math/rand

AntonioMO