ID - Sequential Guid (COMB) Vs. Int Identity, using Entity Framework

Introduction

Recently i had to choose a type for an Id in a SQL server Database. We are using Entity framework to access the database. I had to choose between Guid an Int. I seems to me that Guid must be a slow solution but in a real world scenario, this was not true (see results later in this blog). First an (optional) small introduction about some terms used here. Then the testresult , folowwing by the C# implementation of a third option, called Sequential Guid.

Guid

Is a 16 byte random number (for example c68199c2-f8e3-42bf-83f7-479bd33c94e6). It is the Micrsoft implementation of a UUDI. Think of all the possible numbers as unique tickets of a lottery. You need to sell 200.000.000.000 tickets every second from the big bang unitil now (13.700.000.000 years) to sell all the unique tickets (numbers). If you use that random chosen number as an Id in a database, the chance of a duplicate is virtualy zero. The probability of one duplicate would be about 50% if every person on earth owned 600.000.000 Guids.

UniqueIdentifier

A Guid is called a uniqueidentifier in the SQL server database. The last 6 bytes of it are the most significant bytes when the records are indexed / sorted.

Sequential Guid / COMB

Is a Guid but the most significant bytes are created sequentially. It is also called COMB (combination of Guid and sequence).

SqlGuid

A .NET framework object that sorts Guids the same way as the uniqueidentifier in the SQL server database.

Advantages of a Guid

Using a Guid as Id instead of a int wit a autonummer / database-sequence / Identity has a lot of (functional) advantages:

  • Object orientation: An object can creates it own Id. That supports one of the most important object orientation principles: encapsulation.
  • Safer: It is very hard to guess an Id.
  • Faster: Getting an Id using a connection to the database or even a webservice call to a webservice that calls to a database, is much slower than creating a Guid with Guid.NewGuid().
  • Less errors: Less code means less errors. Creating a guid uses less code to get the new Id-value in your object.
  • No connection needed. If your app in offline, the app can still add new record locally and safe them later in the database.
  • Synchronizing easier: synchronize data from multiple sources is much easier.
  • Merging tables easier: This is much easier as the Guid is always unique.

Advantages of Int

Using an int also has advantages:

  • Testing and debugging: Working with 22 is easier than fbc32944-d573-4d67-bed2-565cbe6d5f70, even when using test-guids like 00000000-0000-4000-0000-000000000022.
  • Uses less memory: an int uses less space on disk or in memory. Note that an Int in memory on a 64 bits computer takes 64 bits and a guid 128. Not only the fields but also the indexes takes more space.
  • Better support: an Int is supported by all databases. However the implementation of the autonumbering is different.

How fast (or slow) is a Guid?

Searching the internet, I found many statements that Guid is much slower, in particular when inserting a new record in the database. The Id of a table is the clustered index bij default (the fysical order of the records) so you can image that inserting record random in an existing list less efficent than adding records at the end.

Now we know that adding at the end is easier for the database. We can aldo use a Sequential Guid. In this case I created a Sequential Guid with 10 random bytes and 6 bytes created sequentialy, running out of sequential values in almost 90 years (or 281.474.976.710.656 values).

I created 3.000.000 records in a database on the same computer as the C# application that uses the database. So you get an idea of the total costst. Every insert uses a new DbContext. I tested 5 scenario's (SQL create statements here):

  • Int not declared with Identity(1,1) / autonumber. This is not a real world scenario, but it shows the extra time that it takes to use Identity in an other Scenario.
  • Sequential Guid
  • Int declared with Identity(1,1) / autonumber. I didn't used a the Sequence introduced in SqlServer 2012 because it is likely to be slower.
  • Regular Guid with clusted index.
  • Regular Guid with non clustered index and an extra field with a Clustered Index declared with Identity(1,1).

Here are the results of 3.000.000 inserts:

  Int (Not unique)
Sequential guid Int, identity Guid clustered Guid, non clustered 
500.000 0:45:01 00:49:14 00:51:31 00:49:36 00:57:48
1.000.000 01:30:54 01:38:48 01:43:23 01:45:37 01:58:14
1.500.000 02:15:30 02:27:19 02:36:34 02:42:43 03:00:08
2.000.000 03:00:22 03:15:05 03:28:42 03:39:59 04:03:10
2.500.000 03:45:20 04:02:40 04:21:53 04:36:28 05:06:09
3.000.000 04:31:29 04:51:06 05:14:36 05:32:05 06:11:19
Absolute   00:19:37 00:43:07 01:00:36 01:39:50
Percentage   7,23% 14,81% 19,26% 30,06%

Result

The sequential Guid is actually faster than the Int, declared with Identity!

The Sequential Guid Gernerator implementation

The sequential Guid is essentially a Guid with the last 6 bytes replaced by a number that corresponds to the current date and time. I saw some implemtations on the internet that created the 6 bytes this way:

Date Time Value
Januari 1, 0001 00:00:00 Hex: 00000
December 31, 9999 23:59:59 Hex: FFFFFF

This makes no sense as the application wil never run 2000 years ago and it is also unlikely that it runs for almost 8000 years. That is why I used this:

Date Time Value
Oktober 15, 2011 00:00:00 Hex: 00000
Januari 1, 2100 00:00:00 Hex: FFFFFF

This are the default values but you can use other values (when calling the constructor). In this case, the counter is incremented 3 times per millisecond.

Some examples of using the SequentialGuid:


      // Based on DateTime.Now
      Guid sqlGuid1 = SequentialGuid.NewGuid();// The easy way suitable for 99% of the usages
      Guid sqlGuid2 = new SequentialGuid().GetGuid();
       
      // Use an other range 
      SequentialGuid sqlGuidCenturyObject = new SequentialGuid(new DateTime(2013, 10, 30), new DateTime(2113, 10, 30));
      // 25 Guids from the object above
      IEnumerable<Guid> guids = Enumerable.Range(1, 25).Select(o => sqlGuidCenturyObject.GetGuid());
       
      // Based on an other date within the range. Used in unit tests of the SequentialGuid class
      Guid sqlGuid3 = new SequentialGuid().GetGuid(new DateTime(2015, 1, 1));
      Guid sqlGuid4 = new SequentialGuid().GetGuid(DateTime.Now.Ticks + 1000);
      

Source code of SequentialGuid

Note: A newer version of the Source code can be found here.

This is the source code used for this post:


      public class SequentialGuid
      {
       
          public DateTime SequenceStartDate { get; private set; }
          public DateTime SequenceEndDate { get; private set; }
       
          private const int NumberOfBytes = 6;
          private const int PermutationsOfAByte = 256;
          private readonly long _maximumPermutations = (long)Math.Pow(PermutationsOfAByte, NumberOfBytes);
          private long _lastSequence;
       
          public SequentialGuid(DateTime sequenceStartDate, DateTime sequenceEndDate)
          {
              SequenceStartDate = sequenceStartDate;
              SequenceEndDate = sequenceEndDate;
          }
       
          public SequentialGuid()
              : this(new DateTime(2011, 10, 15), new DateTime(2100, 1, 1))
          {
          }
       
          private static readonly Lazy<SequentialGuid> InstanceField = new Lazy<SequentialGuid>(() => new SequentialGuid());
          internal static SequentialGuid Instance
          {
              get
              {
                  return InstanceField.Value;
              }
          }
       
          public static Guid NewGuid()
          {
              return Instance.GetGuid();
          }
       
          public TimeSpan TimePerSequence
          {
              get
              {
                  var ticksPerSequence = TotalPeriod.Ticks / _maximumPermutations;
                  var result = new TimeSpan(ticksPerSequence);
                  return result;
              }
          }
       
          public TimeSpan TotalPeriod
          {
              get
              {
                  var result = SequenceEndDate - SequenceStartDate;
                  return result;
              }
          }
       
          private long GetCurrentSequence(DateTime value)
          {
              var ticksUntilNow = value.Ticks - SequenceStartDate.Ticks;
              var result = ((decimal)ticksUntilNow / TotalPeriod.Ticks * _maximumPermutations);
              return (long)result;
          }
       
          public Guid GetGuid()
          {
              return GetGuid(DateTime.Now);
          }
       
          private readonly object _synchronizationObject = new object();
          internal Guid GetGuid(DateTime now)
          {
              if (now < SequenceStartDate || now >= SequenceEndDate)
              {
                  return Guid.NewGuid(); // Outside the range, use regular Guid
              }
       
              var sequence = GetCurrentSequence(now);
              return GetGuid(sequence);
          }
       
          internal Guid GetGuid(long sequence)
          {
              lock (_synchronizationObject)
              {
                  if (sequence <= _lastSequence)
                  {
                      // Prevent double sequence on same server
                      sequence = _lastSequence + 1;
                  }
                  _lastSequence = sequence;
              }
       
              var sequenceBytes = GetSequenceBytes(sequence);
              var guidBytes = GetGuidBytes();
              var totalBytes = guidBytes.Concat(sequenceBytes).ToArray();
              var result = new Guid(totalBytes);
              return result;
          }
       
          private IEnumerable<byte> GetSequenceBytes(long sequence)
          {
              var sequenceBytes = BitConverter.GetBytes(sequence);
              var sequenceBytesLongEnough = sequenceBytes.Concat(new byte[NumberOfBytes]);
              var result = sequenceBytesLongEnough.Take(NumberOfBytes).Reverse();
              return result;
          }
       
          private IEnumerable<byte> GetGuidBytes()
          {
              var result = Guid.NewGuid().ToByteArray().Take(10).ToArray();
              return result;
          }
      }
      

Note: In the real implementation you better don't use DateTime.Now but an other version that is easy to mock when unittesting. The production code that I use, goes out of scope for this blog.

Leave a Comment

Comment

Comments

C# CSharp Blog Comment

Inzamelactie Boksen 23-04-2024 / Reply

Wat een geweldige inzamelactie voor het boksen! Ik ben onder de indruk van de solidariteit en steun binnen de gemeenschap.