Recently i had to choose a type for an Id in a SQL server Database. We are using Entity framework to access the database. I had to choose between Guid an Int. I seems to me that Guid must be a slow solution but in a real world scenario, this was not true (see results later in this blog). First an (optional) small introduction about some terms used here. Then the testresult , folowwing by the C# implementation of a third option, called Sequential Guid.
Is a 16 byte random number (for example c68199c2-f8e3-42bf-83f7-479bd33c94e6). It is the Micrsoft implementation of a UUDI. Think of all the possible numbers as unique tickets of a lottery. You need to sell 200.000.000.000 tickets every second from the big bang unitil now (13.700.000.000 years) to sell all the unique tickets (numbers). If you use that random chosen number as an Id in a database, the chance of a duplicate is virtualy zero. The probability of one duplicate would be about 50% if every person on earth owned 600.000.000 Guids.
A Guid is called a uniqueidentifier in the SQL server database. The last 6 bytes of it are the most significant bytes when the records are indexed / sorted.
Is a Guid but the most significant bytes are created sequentially. It is also called COMB (combination of Guid and sequence).
A .NET framework object that sorts Guids the same way as the uniqueidentifier in the SQL server database.
Using a Guid as Id instead of a int wit a autonummer / database-sequence / Identity has a lot of (functional) advantages:
Using an int also has advantages:
Searching the internet, I found many statements that Guid is much slower, in particular when inserting a new record in the database. The Id of a table is the clustered index bij default (the fysical order of the records) so you can image that inserting record random in an existing list less efficent than adding records at the end.
Now we know that adding at the end is easier for the database. We can aldo use a Sequential Guid. In this case I created a Sequential Guid with 10 random bytes and 6 bytes created sequentialy, running out of sequential values in almost 90 years (or 281.474.976.710.656 values).
I created 3.000.000 records in a database on the same computer as the C# application that uses the database. So you get an idea of the total costst. Every insert uses a new DbContext. I tested 5 scenario's (SQL create statements here):
Here are the results of 3.000.000 inserts:
Int (Not unique) |
Sequential guid | Int, identity | Guid clustered | Guid, non clustered | |
500.000 | 0:45:01 | 00:49:14 | 00:51:31 | 00:49:36 | 00:57:48 |
1.000.000 | 01:30:54 | 01:38:48 | 01:43:23 | 01:45:37 | 01:58:14 |
1.500.000 | 02:15:30 | 02:27:19 | 02:36:34 | 02:42:43 | 03:00:08 |
2.000.000 | 03:00:22 | 03:15:05 | 03:28:42 | 03:39:59 | 04:03:10 |
2.500.000 | 03:45:20 | 04:02:40 | 04:21:53 | 04:36:28 | 05:06:09 |
3.000.000 | 04:31:29 | 04:51:06 | 05:14:36 | 05:32:05 | 06:11:19 |
Absolute | 00:19:37 | 00:43:07 | 01:00:36 | 01:39:50 | |
Percentage | 7,23% | 14,81% | 19,26% | 30,06% |
The sequential Guid is actually faster than the Int, declared with Identity!
The sequential Guid is essentially a Guid with the last 6 bytes replaced by a number that corresponds to the current date and time. I saw some implemtations on the internet that created the 6 bytes this way:
Date | Time | Value |
Januari 1, 0001 | 00:00:00 | Hex: 00000 |
December 31, 9999 | 23:59:59 | Hex: FFFFFF |
This makes no sense as the application wil never run 2000 years ago and it is also unlikely that it runs for almost 8000 years. That is why I used this:
Date | Time | Value |
Oktober 15, 2011 | 00:00:00 | Hex: 00000 |
Januari 1, 2100 | 00:00:00 | Hex: FFFFFF |
This are the default values but you can use other values (when calling the constructor). In this case, the counter is incremented 3 times per millisecond.
Some examples of using the SequentialGuid:
// Based on DateTime.Now
Guid sqlGuid1 = SequentialGuid.NewGuid();// The easy way suitable for 99% of the usages
Guid sqlGuid2 = new SequentialGuid().GetGuid();
// Use an other range
SequentialGuid sqlGuidCenturyObject = new SequentialGuid(new DateTime(2013, 10, 30), new DateTime(2113, 10, 30));
// 25 Guids from the object above
IEnumerable<Guid> guids = Enumerable.Range(1, 25).Select(o => sqlGuidCenturyObject.GetGuid());
// Based on an other date within the range. Used in unit tests of the SequentialGuid class
Guid sqlGuid3 = new SequentialGuid().GetGuid(new DateTime(2015, 1, 1));
Guid sqlGuid4 = new SequentialGuid().GetGuid(DateTime.Now.Ticks + 1000);
Note: A newer version of the Source code can be found here.
This is the source code used for this post:
public class SequentialGuid
{
public DateTime SequenceStartDate { get; private set; }
public DateTime SequenceEndDate { get; private set; }
private const int NumberOfBytes = 6;
private const int PermutationsOfAByte = 256;
private readonly long _maximumPermutations = (long)Math.Pow(PermutationsOfAByte, NumberOfBytes);
private long _lastSequence;
public SequentialGuid(DateTime sequenceStartDate, DateTime sequenceEndDate)
{
SequenceStartDate = sequenceStartDate;
SequenceEndDate = sequenceEndDate;
}
public SequentialGuid()
: this(new DateTime(2011, 10, 15), new DateTime(2100, 1, 1))
{
}
private static readonly Lazy<SequentialGuid> InstanceField = new Lazy<SequentialGuid>(() => new SequentialGuid());
internal static SequentialGuid Instance
{
get
{
return InstanceField.Value;
}
}
public static Guid NewGuid()
{
return Instance.GetGuid();
}
public TimeSpan TimePerSequence
{
get
{
var ticksPerSequence = TotalPeriod.Ticks / _maximumPermutations;
var result = new TimeSpan(ticksPerSequence);
return result;
}
}
public TimeSpan TotalPeriod
{
get
{
var result = SequenceEndDate - SequenceStartDate;
return result;
}
}
private long GetCurrentSequence(DateTime value)
{
var ticksUntilNow = value.Ticks - SequenceStartDate.Ticks;
var result = ((decimal)ticksUntilNow / TotalPeriod.Ticks * _maximumPermutations);
return (long)result;
}
public Guid GetGuid()
{
return GetGuid(DateTime.Now);
}
private readonly object _synchronizationObject = new object();
internal Guid GetGuid(DateTime now)
{
if (now < SequenceStartDate || now >= SequenceEndDate)
{
return Guid.NewGuid(); // Outside the range, use regular Guid
}
var sequence = GetCurrentSequence(now);
return GetGuid(sequence);
}
internal Guid GetGuid(long sequence)
{
lock (_synchronizationObject)
{
if (sequence <= _lastSequence)
{
// Prevent double sequence on same server
sequence = _lastSequence + 1;
}
_lastSequence = sequence;
}
var sequenceBytes = GetSequenceBytes(sequence);
var guidBytes = GetGuidBytes();
var totalBytes = guidBytes.Concat(sequenceBytes).ToArray();
var result = new Guid(totalBytes);
return result;
}
private IEnumerable<byte> GetSequenceBytes(long sequence)
{
var sequenceBytes = BitConverter.GetBytes(sequence);
var sequenceBytesLongEnough = sequenceBytes.Concat(new byte[NumberOfBytes]);
var result = sequenceBytesLongEnough.Take(NumberOfBytes).Reverse();
return result;
}
private IEnumerable<byte> GetGuidBytes()
{
var result = Guid.NewGuid().ToByteArray().Take(10).ToArray();
return result;
}
}
Note: In the real implementation you better don't use DateTime.Now but an other version that is easy to mock when unittesting. The production code that I use, goes out of scope for this blog.
Wat een geweldige inzamelactie voor het boksen! Ik ben onder de indruk van de solidariteit en steun binnen de gemeenschap.