So, I have a project with Code Analysis turned on and it treats warnings as errors. In this project, I have a class that overrides Equals() so that I can use some of the nice collection based linq expression (like Except()). Well, code analysis doesn’t like it if you override Equals, but forget to override GetHashCode().
Needless to say, I have to override GetHashCode(). But how does one do this? Some searching turned up a bunch of interesting resources showing how to do this. However, there isn’t some cut and paste code that solves the problem. You have to really understand the implications and usage of this method. For that purpose, I recommend: MSDN’s Object.GetHashCode Method documentation. I also found some interesting discussions on StackOverflow regarding the issue.
In my case, I have an object that I want to be “Equal” to another if three internal values are the same. This is where things get interesting. GetHashCode is just an integer. As such, how to I get it to return a value that is guaranteed not to have the same value as another object with a different set of 3 properties?
For Example, if I just add the numbers, then I could have a problem like this:
| ||Object A ||Object B |
|Field 1 Hash ||1 ||1 |
|Field 2 Hash ||5 ||100 |
|Field 3 Hash ||100 ||5 |
|Sum of Hashes ||106 ||106 |
The above sums are equal, but the objects are different!
So, a number of implementations use seeds and factors to adjust the numbers and reduce the collisions. One such example (from the above stack overflow article) does so like this:
int hash = 13;
hash = (hash * 7) + field1.GetHashCode();
hash = (hash * 7) + field2.GethashCode();
hash = (hash * 7) + field3.GethashCode();
I’m not sure I fully understand this yet, but I guess I can see how this would improve the chance that things are not going to collide. In the above scenario, I would end up with something like the following:
|Field 1 Hash
|Field 2 Hash
|Field 3 Hash
|Sum of Hashes wtih Seed and Factor
||91 + 1 + 644 + 5 + 4543 + 100
|91 + 1 + 644 + 100 + 5208 + 5
So this seems better, but we’re still not guaranteed that these are unique (at least as far as my feeble mind can comprehend).
Microsoft’s MSDN documentation uses a bitwise XOR to combine hash codes from multiple contained fields. This too seems a bit risky, but I haven’t explored how it works in detail.
So all you intelligent people out there, please tell me what I’m missing. Is there some “algorithm” or factor + seed magic that does guarantee unique hash codes?