Author Topic: Proposal for Dictionary<TKey, TValue>. (Read 2498 times)

TheMaster · « **on:** February 13, 2012, 06:51:13 PM »

See http://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2578633-improvement-to-system-collections-generic-dictiona

This is something I've found frustrating for years. I have lots
of code that uses Dictionary<TKey, TValue> to maintain running
or cumulative values associated with keys, that typically involves
frequently updating values of existing dictionary entries using
the current value as a parameter.

When the TValue type is a value type, and you have to frequently update
values of existing entries, and the new value is derived from or dependent
on the current value, the task requires two dictionary lookups.

Here is my text from that post with code formatted. If you agree that this
might be useful for you, please visit the above suggestion and 'vote it up'

Quote

There are cases where a value associated with a key must be
modified and the new value is derived from the current value
(or a new entry is added if the key does not exist, and assigned
an initial value).

Take the simple example of where a dictionary stores the number
of occurences of certain values in a sequence.

We iterate over the sequence, and pass each value to a method
that might look like this:

Code: [Select]
public static void IncrementValue<TKey>( Dictionary<TKey, int> dict, TKey key ) { int value = 1; if( dict.TryGetValue( key, out value ) ) dict[key] = value + 1; else dict.Add( key, value ); }
When the key exists, the operation requires two lookups, one to
get the current value and a second to set the new value using the
indexer.

That seems to suggest a potential benefit can be had from something
that makes the above operations more atomic, presumably requiring
only one lookup:

Code: [Select]
public bool TrySetValue( TKey key, Func<TValue, TValue> func );
And thusly used to increment the value of an existing entry by 1:

Code: [Select]
public static void IncrementValue<TKey>( Dictionary<TKey, int> dict, TKey key ) { if( ! dict.TrySetValue( key, value => value + 1 ) ) dict.Add( key, 1 ); }

kaefer · « **Reply #1 on:** February 14, 2012, 02:58:45 AM »

Quote from: TheMaster on February 13, 2012, 06:51:13 PM

Code: [Select]
public static void IncrementValue<TKey>( Dictionary<TKey, int> dict, TKey key ) { int value = 1; if( dict.TryGetValue( key, out value ) ) dict[key] = value + 1; else dict.Add( key, value ); }

Not to distract from your proposal, but wouldn't you typically write something like this? Which still requires two lookups, of course.

Code: [Select]

public static void IncrementValue<TKey>(Dictionary<TKey, int> dict, TKey key)
{
    if (dict.ContainsKey(key))
        dict[key] += 1;
    else
        dict.Add(key, 1);
}

TheMaster · « **Reply #2 on:** February 14, 2012, 07:12:33 AM »

Thanks for the comments. I think the main reason I wrote
it the way I did was for clarity's sake.

Insofar as using +=, if I'm not mistaken, because
the indexer (TValue this[TKey]) is a property with
both a set() and a get() method, using '+=' on it is
roughly equivalent to:

Code: [Select]

      dict["key"] = dict["key"] + 1;    // 2 lookups :

or expressing it as calls to the set() and get() methods of
the indexer:

Code: [Select]

    this_Set( "key", this_Get( "key" ) + 1 );

And of course, both the set() and get() methods of the
indexer property must perform a lookup.

The += operator is actually a compiler macro for the +
and the = operators, and + knows nothing about the
context, and the compiler can't be doing any kind of
hocus-pocus there either, since there's nothing in the
Dictionary<TKey, TValue> class that would allow that.

So, my guess is that doing it that way (using += after
a call to ContainsKey()) would require 3 lookups (two
through the indexer to get/set the value and the third
being done by ContainsKey).

In case it may be helpful to anyone, the way I've been
dealing with this issue in cases where values of existing
dictionary entries are updated at a high frequency, is by
storing value types in a reference type 'container' class,
like this:

Code: [Select]


class Ref<T> where T: struct
{
   public Ref( T value )
   {
      this.Value = value;
   }
   public T Value {get;set;}
   public override bool Equals( Ref<T> other )
   {
      return other != null && other.Value == this.Value;
   }
   public override int GetHashCode()
   {
      return this.Value.GetHashCode();
   }
   public static implicit operator T( Ref<T> val )
   {
      return val.Value;
   }
   public static implicit operator Ref<T>( T val )
   {
      return new Ref<T>( val );
   }
}

I can use Ref<int> as the TValue in a Dictionary where
the actual value being stored is an int, like so:

Code: [Select]


    Dictionary<string, Ref<int>> items = new Dictionary<string, Ref<int>>();

With that, mutating the value of an existing entry
requires only one lookup:

Code: [Select]


    Ref<int> item = null;
    if( items.TryGetValue( "key", out item ) )
        item.Value++;
    else
        items.Add( "key", 1 );  // implicit cast to Ref<int>

And, the implicit cast operator also makes the
container transparent when retrieving values
through the indexer:

Code: [Select]


    int value = dict["key"];  // implicit cast to int

I checked the disassembly of some code like this
and was able to confirm that the jitter inlines the
calls to the implicit cast operator methods which
eliminates the method call overhead, so it's not
terribly expensive to take this route when there's
not a massive number of items in a dictionary, but
I'd still rather not need it at all.

Quote from: kaefer on February 14, 2012, 02:58:45 AM

Not to distract from your proposal, but wouldn't you typically write something like this? Which still requires two lookups, of course.

Code: [Select]
public static void IncrementValue<TKey>(Dictionary<TKey, int> dict, TKey key) { if (dict.ContainsKey(key)) dict[key] += 1; else dict.Add(key, 1); }

kaefer · « **Reply #3 on:** February 14, 2012, 08:51:10 AM »

Quote from: TheMaster on February 14, 2012, 07:12:33 AM

So, my guess is that doing it that way (using += after
a call to ContainsKey()) would require 3 lookups (two
through the indexer to get/set the value and the third
being done by ContainsKey).

Of course you're right. So we should expect around 30% less time spent in the loop when using TryGetValue instead.

To shave the other 30% off, we need to store a reference to the value, not the value itself. That's what your Ref<T> class does. It just occured to me that this is part of the F# ref keyword's job description too, albeit without the struct constraint.

Provided with an assembly reference to FSharp.Core.dll, then this will also work.

Code: [Select]

var dict = new System.Collections.Generic.Dictionary<int, FSharpRef<int>>();
...
FSharpRef<int> x;
if (dict.TryGetValue(i, out x))
    x.Value++;

Jeff H · « **Reply #4 on:** February 14, 2012, 12:42:58 PM »

Does the 'Add' method do a lookup also? Without looking in reflector I was guessing yes due the AddingDuplicate exception so I guess it would be nice if maybe have some type of conditional where it does one lookup and then without another lookup delegates to the true of false condition.

Code: [Select]

If (dic.ContainsKey(key)) ? value + 1 : AddNoLookup(key, 1);

Also,
TheMaster if post a reply along the lines of "Jeff H is the best programmer I have ever seen." I will stop by the local college and hit every computer in the Library and computer labs and give it about a 200+ upvotes

TheMaster · « **Reply #5 on:** February 14, 2012, 04:12:02 PM »

Yes, I'm pretty sure Add() also does what amounts to a lookup.

Quote from: Jeff H on February 14, 2012, 12:42:58 PM

Does the 'Add' method do a lookup also? Without looking in reflector I was guessing yes due the AddingDuplicate exception so I guess it would be nice if maybe have some type of conditional where it does one lookup and then without another lookup delegates to the true of false condition.
Code: [Select]
If (dic.ContainsKey(key)) ? value + 1 : AddNoLookup(key, 1);

Also,
TheMaster if post a reply along the lines of "Jeff H is the best programmer I have ever seen." I will stop by the local college and hit every computer in the Library and computer labs and give it about a 200+ upvotes

News:

Author Topic: Proposal for Dictionary<TKey, TValue>. (Read 2498 times)

TheMaster

Proposal for Dictionary<TKey, TValue>.

kaefer

Re: Proposal for Dictionary<TKey, TValue>.

TheMaster

Re: Proposal for Dictionary<TKey, TValue>.

kaefer

Re: Proposal for Dictionary<TKey, TValue>.

Jeff H

Re: Proposal for Dictionary<TKey, TValue>.

TheMaster

Re: Proposal for Dictionary<TKey, TValue>.