[Dcmlib] std::map vs std::hash_map

Mathieu Malaterre mathieu.malaterre at kitware.com
Wed Jul 6 17:13:35 CEST 2005


> I didn't catch exactly what you did with the union.
> 
> I suppose the uint32_t tagkey is the 'concatenation' of Dicom Group 
> number and Dicom Element number.
> What is uint16_t tab used for ?
> 
> In the original version, we did something stupid : transform the tag 
> (e.g 0x0018,0x0050) into a TagKey ( e.g "0018|0050")
> We can save time removing the transformation tag -> TagKey.
> But we don't know (maybe Sean  knows) the way the hash function should act.
> (in the very late 70's -there was no stl ...-  we splitted the 'key' 
> into 4 bytes pieces, Xor'ed the pieces to get a 'random like' int32, and 
> divide it by the number of blocks, to get the target block number, and 
> perform a chain when collision occurs)
> 
> I think (I'm not sure), nowadays Htable methods that use 'tries' look at 
> the first bits of the number to know whether they have to split the 
> target block or not, right ?
> 
> The first uint16 of the tag has no more than 100 distinct values (65 in 
> the Dicom V3 directory, + a few 'shadow groups'),
> The second uint16 of the tag has more than 1250 distinct values, the 
> Dicom V3 directory.
> Maybe we could have something more efficent if we describe the key as 
> 'Element number Group number' instead of 'Group number Element number '.
> Or maybe my idea is meaningless :-(


In fact even when properly coding meaning always passing by const 
reference and return reference... There are always cases where string is 
just pain (comparisons in particular). Which is the case in the hash 
table, we need to quickly compare two TagKey. I first thought of simply 
replacing TagKey by uint16_t t[2]. But then when you return this from a 
function it becomes a pointer, so you have to deals with NULL pointer -> 
this is a pain.

The other solutions I thought of where: std::pair<uint16_t, uint16_t>, 
struct { uint16_t[2] }; But I found the union the easiest/fastest to 
implement because the notion of order is implicit. And on a 32bits 
machine comparing two ints should be extremely fast. Therefore we read 
in two uint16_t, but the hash table (just like most of my local gdcm) 
only 'SEE' one uint32_t...

Therefore the hash is really immediate (remember the improvement was 23%!).

The only problem I have to deals with now is the representation of a 
TagKey within a sequence.


Does this make sense now, JP ?

> Ten more days to wait for the Bastille Day.
> Mathieu, don't forget to bring some fire crakers, at Kitware's.

I bought a few M-80 from another state ;-)

Mathieu



More information about the Dcmlib mailing list