Kademlia - A ba dum tss
This evening has been a good one. A welcome break if I may, from a few weeks without enough to feed the brain. Of course, a minor detour existed with refreshing logarithms and some trigonometry.
Cutting back, stumbled upon some excellent content on Distributed Hash Tables almost by accident. The Kademlia DHT is just about old enough for its first job post a bachelors.
The genius of the trick lies in the binary of XOR for distances and a sort of binary tree space for retaining information about neighbouring nodes. Considering Kademlia focuses on distributed hash tables, one of the first concerns is about calling back and identifying nodes prior to fetching data.
This is coded by Kademlia in what it calls k-buckets; the pseudo binary tree.
Each node stores information about nodes at a distance of 2^0 to 2^n from it.
For a DHT of bitsize 4, this would be 0 -16.
Considering XOR distance, this would be the node, and 15 nodes on either end.
The XOR is a fabulous choice; almost a hare from a magician’s hat so to speak. Given it plays well with a binary tree space, the structure Kademlia uses at its core, it becomes a natural choice. Not to mention, the symmetry since it operates on individual bits. This reduces the need to store distances on either side; smart. And yes, take into the fact that XOR is a native instruction set in all ISAs, you have a super efficient operation.
Pushing ahead, to the binary tree. One might wonder at the near panacea nature of binary trees but after every step, it gives more. Here, the handiness lies in the usual O(logn) nature of binary tree like structures. Looking at routing tables for each node, Kademlia suggests skipping of fairly farther nodes. This makes the search space easy to operate, however counterintuitive it sounds.
Take an example of Node1 looking for Value15 in a DHT of bitsize 4. Value15 is possibly closer to Node15 by an assumption. Here, it is almost a wasteful exercise to hold on to dial codes to every single node in the 2^n range. Rather, the nifty choice becomes a connect to say, Node7 that’s at the farthest held by the node in the bucket. Node7, by virtue of its closeness, can return information about the node that has the value (possibly), can direct Node1 to hop to the node that (possibly) contains the data.
While it does look costly an operation repeating hops, considering the binary tree like structure of how the buckets are organised, for a million nodes, the number of hops is just 6. Amazing, ain’t it!
Browsing through some literature around, I got a couple of thoughts about things gone by. About four years back, I had a discussion regarding building a hashtable using a (doubly) linked list as part of an interview. Where I fumbled most was when the question moved towards a DHT. How I wish I knew of Kademlia earlier! Does reiterate the need for me to get back to studying academic work like I once used to while still in academia.
Second interesting ponder was about our good old work while at crisp; vectorised distances we worked with partially and some learnings from this dimension. Perhaps, something for another life.