Search v2 works! Well, sort of. The core search mechanism works. I now have to turn it into a server daemon, but that's not much work.
So, here's how it works. There are 4 main parts to the indexing mechanism.
- Links dictionary - This is currently a b-tree that contains link URLs as the key, as well as link IDs. There's also a reverse lookup table (which is a vector).
- Blogs dictionary - This is currently a map (but would probably be a b-tree at the end) with blog URL as the key, and id as the data. There's also a reverse lookup table
- Link<->Blog Graph - Is a map, with link id as key, and a vector of blog id's in the data field.
- Blog<->Link Graph - Is also a map, with blog id as key and contains a vector of link IDs.
Generating the above 4 takes about 2 seconds right now, and searches take ...drum roll... around 0.03 seconds! You read that right. Not 0.3, but 0.03, as in 6% of my goal of 0.5 seconds! Of course, there will be some overhead when I turn it into a server daemon, but I actually might be able to keep it under 0.1 second.
And the really nifty thing is that search times will no longer grow in proportion to the number of blogs indexed. In fact, search times will now depend more on the number of links a site has than the number of blogs in the index.