I kind of started working on the word indexing engine for BlogMatcher in C++, but I'm starting to see that that may not be such a great idea. I thought of using C++ because I wanted to use the STL Map and a heap (implemented using a STL Vector) but somehow I have a feeling that it's going to be slow as all hell. Part of it is the fact that vectors grow exponentially (the way most of them are implemented) which means there's a alot of memory allocation going on, not to mention all the other overhead. On top of that, just some very minal code with the heap class compiles to a 40KB executable. In comparison, my entire search program is only 17KB.
Yeah. So the bottom line is, if you want ease of use (and OO), go with C++. If you want raw performance, C probably still kicks ass. That makes me wonder... Why does Google use C++?
UPDATE: On second thought, performance really isn't an issue for the indexer... Duh. That's what happens when you watch TV. I'll stop saying stupid things like this when our cable subscription runs out and the two TVs move out along with the.....
UPDATE2: Dirvish asks:Doesn't performance for the indexer still matter for Google?
That's the weird thing... Google's job descriptions mention C++ and Python, but C is suspiciously omitted. You'd think they'd do everything in C, but when you have tens of thousands of high-performance servers, maybe it doesn't matter too much. On the other hand, C++ is a super-set of C, so it could be that they implicitly include C (even though they're really different beasts).
Posted Wed, April 23, 2003 09:52 by dirvish
Doesn't performance for the indexer still matter for Google? Don't they run their indexer non-stop? If it was faster they could index more often.
[moderate]