Simple Q/A

How do you implement your search functionality?

We first read in all the data (in our case, all the posts), and build a postObj for each entry. Then, we create an index using B+Tree, with the key to be the username or tag, while the value is the PostObj. We wrap all those functions into our store engine. Then we build a query engine to implement the actual query logic with the help of a tokenizer and parser. The logic is blablabla. So the user could have queries like e.g., #NoTag & @User.

How do you realise your B+Tree, what is the difference to BTree?

B+Tree is a special version of BTree with some modifications. There are two major differences. First, compared with BTree, BTree has distinct LeafNode and TreeNode(Non-leaf node), Only Treenode has children, while Leafnode doesn't, But the leaf node has connections in between. So, we implement it as prev and next. Second, for the split, the split-leaf node does not go up, still has a position in the left leaf node. It stays at the bottom. This enables faster range search and storing multiple data with the same key.

What is the key of you B+Tree, and what is the value?

key is user or tag, value is the PostObj

Why do you design you indexPostObj in such way?

Because we have a user tree and tag tree, both user tree and tag tree have the same index type which is string so we only need one indexpostobj, the compare to method defines how we compare our keys, which in our case, just use default string comparison.

How do you add a value to your B+ Tree?

starts from root, go to left if key is smaller or equal to and go right if key is larger until reach the leaf node, find the correct spot to insert it, if size of the keys after insertion equals to the order, split and go upwards

How do you get a value from your B+ Tree?

starts from root, go to left if key is smaller or equal to and go right if key is larger until reach the leaf node, find the first match, and iteratively go through consecutive nodes until find a different.

Why do you use a b+tree?

we can have faster range search, consecutive memory storage leads to faster read, etc.....

How do you implement your And operator?

take first arg get a set (no duplication) take second arg get another set first set retainall second set