Notes of Introduction to Algorithm(B-Tree)

When I was a college student and took the course Data Structure and Algorithm, the B-Tree section was not included in the exam requirement. At that time the teacher just said little about it and therefore I knew little about it. It is ridiculous that I totally understand B-Tree until last month. So what is the B-Tree? B-Trees are balanced search trees designed to work well on magnetic disks or other direct-access secondary storage devices. B-Trees are similar to red-black trees. but they are better at minimizing disk I/O operations. Many database systems use B-Trees, or variants of B-Trees, to store information.

B-Trees differ from red-black trees in that B-Tree nodes may have many children. B-Trees are similar to red-black trees in that every n-node B-Tree has height O(lg n), although the height of a B-Tree can be considerably less than that of a red-black tree because its branching factor can be much larger. Therefore, B-Trees can also be used to implement many dynamic-set operations in time O(lg n).

The precise definition of B-Trees is as follows:

A B-tree T is a rooted tree (whose root is m_ root [ T ]) having the following properties:

  1. Every node x has the following fields:
    1. n [ x ], the number of keys currently stored in node x ,
    2. the n [ x ] keys themselves, stored in nondecreasing order, so that key 1 [ x ] ≤ key 2 [ x ] ≤ ··· ≤ key n [ x ] [ x ],
    3. leaf [ x ], a boolean value that is TRUE if x is a leaf and FALSE if x is an internal node.
  2. Each internal node x also contains n [ x ]+ 1 pointers c 1 [ x ], c 2 [ x ], …, c n [ x ]+1 [ x ] to its children. Leaf nodes have no children, so their c i fields are undefined.
  3. The keys key i [ x ] separate the ranges of keys stored in each subtree: if k i is any key stored in the subtree with root c i [ x ], then k 1 ≤ key 1 [ x ] ≤ k 2 ≤ key 2 [ x ] ≤··· ≤ key n [ x ] [ x ] ≤ k n [ x ]+1 .
  4. All leaves have the same depth, which is the tree’s height h .
  5. There are lower and upper bounds on the number of keys a node can contain. These bounds can be expressed in terms of a fixed integer t ≥ 2 called the minimum degree of the B-tree:
    1. Every node other than the root must have at least t – 1 keys. Every internal node other than the root thus has at least t children. If the tree is nonempty, the root must have at least one key.
    2. Every node can contain at most 2 t – 1 keys. Therefore, an internal node can have at most 2 t children. We say that a node is full if it contains exactly 2 t – 1 keys.

The simplest B-tree occurs when t = 2. Every internal node then has either 2, 3, or 4 children, and we have a 2-3-4 tree . In practice, however, much larger values of t are typically used.

 

The following is my implement of B-Tree. The template class Element is data type stored inside the B-Tree nodes. I use two STL’s vectors to store the info of the elements and children in each node. One is vector m_data; another is vector m_child .

 1 //the class definition of the element stored in each node;
 2 template<typename T>
 3 class Element
 4 {
 5 public:
 6 	T data;
 7 	bool operator < (Element& e) const { return data < e.data;}
 8 	bool operator <= (Element& e) const { return data <= e.data;}
 9 	bool operator > (Element& e) const { return data > e.data;}
10 	bool operator >= (Element& e) const { return data >= e.data;}
11 	bool operator == (Element& e) const { return data == e.data;}
12 	Element& operator = (const Element& e)
13 	{
14 		data=e.data;
15 		return *this;
16 	}
17 	Element(T& d) : data(d){}
18 };
19
20 //An integer element.
21 typedef Element<int> Elem;
22
23 class BTreeNode
24 {
25 public:
26 	vector<Elem> m_data;
27 	//m_child[i] is the child whose keys are smaller than that of m_data[i];
28 	//and the size of m_child is always one more than m_data
29 	vector<BTreeNode*> m_child;
30 //Here is my B-tree strcture representation
31 //
32 //                     m_data[0]  m_data[1] ... m_data[i] ...
33 //      m_child[0]--> /         |          |              <--m_child[i+1]
34 //                   /          |          |                   
35 //
36 	BTreeNode* m_parent; // pointer to the parent BTreeNode
37 	BTreeNode(BTreeNode* mp=NULL) : m_parent(mp){}
38 };
39 class BTree
40 {
41 	typedef void (*v_func)(Elem &data);//visit function		
42 protected:
43 	int m_order; //maximum number of the elements stored a node
44 	BTreeNode* m_root;//the root of the tree
45 	v_func m_visit;
46 	//return true if the insertion does not need split the current node
47 	bool Normal_insert(Elem &);
48 	//When the number of current node's element is overflow, perform split insertion.
49 	bool Split_insert(Elem &);
50 	//store the result of every search
51 	BTreeNode* search_result;
52 public:
53 	BTree(v_func f, int n, BTreeNode* r=NULL):m_visit(f),m_order(n),m_root(r)
54 	{
55 		search_result = NULL;
56 	};
57 	~BTree();
58 	bool Insertion(Elem&);//Insert a element to the B-tree.
59 	bool Deletion(Elem&);
60 	BTreeNode* Search(Elem&);//Search an Element in B-tree.
61 	void travel(BTreeNode*) const; // travel the Btree
62 	void print() const { travel(m_root); } // print the elements in a sorted order
63 };

Searching a B-tree is much like searching a binary search tree, except that instead of making a binary, or “two-way,” branching decision at each node, we make a multiway branching decision according to the number of the node’s children. More precisely, at each internal B-tree node x, in my code, I perform a binary search.

 1 //Search an Element in B-tree. Return the node pointer which contain the 
 2 //desired element or return NULL if the element isn't found.
 3 BTreeNode* BTree::Search(Elem& t)
 4 {
 5 	BTreeNode* p = m_root;
 6 	while (p)
 7 	{
 8 		//store the current search result.
 9 		search_result = p;
10 		//perform binary search of the node
11 		int first=0, last=p->m_data.size()-1, mid = (first+last)/2;
12 		while (first<=last)
13 		{
14 			mid=(first+last)/2;
15 			if (p->m_data[mid]==t) {
16 				return p;
17 			}
18 			if (p->m_data[mid]>t) {
19 				last=mid-1;
20 			}
21 			if (p->m_data[mid]<t) {
22 				first=mid+1;
23 			}
24 		}
25 		//continue search in the child.
26 		if(p->m_data[mid]>t)
27 			p = p->m_child[mid];
28 		else
29 			p = p->m_child[mid+1];
30 	}
31 	return p;
32 }

As with binary search trees, we search for the leaf position at which to insert the new key. With a B-tree, however, we cannot simply create a new leaf node and insert it, as the resulting tree would fail to be a valid B-tree. Instead, we insert the new key into an existing leaf node. Since we cannot insert a key into a leaf node that is full, we introduce an operation that splits a full node y (having 2t – 1 keys(having m_order elements)) around its median key keyt[y] into two nodes having t – 1 keys each. The median key moves up into y’s parent to identify the dividing point between the two new trees. But if y’s parent is also full, it must be split before the new key can be inserted, and thus this need to split full nodes can propagate all the way up the tree.

  1 bool BTree::Insertion(Elem& t)
  2 {
  3 	if(Search(t))//The element already exits in the tree
  4 		return false;
  5 	if(m_root==NULL)//The tree is empty
  6 	{
  7 		m_root = new BTreeNode;
  8 		m_root->m_data.push_back(t);
  9 		m_root->m_child.push_back(NULL);
 10 		m_root->m_child.push_back(NULL);
 11 		m_root->m_parent = NULL;
 12 		return true;
 13 	}
 14 	else
 15 	{
 16 		if(search_result)
 17 		{
 18 	//If the element can fit into vector m_data, slide all the elements
 19 	//greater than the arguement forward one position and insert the newly
 20 	//vacated slot, then return true. Otherwise performan split.
 21 			vector<Elem> *p = &search_result->m_data;
 22 			vector<BTreeNode*> *pc = &search_result->m_child;
 23 			int i;
 24 			//locate the right position to insert
 25 			for(i=0;i<p->size() && (*p)[i]<t;i++);
 26 			p->insert(p->begin()+i,t);//insert the new Element
 27 			pc->insert(pc->begin()+i,NULL);//insert the new child.
 28 			if(p->size() < m_order)//dosen't need to split this node
 29 			{
 30   		            return true;
 31 			}
 32 			else
 33 				return Split_insert(t);
 34 		}
 35 		else
 36 			return false;
 37 	}
 38
 39 }
 40
 41
 42 bool BTree::Split_insert(Elem &t)
 43 {
 44 	BTreeNode* sr = search_result;
 45 	if(sr)
 46 	{
 47 		BTreeNode* sr = search_result;
 48 		vector<Elem> *p = &sr->m_data;
 49 		vector<BTreeNode*> *pc = &sr->m_child;
 50 		//begin to insert and split
 51 		while(p->size() >= m_order)
 52 		{
 53 			int split_point = p->size()/2;
 54 			BTreeNode* new_node = new BTreeNode(sr->m_parent);
 55 	//new node recieve rightmost half of current node. And after copying 
 56 	//data to new node, delete the righmost half of the current node
 57 			int i,total= p->size();
 58 			for(i=split_point+1;i<total;i++)
 59 			{
 60 				new_node->m_data.push_back((*p)[split_point+1]);
 61 				new_node->m_child.push_back((*pc)[split_point+1]);
 62 	//make the new node be the parent of the righmost half's children
 63 				if((*pc)[split_point+1])
 64 					(*pc)[split_point+1]->m_parent = new_node;
 65 				p->erase(p->begin()+split_point+1);
 66 				pc->erase(pc->begin()+split_point+1);
 67 			}
 68 			//deal with the one more child
 69 			new_node->m_child.push_back((*pc)[split_point+1]);
 70 	//make the new node be the parent of the righmost half's children
 71 			if((*pc)[split_point+1])
 72 				(*pc)[split_point+1]->m_parent = new_node;
 73 			pc->erase(pc->begin()+split_point+1);
 74 			//delete the split_point
 75 			Elem upward = (*p)[split_point];
 76 			p->erase(p->begin()+split_point);
 77 			//make the current's parent to be new_node's parent
 78 			new_node->m_parent = sr->m_parent;
 79 			if(sr == m_root)	//reach the root
 80 			{
 81 				//make a new node to be root.
 82 				BTreeNode* new_node2 = new BTreeNode(NULL);
 83 				m_root = new_node2;
 84 				new_node2->m_data.push_back(upward);
 85 //the new root's two children are the splited two from current node
 86 				new_node2->m_child.push_back(sr);
 87 				new_node2->m_child.push_back(new_node);
 88 				sr->m_parent = new_node2;
 89 				new_node->m_parent = new_node2;
 90 				return true;
 91 			}
 92
 93 			//add the upward element to the parent of current node				
 94 			p = &sr->m_parent->m_data;
 95 			pc = &sr->m_parent->m_child;
 96 			//locate the right position to insert
 97 			for(i=0;i<p->size() && (*p)[i]<upward;i++);
 98 			p->insert(p->begin()+i,upward);
 99 			//adjust the parent's child
100 			pc->insert(pc->begin()+i+1,new_node);
101 			sr=sr->m_parent;
102 		}
103         return true;
104 	}
105 	return false;
106 }

Here is the the ful list code. I have not tested it seriously, so if you find that there is a bug in it, please comment on. Thanks!
Deletion from a B-tree is analogous to insertion but a little more complicated, I will implement it later if have any spare time.

One Reply to “Notes of Introduction to Algorithm(B-Tree)”

Leave a Reply

Your email address will not be published. Required fields are marked *