# 编写一个程序，计算使用线性探测、平方探测以及双散列插入的长随机序列所需要的冲突次数

线性探测: $F(i) = i$
平方探测: $F(i) = i^2$
双散列: $F(i) = i*h(X)$ , 其中 $h(X) = 7-(X mod 7)$
计算冲突次数直接在函数里面加个静态变量就行。

	#define R 7
	enum KindOfEntry {
	Legitmate,
	Empty,
	Deleted
	};
	typedef int HashElementType;
	struct HashEntry {
	HashElementType Element;
	enum KindOfEntry Info;
	};
	typedef struct HashEntry Cell;
	struct HashTb1{
	int Tablesize;
	Cell* TheCell;
	};
	typedef struct HashTb1* HashTable;
	typedef unsigned int HashIndex;
	typedef HashIndex HashPosition;
	enum InsertOfEntry{
	Linear,
	Square,
	DoubleHash
	};

	// Key is string usually,but for test easily , I use int
	HashIndex Hash(HashElementType Key, int Tablesize)
	{
	return Key % Tablesize;
	}

	HashIndex Hash2(HashElementType X)
	{
	return R - (X % R);
	}
	// F(i) = i*i and F(i) = F(i-1) + 2i -1 equally
	// the function is find empty cell
	HashPosition FindSquare(HashElementType Key, HashTable H)
	{
	HashPosition CurrentPos;
	int CollisionNum = 0;
	// static int count = 0;

	CurrentPos = Hash(Key, H->Tablesize);
	while(H->TheCell[CurrentPos].Info != Empty &&
	H->TheCell[CurrentPos].Element != Key)
	{
	CurrentPos += 2 * ++CollisionNum -1;
	if(CurrentPos>=H->Tablesize)
	CurrentPos -= H->Tablesize;
	// printf("FindSquare: %d\n", ++count);
	}
	return CurrentPos;
	}
	// F(i) = i.
	// the function is find empty cell.
	HashPosition FindLinear(HashElementType Key, HashTable H)
	{
	HashPosition CurrentPos;
	int CollisionNum = 0;
	// static int count = 0;

	CurrentPos = Hash(Key, H->Tablesize);
	while(H->TheCell[CurrentPos].Info != Empty &&
	H->TheCell[CurrentPos].Element != Key)
	{
	CurrentPos += 1;
	if(CurrentPos>=H->Tablesize)
	CurrentPos -= H->Tablesize;

	// printf("FindLinear: %d\n", ++count);
	}
	return CurrentPos;
	}
	// F(i) = i*Hash2(X).
	// the function is find empty cell.
	HashPosition FindDoubleHash(HashElementType Key, HashTable H)
	{
	HashPosition CurrentPos;
	int CollisionNum = 0;
	// static int count = 0;
	// int num = 0;

	CurrentPos = Hash(Key, H->Tablesize);
	while(H->TheCell[CurrentPos].Info != Empty &&
	H->TheCell[CurrentPos].Element != Key)
	{
	CurrentPos += ++CollisionNum * Hash2(Key);
	if(CurrentPos>=H->Tablesize)
	CurrentPos -= H->Tablesize;
	// when you check CurrentPos>=H->Tablesize
	// it means that key can not insert hash table
	// num++;
	if(CurrentPos>=H->Tablesize)
	{
	// count -= num;
	// printf("FindDoubleHash: %d\n", ++count);
	break;
	}
	// printf("FindDoubleHash: %d\n",++count);
	}
	return CurrentPos;
	}

# 在分离链接散列表中进行大量的删除可能造成表非常稀疏，浪费空间。在这种情况下，我们可以再散列一个表，大小为原表的一半。设当存在相当于表的大小的二倍那么多的元素的时候，我们再散列到一个更大的表。在再散列到一个更小的表之前，该表应该有多么稀疏？

设再散列的阈值为 p, 旧表大小为 2N, 新表为 N。
则在 2N-2pN 插入或 pN 次删除后会再散列到一个更小的表。
综上考虑 p=2/3 时，效果更好。

# 有一种冲突解决策略是定义一个序列 $F(i)=r_i$ ，其中 $r_0=0$ 且 $r_1,r_2,...,r_N$ 是前 N 个整数的一个随机排列（每个整数恰好出现一次）。

# a. 证明，在这种策略下，如果表不满，那么冲突总能够被解决

假设每个表大小为 N, $r_i$ 每次不出现重复的数字且 $i<N$ , 则在 $i/N$ 次次聚集冲突后可以填入表中。

# b. 能够期望这种策略会消除聚集吗？

只消除了主聚集问题，次聚集问题没有消除。因为散列到某个位置的所有元素都将尝试相同的冲突解决顺序。

# c. 如果表的装填因子是 λ，执行一次插入的期望时间是多少？

$O(\lambda-i)$ 。

# d. 如果表的装填因子是 λ，执行一次成功查找的期望时间是多少？

一样是 $O(\lambda-i)$ 。

# e. 给出一个有效算法（理论上以及实际上）生成随机序列。解释为什么选择 P 的那些法则是重要的？

随机数算法的选取影响随机数的生成效率，硬件生成随机数是最慢的，所以选 P 的算法要慎重。

# 各种冲突解决方法的优点和缺点是什么？

分离链接法，不会发生冲突问题，但会消耗昂贵的内存和指针。
线性探测法，实现简单，但随着插入增加会加重主聚类问题，使性能下降。
平方探测法，实现稍微困难，有着和线性探测法一样的问题。
双散列，消除了主聚类和次聚类，但第二个散列函数计算成本可能很高。

# 描述一个避免初始化散列表的过程（以内存消耗作为代价）。

添加一个字段表示散列表已填的状态即可避免初始化。