以 16 的代码为例,TRUNCATE 的逻辑很好找,DropRelationsAllBuffers,整体的代码逻辑较长,让我们一步一步分析。注释很明了,从 Buffer pool 中移除指定表的所有分支文件。
1 2 3 4 5 6 7 8
/* --------------------------------------------------------------------- * DropRelationsAllBuffers * * This function removes from the buffer pool all the pages of all * forks of the specified relations. It's equivalent to calling * DropRelationBuffers once per fork per relation with firstDelBlock = 0. * -------------------------------------------------------------------- */
/* If it's a local relation, it's localbuf.c's problem. */ for (i = 0; i < nlocators; i++) { if (RelFileLocatorBackendIsTemp(smgr_reln[i]->smgr_rlocator)) { if (smgr_reln[i]->smgr_rlocator.backend == MyBackendId) DropRelationAllLocalBuffers(smgr_reln[i]->smgr_rlocator.locator); } else rels[n++] = smgr_reln[i]; }
/* * If there are no non-local relations, then we're done. Release the * memory and return. */ if (n == 0) { pfree(rels); return; }
然后分配一个二维数组,记录要 DROP/TRUNCATE 对象的所有数据块
1 2 3 4 5 6
/* * This is used to remember the number of blocks for all the relations * forks. */ block = (BlockNumber (*)[MAX_FORKNUM + 1]) palloc(sizeof(BlockNumber) * n * (MAX_FORKNUM + 1));
接下来是代码核心流程,先看注释
We can avoid scanning the entire buffer pool if we know the exact size of each of the given relation forks. See DropRelationBuffers.
/* * We can avoid scanning the entire buffer pool if we know the exact size * of each of the given relation forks. See DropRelationBuffers. */ for (i = 0; i < n && cached; i++) { for (int j = 0; j <= MAX_FORKNUM; j++) { /* Get the number of blocks for a relation's fork. */ block[i][j] = smgrnblocks_cached(rels[i], j); ---返回InvalidBlockNumber
/* We need to only consider the relation forks that exists. */ if (block[i][j] == InvalidBlockNumber) { if (!smgrexists(rels[i], j)) ---判断文件是否存在,进而返回cached = false continue; cached = false; break; }
/* calculate the total number of blocks to be invalidated */ nBlocksToInvalidate += block[i][j]; } }
/* * To remove all the pages of the specified relation forks from the buffer * pool, we need to scan the entire buffer pool but we can optimize it by * finding the buffers from BufMapping table provided we know the exact * size of each fork of the relation. The exact size is required to ensure * that we don't leave any buffer for the relation being dropped as * otherwise the background writer or checkpointer can lead to a PANIC * error while flushing buffers corresponding to files that don't exist. * 为了从缓冲池中移除指定关系分支的所有页面,我们需要扫描整个缓冲池,但如果我们知道关系的每个分 支的确切大小,我们可以通过从BufMapping表中查找缓冲区来优化它。需要确切大小是为了确保我们没 有留下任何要删除的关系的缓冲区,否则后台写入器或检查点器在刷新不存在的文件对应的缓冲区时可能 会导致PANIC错误。 * To know the exact size, we rely on the size cached for each fork by us * during recovery which limits the optimization to recovery and on * standbys but we can easily extend it once we have shared cache for * relation size. 为了知道确切的大小,我们依赖于在恢复期间为每个分支缓存的大小,这限制了优化适用于恢复和备用节 点,但一旦我们有了关系大小的共享缓存,我们可以轻松扩展它。 * * In recovery, we cache the value returned by the first lseek(SEEK_END) * and the future writes keeps the cached value up-to-date. See * smgrextend. It is possible that the value of the first lseek is smaller * than the actual number of existing blocks in the file due to buggy * Linux kernels that might not have accounted for the recent write. But * that should be fine because there must not be any buffers after that * file size. */
/* * smgrnblocks_cached() -- Get the cached number of blocks in the supplied * relation. * * Returns an InvalidBlockNumber when not in recovery and when the relation * fork size is not cached. */ BlockNumber smgrnblocks_cached(SMgrRelation reln, ForkNumber forknum) { /* * For now, we only use cached values in recovery due to lack of a shared * invalidation mechanism for changes in file size. */ if (InRecovery && reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber) return reln->smgr_cached_nblocks[forknum];
/* * We apply the optimization iff the total number of blocks to invalidate * is below the BUF_DROP_FULL_SCAN_THRESHOLD. 如果要无效的页面数小于BUF_DROP_FULL_SCAN_THRESHOLD(NBuffers/32)的话,从hash中查找,否则遍历所有页面 */ if (cached && nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD) { for (i = 0; i < n; i++) { for (int j = 0; j <= MAX_FORKNUM; j++) { /* ignore relation forks that doesn't exist */ if (!BlockNumberIsValid(block[i][j])) continue;
/* drop all the buffers for a particular relation fork */ FindAndDropRelationBuffers(rels[i]->smgr_rlocator.locator, j, block[i][j], 0); } }
The threshold to use is rather a guess than an exactly determined value
比如 drop test1,test2,test… ,如果要删除的表大于了 20,就采用二分查找,否则就避免进行二分查找,减少开销
1 2 3 4 5 6 7 8 9 10 11
/* * For low number of relations to drop just use a simple walk through, to * save the bsearch overhead. The threshold to use is rather a guess than * an exactly determined value, as it depends on many factors (CPU and RAM * speeds, amount of shared buffers etc.). */ use_bsearch = n > RELS_BSEARCH_THRESHOLD;
/* sort the list of rlocators if necessary */ if (use_bsearch) pg_qsort(locators, n, sizeof(RelFileLocator), rlocator_comparator);
/* * DropRelationAllLocalBuffers * This function removes from the buffer pool all pages of all forks * of the specified relation. * * See DropRelationsAllBuffers in bufmgr.c for more notes. */ void DropRelationAllLocalBuffers(RelFileLocator rlocator) { int i;
for (i = 0; i < NLocBuffer; i++) { BufferDesc *bufHdr = GetLocalBufferDescriptor(i); LocalBufferLookupEnt *hresult; uint32 buf_state;
buf_state = pg_atomic_read_u32(&bufHdr->state);
if ((buf_state & BM_TAG_VALID) && BufTagMatchesRelFileLocator(&bufHdr->tag, &rlocator)) { if (LocalRefCount[i] != 0) elog(ERROR, "block %u of %s is still referenced (local %u)", bufHdr->tag.blockNum, relpathbackend(BufTagGetRelFileLocator(&bufHdr->tag), MyBackendId, BufTagGetForkNum(&bufHdr->tag)), LocalRefCount[i]); /* Remove entry from hashtable */ hresult = (LocalBufferLookupEnt *) hash_search(LocalBufHash, &bufHdr->tag, HASH_REMOVE, NULL); if (!hresult) /* shouldn't happen */ elog(ERROR, "local buffer hash table corrupted"); /* Mark buffer invalid */ ClearBufferTag(&bufHdr->tag); buf_state &= ~BUF_FLAG_MASK; buf_state &= ~BUF_USAGECOUNT_MASK; pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state); } } }
另外,最开始判断需要删除的对象数量,包括表所有的对象,比如 TOAST,比如 TOAST 的索引,所以各位可以验证一下,比如 create table test1(id int),那么这个 n 就是 1 ,如果 create table test2(info text),这个 n 就是 3,然后所有的对象的都要经过遍历,可想而知,表越复杂,索引越多,那么删除就越慢!
1 2 3 4 5 6 7 8 9 10 11
/* If it's a local relation, it's localbuf.c's problem. */ for (i = 0; i < nlocators; i++) { if (RelFileLocatorBackendIsTemp(smgr_reln[i]->smgr_rlocator)) { if (smgr_reln[i]->smgr_rlocator.backend == MyBackendId) DropRelationAllLocalBuffers(smgr_reln[i]->smgr_rlocator.locator); } else rels[n++] = smgr_reln[i]; }
此处我用 until 跳出循环,最终可以看到,遍历了 16384 次!也就是整个 shared buffers 的大小。