Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added debug log of AllocatedIPCount of ippool #3926

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

ty-dc
Copy link
Collaborator

@ty-dc ty-dc commented Aug 22, 2024

Thanks for contributing!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes ##3771

Special notes for your reviewer:

SpiderIPPool's AllocatedIPCount statistics are aligned with SpiderSubnet, and some debug logs are added.

@ty-dc ty-dc added the pr/not-ready not ready for merging label Aug 22, 2024
Copy link

codecov bot commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.81%. Comparing base (1d5d889) to head (6ef5fc2).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
pkg/ippoolmanager/ippool_manager.go 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3926      +/-   ##
==========================================
- Coverage   81.34%   80.81%   -0.54%     
==========================================
  Files          51       51              
  Lines        4514     4519       +5     
==========================================
- Hits         3672     3652      -20     
- Misses        672      700      +28     
+ Partials      170      167       -3     
Flag Coverage Δ
unittests 80.81% <71.42%> (-0.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/ippoolmanager/ippool_manager.go 88.64% <71.42%> (-0.64%) ⬇️

... and 1 file with indirect coverage changes

@ty-dc ty-dc force-pushed the fix/error-allocatedipcount branch from e1cda70 to d2cedea Compare August 23, 2024 02:16
@weizhoublue
Copy link
Collaborator

weizhoublue commented Aug 23, 2024

需要找出 用量统计的根因,gc并不能解决真正的问题,只是事后补救,这是两个纬度

@ty-dc
Copy link
Collaborator Author

ty-dc commented Aug 23, 2024

需要找出 用量统计的根因,gc并不能解决真正的问题,只是事后补救,这是两个纬度

image
原本分配 IP 的流程,是在已有的 allcateIPCount 数量++,如果分配地址时,并发进行一个在分配、一个在释放,而释放的时候 AllocatedIPCount 还没有完成 -- 这个动作,而分配的时候拿到的是未 -- 的 AllocatedIPCount ,并进行了 ++ ,从而导致出现了统计出现不正确的现象(猜测,暂未能证实猜想)

@ty-dc ty-dc force-pushed the fix/error-allocatedipcount branch from d2cedea to 42941c7 Compare August 23, 2024 08:30
@@ -586,6 +586,23 @@ var _ = Describe("test ip with reclaim ip case", Label("reclaim"), func() {
// Delete Pod
Expect(frame.DeletePod(podName, namespace)).To(Succeed(), "Failed to delete pod %v/%v\n", namespace, podName)
GinkgoWriter.Printf("succeed to delete pod %v/%v\n", namespace, podName)

// Check whether the dirty IP data is recovered successfully and whether the AllocatedIPCount decreases and meets expectations?
Copy link
Collaborator Author

@ty-dc ty-dc Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用例是在手动更新 AllocatedIPCount 为一个错误的值,期望在新的改进引入后,分配 IP 或则 释放 IP,或则 gc all 回收异常 IP,都能够纠正这个 AllocatedIPCount 值。

IPPool 的状态是健壮的。

@ty-dc ty-dc added release/bug cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. and removed pr/not-ready not ready for merging labels Aug 23, 2024
@ty-dc ty-dc force-pushed the fix/error-allocatedipcount branch from 42941c7 to e7ee784 Compare August 23, 2024 09:25
@weizhoublue weizhoublue added pr/not-ready not ready for merging and removed cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. labels Sep 3, 2024
@ty-dc ty-dc force-pushed the fix/error-allocatedipcount branch 2 times, most recently from f59f549 to a9069ca Compare September 9, 2024 10:11
@ty-dc ty-dc removed the pr/not-ready not ready for merging label Sep 9, 2024
@weizhoublue
Copy link
Collaborator

(1) 这个日志 真有 能帮助 debug 问题根因么
(2)这个 pr 的 title 并不能 fix

@ty-dc ty-dc changed the title fix: the statistics of ippool's status.AllocatedIPCount are inaccurate Added debug log of AllocatedIPCount of ippool Sep 10, 2024
@ty-dc ty-dc added release/none no release note and removed release/bug labels Sep 10, 2024
@ty-dc
Copy link
Collaborator Author

ty-dc commented Sep 10, 2024

(1) 这个日志 真有 能帮助 debug 问题根因么

当前没有任何日志去获悉 ippool 的 AllocatedIPCount 数量在什么时候开始出现异常的。AllocatedIPCount++,AllocatedIPCount-- 只出现在文中两处,补充这样的日志,当 IPPool 分配与释放 IP 打印当前的 AllocatedIPCount,能够知道在哪一次分配和释放出现的记录错误的问题。

@weizhoublue
Copy link
Collaborator

weizhoublue commented Sep 12, 2024

(1) 这个日志 真有 能帮助 debug 问题根因么

当前没有任何日志去获悉 ippool 的 AllocatedIPCount 数量在什么时候开始出现异常的。AllocatedIPCount++,AllocatedIPCount-- 只出现在文中两处,补充这样的日志,当 IPPool 分配与释放 IP 打印当前的 AllocatedIPCount,能够知道在哪一次分配和释放出现的记录错误的问题。

你再想想,你这行 日志 真的能 进行 生产监控么
一直在这里 debug 级别 打印 数字

@@ -215,6 +215,8 @@ func (im *ipPoolManager) genRandomIP(ctx context.Context, ipPool *spiderpoolv2be
ipPool.Status.AllocatedIPCount = new(int64)
}

// reference issue: https://github.com/spidernet-io/spiderpool/issues/3771
logger.Sugar().Debugf("Handling AllocatedIPCount when allocate IP from IPPool %s, the current AllocatedIPCount is: %d", ipPool.Name, *ipPool.Status.AllocatedIPCount)
Copy link
Collaborator

@weizhoublue weizhoublue Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这么检测,是帮助不了 捕获问题的
它的逻辑应该是

if  len( ip 分配数组 ) != AllocatedIPCount {
      log(error, ....... )
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@weizhoublue
Copy link
Collaborator

同理, subnet 对象的统计 也可能会有问题

@ty-dc ty-dc force-pushed the fix/error-allocatedipcount branch from a9069ca to 6168361 Compare September 14, 2024 06:47
@ty-dc
Copy link
Collaborator Author

ty-dc commented Sep 14, 2024

同理, subnet 对象的统计 也可能会有问题

子网根据实际的来算的

func subnetStatusCount(subnet *spiderpoolv2beta1.SpiderSubnet) (totalCount, allocatedCount int64) {
	s, _ := spiderpoolip.NewCIDR(subnet.Spec.Subnet, subnet.Spec.IPs, subnet.Spec.ExcludeIPs)

	if subnet.Status.ControlledIPPools == nil {
		return 0, 0
	}
	var controlledIPPools spiderpoolv2beta1.PoolIPPreAllocations
	err := json.Unmarshal([]byte(*subnet.Status.ControlledIPPools), &controlledIPPools)
	if nil != err {
		return 0, 0
	}

	// allocated IP Count
	var allocatedIPCount int64
	for _, poolAllocation := range controlledIPPools {
		tmpIPs, _ := spiderpoolip.ParseIPRanges(*subnet.Spec.IPVersion, poolAllocation.IPs)
		allocatedIPCount += int64(len(tmpIPs))
	}
	return int64(s.TotalIPInt()), allocatedIPCount
}

@weizhoublue
Copy link
Collaborator

同理, subnet 对象的统计 也可能会有问题

子网根据实际的来算的

func subnetStatusCount(subnet *spiderpoolv2beta1.SpiderSubnet) (totalCount, allocatedCount int64) {
	s, _ := spiderpoolip.NewCIDR(subnet.Spec.Subnet, subnet.Spec.IPs, subnet.Spec.ExcludeIPs)

	if subnet.Status.ControlledIPPools == nil {
		return 0, 0
	}
	var controlledIPPools spiderpoolv2beta1.PoolIPPreAllocations
	err := json.Unmarshal([]byte(*subnet.Status.ControlledIPPools), &controlledIPPools)
	if nil != err {
		return 0, 0
	}

	// allocated IP Count
	var allocatedIPCount int64
	for _, poolAllocation := range controlledIPPools {
		tmpIPs, _ := spiderpoolip.ParseIPRanges(*subnet.Spec.IPVersion, poolAllocation.IPs)
		allocatedIPCount += int64(len(tmpIPs))
	}
	return int64(s.TotalIPInt()), allocatedIPCount
}

为什么不 两个拉齐 一个算法
都使用 切片长度

@ty-dc
Copy link
Collaborator Author

ty-dc commented Sep 18, 2024

为什么不 两个拉齐 一个算法
都使用 切片长度

开始修改就是对齐子网,使用切片长度来算的,然后被你否了。你说没找到根因怎么知道修改是否有效?

@ty-dc ty-dc force-pushed the fix/error-allocatedipcount branch from 6168361 to e9f0536 Compare September 18, 2024 06:13
@ty-dc ty-dc force-pushed the fix/error-allocatedipcount branch from e9f0536 to 6ef5fc2 Compare September 18, 2024 06:17
Copy link
Collaborator

@cyclinder cyclinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议 PR 加些描述,方便 release-notes 展示

@ty-dc
Copy link
Collaborator Author

ty-dc commented Sep 19, 2024

建议 PR 加些描述,方便 release-notes 展示

done

@ty-dc ty-dc added cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. labels Sep 19, 2024
@ty-dc ty-dc merged commit 51c3e4b into spidernet-io:main Sep 19, 2024
57 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 19, 2024
Added debug log of AllocatedIPCount of ippool

Signed-off-by: robot <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 19, 2024
Added debug log of AllocatedIPCount of ippool

Signed-off-by: robot <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 19, 2024
Added debug log of AllocatedIPCount of ippool

Signed-off-by: robot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. release/none no release note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants