-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The question about the redcap algorithm #2467
Comments
Thanks for your question! @JagnDC Can you post your step-by-step manual calculation so we can check what might be the problem? You can see a step-by-step manual calculation using FullOrder-CompleteLinkage in our GeoDa workbook: https://geodacenter.github.io/workbook/9c_spatial3/lab9c.html#appendix |
Can you also post how did you update the centroid (with average linkage) of the merged cluster after making a connection? I didn't see it in your attached image. You can also post the dendrogram of the hierarchical clustering of average linkage, which helps to explain the calculation (like in the GeoDa documentation I posted). Thanks! |
The calculation doesn't look correct to me: after merging 2 and 4, why all the sudden 3 has connection to 2? Yes, It's better to post a screenshot of the dendrogram of SCHC, see in GeoDa documentation: Thanks! A careful consideration of the various REDCAP algorithms reveals that they essentially consist of three steps. First, a dendrogram for contiguity constrained hierarchical clustering is constructed, using the given linkage function. This yields the exact same dendrogram as produced by SCHC. Next, this dendrogram is turned into a spanning tree, using standard graph manipulation principles. Finally, the optimal cuts in the spanning tree are obtained using the same logic (and computations) as in SKATER, up to the desired level of k. |
Here's the dendrogram I drew and the SCHC dendrogram, and I've also labeled the order of the edges. To be clear, I'm just building a spanning tree here. For your question, you can see my clustering on the right side of the distance matrix diagram. Since 2 and 4 are merged into one cluster, cluster 2 in state 2 represents the set of cluster 2 and cluster 4 in state 1. When updating the distance between cluster 2 and other clusters in the calculation process, all distances are calculated. I devised a separate step to determine whether clusters are adjacent. To avoid misunderstandings, I have updated the process plot of the distance matrix here. |
Thank you for your reply. For your question, the average of all distances between two clusters I calculated here, because it is the calculation between feature values, so no judgment of adjacency is made, and constraints are made in the subsequent calculation process, so the final calculation results will not be affected. However, I would like to thank you for reminding me. I have optimized my calculation process and reduced unnecessary calculation process. |
You are welcome! Just noticed another one: the 4.57081 values in state 2 seems not correct.
All these calculations will impact the final results. The SCHC is a special case of classic hierarchical clustering with fully connected weights. The implementation is identical to other clustering libraries.
On Nov 23, 2023, at 9:38 PM, dingchen ***@***.***> wrote:
Thank you for your reply. For your question, the average of all distances between two clusters I calculated here, because it is the calculation between feature values, so no judgment of adjacency is made, and constraints are made in the subsequent calculation process, so the final calculation results will not be affected.
However, I would like to thank you for reminding me. I have optimized my calculation process and reduced unnecessary calculation process.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/GeoDaCenter/geoda/issues/2467*issuecomment-1825138841__;Iw!!BpyFHLRN4TMTrA!9awoiDVpcRBA_hp4zAdo48tEoQlB8Sm5cFrUEEYBD_Aj8z1T8sEVcX7fGhMP-SQEABJcZrrVVNu2C49KFQDzWQ2h0g$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AASPYT7MAXCT3DEDLDZNS23YGAQFXAVCNFSM6AAAAAA7NTOWP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGEZTQOBUGE__;!!BpyFHLRN4TMTrA!9awoiDVpcRBA_hp4zAdo48tEoQlB8Sm5cFrUEEYBD_Aj8z1T8sEVcX7fGhMP-SQEABJcZrrVVNu2C49KFQBNv82aTA$>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Thank you for your reply. I see what you mean. I want to briefly explain my calculation process. According to the full pattern in the original article, when I merge nodes 2,4 in state 1, I should calculate the distance between the other clusters, which are cluster 0 and cluster 1. You are confused about the distance between cluster 1 and cluster 2
|
I am confused: among node 1-5, you are merging node 3 and 5 at first step from your screenshot. This is corresponding to row/column 2 and 4 in state 0 table. It should be simply the average of the two columns/rows. In state 1 table, 4.93383 is correct, and 4.57081 is not correct and some others are not correct as well. Please check and state carefully.
On Nov 23, 2023, at 10:45 PM, dingchen ***@***.***> wrote:
Thank you for your reply. I see what you mean. I want to briefly explain my calculation process.
According to the full pattern in the original article, when I merge nodes 2,4 in state 1, I should calculate the distance between the other clusters, which are cluster 0 and cluster 1. You are confused about the distance between cluster 1 and cluster 2
dis(node 1, node 2)=√(9+0+9)=√18
dis(node 1, node 4)= √(16+4+4)=√24
avgdis(cluster 1, cluster 2)= √18+ √24=4.5708
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/GeoDaCenter/geoda/issues/2467*issuecomment-1825175890__;Iw!!BpyFHLRN4TMTrA!81WITMV5Zf3pokGKGavnV60tbxBtaJ6jlH1cnZLAcBau5QDZh39eAXglly0_aiytkw654W1W_6B3jAL5CNIU3EvlcA$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AASPYT3U2CNQVILOWJR4HSDYGAX5RAVCNFSM6AAAAAA7NTOWP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGE3TKOBZGA__;!!BpyFHLRN4TMTrA!81WITMV5Zf3pokGKGavnV60tbxBtaJ6jlH1cnZLAcBau5QDZh39eAXglly0_aiytkw654W1W_6B3jAL5CNIdLZ6Brg$>.
You are receiving this because you commented.Message ID: ***@***.***>
|
I see. Can you send me the files you used in GeoDa? So I can check the details. Thanks! |
test_data.zip Feel free to contact me if you have any new findings or questions! |
I would like to ask a question about the REDCAP algorithm, I wrote the REDCAP algorithm according to the paper, but the calculation results seem to be different from the results of geoda, and some problems are found through comparison.
For FullOrder-AverageLinkage, my data property values are as follows:
The initial adjacency relation is as follows.
I used this to cluster five nodes and set the number of clusters to two.
My code calculates the following:
{0: [0, 4, 1, 2], 1: [3]}
geoda calculates the following:
{0: [0, 1, 3], 1: [2, 4]}
I didn't know which of these calculations was correct, so I looked at the spanning tree, and I found that my algorithm built the following tree adjacency:
geoda's tree is:
After manual calculation, I found that according to my understanding, the results of the paper should be consistent with the results of my algorithm. May I ask if there is a problem with my calculation results? Or maybe there's something wrong in some other place that I don't understand. Thanks to the Author
The text was updated successfully, but these errors were encountered: