Data Mining Must to Read Materials | 数据挖掘必读论文
In this repository, I am going to recommend many must reading materials to you. It may be include the whole field of data mining, I will try my best.
So, let me show you how to use this repo.
In the branch of data mining, I am going to set them to a title, and put some reading materials information below by it as a item(For each paper, I am goint to set their title as bold style). If each item has ☑️ before it, just say you could find its pdf in my corresponed sub-repo. but if you could download directly, then I suggest that you would click the item to download it.
Then you could see that have ⭐ before some items, it indicate that this paper was a highly influential in its field, as well as I really recommend you could read it carefully.
In addition, I am going to write down some thinking about this paper below some items, them just show that my perception after reading them. Hopefully, they could help you to master fastly the whole centre of papers.
Finally, if you have any suggestions then please tell me for email. Thanks!
Okay, Let us beginning! Good luck!
本人在学习数据挖掘理论的时候收集了一些论文,我将其中有一些写的非常的好(有很大影响力)的文章放在这个仓库里,供大家阅读,节省查找文献的时间。我会竭尽所能地记录数据挖掘几大领域的好文章。
这个repo的使用说明如下:
我会以该文章的完整引用格式作为索引(每篇论文的题目我都会加粗显示),如果索引前面带有 ☑️ 则说明可以在repo对应的目录中找到该论文的pdf版,仅作学习交流用途,如果有能力的读者可以直接点击索引进行下载。
如果某些论文索引前面带有 ⭐ 就表明该论文我建议您仔细阅读。
此外,我还会在一些论文索引下面记录我自己的感悟,我希望这些能够帮助您快速掌握文章的中心。
最后,若您有关于此repo的任何疑惑或者想要和我一起帮助他人,请发邮件给我。感谢!
祝大家学习愉快,科研顺利!
Table of Contents | ||
---|---|---|
Suvery | Basic Theory | Data Preprocessing |
Visualization | Classification | Clustering |
Frequent Pattern | Linking Mining | Bagging and Boosting |
Graph Mining | Sequential Patterns | Integrated Mining |
Deep Learning | ... | ... |
Note: The table of contents is based on Prof. XinDong Wu's speech in CCF of China
注: 目录是根据吴新东教授在CCF的一场演讲所制作的。
-
- This paper is focus on the field of Educational data mining(EDM) which deal with different type of data in education area. Although it not the full meaning of data mining, but we could still understand the concept of data mining from it.
-
- This paper introduct 10 algorithms in data mining, they are included association analysis, classification, clustering, statistical learning, bagging and boosting, sequential patterns, integrated mining, rough sets, link mining and graph mining, and them were pretty widely used in data mining research. So this paper would let us to understand how to do in data mining prefectly with them.
-
- This paper dicussed the hotest ten topic of data mining, such as time series mining, recognize complex knowledges and mining network stream etc. Even though issued in 2005, but it still has a great value to`` miner in today.
-
(Lagrange multiplier) Klein, Dan. "Lagrange multipliers without permanent scarring." University of California at Berkeley, Computer Science Division (2004): 1-11.
- Lagrange multiplier is common core algorithm for calculation a optimization problem within constraint conditions. If you definitly understand it, then you would fastly master many machine learning algorithms.
-
Boyd S, Boyd S P, Vandenberghe L. Convex optimization[M]. Cambridge university press, 2004.
- A pretty nice book for optimization of mathematical, and it is free to download. You could get anything(ebook, extra-practice, code) on the website. Amazingly, its all of the code were written by Python, Matlab and Julia, and you also could directly download them in this repo.
-
- This paper propose a keys sorting algorithm to clean many repeat entities from multiple databases. such as the same real-world entities are represented differently in the data sets. It would be very useful if you have to merge/purge multiple records from different databases.
-
- This Paper proposed a ynthetic over-sampling method whcih could construction of classifier from imbalance data sets. And it also shown performance of the combination of the over-sampling minority class and the under-sampling majority class is better than only use under-sampling majority class. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. Furthermore, the authors summarizes many previously approach and describes their disadvantage of methods.
-
- This paper will let you to understand what is class imblance problem and how to tackle them by re-sampling or cost-modifying methods, then how its effective in this problem. Finally it discuss the affection in modern classifier such as Decision Tree, Support Vector Machine and Nerual Network.
-
- Principal component analysis(PCA) is a widely used but it more like a black box. This paper purpose to dispel the magic behind this black box. You may be only need to read it then you would understand all of PCA. And this paper is suit for readers of all levels, just say it better than Google.
-
- A very useful review of visualization techniques, it propose a classification of visualization techniques method, which is based on the data type to be visualized, the visualization technique, and the interaction and distortion technique. You could find many valueable papers in it.
-
- This paper purpose a better than pca method, t-Stochastic Neighbor Embedding, in visualization. It is a unsupervised learning method.
-
- This paper would show you many common data visualizations techniques, especially high-dimensions, in nowadays, and it would show you how they are work.
-
⭐ (CART) Breiman, Leo. Classification and regression trees. Routledge, 2017.
- It is a quiet classic classification and regression trees(CART) book.
-
- This paper purpose of showing the overview of all of CART(classification and regression tree) for you, those algorithms be abstracted by this paper.You would definitely get some knowledge for CART from it.
-
(Naive Bayes) Duda, Richard O., Peter E. Hart, and David G. Stork. Pattern classification. John Wiley & Sons, 2012.
-
(Bayesk Network) Friedman, Nir, Dan Geiger, and Moises Goldszmidt. "Bayesian network classifiers." Machine learning 29.2-3 (1997): 131-163.
-
⭐ (Overview) J. Han, H. Cheng, D. Xin, and X. Yan, “Frequent pattern mining: Current status and future directions”.
-
- Apriori algorithm, really classical paper in discovery of frequent pattern.
-
- Freqent pattern growth algorithm.
-
⭐ (Overview) LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
- The best paper of introduction for deep learning made by LeCun, Bengio and Hinton.