http://nlp.hivefire.com/ NLP News
https://nlppeople.com/ NLP Jobs
http://www.cs.rochester.edu/~tetreaul/conferences.html Computational Linguistics / NLP Conferences
http://www.ldc.upenn.edu/ LDC: The Linguistic Data Consortium
http://www.clt.gu.se/wiki/nlp-resources NLP Resources
http://www.aaai.org/AITopics/html/natlang.html AAAI Topics on NLP
http://www-nlp.stanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources
http://wordnet.princeton.edu/ WordNet
http://www.corpus4u.org/ 语料库语言学在线
- The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992 as part of the TIPSTER Text program.
http://nlp.cs.berkeley.edu/tutorials/variational-tutorial-slides.pdf Variational Inference in Structured NLP Models, Presented at NAACL 2012 with David Burkett.
http://pages.cs.wisc.edu/~jerryzhu/pub/ZhuCCFADL46.pdf Tutorial on Statistical Machine Learning for NLP 2013
http://www.stanford.edu/class/cs224n/ CS 224N / Ling 284 — Natural Language Processing
http://www.cs.berkeley.edu/~klein/cs288/sp10/ CS 288: Statistical Natural Language Processing, Spring 2010
http://demo.clab.cs.cmu.edu/fa2013-11711/index.php/Main_Page Algorithms for NLP: Basic Information (Fall 2013)
http://www.cs.colorado.edu/~martin/csci5832/lectures_and_readings.html Natural Language Processing, CSCI 5832 FALL 2013
http://www.cs.columbia.edu/~cs4705/ COMS W4705: Natural Language Processing 2013
http://www1.cs.columbia.edu/~julia/courses/CS4705/syllabus10.htm COMS 4705: Natural Language Processing, Fall 2010
http://www1.cs.columbia.edu/~julia/courses/CS4706/syllabus12.htm CS4706: Spoken Language Processing, Spring 2012
http://www.cs.cornell.edu/courses/cs4740/2014sp/ CS 4740/5740 - Introduction to Natural Language Processing, Spring 2014
http://l2r.cs.uiuc.edu/~danr/Teaching/CS546-13/ Machine Learning and Natural Language Spring 2013
http://www.cs.jhu.edu/~jason/465/ Natural Language Processing Course # 600.465 — Fall 2013
http://web.stanford.edu/class/cs224s/ CS 224S/LINGUIST 285 Spoken Language Processing
http://www.umiacs.umd.edu/~resnik/ling773_sp2014/ Ling773/CMSC773/INST728C, Spring 2014 Computational Linguistics II
http://cs.nyu.edu/courses/spring13/CSCI-GA.2590-001/index.html
http://www.cis.upenn.edu/~cis530/ CIS 530 Fall 2013 Computational Linguistics
http://pages.cs.wisc.edu/~jerryzhu/cs769.html CS 769: Advanced Natural Language Processing Spring 2010
http://pages.cs.wisc.edu/~bsnyder/cs769.html
http://nlp.stanford.edu/ Stanford NLP group
http://nlp.cs.berkeley.edu/ Berkeley NLP group
http://www.lti.cs.cmu.edu/ CMU Language Technologies Institute
http://nlp.ict.ac.cn/index_zh.php 中科院计算所自然语言处理研究组
http://www.sogou.com/labs/ Sogou实验室
http://linguistics.georgetown.edu/ Department of Linguistics, Georgetown University
http://ir.hit.edu.cn/ 哈工大社会计算与信息检索研究中心
https://wiki.umiacs.umd.edu/clip/index.php/Main_Page
http://www.eng.utah.edu/~cs5340/
http://www.cs.colorado.edu/~martin/slp2.html SPEECH and LANGUAGE PROCESSING 2nd edition 2009
- 浔雨: "自然语言处理综论" 这本书的权威自不用说,译者是冯志伟老师和孙乐老师,当年读这本书的时候,还不知道冯老师是谁,但是读起来感觉非常好,想想如果没有在这个领域积攒多年的实力,是不可能翻译的这么顺畅的。这本书在国内外的评价都比较好,对自然语言处理的两个学派(语言学派和统计学派)所关注的内容都有所包含,但因此也失去一些侧重点。从我的角度来说更偏向于统计部分,所以需要了解统计
http://cognet.mit.edu/library/books/view?isbn=0262133601 Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.
http://nlp.stanford.edu/~manning/
http://www.umiacs.umd.edu/~hal/
http://mimno.infosci.cornell.edu/ David Mimno
- maintainer of MALLET
http://www.cs.berkeley.edu/~klein/ Dan Klein
http://cs.brown.edu/people/ec/home.html Eugene Charniak
http://www.cs.colorado.edu/~martin/
http://www.cs.columbia.edu/~mcollins/
http://www1.cs.columbia.edu/~julia/
http://www.cs.cornell.edu/home/cardie/
http://www.eecs.harvard.edu/shieber/
- computational Linguistics
http://www.stanford.edu/~jurafsky/
http://www.umiacs.umd.edu/~resnik/
http://homes.cs.washington.edu/~taskar/
http://www.cis.upenn.edu/~nenkova/
http://www.cs.utah.edu/~riloff/
http://pages.cs.wisc.edu/~jerryzhu/
http://pages.cs.wisc.edu/~bsnyder/
http://www.cs.cmu.edu/~nasmith/
http://www.cs.cmu.edu/~alavie/
http://gate.ac.uk GATE
- 孔牧: 你可以按照它的要求向其中添加组件, 完成自己的nlp任务. 我在的项目组曾经尝试过使用, 虽然它指出组件开发, 但是灵活性还是不高, 所以我们自己又开发了一套流水线。
http://nltk.org Natural Language Toolkit(NLTK)
http://mallet.cs.umass.edu MALLET MAchine Learning for LanguagE Toolkit
http://opennlp.apache.org/ OpenNLP
http://alias-i.com/lingpipe/ LingPipe is tool kit for processing text using computational linguistics.
https://textblob.readthedocs.org/en/dev/ TextBlob: Simplified Text Processing (python)
https://github.com/HIT-SCIR/ltp 语言技术平台(Language Technology Platform,LTP)是哈工大社会计算与信息检索研究中心历时十年开发的一整套中文语言处理系统。
- http://www.ltp-cloud.com/ “语言技术平台云”(LTP-Cloud)
- 孔牧: 这个是一个较完善的流水线了, 不说质量怎么样, 它提供分词、语义标注、 句法依赖、 实体识别。 虽然会出现错误的结果, 但是, 找不到更好的了。
https://github.com/xpqiu/fnlp/ 中文自然语言处理工具包
- 邱锡鹏: 推荐自家的FudanNLP
http://snowball.tartarus.org/ Snowball
http://nlp.stanford.edu/software/tagger.shtml Stanford POS Tagger
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ TreeTagger
http://www.coli.uni-saarland.de/~thorsten/tnt/ TnT
http://nlp.stanford.edu/software/lex-parser.shtml Stanford Parser
http://nlp.cs.berkeley.edu/software.shtml Berkeley Parser
https://github.com/BLLIP/bllip-parser Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006
http://www.nzdl.org/Kea/index_old.html KEA keyphrase extraction
http://nlp.stanford.edu/software/CRF-NER.shtml Stanford NER
http://nlp.stanford.edu/software/segmenter.shtml Stanford Word Segmenter
https://github.com/fxsjy/jieba 中文分词
http://ictclas.org/ 中科院分词ICTCLAS
- 孔牧: 一个比较权威的分词器, 相信你最后会选择它作为项目的分词工具, 虽然本身存在很多问题, 但是我找不到更好的开源项目了。
http://msdn.microsoft.com/zh-cn/library/jj163981.aspx
- 孔牧: 当然这个是不开源的, 但是分词非常准, 但是悲剧的是它将分词和实体识别同时完成了, 而且分词(在它提供的工具中)不提供词性标注。
https://github.com/ansjsun/ansj_seg ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
http://cmusphinx.sourceforge.net/ CMU Sphinx
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm Matlab Topic Modeling Toolbox 1.4
http://gibbslda.sourceforge.net/ GibbsLDA++
http://code.google.com/p/glda/ GLDA GPU-accelerated Latent Dirichlet allocation training
http://lucene.apache.org/ Lucene
http://zhangkaixu.github.io/bibpage/cws.html 张开旭同学整理的文献列表
(2008) Sunita Sarawagi. Information extraction. Foundations and Trends in Databases.
(2000) Rosenfeld, R. Two decades of statistical language modeling: where do we go from here?. Proc. IEEE. (2009) Chengxiang Zhai. Statistical Language Models For information Retrieval. Lecture Notes. http://www.cs.cmu.edu/~roni/papers/survey-slm-IEEE-PROC-0004.pdf Two decades of Statistical Language Models
(2009) Sandra Kubler, Ryan McDonald, Joakim Nivre. Dependency Parsing. Synthesis Lectures on Human Language Technologies.
(2008) Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval .
(2009) Navigli, R. Word sense disambiguation: A survey. ACM Computing Surveys.
http://mimno.infosci.cornell.edu/topics.html Topic modeling bibliography
Parsing(句法结构分析~语言学知识多,会比较枯燥)
Klein & Manning: "Accurate Unlexicalized Parsing" ( )
Klein & Manning: "Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency" (革命性的用非监督学习的方法做了parser)
Nivre "Deterministic Dependency Parsing of English Text" (shows that deterministic parsing actually works quite well)
McDonald et al. "Non-Projective Dependency Parsing using Spanning-Tree Algorithms" (the other main method of dependency parsing, MST parsing)
Machine Translation(机器翻译,如果不做机器翻译就可以跳过了,不过翻译模型在其他领域也有应用)
Knight "A statistical MT tutorial workbook" (easy to understand, use instead of the original Brown paper)
Och "The Alignment-Template Approach to Statistical Machine Translation" (foundations of phrase based systems)
Wu "Inversion Transduction Grammars and the Bilingual Parsing of Parallel Corpora" (arguably the first realistic method for biparsing, which is used in many systems)
Chiang "Hierarchical Phrase-Based Translation" (significantly improves accuracy by allowing for gappy phrases)
Language Modeling (语言模型)
Goodman "A bit of progress in language modeling" (describes just about everything related to n-gram language models 这是一个survey,这个survey写了几乎所有和n-gram有关的东西,包括平滑 聚类)
Teh "A Bayesian interpretation of Interpolated Kneser-Ney" (shows how to get state-of-the art accuracy in a Bayesian framework, opening the path for other applications)
Machine Learning for NLP
Sutton & McCallum "An introduction to conditional random fields for relational learning" (CRF实在是在NLP中太好用了!!!!!而且我们大家都知道有很多现成的tool实现这个,而这个就是一个很简单的论文讲述CRF的,不过其实还是蛮数学= =。。。)
Knight "Bayesian Inference with Tears" (explains the general idea of bayesian techniques quite well)
Berg-Kirkpatrick et al. "Painless Unsupervised Learning with Features" (this is from this year and thus a bit of a gamble, but this has the potential to bring the power of discriminative methods to unsupervised learning)
Information Extraction
Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. COLING 1992. (The very first paper for all the bootstrapping methods for NLP. It is a hypothetical work in a sense that it doesn't give experimental results, but it influenced it's followers a lot.)
Collins and Singer. Unsupervised Models for Named Entity Classification. EMNLP 1999. (It applies several variants of co-training like IE methods to NER task and gives the motivation why they did so. Students can learn the logic from this work for writing a good research paper in NLP.)
Computational Semantics
Gildea and Jurafsky. Automatic Labeling of Semantic Roles. Computational Linguistics 2002. (It opened up the trends in NLP for semantic role labeling, followed by several CoNLL shared tasks dedicated for SRL. It shows how linguistics and engineering can collaborate with each other. It has a shorter version in ACL 2000.)
Pantel and Lin. Discovering Word Senses from Text. KDD 2002. (Supervised WSD has been explored a lot in the early 00's thanks to the senseval workshop, but a few system actually benefits from WSD because manually crafted sense mappings are hard to obtain. These days we see a lot of evidence that unsupervised clustering improves NLP tasks such as NER, parsing, SRL, etc,
-
http://www.newsmth.net/nForum/#!article/NLP/43 zibuyu (得之我幸失之我命), NLP常用信息资源, 水木社区 (Wed Mar 14 23:56:43 2007)
-
http://www.newsmth.net/nForum/#!article/NLP/3849 zibuyu (得之我幸失之我命), NLP常用开源/免费工具, 水木社区 (Wed Mar 14 23:56:43 2007)
-
http://www.newsmth.net/nForum/#!article/NLP/5461 zibuyu (得之我幸失之我命), NLP领域经典综述, 水木社区 (Tue Feb 24 11:13:53 2009)
-
http://www.zhihu.com/question/19929473 "目前常用的自然语言处理开源项目/开发包有哪些?" 孔牧, 邱锡鹏, 裴飞, 贺一帆 武博文
-
http://www.zhihu.com/question/19895141 "自然语言处理怎么最快入门?"