Term: Spring 2017
- Team 2
- Projec title: Who is Who -- Author Disambiguation
- Team members
- Jiahui Tan
- Ruxue Peng
- Jiahao Zhang
- Xuanzi Xu
- Tongyue Liu
- Project summary: In this project, we implement two algorithms on Author Disambiguation: (1) Naive Bayes method proposed by "Two Supervised Learning Approaches for Name Disambiguation in Citations" (Han,2004) and (2) Agglometative Clustering proposed by "Author Disambiguation Using Error Driven Machine Learning with a Ranking Loss Function". We further evaluate the prediction accuracy, algorithm sensitivity as well as implementation easiness of both methods respectively. Due to the limitation of computational capacity, we made some reasonable modification to the original algorithm suggested in paper 5 (Culotta 2007). Suggestions for further improvement are also provided.
Contribution statement: (default)
Paper 2(Naive Bayes) is implemented by Xuanzi Xu and Tongyue Liu
Paper 5(Error Driven) is implemented by Jiahui Tan, Ruxue Peng and Jiahao Zhang
All team members contributed equally in all stages of this project. All team members approve our work presented in this GitHub repository including this contributions statement.
Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.
proj/
βββ lib/
βββ data/
βββ doc/
βββ figs/
βββ output/
To reproduce the result, please first go to doc subfolder for a README file.