This project was conducted as part of a research internship aimed at exploring and developing advanced methodologies in stance detection for sentiment analysis. The primary focus of the project was to build reliable classifiers to categorize reviews into neutral, opposing, and supportive stances using both machine learning and deep learning models.
The main objectives of the project were:
- To investigate the efficacy of different models in stance detection.
- To develop reliable classifiers using machine learning and deep learning techniques.
- To analyze the impact of class imbalance and apply techniques like SMOTE and back translation to mitigate it.
- To enhance stance categorization with topic modeling and introduce trust-based accuracy metrics for reliable predictions.
The methodology adopted for this project involved several key steps:
-
Literature Review:
- Conducted a comprehensive review of existing literature to identify gaps and opportunities in the field of stance detection.
-
Data Collection:
- Gathered relevant data from sources like Amazon, FNC-1, Ethereum, and mixed datasets using standard data collection methods.
-
Data Analysis:
- Applied statistical and machine learning techniques to analyze the data, including support vector machines, random forests, and deep learning models like LSTM and BiLSTM.
-
Model Development:
- Developed predictive models using tools like Scikit-learn and Keras, aimed at stance detection in review texts.
-
Validation:
- Tested and validated the models using validation techniques like accuracy and trust-based accuracy, ensuring robustness and accuracy.
-
Implementation:
- Implemented the solution in a practical environment, demonstrating its utility in sentiment analysis and opinion mining.
The project yielded significant findings:
- The best model based on accuracy for the Amazon dataset was Random Forest with an accuracy of 0.9756.
- The best model based on trust-based accuracy for the Amazon dataset was also Random Forest with a trust-based accuracy of 1.0.
- For the mixed dataset, the best model based on accuracy was CNN with an accuracy of 0.8224, and Naive Bayes had the best trust-based accuracy of 0.9523.
- Various models were compared across different datasets, highlighting the importance of addressing class imbalance for improved performance.
The project successfully demonstrated the application of machine learning and deep learning models in stance detection. Techniques like SMOTE and back translation were crucial in mitigating class imbalance, and the introduction of trust-based accuracy metrics provided more reliable predictions. The enhanced stance categorization with topic modeling further improved the feature extraction process, leading to better model performance.
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
- Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization.
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(Oct), 2825-2830.