Veganism is a social justice movement dedicated to ending the exploitation and oppression of animals. Advocates view it as an ethical commitment to stop using animals as commodities, emphasizing their right to live without harm. The movement challenges long-standing norms that have accepted animal exploitation, promoting both individual responsibility and broader societal change. As veganism gains traction, it elicits a range of public responses, influencing societal acceptance and policy decisions. This project aims to develop a sentiment analysis system to gauge public attitudes toward veganism, providing valuable insights for advocacy and fostering informed discussions about animal rights. By examining language and tone in online conversations, the system can identify patterns in public perception and track changes in understanding and support. This detailed analysis can help vegan advocates and animal rights organizations tailor their strategies, contributing to a more empathetic public awareness of animal rights issues. Ultimately, the tool seeks to support the vegan movement by clarifying public perceptions and encouraging ethical discussions about animal treatment. Additionally, understanding these sentiments can highlight areas of resistance and support, guiding more effective outreach and education efforts. By leveraging machine learning, this project aims to offer a scalable and robust solution for monitoring public opinion on veganism, aiding in the broader goal of achieving justice and compassion for all sentient beings.
LITERATURE REVIEW:
Rikters, M., & Kale, M. (2021) analyze Twitter sentiments on meat consumption over ten years, emphasizing its health and environmental impacts. They use sentiment analysis to categorize tweets in Latvian, revealing public attitudes towards meat and alternative proteins. The study highlights the environmental cost of meat production and the influence of seasonal food preferences, advocating for interdisciplinary research to address food consumption and sustainability.
Shamoi et al. (2022) analyze public sentiment on vegan diets using Twitter data, employing mutual information for feature selection. The study reveals a positive shift in sentiment towards veganism over 12 years, driven by health benefits, COVID-19, and climate concerns. The authors emphasize the importance of sentiment analysis in promoting healthy, sustainable eating habits.
Jennings et al. (2021) investigate perceptions of veganism through surveys and social media analysis. They find that non-vegans are skeptical about the health benefits of veganism and perceive it as less healthy and difficult compared to vegans. Social media analysis reveals positive sentiment towards veganism, suggesting that leveraging social media could help improve public perceptions and encourage adoption. The study highlights the role of social media in shaping dietary behaviors and calls for further research on its impact on public health initiatives.
Park and Kim (2022) investigate how vegans and nonvegans view veganism during the COVID-19 pandemic. Through Word2Vec and qualitative analysis of Reddit discussions, they explore key aspects of veganism, including lifestyle, animal rights, and food. The study reveals biases against veganism among nonvegans and examines how the pandemic has influenced food choices. The authors propose that understanding these differing perspectives can help address biases and encourage the adoption of veganism.
Kadel et al. (2024) explore how Instagram influences perceptions of veganism and its effect on eating intentions. By analyzing 44,316 posts tagged with #vegan, they discover that content frequently focuses on food, health, cosmetics, and photography, with a generally positive sentiment. The study indicates that viewing vegan content on Instagram is associated with eating intentions, with attitude and self-identity playing significant roles. The authors emphasize the potential of social media to encourage healthy eating habits and suggest further research on its impact on dietary choices.
Gangrade, Shrivastava, and Gangrade (2018-19) investigate sentiment analysis on Instagram using natural language processing and Thayer’s psychologically defined model. They propose a method for classifying sentiments by extracting keywords from hashtags, achieving an accuracy rate of 90.7%. The study highlights Instagram's role in emotional expression and suggests that their method offers a nuanced understanding of user emotions, surpassing traditional polarity classification. The authors recommend applying this approach to analyze social phenomena across various fields.
Karimvand et al. (2018-19) introduce a multimodal deep learning method for sentiment analysis of Persian Instagram posts, using a bi-directional gated recurrent unit for text and a 2-dimensional convolutional neural network for images. Their new dataset, MPerInst, demonstrates that combining text and image modalities significantly enhances sentiment detection accuracy and F1-score. The proposed model outperforms existing deep fusion models and highlights the potential of multimodal approaches for analyzing social media sentiment. The authors advocate for further research in multimodal sentiment analysis and its diverse applications.
Architecture and Design:
a. Objective Definition:
The main goal is to create a dependable and precise machine learning-based sentiment analysis system that categorizes sentiments about veganism as positive, negative, or neutral. This system aims to analyze public opinion, offering valuable insights for advocates, researchers, and organizations dedicated to veganism as a social justice movement.
b. System Components:
- Data Collection:
Gather a comprehensive dataset containing text data reflecting sentiments toward veganism. Ensuring a balanced dataset is crucial to accurately represent positive, negative, and neutral sentiments without bias. For this project, data has been sourced from Kaggle, ensuring a diverse range of opinions and contexts.
- Data Preprocessing:
Conduct standard text cleaning and preprocessing to prepare the data for analysis. This involves converting text to lowercase, removing punctuation, and using TF-IDF vectorization to transform text into numerical features. Preprocessing is vital to ensure consistency and enhance the model’s ability to generalize across various text sources.
- Model Development:
Develop and evaluate four machine learning models—Support Vector Classification (SVC), Logistic Regression, and K-Nearest Neighbors (KNN)—to compare their performance. These models are selected for their varied strengths in handling different aspects of text classification, from boundary definition to similarity-based and ensemble learning approaches.
c. Implementation Plan:
Dataset Preparation: The dataset, sourced from Kaggle, consists of 50,000 rows of text data labeled with positive, negative, or neutral sentiments toward veganism. Preprocessing involves standardizing the input by converting text to lowercase and removing punctuation or non-alphabetic characters. To enhance the model’s generalization capabilities, 10% of sentiment labels are modified to simulate noise. Text data is then transformed into numerical features using TF-IDF vectorization with n-grams (1-2) and a feature limit of 5,000 to capture key patterns and phrases.
Model Architecture: The proposed system architecture integrates multiple components to process and analyze sentiment data effectively.
Feature Extraction: To optimize text data for machine learning, TF-IDF vectorization is applied. This method transforms text data into numerical features using Term Frequency-Inverse Document Frequency (TF-IDF) with n-grams (1-2) and a maximum of 5,000 to 6,000 features, capturing nuanced patterns and phrases to enhance the model's ability to differentiate sentiment classes effectively.
Model Implementation and Training: Three machine learning models are implemented and trained: Support Vector Classification (SVC) for its precise boundary definition, Logistic Regression as a baseline model for its interpretability, and K-Nearest Neighbors (KNN) for its similarity-based classification to capture sentiment trends in context.
Training and Validation: The dataset is split into training (80%) and validation (20%) sets. k-Fold Cross-Validation is utilized to enhance model generalization and monitor for overfitting. Performance metrics such as accuracy, precision, recall, and F1 score are tracked to optimize model performance.
Model Evaluation: Overall accuracy is measured to understand the proportion of correct predictions across all sentiment classes. A detailed classification report is generated to review precision, recall, and F1-score for each class (positive, negative, neutral), providing insights into the model’s performance on individual sentiments. The weighted F1 score is calculated to balance precision and recall, especially useful for assessing performance across imbalanced sentiment classes. The confusion matrix is analyzed to observe the distribution of true positives, true negatives, false positives, and false negatives, helping identify patterns in misclassification and areas where the model may need refinement.
Deployment Strategy: The model is deployed on a cloud platform to provide scalable, remote access for social justice organizations and researchers.
Training and Support: User training materials are developed to help users effectively monitor and analyze public opinion. Ongoing technical support is offered to ensure effective implementation and troubleshooting.
Feedback Loop: A user feedback mechanism is established for continuous improvement based on user feedback. The model is regularly updated with new data to maintain accuracy and relevance.
Diagram for data flow diagram of methodology:
Sunali Bhattacherji * 2
Omkar Singh 1
10.5281/zenodo.15082266