Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12323/8047
Title: Sentiment Analysis Using Machine Learning Methods On Social Media
Other Titles: Sosial mediada maşın öyrənmə metodlarından istifadə edərək sentiment analizin aparılması
Authors: Damirova, Jamila Ilham
Keywords: machine learning methods
sentiment analysis
Issue Date: 2025
Series/Report no.: ;Master thesis
Abstract: Social media has rapidly transformed into a key platform in which people express thoughts, feelings, as well as reactions towards current events, political decisions, social issues, also commercial experiences. Among those platforms, Twitter distinguishes itself via the relative brevity of the posts as well as the large intensity of more public interaction, making it such a perfect ground for sentiment analysis. In this thesis, the sole focus lies upon extracting meaningful emotional patterns from user-generated content which is on Twitter by using machine learning algorithms. The research scope is not restricted solely to sentiment polarity identification but stretches toward the structural optimisation of sentiment analysis models using Azerbaijani-language data, with specific emphasis upon recent sociopolitical discourse touching aviation incidents. Instead of the usual surveys and structured feedback forms, Twitter posts show several spontaneous reactions. These reactions, completely unfiltered, contain nothing that acts as a filter. This spontaneity introduces linguistic noise, informal syntax, and wide-ranging usage of abbreviations, emojis, and colloquialisms within the Azerbaijani language, particularly because it lacks adequate annotated corpora for computational analysis. To overcome these limitations, the study adopts a strict preprocessing pipeline that includes spelling normalization, tokenization, removal of stop words, in addition to lemmatization, with careful handling of non-standard text elements. After filtering, the data is vectorized using word embeddings, specifically Word2Vec and BERT. This permits semantic and contextual representation far beyond frequency analysis alone. The dataset was additionally improved via established stratified sampling techniques so as to ensure balanced representation of sentiments, thereby minimising bias in model training as well as within evaluation. Beyond just classical evaluation metrics, confusion matrices were analysed in a visual way for classification errors and also to help refine the decision boundaries within the models. Attention was paid in addition to temporal trends that are within sentiment expression, revealing shifts in emotional tone during phases that are within public discourse. Language-specific challenges, such as a scarcity of sentiment lexicons and of pre-trained models in Azerbaijani, were reduced via manual annotation as well as domain adaptation strategies. For further improved model generalisation, k-fold cross-validation was applied along with hyperparameter tuning via randomised search. The machine learning algorithms applied—Logistic Regression, Support Vector Machines, and Random Forest—were completely compared utilising metrics like accuracy, precision, recall, as well as F1-score. Each model was evaluated both on a general sentiment dataset with emotionally tagged keywords such as "love", "hate", "disappointed", as well as on one context-specific dataset of Azerbaijani tweets related to the Aktauda aviation incident. The results do show Random Forest and SVM manage subtle language cues in a better way than Logistic Regression does, especially in irony, blame, or sarcasm, but Logistic Regression is okay in simple contexts. Among these, Random Forest emerged as being the most strong while reaching up to 89% in accuracy. It demonstrated performance balanced across some sentiment classes. The analysis does also confirm social media posts as expressive indicators for public emotion and function as a lens through which political and cultural narratives then unfold. For instance, tweets about that aviation incident communicated no less than fear and grief, but also politically charged accusations together with calls for accountability. These feelings, when charted, display public trust, anger, or admiration, each varying as a reaction to government and world responses. This confirms the total planned value of sentiment analysis within public policy evaluation, crisis communication, and within media monitoring. Through integration of domain-specific feature engineering, alongside contextual embeddings, and incorporation of linguistic particularities natural to the Azerbaijani language, this study contributes a methodological framework that is adaptable for many under-resourced languages. It puts forward also a rather dynamic approach for sentiment classification. This approach remains effective within a constantly evolving online vernacular. In contrast to static, lexicon-based models, this ensemble learning method is more responsive to present data, enabling institutions, journalists, and researchers to interpret digital emotions faster and more accurately. To conclude, the research shows sentiment analysis on Twitter with machine learning is far more than mere computation; it is a socio-technical study of just how societies feel and then voice emotion online. By decoding emotional undercurrents within social media discourse, notably during times of crisis, decision-makers are better equipped to understand collective psychology, respond to misinformation directly, and engage with citizens meaningfully in the era of digitised public opinion.
Description: Faculty: Graduate School of Science, Art and Technology Department: Computer Science Specialty: Informatics Supervisor: PhD, Associate Professor Leyla Muradkhanli Gazanfar
URI: http://hdl.handle.net/20.500.12323/8047
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
Sentiment Analysis Using Machine Learning Methods On Social Media.pdf1.03 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.