2B. Statistical methods for textual data

14:30 - 15:40, Aula 10


Organizer: Michelangelo Misuraca

Chair: Michelangelo Misuraca


PROCSIMA: Probability Distribution Clustering Using Similarity Matrix Analysis


Marco Ortu


Abstract: This study presents PROCSIMA, a methodological approach to document clustering, that defines a similarity metric derived from Jensen-Shannon divergence, for measuring similarities between topic probability distributions obtained from Topic Modeling techniques, such as Latent Dirichlet Allocation (LDA). Unlike conventional approaches that allocate documents to a singular, most pertinent topic, PROCSIMA allocates the clustering of documents by considering their comprehensive topic distribution. By transforming the similarity matrix into an adjacency matrix and subsequently applying community detection algorithms it defines document clusters. Empirical validation on both synthetic and real-world datasets is performed by PROCSIMA by bootstrapping the optimal number of network communities to outperform traditional clustering methods.

Click here to view the abstract.

Exploring Anti-Migrant Rhetoric on Italian Social Media


Lara Fontanella, Annalina Sarra, Emiliano del Gobbo, Alex Cucco and Sara Fontanella


Abstract: In our digital era, the pervasive expansion of social media is leading to a significant transformation in the dynamics of communication. Social media platforms have evolved into arenas where both individuals and communities articulate their sentiments and convictions, be they positive or negative, regarding relevant topics. In the wake of the refugee crisis of 2015, anti-immigration sentiments are permeating public discourse across the global North. Digital environments unleash and amplify everyday racism, facilitated by the ability to remain anonymous and the widespread availability of these messages. The primary objective of this study is to systematically analyze how social media platforms have portrayed immigrants, mi- grants, asylum seekers, and refugees in recent years. Over 185,000 comments were collected and analyzed using a seededLDA technique.

Click here to view the abstract.

Causal inference from texts: a random-forest approach


Chiara Di Maria, Alessandro Albano, Mariangela Sciandra and Antonella Plaia


Abstract: This paper employs causal random forests to analyse textual reviews in an e-commerce context, specifically investigating the causal impact of sentiment on the Positive Feedback Count (PFC). The PFC denotes the number of users who found the review helpful. The results uncover a negative causal effect, indicating that transitioning from negative to positive sentiment reduces the count of users perceiving a review as helpful. The analysis further explores heterogeneity, highlighting the nuanced influence of specific words and variations in treatment effects. This research underscores the efficacy of causal inference in elucidating the intricate dynamics between sentiment and the perceived utility of reviews.

Click here to view the abstract.

 

A work by Gianluca Sottile

(on behalf of the local organizing committee)