C8. Advances in text mining

08:40 - 09:50, Aula 12


Chair: Mariangela Sciandra


Can Correspondence Analysis Challenge Transformers in Authorship Attribution Tasks?


Andrea Sciandra and Arjuna Tuzzi


Abstract: With reference to a large corpus of 76 Italian contemporary popular mystery novels by 16 different authors, this study aims to assess the performance of large language models in an authorship attribution test. The results obtained through both transformers and correspondence analysis vector representations are compared and contrast in machine learning classification tasks. Although in previous works transformers have been shown to perform better than other alternatives, in this case, correspondence analysis wins the challenge. Results support the hypothesis that specialized large corpora require tailor-made representations.

Click here to view the abstract.

EmurStat: a digital tool for statistical analysis of emur flow


Simone Paesano, Maria Gabriella Grassia, Marina Marino, Dario Sacco and Rocco Mazza


Abstract: New Public Management (NPM) emphasizes the use of market-based techniques to improve efficiency and effectiveness in public service delivery. This approach seeks to promote accountability and performance measurement. Key performance indicators describe the performance of processes that characterize a specific workflow. One of the concepts that has emerged in the last decade is Precision Public Health, which integrates traditional determinants of health with new approaches such as data science and health economics. Moreover, visualizations help to understand social determinants of health and public health indicators. This paper aims to present a useful application for data visualization, processing, and analysis for understanding and evaluating the performance of services provided by emergency rooms, through the lunge on a specific case.

Click here to view the abstract.

Graph Neural Networks for clustering medical documents


Vittorio Torri and Francesca Ieva


Abstract: Clustering is one of the most challenging tasks in the field of Natural Language Processing, due to the high dimensionality of textual data. Different types of document embeddings have been proposed in the past, often based on the transformer neural network architecture. In this work, we propose to exploit a graph-based representation combining it with the recent advancements in the field of graph neural networks. While graph neural networks achieved promising results in document classification, their potential for document clustering has not been explored yet. In particular, we propose an application in the medical domain, where document clustering is of paramount importance due to the large amount of information present in medical documents and the difficulties in labelling them.

Click here to view the abstract.

 

A work by Gianluca Sottile

(on behalf of the local organizing committee)