Progetti di ricerca

I nostri studenti e le nostre studentesse sono invitati a prendere parte ai progetti attivi presso la Sezione di Linguistica teorica e applicata.

Possono manifestare il proprio interesse contattando i/le docenti responsabili di ciascun progetto elencato qui di seguito.

Verb Valency in Germanic. Diachronic analysis and reconstruction of protolinguistic scenario

Dott. Matteo Tarsi - Marie-Marie Skłodowska-Curie Fellow

Verbal valency is the number of core arguments a verb can take, hence instantiating a particular valency frame. There are for instance zero-valent verbs e.g. to rain (Eng. it rains); monovalent verbs e.g. to sleep(Eng. I sleep); bivalent verbs e.g. to kiss (Eng. the girl kisses the boy); trivalent verbs e.g. to give (Eng. the boy gives a present to the girl); etc. The project’s overarching aim is to map out and analyze the valency patterns of Germanic. This is done by selecting a number of representative language varieties within Germanic (= Gmc) and by tracing the evolution and variation of valency patterns for a selection of verb meanings, following the methodological gold standard set out by the ValPaL project.

The to-be-researched languages are: Gothic (East Gmc); English, German, Dutch, and Frisian (West Gmc); Icelandic and Swedish (North Gmc). The present project contributes to advancing that methodology and it is inscribed in the research activity carried out by the Pavia Linguistics Team within the Pavia Verbs Database project (PaVeDa).

The overarching aim of my project is made up of the following objectives:

to gather all elicited data in a digital resource;
to pursue a diachronic analysis of valency patterns within the history of single languages;
to carry out a synchronic analysis of valency patterns in the Germanic languages at different language stages;
to reconstruct valency patterns of Germanic in prehistoric times.

The most relevant direct outcomes of the project are:

to make the data available to other researchers;
to contribute to the actual research in diachronic morphosyntax;
to expand the PaVeDa database with diachronic data for Germanic;
to establish a well-defined benchmark within the addressed research area, by implementing a standard tool designed at the host institution.

Go to the project website!

Inequality and Cancer: Investigating Access to Resources for Prevention and Treatment (ICaRe)

Dott. Chiara Cassani - Principal Investigator

Prof. Silvia Luraghi - head of the University of Pavia unit

Disparities in access to healthcare resources and information pose a significant
challenge, disproportionately impacting the life and the cancer outcomes of
disadvantaged populations. The ICaRe research project investigates the
intersectionality of factors contributing to inequalities in accessing cancer
information and resources in Italy. By analyzing variables such as gender, sexual
orientation, geographic origin, age, and education level and literacy, the project aims to identify the key drivers of these inequalities. These variables, both independently and through their interactions, contribute to a complex landscape of inequality, shaping difficulties in accessing and understanding information and resources on cancer prevention and treatment. Particular attention is given to individuals who identify with sexual and gender minorities (SGM) who exhibit one or more of the other vulnerabilities outlined above, as they embody a minority within minorities. An important component of the ICaRe project will involve investigating how current cancer health communication materials in Italy fail to achieve linguistic accessibility and inclusivity for the vulnerable populations outlined by the afore-mentioned factors. At the same time, this current situation prompts for the development of more accessible and inclusive communication strategies to overcome this issue. The insights of this investigation will serve as a springboard for developing targeted interventions and evidence -based policies to promote better access to cancer resources and information in Italy. Eventually this could contribute to achieving health equity and reducing disparities in cancer outcomes.

Researchers of the University of Pavia unit: Luca Brigada Villa, Serena Coschignano, Ilaria Fiorentini, Marco Forlano, Chiara Zanchi

Linked WordNets for Ancient Indo-European languages

PRIN 2022 - MUR funded - 2022YAPFNJ

Prof. Chiara Zanchi - project lead

This project aims to expand and link three existing WordNets (WNs) of ancient Indo-European languages (Sanskrit, Ancient Greek, and Latin) with other linguistic resources.

WNs are lexical databases representing the lexicon in a relational way. Meanings are associated to lexical entries as synsets, brief glosses identified by an ID number. The lexical entries sharing the same synset form a synonymic set. Synsets are connected with one another through conceptual-semantic relations, while lexical entries are linked through lexical relations. The original WN was designed for English (Miller et al 1990); later, several WNs have been developed for modern and ancient languages (on Latin and Ancient Greek, see Minozzi 2009, Bizzoni et al 2014, Boschetti 2019).

The first goal of this project is to harmonize and refine these three WNs and make them interoperable. WNs have the potential to allow for crosslinguistic semantic comparison by using the same set of synsets from the original WN. To ensure interoperability, our WNs will share the same architecture, theoretical framework, annotation workflow, and guidelines. In addition, the creation of new synsets and other structural adjustments will be kept to the very minimum. However, we will introduce some crucial innovations. Our lexicographic work will be framed within a principled view of polysemy drawn from cognitive linguistics (Winters et al 2020).

Furthermore, we will enrich our WNs with philological and morphological information (periodization, literary genre, loci of attestation, principal parts and alternative/irregular forms of paradigms, etymology). These addenda will account for the dynamicity of languages’ lexicon and make our WNs appealing to a larger audience of scholars and students. Our second goal is to enlarge the WNs. To do this, our methods will combine automatic and manual annotations. We will import as much data as possible from available resources (e.g., etymological and domain-specific dictionaries, morphological analyzers and lemmatizers). We will try and evaluate the application of data-driven methods to ancient languages, such as word embeddings (Khodak 2017), parallel corpora (Apidanaki/Sagot 2014), and automatic hypernym discovery from learnt syntactic patterns (Snow et al 2005); the results obtained will be validated by human annotators. Human annotators will also perform parallel manual annotations on sets of agreed near-equivalents in the three languages. Our third goal is to link our WNs with other textual and lexical resources by implementing the principles of Linguistic Linked Open Data (Cimiano et al 2020). In particular, we will add sentence and semantic frame information to verbal entries, by linking them with morphosyntactically annotated corpora, valency lexica, and FrameNet. Finally, we aim to make our WNs and interlinked resources accessible for everyone through a user-friendly open-source interface.

Main researchers of the Pavia Unit:
Chiara Zanchi, Erica Biagetti, Stefano Rocchi

This project continues the work on the Ancient Greek WordNet and the Sanskrit WordNet carried out in the framework of an international agreement between the University of Pavia (referent: Silvia Luraghi) and the University of Exeter (referent: William M. Short).

Visit the project website!

MinEdu – Supporting minority languages in educational contexts

PRIN2022 - MUR funded project - 2022HBK4NP

Prof. Ilaria Fiorentini - project lead

MinEdu project (Pavia, Bolzano, Udine, Pisa) aims at supporting the development of historical
minority languages in Italy through the creation of collaborative networks that support teaching in
multilingual contexts. Over the last twenty years, national and local legislation has facilitated the
implementation of educational initiatives in multilingual contexts. However, several challenges
have arisen, including the establishment of curricula, the maintenance of vertical continuity, and a
shortage of university courses providing adequate preparation for teachers. It is also crucial to share
experiences and best practices while ensuring that teachers possess efficient linguistic,
methodological, and cultural skills. In this context, our aim is to collect authentic data from Friuli,
Dolomitic Ladinia and Sardinia to gain a complete overview of existing practices and problems in
these areas. This will allow us to offer theoretical insights on teaching minority languages within
the wider context of language education, at the same time providing practical support for minority
languages in general, and the dissemination of the outcomes for both the scientific and educational
communities.

MinEdu – Sostenere le lingue minoritarie nei contesti educativi

Il progetto MinEdu (Pavia, Bolzano, Udine, Pisa) mira a sostenere lo sviluppo delle lingue
minoritarie storiche in Italia attraverso la creazione di reti che supportino l'insegnamento in contesti
plurilingui. Negli ultimi vent'anni, leggi nazionali e regionali hanno sostenuto iniziative educative
in tali contesti. Tuttavia, sono emerse diverse sfide, tra cui la definizione dei curricoli, il
mantenimento della continuità verticale e la carenza di corsi universitari che forniscano una
preparazione adeguata agli insegnanti. È inoltre fondamentale condividere le esperienze e le buone
pratiche, assicurandosi che gli insegnanti possiedano competenze linguistiche, metodologiche e
culturali efficaci. Il nostro obiettivo è dunque quello di raccogliere dati autentici in Friuli, Ladinia
dolomitica e Sardegna per ottenere una panoramica completa delle pratiche e dei problemi in queste
aree. Questo ci permetterà di offrire spunti teorici sull'insegnamento delle lingue minoritarie nel
contesto più ampio dell’educazione linguistica, fornendo inoltre un supporto pratico per le lingue
minoritarie in generale e garantendo la diffusione dei risultati presso la comunità scientifica e quella
educativa.

Main researchers of the Pavia Unit:
Ilaria Fiorentini, Michel Florent Georges Wauthion

Verbs’ constructional patterns across languages: a multi-dimensional investigation

PRIN2022 - MUR funded project - 20223XH5XM

Prof. Silvia Luraghi - Project Lead

Cross-linguistically verb classes show similar patterns as to their argument structure constructions, or valency patterns. This similarity raised the attention of researchers, who studied the range of possible variation across verb classes emerging from languages of different genetic and areal affiliation trying to extract general coding tendencies. In spite of numerous attempts to investigate constructional patterns cross-linguistically, no systematic comparative study on diachronic developments across languages is available. The Pavia research team developed a database under the PaVeDa project funded by the University of Pavia for the year 2021 (https://hodel.unipv.it/paveda), which is configured to contrastively display valency patterns simultaneously in different languages. The datasets from several ancient languages (Early Latin, Ancient Greek, Gothic, Old Irish, Old English, Classical Armenian) are ready to be uploaded in the database, along with the modern languages stored in the ValPaL database (Hartmann et al 2013; http://valpal.info), importable by means of a script that we have designed. The latter database contains data for 80 verb meanings from 36 languages; it is the result of the project “Valency classes in the languages of the world” carried out at Leipzig University from 2009 to 2013 (Malchukov, Comrie 2015). While the ValPaL database does not allow for contrastive visualization of constructions across the languages it contains, developers of the PaVeDa database designed a special layer of annotation that allows generalizing over language-specific patterns, and makes them visually comparable. Work on ancient languages also brought to methodology redesign, as these languages can only be studied based on corpus data rather than relying on the native speakers’ knowledge. This practice brings about a usage-based methodology that we count on implementing for modern languages too, linking the data on constructional patterns to existing digitalized corpora. With our project, we aim to further develop both typological and diachronic comparison by adding more languages, both ancient and modern, from language families already represented in the ValPal database (Indo-European and Afro-Asiatic) as well as from families that are not represented (Uralic and Turkic). Contrastive study will also enable applications with an impact on language acquisition and L2 learning and teaching of verbal constructions. Our project is embedded in current research on verbal constructions. We build on results reached by members of the team in earlier projects: ValPaL project to which some members of our research team contributed, PRIN 2015 project “Transitivity and argument structure in flux” (20159M7X5P), PaVeDa project. The database will be made available to the scientific community on an open-source basis through a dedicated web platform. We are planning on cooperating with a number of international partners which have already agreed to participate.

Main researchers of the Pavia Unit:
Silvia Luraghi, Vermondo Brugnatelli (Università degli Studi di Milano Bicocca), Erica Pinelli, Elisa Roma

This project continues the work carried out in the framework of the project PaVeDa–Pavia Verbs Database. Information on this project and related events can be found here.

Visit the project website!

SELSI (Spoken Easy Language for Social Inclusion)

Il progetto biennale SELSI (Spoken Easy Language for Social Inclusion, 2022-2024), finanziato dall’Unione Europea nell’ambito del programma Erasmus+, Azione Chiave KA220-ADU - Partenariato di cooperazione nell'educazione degli adulti, ha preso avvio il 1° novembre 2022, con un budget di 250.000,00 €.

L’obiettivo del progetto è di indagare – in prospettiva teorica e applicativa –diversi aspetti della semplificazione linguistica (Easy Language) in contesti di oralità, e di identificare raccomandazioni e strategie pratiche per favorire la comunicazione orale semplificata, indipendentemente dalla lingua utilizzata, per fornire un supporto adeguato agli adulti con disabilità cognitive e intellettive.

La peculiarità del progetto è proprio il focus sui contesti di oralità. La Lingua Facile, nota anche come Easy Language, infatti, è stata finora indagata da parte della comunità scientifica europea quasi esclusivamente nella modalità scritta. I risultati del progetto rappresenteranno quindi un contributo innovativo e un importante punto di partenza per ulteriori approfondimenti, anche specifici per lingua.

Il progetto è coordinato da ZAVOD RISA (Centro sloveno di alfabetizzazione generale, funzionale e culturale). Il partenariato di progetto comprende le Università di Pavia (PI: E. Perego, Dipartimento di Studi Umanistici), di Trieste e di Vilnius, la Radiotelevisione della Repubblica di Slovenia, l’Ente per la Lingua Facile di Riga, il Centro di raccolta e diffusione delle informazioni di Vilnius e l’Associazione Nazionale Svedese per la Dislessia.

Coordinatore partner pavese: Elisa Perego

SELSI (Spoken Easy Language for Social Inclusion) | EN

The two-year project SELSI (Spoken Easy Language for Social Inclusion, 2022-2024), which is funded by the European Union under the Erasmus+ programme, Key Action KA220-ADU - Cooperation Partnership in Adult Education, started on 1 November 2022 with a budget of €250,000.00.

The aim of the project is to analyse, both from a theoretical and practical point of view, different aspects of linguistic simplification (Easy Language) in oral contexts and to identify recommendations and practical strategies to promote simplified oral communication, regardless of the language used, in order to provide adequate support to adults with cognitive and intellectual disabilities.

The specificity of the project lies precisely in the fact that it focuses on oral contexts. Easy Language (a language variety covering the maximum comprehensibility level) has so far been studied by the European scientific community almost exclusively in written form. The results of the project will therefore represent an innovative contribution and an important starting point for further research, including language-specific research.

The project is coordinated by ZAVOD RISA (Slovenian Centre for General, Functional and Cultural Literacy). The project partnership includes the Universities of Pavia (PI: E. Perego, Department of Humanities), Trieste and Vilnius, the Radiotelevision of the Republic of Slovenia, the Easy Language Authority of Riga, the Information Collection and Dissemination Centre of Vilnius and the Swedish National Association for Dyslexia.

Main researcher of the Pavia Unit: Elisa Perego

Visit the project website!

The informalisation of English language learning through the media: language input, learning outcomes and sociolinguistic attitudes from an Italian perspective

PRIN - MUR funded national project (2022-2025)

Prof. Maria Pavesi – Principal Investigator

Owing to contemporary globalisation, multilingualism and media saturation, the availability of
English in traditional and new media has increased at an unprecedented rate. Extensive access to the language outside institutional settings is leading to a growing informalisation of L2 learning and use. No longer restricted to institutional sites, the learning of English as an additional language (L2) emerges naturally from user-initiated extramural engagement with popular media, web technologies and social encounters for non-primarily language learning reasons. Concurrently, the informalisation of L2 learning and use is changing students’ language attitudes toward English. These multifaceted trends have been investigated in many European countries. By contrast, although Italy appears to be experiencing a similar radical change, little research has been carried out on the acquisitional and sociolinguistic impact of media-induced contact with English in this country.

This project probes Italian university students’ private worlds and the undetected processes which are shaping English informal learning and use. It studies the degree, type and modality of media exposure, while engaging with the outcomes of informal language learning and learner-users’ beliefs and orientations to English as a native language (ENL), foreign language (EFL) and lingua franca (ELF). The project draws on functional, interaction-based and cognitive approaches to second language acquisition (SLA) with a focus on media genres.

The investigation involves four territorially differentiated universities: Pavia – the leading
University –, Pisa, Salento and Catania. It employs a mixed research design that combines cross-sectional and longitudinal data collections coupled with an array of empirical tools and quantitative and qualitative approaches to data analysis. It is organized in three phases:
1) By means of 2,000 questionnaires, information will be collected on students’ personal data,
extra-linguistic and social characteristics. The main variables to be tapped include frequency, type of media exposure (e.g., TV series, songs, social networks, blogs/vlogs, YouTube, videogames, websites, (online) press, radio, podcast) and modality of access (English-only vs
subtitled/multilingual input, receptive-only vs interactional use), motivation and goals, instruction and non-media contact with English (e.g., travelling and mobility programmes). The survey will comprise self-evaluations of linguistic competence and assessment of general lexical knowledge to allow correlations between the factors investigated and language learning outcomes.
2) Due to the private nature of most informal learning, ethnographic studies through semi-structured interviews will be carried out on purposefully selected respondents to gain an in-depth picture of behavioural patterns across time, beliefs and motivations associated with English-language media.
3) Longitudinal studies of presently untutored high-exposure respondents will be conducted within the Complexity, Accuracy, Fluency (CAF) framework and testing specific late-acquired areas of the L2 – including spoken language grammar and pragmatics.

Corpus-based descriptions of relevant media genres will inform hypotheses on the impact of
different input types on the acquisition of L2 English.

The four research units will follow the same protocols in the first stages of the project to guarantee results’ comparability, while developing specific components in later stages of research. The Pavia team will coordinate the development and implementation of the research design. It will contextualise the main issues in informal learning of L2 English while providing corpus-based descriptions of spoken language in audiovisual (AV) dialogue. In the interviews, the Pavia team will focus on participants’ comprehension of AV input and other cognitive processes that may lead to second language acquisition. It will also address participants’ attitudes towards English –ENL, EFL, ELF–, their identities as learners-users, their perception of L2-self/selves and multilingual repertoire developed via the media. In the longitudinal studies, address trajectories of CAF will be investigated, focusing on advanced spoken morpho-syntax.

The present PRIN project is the first one in Italy to charter this highly dynamic and largely
unexplored landscape. It has applications and implications pertaining to English second language acquisition, applied linguistics, sociolinguistics, educational and translation policies. The questionnaire built for the project qualifies as a scientific instrument available for future large-scale investigations in Italy and other countries to supervise the evolving role of English (inter)nationally and observe the correlation patterns between the different factors –including instruction– involved in contemporary language learning. The investigation will advance our understanding of how learners acquire English informally and deploy specific actions that empower them linguistically. It will show how the ubiquity of English has implications for language learning landscapes and the conceptualisation of L2 users, with an effect on speakers’ orientations towards native, learner and ELF varieties.

Main researchers of the Pavia Unit:
Maria Pavesi, Maicol Formentelli, Silvia Monti, Camilla De Riso (Dipartimento di Studi
umanistici) Elisa Ghia, Cristina Mariotti (Dipartimento di Studi politici e sociali), Erik Castello
(Università di Padova)

Visit the project website!

Il paesaggio linguistico torinese

Prof. Ilaria Fiorentini - coordinatrice

Il progetto “Il paesaggio linguistico torinese” ha dalla considerazione che, al contrario di altre grandi città italiane (come per esempio Milano o Roma), il paesaggio linguistico di Torino risulta tuttora scarsamente indagato, a dispetto del riconosciuto plurilinguismo della città.

L’attività di raccolta dati sul campo si è svolta a Torino il 12 e il 13 novembre 2022 e ha visto la partecipazione di 20 studenti e studentesse dell’Università di Pavia (Dipartimento di Studi Umanistici). È stata effettuata la mappatura di varie zone dei quartieri San Salvario, Centro, Porta Palazzo e Quadrilatero, tutti a diverso tasso di plurilinguismo. Sono state raccolte circa 600 fotografie, una selezione delle quali verrà presentata durante la mostra che si terrà presso l’Università di Pavia il 18 marzo 2023 (aula e orari da definire).

L’attività è stata finanziata grazie al Cofinanziamento di Ateneo per attività di ricerca sul campo in ambito umanistico (anno 2022).

Visita il sito del progetto!

La lingua dell’arte tattile inclusiva: mosse comunicative e strategie linguistiche italiano/inglese della narrazione procedurale interattiva rivolta al pubblico cieco e ipovedente

PI: Elisa Perego

Cofinanziamento di Ateneo per attività di ricerca sul campo in ambito umanistico ANNO 2024

Considerata la rilevanza che oggi hanno inclusione sociale e accessibilità ai luoghi della cultura anche in una dimensione europea, è sempre più importante approfondire le caratteristiche della comunicazione museale accessibile e inclusiva, un tipo di comunicazione che viene particolarmente valorizzata in contesti multimodali di fruizione. Il progetto si concentra su una forma specifica di comunicazione accessibile e inclusiva sempre più diffusa e coinvolgente: l’audio descrizione per persone cieche e ipovedenti che accompagna l’esplorazione tattile delle opere descritte e riprodotte per essere toccate, e che prende la forma di una narrazione procedurale spesso interattiva (guida-visitatore).

Attraverso il lavoro sul campo presso il Museo tattile di pittura antica e moderna Anteros dell'Istituto dei ciechi "Francesco Cavazza" di Bologna (https://www.cavazza.it/drupal/it/node/332) si svolgeranno interviste semi strutturate a staff e a utenti ciechi e si raccoglieranno dati linguistici essenziali in italiano e in inglese per tracciare un profilo linguistico dettagliato del genere testuale in questione, ancora poco esplorato specialmente nella sua forma più naturale, ovverosia quella veicolata spontaneamente (live vs. pre-recorded) e a contatto diretto con l’utente finale che ha esigenze speciali.

I risultati dell’attività di ricerca e di analisi linguistica quantitativa e qualitativa permetteranno di 1) restituire alla comunità scientifica risultati cui attingere per comprendere meglio i meccanismi testuali, le mosse comunicative e le norme linguistiche di questo genere testuale anche in una prospettiva contrastiva italiano-inglese, a sua volta finalizzata a identificare le differenze e le analogie tra narrazione procedurale spontanea e preregistrata; 2) stendere linee guida concise, strategiche e flessibili in inglese destinate a guidare i professionisti della comunicazione museale accessibile nella scelta dei comportamenti linguistici più adeguati per rendere l’esplorazione tattile efficace e gratificante; e 3) creare contenuti audiovisivi innovativi con finalità didattiche e divulgative.

I risultati di progetto costituiranno un importante perno per l’espansione della ricerca sulla comunicazione inglese accessibile a livello europeo

EN | The language of inclusive tactile art: communicative moves and Italian/English linguistic strategies of interactive procedural narration for a blind and visually impaired audience

PI: Elisa Perego

University co-funding for field research activities in the humanities field YEAR 2024

Considering the relevance of social inclusion and accessibility to cultural institutions, also in a European dimension, it is increasingly important to address the characteristics of accessible and inclusive museum communication, a type of communication that is particularly appreciated in multimodal consumption contexts. The project focuses on a specific form of accessible and inclusive communication that is becoming increasingly widespread and appealing: audio description for blind and visually impaired people, which accompanies the tactile exploration of reproduced artworks intended to be touched. This specialized text type takes the form of a procedural, often interactive narrative (guide-visitor).

Field research at the Anteros Tactile Museum of Ancient and Modern Painting of the "Francesco Cavazza" Institute for the Blind in Bologna (https://www.cavazza.it/drupal/it/node/332) will involve semi-structured interviews with staff and blind users and the collection of essential linguistic data in Italian and English in order to establish a detailed linguistic profile of the textual genre in question. This is in fact still little researched, especially in its most natural form, that which is conveyed spontaneously (livevs. pre-recorded) and in direct contact with the end user with special needs.

The results of the research and of the quantitative and qualitative linguistic analysis will make it possible 1) to provide the scientific community with results to better understand the textual mechanisms, communicative traits and linguistic norms of this textual genre also from a contrastive Italian-English perspective, with the aim of identifying the differences and analogies between spontaneous and pre-recorded procedural narration; 2) to produce a concise, strategic and flexible guide in English to help professionals in the field of accessible museum communication to select the most appropriate linguistic behaviours to make tactile exploration effective and rewarding; and 3) to create innovative audiovisual content for educational and informational purposes.

The project results will be an important linchpin for the expansion of research on accessible English communication at European level.

WhAP! Corpus WhatsApp Pavia

Prof. Ilaria Fiorentini - Principal Investigator

WhAP! è un corpus linguistico italiano, che, sulla scia di altri corpora, raccoglie dati sullo scritto e il parlato degli utenti WhatsApp. È proprio la peculiarità di WhatsApp di utilizzare sia messaggi vocali sia scritti a risultare interessante per gli studiosi, data un’interazione completamente originale nel mondo della Comunicazione Mediata da Computer (CMC). A questi saranno allegati dei metadati, che indichino le caratteristiche degli individui, come genere, fascia d’età, provenienza geografica etc., per contestualizzare alcune varianti nell’ambito sociolinguistico.

I creatori del corpus si sono occupati di ricostruire le chat con i propri contatti, sia per quanto riguarda messaggi scritti, sia vocali, opportunamente trascritti tramite ELAN, software creato appositamente per l’annotazione linguistica. Dal nostro portale, gli utenti potranno accedere direttamente a queste chat, per fini privati o divulgativi, andando a ricercare fenomeni in particolare, oppure filtrando le chat per certe aree geografiche o fasce d’età. Nonostante la presenza di metadati per ogni chat, i nostri annotatori si sono occupati di anonimizzare ogni nome, via o riferimento contestuale che possa ricondurre all’identità del parlante o dello scrivente, in modo da tutelarne la privacy.

Le chat WhatsApp risultano un territorio di studio vergine da controllo dell’utente, che con i propri contatti applica varietà registiche non influenzate dall’essere osservato da terzi. A questo sia aggiunge la possibilità di studiare messaggi vocali, contenenti fenomeni del parlato spontaneo, molto difficili da reperire in altre maniere, proprio per motivi legati alla privacy. L’ambito di studio della sociolinguistica può così ottenere molti dati su fenomeni quali code mixing, riformulazioni e altri in un medium che si pone esattamente a metà tra scritto prototipico e parlato spontaneo.

Persone
Ilaria Fiorentini, Marco Forlano, Nicholas Nese, Chiara Zanchi, le studentesse e gli studenti della Laurea Magistrale in Linguistica Teorica, Applicata e delle Lingue Moderne

Contribuisci al corpus WhAP!

Pavia Corpus of Film Dialogue

Prof. Maria Pavesi – Project Lead

The Pavia Corpus of Film Dialogue (PCFD) is a parallel and comparable corpus made up of original Italian films and original English films together with their dubbed Italian translations. The corpus was created at University of Pavia where it has been developed since 2005 to investigate translated and non-translated audiovisual dialogues on their own, in parallel and contrastively.

Main aims of the corpus

The PCFD has been conceived as a flexible tool for analysing and comparing film language and audiovisual translation, with a focus on the English-Italian language pair. The corpus allows the pursuit of several objectives. Moving from a target-language orientation to dubbing, a systematic study can be carried out of linguistic, sociolinguistic, pragmatic and translational phenomena to ultimately delineate a profile of contemporary dubbed Italian. The component of the Anglophone original dialogue can also be inspected independently of its dubbed counterpart to look for conversational features and uncover their specific functions. The comparable component comprising original Italian productions makes it possible to draw comparisons between dubbed and original Italian films and between English and original Italian films. As the films included in the corpus cover a time span of almost 30 years (from 1990 to 2017), the PCFD is also suitable for short-term diachronic studies. Finally, the corpus can be exploited for language learning purposes.

Current research team

Maria Pavesi
Maicol Formentelli
Liviana Galiano
Raffaele Zago

For more information, please download the following file:

Pavia Corpus of Film Dialogue