1In democratic countries, a parliament is a central representative and legislative institution. It is composed of elected representatives through which the citizens have a voice in shaping and enacting laws and thus participate in governing all areas of life and social activities. In addition, the parliament often controls the executive branch (Norton, 2002). Due to the parliament’s crucial role in the development of society, its activity has always been an important topic of research in the humanities and social sciences.
2In the last two decades, the progress of technology, the increased interest of the media and the citizens in the work of the parliament, and the desire for greater transparency have made the data about the parliamentary activity – including the records of parliamentary debates – more accessible (Norton, 2002). The records are a unique research source as the parliamentary debates reflect the political, societal, and cultural atmosphere of a certain period (Ilie, 2010). Since parliamentary discourse is highly regulated and parliamentary records are often available in digital form, they are a convenient source for building parliamentary corpora. These are temporally limited and structured collections of debate records with added metadata on the speakers and speeches and linguistic annotations (Truan and Romary, 2021).
3Usually, parliamentary corpora include large amounts of data that cannot be analysed by hand within a reasonable time frame. Concordancers are popular tools to analyse corpora. You can familiarize yourself with them in a related tutorial (Fišer and Pahor de Maiti, 2021). Other tools, such as Orange (Demšar et al., 2013), used in this tutorial, enable text mining approaches which take large amounts of data to extract patterns and information that are not obvious from the text at first glance (Wiedemann, 2016).
4Among other things, text mining techniques have been used for sentiment analysis of parliamentary debates (Rheault et al., 2016; Rudkowsky et al., 2017), for modelling policy conflict between the cabinet parties (Bergmann et al., 2018), for opinion mining (Abercrombie and Batista-Navarro, 2020), modelling argumentation (Petukhova et al., 2015), etc. Among these, topic modelling (Meeks and Weingart, 2012) is one of the most often used text mining techniques in the digital humanities and the one that will be the focus of this tutorial.
5This tutorial introduces researchers in the humanities and social sciences to text mining and shows the value of such approaches for research in these scientific fields. The tutorial breaks down the particularities of parliamentary discourse and topic modelling by answering concrete research questions. The analysis is based on the freely accessible corpus of British parliamentary debates ParlaMint (Erjavec et al., 2021) and the Orange tool (Demšar et al., 2013), which enables the use of advanced text mining techniques without any programming knowledge.