Abstract
In many countries, presidents offer annual addresses to their respective legislatures or citizens. These speeches serve as a platform for leaders to communicate their vision, achievements, and priorities. Presidential speeches are key instruments of political power, serving as tools for agenda-setting and public engagement. However, studies on presidential rhetoric are often limited by traditional methodologies that focus on specific leaders or short periods. This article addresses these limitations by applying Latent Dirichlet Allocation to analyze Chilean presidential speeches from 1832 to 2021, uncovering thematic patterns and their evolution over nearly two centuries. Chile’s case, with its stable democratic tradition and rich history of presidential messages, demonstrates Latent Dirichlet Allocation’s utility for comprehensive, long-term discourse analysis. Our findings highlight Latent Dirichlet Allocation’s effectiveness in capturing thematic shifts that reflect broader socio-political changes, showcasing its potential for studying executive speeches across various political systems. This methodology bridges computational analysis and qualitative inquiry, offering a replicable framework for comparative research. We conclude by discussing its broader implications for future studies on political communication and cross-national analysis.
Introduction
Presidential speeches are more than just the words of a leader; they embody the formalities of their institution, reflect prevailing values and ideas, and aim to engage and influence their audience. They serve as key political power and authority instruments, leveraging language to establish positions, persuade, and evoke emotions. The prominence of presidents in presidential regimes has made their speeches a focal point in social science research, with scholars examining their strategic role in agenda setting, public engagement, and framing political discourse.
The tradition of presidential speeches is not exclusive to presidential regimes; it spans parliamentary (e.g. India) and semi-presidential (e.g. France, Portugal) systems. While annual addresses date back to the nineteenth or early twentieth century in countries such as Argentina, Chile, Costa Rica, Mexico, Peru, and the United States, to name a few, existing studies primarily rely on approaches limited to word counts and predefined categories. This has left a significant gap in the study of presidential speeches, which has been restricted to specific administrations and confined to relatively short timespans.
Our article bridges these gaps by presenting an innovative methodological approach that leverages Latent Dirichlet Allocation (LDA)—an unsupervised machine learning algorithm—to facilitate the large-scale, longitudinal analysis of presidential speeches. This approach allows for an in-depth exploration of thematic structures and trends in political discourse over extended periods, offering scholars a robust tool applicable across various contexts. By demonstrating the utility of LDA through an analysis of Chilean presidential speeches from 1832 to 2021, we propose a framework that can be adapted to other political systems and historical contexts, providing generalizable insights into how leaders use speeches to respond to changing socio-political landscapes.
Chile’s stable democracy and tradition of annual presidential messages provide an exemplary case for this analysis. Spanning nearly 200 years, from 1832 to the present, these annual speeches allow for an in-depth analysis of both subtle and significant changes in political discourse, enabling a comprehensive examination of the topics and ideas that shape and are shaped by presidential discourse over time. While this study focuses on Chile, the implications extend beyond its borders, contributing to broader discussions in political communication, agenda-setting, and the role of executive speeches in institutional change.
Recent computational advances have made it possible to analyze extensive texts efficiently. Techniques like LDA enable researchers to uncover latent topics, track their evolution, and reveal underlying relationships without predefined categories. This is especially advantageous for research involving voluminous, multi-themed documents such as presidential speeches, where traditional qualitative methods might overlook significant details or trends.
Using LDA to analyze presidential speeches reveals latent thematic patterns that other methods, like traditional qualitative analyses or keyword-based quantitative approaches, may miss. While manual analyses rely on subjective interpretations and predefined aspects, LDA uncovers emerging topics without prior assumptions, providing a broader view of recurring content. In analyzing Chilean presidential speeches from 1832 to 2021, LDA highlighted thematic continuities and shifts over nearly two centuries. It also quantified each topic’s prevalence over time, offering a longitudinal perspective that complements contextual analyses.
LDA operates inductively, revealing topics through statistical word co-occurrence patterns and reducing bias from external categories. Its flexibility in identifying multiple themes better reflects the multifaceted nature of political speeches shaped by diverse agendas and audiences. This method enhances the interpretative scope by uncovering connections and patterns that might be overlooked in analyses with predefined categories, capturing the complexities of the original texts.
The article below consists of six sections in addition to this introduction. The following section reviews the existing literature and discusses its advancements and limitations while situating our study in this body of work. In section 3, we explain LDA’s main features and advantages for analyzing presidential speeches. This is followed by a quick description of Chile’s annual presidential speech tradition and our dataset of 172 presidential addresses delivered by 30 presidents. Section 4 offers a discussion of the significant findings from our analysis. Section 5 lays out our article’s main methodological contributions and identifies a path for future research. Finally, we offer our concluding remarks, emphasizing how this methodological approach enhances the analytical toolkit for scholars studying political discourse and supports the theoretical exploration of the interplay between ideas, leadership, and institutional evolution.
Studies on Presidential Speech: An Overview
Many presidents use public messages, field visits, tours, and speeches to communicate directly with citizens (Kernell, 1997). Presidential speeches have captivated scholars due to their strategic significance in shaping public discourse, policy agenda, and political narratives. These speeches are more than ceremonial; they represent a confluence of power, institutional legitimacy, and public persuasion. Their importance also stems from the president’s centrality in presidential systems, spurring a body of research mostly grounded on understanding presidential speeches’ main features and consequences. Extensive research has delved into these speeches, analyzing how they shape public opinion, legislative agendas, policy-making, judicial decisions, and discourse on various national issues (Cohen, 1995; Edwards and Wood, 1999; Eshbaugh-Soha and Collins, 2015; Kingdon, 2014; Moen, 1988; Powell, 1999; Rex, 2011; Rottinghaus, 2010; Schaefer, 1997; Young and Perkins, 2005). Nevertheless, this body of work often remains constrained in its scope, focusing primarily on short timeframes or specific leaders.
This type of analysis focused mainly on the frequency of particular words in presidential speeches and employed an analytical strategy involving evaluating content and its evolution over time (Lim, 2002). It has also been found that while the duration of presidential speeches has decreased, there has also been a trend toward increased public engagement (Teten, 2003, 2007, 2008). The extant literature has identified distinct periods characterized by differences in speech content, methods of broadcasting, and their impacts, shaped by the growth in sample sizes and the evolving dynamics of media (Hoffman and Howard, 2006; Murphy, 2008). However, these methodologies frequently limit themselves to specific periods or predefined themes, omitting broader insights into evolving patterns across more extended historical periods and diverse political systems.
In Latin America, research on presidential speeches has been narrower in scope, mainly focusing on specific topics and periods. Studies have examined the discourses of specific presidents, such as Andrés Manuel López Obrador’s written, audio, and visual speeches from his 2018 presidential campaign in Mexico (Amparan, 2021); Lenín Moreno in Ecuador (Von Schoettler, 2020); Fernando de la Rúa in Argentina (Fair, 2017); and the promises made by Chilean presidents (Faúndez Caicedo and Navia, 2024). A different body of research has centered its attention on presidential discourses from particular ruling coalitions or parties, such as Concertación governments in Chile (López et al., 2011) and the use of dominant concepts in presidential speeches in Argentina and Chile (Durán Migliardi, 2017; López et al., 2011; Rosaenz, 2017; Soto, 2016). To our knowledge, the most extensive and ambitious study examines the annual speeches of 73 presidents across 13 Latin American countries from 1980 to 2014 (Arnold et al., 2017). However, even this relied heavily on word counts, limiting its capacity to capture thematic nuances and shifts.
Despite substantial strides in assessing the impact of presidential speeches on several topics, such as media coverage (Schaefer, 1997, 2007; Young and Perkins, 2005), legislative agendas (Barrett, 2004; Edwards and Wood, 1999; Moen, 1988; Powell, 1999), and public opinion (Eshbaugh-Soha, 2010; Ragsdale, 1984; Rottinghaus, 2006, 2010; Young et al., 2003). Moreover, the linguistic and thematic analysis of presidential speeches have also made notable advances by focusing on specific topics, concepts or emotions addressed (Coe, 2007; Erisen and Villalobos, 2014; Towle and Collier, 2010; Vavilova and Galieva, 2023; Wegner, 2013); while others have centered their attention on language style and semantics (Lim, 2002; Mohammadi et al., 2020; Savoy, 2014; Teten, 2003; Vavilova and Galieva, 2023). In terms of techniques, these studies have used quantitative and qualitative methodologies or a combination of both. Specifically, some have declared limitations regarding the volume of documents to be processed, e.g. opting for representative samples or intentional sampling (Coe and Neumann, 2011; Teten, 2003). Nevertheless, these methods—while contributive—are intrinsically limited by their systematization of information and temporal coverage.
To deal with these limitations, some have focused exclusively on the rhetorical analysis of some presidencies (Collier and Towle, 2011; Sigelman and Whissell, 2002; Towle and Collier, 2010), whereas others have employed linguistic software, focusing on the words used and their frequency (Vavilova and Galieva, 2023; Wegner, 2013), semantic complexity, and vocabulary variety (Lim, 2002; Painter and Fernandes, 2021; Savoy, 2014).
While the academic interest in presidential speeches continues to rise across various social science fields, overall, studies need to be expanded in scope. This is mainly due to a reliance on qualitative analyses of brief time frames and quantitative methods confined to word counts or predefined categories, all resulting from previous computational limitations. Fortunately, new methodologies and computational tools have made the analysis of large volumes of information, such as speeches, possible. A recent example is the processing of thousands of speeches of parliamentarians in Brazil to identify the speakers’ agendas, opinions, feelings, and their relationship with the government (Inácio et al., 2023; Izumi and Medeiros, 2021; Moreira, 2020).
In addition, a pervasive problem with these methodological tools is that both software and qualitative analyses of samples require manual labeling or ranking of words and concepts, which can introduce selection biases and limit the ability to recognize underlying patterns and topics that evolve over time due to the researcher’s interests or conceptual categories. In addition, these methodologies struggle to process and distinguish the topics or argumentative elements that emerge throughout a presidency, hindering the identification of subtle ideational changes, such as the presidential bricolage described by Carstensen (2011a, 2011b).
Recent advances in computational text analysis, such as LDA, have opened new pathways for addressing these challenges. Our study contributes to this emerging methodological frontier by demonstrating how LDA can analyze an extensive corpus of Chilean presidential speeches spanning nearly 200 years. This application enriches the toolkit available for political text analysis and showcases how a similar approach could be used in various geopolitical contexts. The Chilean case provides an exemplary starting point due to its long tradition of annual presidential addresses. However, the methodology has the potential for general application, aiding scholars in examining executive communication in other presidential or semi-presidential systems.
By integrating computational methods with historical and contextual analyses, we seek to address the broader scholarly interest in how political leaders communicate over time. The insights gained from Chile’s presidential rhetoric can contribute to understanding thematic consistencies and variations that might resonate in other contexts, thus bridging the methodological and empirical gaps present in current literature.
Research Design
Single-case studies, such as the one we conduct here, help to validate methodological approaches and provide diverse and reliable data for future analyses. In political science, causal analysis has traditionally been prioritized over descriptive approaches, often considering the latter secondary (Gerring, 2012). However, Gerring (2012) argues that description holds intrinsic value for several reasons: it precedes causal analysis in under-explored topics, and it offers diverse data collection that can support long-term research beyond hypothesis-driven studies. By focusing on descriptive analysis, this study lays the groundwork for more nuanced investigations, bridging the gap between comprehensive data exploration and deeper causal inquiries.
Furthermore, new methodological approaches are particularly valuable in contexts where causality is uncertain, as they enable an independent descriptive inference that is not tethered to specific hypotheses. This promotes a broader and more objective understanding of political phenomena.
Finally, applying this method in a case study allows us to address one of the literature’s persistent challenges: validating unsupervised probabilistic topic models and their labels to ensure they accurately represent relevant themes for measuring social science concepts (Ying et al., 2022). Automatic models identify broad patterns but benefit from human calibration to achieve precision in nuanced contexts (Lowe and Benoit, 2013). This research includes a longitudinal case study that seeks to validate the model’s labels through contextual analysis, triangulating with historiographical sources, economic and social data, and the extracted content of presidential speeches.
With the computational advances of recent decades, several automated techniques for quantitative text analysis have been developed and used in political science. These techniques do not dispense with close reading and the researcher’s substantive knowledge of the topic. However, their use allows us to work with more documents than was previously imaginable. Among the various computational techniques available to identify topics in a set of documents, LDA stands out. LDA is widely used for topic analysis of textual documents. It assumes that each document is a combination of various topics and that each word is associated with one of these topics.
In social sciences, LDA and its variants have been found to have applications in various contexts for topic analysis. For instance, Quinn et al. (2010) analyzed 118,000 U.S. Senate speeches delivered between 1997 and 2014, examining the emergence and evolution of different topics over time. Similarly, Grimmer and Stewart (2013) studied 24,000 press releases from U.S. senators in 2007, mapping the principal political agendas over time. In another notable study, Barron et al. (2018) analyzed over 40,000 speeches from the French Revolution’s parliamentary debates, revealing patterns of discourse evolution. Their findings supported evidence that left-wing parliamentarians innovated while right-wing members preserved previous patterns.
In addition, Barberá et al. (2019) examined interactions between politicians and the public through social media, analyzing over 4 million tweets. They found that legislators tend to follow rather than lead public discussions and respond more to supporters than the general public. Finally, Moreira (2020) explored how government-opposition dynamics influence the agendas of Brazilian legislators, highlighting the impact of ideology, gender, and popularity on parliamentary behavior. In summary, LDA is a versatile tool for exploring textual data, offering valuable insights into political discourse and agenda dynamics.
Previous research on presidential speeches can illuminate the prioritized issues, aiding in creating a more sophisticated model for word classification. However, this approach has been restricted primarily to recent years, which may inadvertently exclude topics specific to the nineteenth and twentieth centuries (or sub-periods within them) from consideration. Furthermore, unsupervised models like LDA may present limitations and potential challenges regarding classification accuracy; their value is particularly notable in exploratory analyses. They can reveal unexpected patterns, trends, or underlying themes that might take time to be evident. In addition, LDA is a flexible and scalable model capable of managing large datasets and adapting to dynamic changes in speech patterns over time. This makes it highly suitable for the extensive temporal range of our study.
Comparing LDA to alternative techniques reveals its distinct advantages. For instance, conventional clustering algorithms may offer simplicity in grouping speeches based on word usage similarities but often need help with the hierarchical nature of topics and linguistic ambiguities. Conversely, neural network-based models or deep learning approaches boast scalability and complexity, capable of capturing intricate patterns in vast datasets. However, these methods demand significant computational resources and expertise for implementation. In contrast, LDA strikes a balance between interpretability and computational efficiency. Its probabilistic nature allows for identifying coherent topics while maintaining a straightforward implementation process. In addition, LDA’s ability to model topic distributions at both document and word levels enhances its suitability for longitudinal analyses of speeches, ensuring robust and insightful results. Hence, our article presents an innovative, cost-effective, and accessible research method well suited for analyzing speeches across various contexts and extended timeframes.
LDA’s algorithm starts from the idea that any textual document is composed of multiple topics that appear in different proportions, and each word in each document is assigned to one of the topics. A topic is nothing more than a distribution over the set of all words, i.e. for each word in our vocabulary, a probability of occurrence is assigned. For example, the word “army” is more likely to appear in a topic related to war than in a topic related to education.
Formally,
1. For each topic k, words are drawn according to a distribution over the words φk ~ Dirichlet (α);
2. For each document d, words are generated from a two-stage process:
(a) A distribution over topics θd ~ Dirichlet (β) is given;
(b) For each word and in document d:
i. A topic from the distribution over zd,i ~ Multinomial (θd);
ii. A word is assigned from the distribution over words wd,i ~ Multinomial (φkd,i). 1
The LDA technique is suitable for our research problem for two main reasons. First, because it is an unsupervised learning model, it is not necessary to predefine the thematic categories of presidential speeches. This is especially beneficial for our extended period of almost 200 years, in which a wide variety of topics is expected in the documents. This way, we avoid limiting the analysis to prior knowledge of categories and allow themes to emerge more organically from the data.
Second, LDA recognizes that each document may contain a combination of multiple themes rather than being restricted to a single theme. This will allow us to identify the different relationships or sets of pieces containing the ideas and their prevalence in the speeches issued by the presidents. Since presidential speeches tend to be lengthy communications with an average of more than 9,000 words, they commonly address various topics. Classifying them into a single topic would lead to losing relevant information.
Before estimating the LDA model, we preprocess the data by representing the documents in a document-term matrix, where each row corresponds to a presidential message and each column represents a word stem. In addition, we exclude stems that appear in fewer than 10% or more than 90% of the speeches, as they provide little useful information.
Defining the number of topics to be estimated by the LDA model poses a significant challenge in our research. Although there are metrics in the literature that can help in this task (Griffiths and Steyvers, 2004), often their results are not substantively interesting or theoretically useful. Therefore, a qualitative assessment of the results is essential (Grimmer and Stewart, 2013).
To address this challenge, and based on the analysis of Chilean presidential speeches presented in the following sections, we estimated 18 different models, ranging from 5 to 20 themes, and two additional models with 25 and 30 themes. Then, we evaluated the cohesion of the 10 stems most associated with each theme and analyzed the temporal distribution of the themes to verify whether it is consistent with what is described in the historiography. In addition, we have carefully read a sample of the presidential speeches. These qualitative assessments allow the analyst to determine the optimal number of themes based on the substantive cohesion of the themes extracted by the model.
Combining quantitative and qualitative techniques will allow us to obtain a more complete and meaningful understanding of the themes present in presidential speeches over time in Chile.
In the first column of Table 1, we present the labels assigned to each theme after reading some of the presidential speeches. In the second column, we find the 10 stems most associated with each label, which provides relevant information for their assignment. For example, the theme “War and Security” is strongly associated with terms such as “war,” “army,” “Peru,” “territory,” and “occupation.” In the third column, we provide examples of stems from English words to aid understanding. Finally, the fourth column shows the percentage of speeches classified in each theme. We observe that the theme “War and security” is the most frequent (19.95%), followed by “Infrastructure” (16.14%) and “Schools and Primary Education” (14.6%). These results provide us with an overview of the themes most prevalent in presidential speeches over time in Chile.
Topics and Associated Stems.
Presidential Speeches in Chile (1832–2021): Quick Description and Dataset
The presidential speech is one of the most traditional republican rites in Chile. It was enumerated among the president’s duties in the short-lived 1828 Constitution for the first time. Although this duty of the president was not explicit in the Constitution of 1833, on June 1, 1832, President José Joaquín Prieto (1831–1841) inaugurated the institutional ceremony that opened the ordinary sessions of Congress. During this ceremony, the president reports on his administration’s work and delivers the highlights of his agenda to the entire Congress. This tradition has been maintained, with few exceptions, until the present day, with some changes in date and time. For example, in 1925, the date of the annual speech was changed to May 21 to coincide with the celebration of Navy Day, though in 2018, it was changed back to the original date of June 1.
However, this long-standing institutional ceremony has also been subject to various political and economic fluctuations in the country. This annual rendering of accounts was suspended by Arturo Alessandri in 1925 following the political upheaval that led to the dissolution of Congress in 1924. A separate case is President Carlos Ibáñez del Campo’s disdain toward the political class. This attitude motivated him to send in brief written speeches between 1927 and 1931, which the Secretary of the Senate read. At another time, President Jorge Alessandri was forced to interrupt his speech due to the great earthquake of 1960 that affected the country on May 21 and 22. This rite was also canceled during the dictatorship (September 1973–March 1990) when the Congress was dissolved. The Military Junta addressed the country once a year on September 11. This republican ceremony resumed in 1990 with the return to democracy under the presidency of Patricio Aylwin.
Another important fact to consider is that this rite was initially performed exclusively before the National Congress. However, this situation changed with technological advances in mass communication, such as radio and television. Thus, the first radio transmission of the speech took place in 1924 with President Arturo Alessandri Palma. Later, on May 21, 1962, the speech of President Jorge Alessandri Rodríguez was the first to be broadcast on television.
It is important to highlight the importance of this discourse for analysis since it exhibits a long periodicity that allows us to examine the transformations of the “rhetorical context” over time (Martin, 2015). The rhetorical context refers to the immediate conditions of the discourse, such as the historical time in which it is framed, the place where it is issued, and the contextual demands to which it responds (Martin, 2015).
This study is based on an analytical framework that maintains that ideas are relational in nature; therefore, it focuses on examining the semantic relationships within and outside each idea. According to Carstensen (2011a), ideas are composed of pieces whose relevance may change over time, which does not necessarily imply a complete paradigm shift. Under this approach, we can identify gradual changes in ideational explanations, which brings greater dynamism to the analysis.
Moreover, thanks to the systematicity of the sources at our disposal—covering a large number of presidents (30 subjects) over a long period of 189 years—and considering the variations of the context, we hope to make a significant contribution to the theoretical discussion of the relationship between agent and structure in the ideational realm. Specifically, this article analyzes all presidential speeches presented to the Chilean Congress between 1832 and 2021. We exclude speeches between 1974 and 1989 because they were delivered under an authoritarian regime. In addition, there were no presidential speeches in 1838 and 1925.
In sum, we assessed 172 speeches made by 30 different presidents, equivalent to an average of 5.73 speeches per president. Alfredo Duhalde, Emiliano Figueroa, and Juan Esteban Rodríguez delivered the fewest speeches to Congress, with only one speech each. On the other hand, Carlos Ibáñez del Campo, with 11 speeches, was the president with the most significant number of observations. The longitudinal analysis of these speeches will enable us to track the evolution of the country’s most salient topics (or sets of ideas) over nearly two centuries.
Findings
In this section, we first present the descriptive results of presidential speech lengths, followed by analyzing the dominant topics addressed. The first column of Table 2 shows the statistics on the speeches’ length. On average, the speeches are 9,576 words long. There is considerable variation among the speeches. The shortest speech, consisting of 390 words, was delivered by Carlos Ibáñez del Campo in 1954, while the longest, exceeding 47,000 words, was given by Arturo Alessandri Palma in 1924. This variation is captured by the standard deviation, which is 8095.
Descriptive Statistics of the Corpus of Presidential Speeches.
In order to carry out the quantitative analysis of the texts, it is essential to reduce the amount of information present in the speeches. We can simplify the data by excluding irrelevant items, such as punctuation, numbers, and other stopwords (such as articles, connectives, and pronouns). We also reduce words to their root or stem, which implies that words that share the same meaning, such as “economy,” “economist,” and “economic,” are reduced to the root “econom-.”
In the second column of Table 2, we present the results after preprocessing the text described above, which led to a drastic reduction in the size of the speeches. The average length of the speeches was halved from 9,576 words to 4,524 stems.
Figure 1 shows the number of stems over time. In general terms, presidential speeches given before Congress until the early twentieth century tended to be relatively short. However, after 1924, with the introduction of radio, the length of the speeches increased, though with considerable variation over time.

Number of Stems Over Time.
The impact of the audience on presidential discourse in Chile shows results that are different from those found in studies of presidential discourse in the United States. Unlike the trend of “modern” rhetoric in the United States, characterized by a reduction in content and a greater use of the pronoun “we” to connect with the audience (Lim, 2002; Murphy, 2008; Teten, 2003, 2007), in Chile, there has been a sustained increase in the number of words in presidential speeches from the 1950s onwards.
This increase in the number of words can be explained by the structure of the presidential speech in Chile, which includes a report on the State of the nation and an account of the different ministerial portfolios. The increasing number of ministries, from three to seven in the nineteenth century and from seven to 23 in the twentieth century, has led to the expansion of presidential speeches.
This discursive structure may also explain the dynamism in the topics addressed in presidential speeches. As the State responds to social problems and creates institutions to address various needs, policies are generated and announced through presidential speeches, both for the political class and for the citizenry.
In summary, unlike the findings in the United States, the broadening of the audience for the presidential speech enabled by the development of mass media has apparently not reduced the size of the speech nor changed its tone to be more direct and closer to the public. Rather, the length and rigid structure of Chilean presidential discourse have persisted over time.
Presidential Speeches and Dominant Topics in Historical Perspective
Meanwhile, examining presidential speeches across nearly two centuries reveals significant shifts in the most prominent issues. Initially, presidents’ foci focused on maintaining political order and advancing material and social progress. Over time, however, this has evolved into discourses increasingly aligned with post-material values and beliefs, emphasizing themes like effort and opportunities in recent decades.
The results show that topics in presidential speeches transcend particular administrations. These dominant topics come and go in waves. To illustrate, let us briefly discuss four of the most frequent topics identified in Table 1: War and Security (19.95%), Infrastructure (16.14%), Schools and Primary Education (14.60%), and Constitutionalism and Patriotism (11.32%). For instance, throughout the nineteenth century, two themes remained prevalent: War and Security and Schools and Primary Instruction. These issues were addressed almost equally by conservative and liberal presidents. In this sense, the discursive stamp of this period was more determined by the context, which indicates a consensus among the nineteenth-century political elite.
The first topic is closely linked to the problems of the nineteenth century, in which the country faced three significant wars: the war against the Peruvian-Bolivian Confederation (1836–1839), the war that Chile declared against Spain in solidarity with Peru (1865–1867); and the War of the Pacific, which involved Chile, Peru, and Bolivia (1879–1884). It is important to note that the periods of these conflicts were not limited exclusively to the years of military deployment or warlike confrontation; before and after the wars, there was intense related activity. Therefore, an armed conflict can occupy the political agenda for an entire decade, justifying it as a contingent topic. This does not mean, however, that we are diminishing its importance or transcendence. It is important to emphasize that the presidential system and its concentration of power in Latin America were strongly related to the conflicts of emancipation, the definition of borders, and the organization of the State (Toro and Arellano, 2017).
Indeed, these three conflicts played an important role in the process of consolidation of the Chilean State. According to historian Mario Góngora (1981), the notion of the Chilean State and the feeling of “Chileanness” were constituted based on the idea of being a “land of war.” Góngora highlighted the large number of wars that Chile faced during the nineteenth century. For example, Chile’s triumph in the war against the Peruvian-Bolivian Confederation brought a sense of unity to the elite. It led to the installation of a war hero with institutionalist rather than caudillo-like characteristics, Manuel Bulnes, in the presidency (Valenzuela and Valenzuela, 1983: 44). In addition, the success in the War of the Pacific allowed the Chilean State to consolidate a national identity and unprecedented access to fiscal resources thanks to the exploitation of saltpeter.
“Schools and Primary Education” were widely addressed by conservative and liberal presidents of the nineteenth century. This topic is associated with an ideological debate during that era on the role of the State in education. In the context of a largely illiterate society, with 86.6% of the population not knowing how to read or write in 1854 (Díaz et al., 2016), the need to train citizens with “civic virtues” led to the consideration that the State should play a leading role in the education of the population. This process began in the 1840s, driven by prominent intellectuals such as Domingo Faustino Sarmiento. Later, in the 1850s, during Manuel Montt’s presidency, an intense policy of school construction began. The culmination of this educational policy was promulgating the Law of Primary Instruction in 1860, during the government of José Joaquín Pérez. This Law established the role of the State in the education of the masses through the creation of a primary education system with a basic formative approach. In addition, secondary education was established, which had a more restricted and elitist character.
Figure 2 also shows a greater diversity of topics throughout the twentieth century since none reached a significantly high frequency—the most important only reaching 11.32%. This indicates that the presidential speeches of this period encompassed a greater diversity of topics and issues, as well as the changes and challenges of the historical context.

Presidential Speech Topics Over Time.
Interestingly, the topic “Constitutionalism and Patriotism” exhibited the highest percentage during the twentieth century. This topic can be classified as contingent or situational since, in general, this rhetoric emerges in times of political crisis or institutional instability. This discursive resource initially came to the fore during the administration of José Manuel Balmaceda (1886–1891), which ended tragically when he committed suicide amid the civil war.
In the twentieth century, the saliency of this topic was associated with moments of national or international political crises, as well as with situations of national distress, such as the 1960 earthquake. The highest percentage for this topic was recorded in 1935, during the second administration of Arturo Alessandri, in the context of a request to Congress for extraordinary powers to maintain public order. In part of his speech, Alessandri highlights the “patriotism” of Congress for approving his extraordinary powers:
“Your patriotism, never denied, made you grant on two occasions extraordinary powers to maintain public order. I am certain that, if the Executive had considered it necessary to resort again to such arbitration, it would have found in you the same patriotic welcome” (Presidential Message, 1935: 5).
“Constitutionalism and Patriotism” shows how this rhetoric has been used in critical moments to reinforce the legitimacy of the government and seek the support of the population and Congress in exceptional situations. Thus, analyzing presidential speeches over time allows us to capture the discursive dynamics and rhetorical resources used by presidents to face different political and historical challenges.
It is interesting to note how this rhetorical resource of thanking or invoking a patriotic attitude in the face of internal or external threats was widely used by presidents during this period. A noteworthy example is Gonzalez Videla’s speech in 1948, in the context of the Cold War and the implementation of the Law for the Defense of Democracy, which outlawed the Communist Party. In his speech, Gonzalez Videla declared: “To avert the immediate dangers of the dissociating action of communism and the speculative desires of unscrupulous elements, you were good enough to grant me extraordinary powers in a patriotic attitude that exalts the Parliament” (Presidential Message, 1948: 41).
“Infrastructure” is another dominant theme emerging in the nineteenth century’s second half. From approximately 1886 to 1925, presidents emphasized public works, such as constructing public buildings, expanding the railroad network, and improving roads, housing, and sewage, among others. For instance, the state railway network went from 950 kilometers in 1887 to 5,459 kilometers in 1925 (Díaz et al., 2016). This orientation toward infrastructure development was due to the conception of progress promoted by the political elite of the time and the State’s access to important economic resources due to the economic boom generated by the exploitation of saltpeter (Meller, 2007).
Interestingly, the topic of “Infrastructure” temporarily overlapped with the rise of social welfare in presidential speeches. These subjects epitomize the global efforts, particularly in Latin America, to identify and implement development models aimed at assisting poorer nations in surmounting their well-known socio-economic challenges. In fact, during the twentieth century, marked by two world wars and a prolonged “Cold War,” Latin America was a significant focus of international development efforts. Organizations such as the Economic Commission for Latin America (ECLAC) and programs such as the Alliance for Progress had a strong presence in the region. They sought to deter countries from following the path traced by Cuba or other countries in the orbit of the USSR. The result of this context meant that governments such as Eduardo Frei M. (1964–1970) and Salvador Allende (1970–1973) built their government programs based on foreign ideological constructions, defined as “global planning” (Góngora, 1981).
Finally, the topics that emerged strongly at the beginning of the twenty-first century are worth noting. Following Inglehart and Welzel (2006), these topics are post-material. Concern for ideas such as effort, opportunities, education, and family are more typical of societies that have managed to stabilize their material needs. Individual autonomy emerges as a fundamental value, as do aspirations to satisfy “subjective well-being and quality of life” as societies move toward higher levels of economic development (Inglehart and Welzel, 2006: 77). At the onset of the twenty-first century, the emphasis on this type of value by different period presidents is consistent with improving the country’s social indicators. For example, in 1990, the poverty rate in the country was 68%, while in 2015, it was reduced to 11.7% (PNUD, 2017: 21). In summary, at the beginning of the twenty-first century, the presidential narrative promoted a vision that offered (and promised) thousands of families emerging from poverty a country of opportunities and a better quality of life.
Speeches and Cross-President Variation
In the previous section, we discussed how most Chilean presidents were immersed in waves of topics that lasted approximately two decades. We attempted to link the presence of significant topics and specific national contexts. Now, it remains to analyze the more agential dimension of presidential discourse, that is, to examine whether Chilean presidents used presidential speeches as strategic resources to promote their agendas or ideas, an approach termed in the literature “going public” (Barrett, 2004; Kernell, 1997; Powell, 1999). Therefore, examining the “microstructure” of the discourses is appropriate to ascertain the dynamism of ideas and how agents can lead in installing new ideas or giving new directions to the discourses (Carstensen, 2011a, 2011b; Martin, 2015).
Generally, most presidential speeches conform to the waves of topics (Figure 2). However, examining the topics addressed by each president (Figure 3) shows that some manage to impose their agenda or make a thematic shift. Some push one topic firmly, influencing their successors, while others work on several topics simultaneously. This shows that, despite the rigidity of the discursive ritual and the waves of topics, the presidents can put their thematic stamp on their speeches.

Speech Topics by President.
An interesting case is that of President Manuel Montt. Throughout his two consecutive administrations, which lasted a decade, his speeches moved away from the dominant topic of “War and Security.” Instead, he consistently spoke about school construction and primary education. Although this issue was also addressed by President Manuel Bulnes, it was Montt who emphasized it prominently throughout his terms in office, as shown in Figure 3:
“I cannot leave the field of public education without calling your attention to the need to give primary education a fixed and permanent organization, and that you find the means to provide the funds that are necessary to spread [education] and generalize it according to the needs of the state. This is perhaps the best means of protecting society from the dangers of the times, and of assuring the Republic a brilliant future” (Presidential Message, 1852: 7).
In the twentieth century, the figures of Jorge Alessandri (1959–1964) and –1970) stand out. Alessandri focused his administration on economics, considering it centrally linked to the country’s social and productive aspects. In his 1959 presidential address, he expressed that these issues form a complex and indissoluble whole.
For his part, Eduardo Frei M. also imposed his discursive agenda in the middle of his mandate, focusing on “Productive Development and Participation.” His rhetoric emphasized the importance of citizen participation in all spheres of society, from politics to labor and social issues. For him, a democratic society had to understand and satisfy modern man’s yearning for participation and needed to create structures supporting this aspiration, as he expressed in his 1969 presidential address.
These examples show that some presidents managed to escape the prevailing clichés and set their discursive agenda during their terms of office. However, as the next paragraphs describe, some presidents are able to establish ideas that become central to the discursive agendas of subsequent administrations.
In his first term, Carlos Ibáñez del Campo (1927–1931) managed to promote the topic of “social welfare,” which later became a trend in successive administrations. This topic was one of the most prominent in the twentieth century, reaching 7.19% in presidential speeches until 1946. Ibáñez emphasized in his 1929 speech that the progressive improvement of the welfare of the working classes and the harmony between capital and labor, established based on justice and cooperation, was a crucial objective of his government.
Subsequently, President Ricardo Lagos’s term (2000–2006) marked a turn in the content of presidential speeches, appealing to post-material values and stressing the need for a joint effort to achieve a developed, socially just, and culturally mature society by the bicentennial of Independence in 2010: “It is not a utopia; it is a possible goal, which depends on us, on our effort and capacity for cooperation” (Presidential Message, 2001).
In line with this trend, President Sebastián Piñera (2010–2014), in his first term, continued with a rhetoric that emphasized a society that affords opportunities, a theme that lasted until 2020. In his 2011 speech, Piñera emphasized: “A society of opportunities means that all Chileans can fulfill themselves as people and develop to the full the talents God gave us” (Presidential Message, 2011).
In Figure 3, we can also identify the presidents who delivered more balanced speeches across various topics. These presidents emphasized different topics. For example, the first of them is Domingo Santa María (1881–1886), a president who had to face the closing of the War of the Pacific by addressing important topics such as “War and Security,” but also addressed issues such as infrastructure and his concern for primary instruction in schools. This is undoubtedly explained by the fact that the triumph of the war meant access to many resources for Chile thanks to the taxes paid by the saltpeter. In his last speech to Congress, he said: “Above all, I have sought to maintain foreign peace, since wars are a painful calamity for nations; but I have not forgotten that the most valuable asset entrusted to me by the nation was its glorious name” (Presidential Message, 1886).
The same can be observed in the presidency of Patricio Aylwin (1990–1994), the first government after the dictatorship, which was characterized as transitional. He maintained the relevance of the topic of “Democracy” but also introduced a new theme called “Productive and Social Development.” That is why, he said at the end of his term:
“Since I took office, I have maintained that, in order to face this challenge successfully, we had to fulfill several simultaneous tasks: to build national unity, consolidate and perfect our democracy, develop and modernize our economy, promote social justice, and integrate Chile as a respected player in the international community” (Presidential Message, 1993:2).
Both presidencies are characterized by a balanced average of topics, coinciding in being presidencies that faced the transition out of two critical junctures, such as a war and a dictatorship. This can be explained by the fact that presidents were forced to work on a more diverse and balanced set of topics to promote the need for gradual change in topics.
Methodological Contribution and Avenues for Future Research
In the previous section, we presented different approaches to study presidential speeches using LDA: examining their length and prevalent topics, as well as their variation over time and across presidents. LDA enabled us to move beyond the limited usefulness of word-counting strategies, hence contributing to a greater understanding of presidential speeches’ main features, patterns, and evolution.
LDA’s Methodological Strength
Our article introduces a methodological approach that transcends a mere dataset description, focusing on the pioneering application of computational techniques to reveal concealed patterns, topics, and trends within presidential speeches. This endeavor enriches the methodological toolkit available to researchers engaged in the study of political discourse analysis by providing a scalable and adaptable model suitable for different political systems and historical periods.
The methodological innovation of applying an LDA model to Chilean presidential speeches from 1832 to 2021 lies in the strategic employment of computational linguistics and topic modeling techniques. This approach facilitates the discovery of latent thematic structures within this extensive corpus. It facilitates the identification, analysis, and interpretation of the underlying topics and themes that permeate presidential speeches across a broad historical timeline, as highlighted by Jelodar et al. (2019). This method not only provides a new lens through which to examine political discourse but also offers a comprehensive view of its evolution over nearly two centuries.
LDA effectively highlights the thematic evolution in Chilean presidential speeches, showing a shift from topics like “war,“ “territory,“ and “construction“ in the nineteenth century—reflecting post-independence concerns—to more recent themes of “effort,” “social development,” and “participation.” These results emerged directly from the data patterns without requiring predefined categories, underscoring the method‘s flexibility in capturing discursive shifts over time. This thematic evolution, captured organically by LDA, allowed for tracing the discursive transformation of presidents in response to structural changes in Chilean society—something that qualitative approaches might have missed due to subjectivity or limited scope.
Potential for Comparative Applications
By employing LDA, we automated the identification of topics within the speeches without the need for prior topic labeling. This model facilitated a comprehensive examination of how topics evolve over time, offering valuable insights into the dynamic nature of political priorities, shifts in public discourse, and contextual influences on presidential speeches.
The flexibility of LDA allows it to be applied in various contexts, making it a powerful tool for diverse research questions, particularly in political communication and agenda-setting. For instance, in studying how presidential speeches influence the political agenda and public perception over time, LDA can identify emerging topics in these speeches at different historical moments. This analysis reveals how themes like institutional reforms or social rights become prominent as political priorities shift. It helps explore how presidents shape discussions and shows the impact of their communication on agenda-setting and public perceptions. LDA effectively uncovers evolving topic patterns over time, which can be challenging to capture with traditional methods.
The insights gained from our application of LDA can extend beyond Chile, serving as a template for studying presidential speeches or executive communications in other nations. Researchers might replicate this approach to analyze speeches across various geopolitical environments, uncovering thematic consistencies and divergences shaped by different political and cultural landscapes. This adaptability supports more generalized studies of how political leaders utilize rhetoric to navigate national and international challenges over time.
Theoretical Connections
In addition to the advantages of using LDA for the study of presidential speeches, our study provides a foundation for theoretical exploration and hypothesis development. We propose three research pathways for theory building and hypothesis testing in this subfield. First, our methodological approach contributes to bringing closer the theory of political ideas and presidential discourses. Future studies can explore the prevalence and application of political ideas in political action expressed in presidential addresses, especially in countries with popularly elected heads of State (Béland and Cox, 2011; Berman, 2011; Hall, 1993). Specifically, we propose two complementary venues for exploring how ideas connect to presidential speeches—that is, ideas working primarily at either a normative or cognitive level (Campbell, 1998). At the normative level, ideas pertaining to “public sentiments” outline what is considered desirable and legitimate in the public sphere, employing symbolic and emotive language to engage with the public. In contrast, at the cognitive level, ideas as paradigms strive to offer a more elaborate and general societal vision. Also at the cognitive level are “programmatic ideas,” which present more specific, targeted solutions—often crafted by technicians or politicians—for immediate and contingent issues. A more thorough and detailed analysis of presidential speeches using LDA could reveal how, why, and when presidents use ideas based on either normative or cognitive levels and their shifts over time (Carstensen, 2011a).
Second, as shown in Figures 1 and 2, speeches’ lengths and prevalent topics exhibit patterns over time. More clearly in the case of dominant topics, they seem to come in waves. This may well be the result of national (e.g. economic development and crisis, unemployment, state capacity, main economic activities, social mobilizations, and presidents’ legislative support, among others) and international factors (e.g. wars, commodity prices, financial crises, etc.). This stream of research may examine how the international and national contexts shape how presidents prioritize topics and frame their annual speeches. However, one limitation in performing this type of analysis is data availability. Although not reported here, we ran time-series analyses to test what factors are behind the prevalence of topics. Nevertheless, due to limited time coverage and lack of important data on ruling parties’ majorities in Congress, street protests, party system fragmentation, and other crucial independent variables, our results were not as theoretically meaningful and empirically robust as we expected.
Third, as Figure 3 and the subsequent discussion demonstrate, a share of presidents actively shapes political ideas through their discourses. Presidents’ speeches are not necessarily mere reflections of the prevailing sociopolitical or ideational context. In fact, some presidents’ discourses reveal a deliberate effort to sway the political agenda. By examining factors at the contextual and individual levels behind presidents’ emphasis on certain topics, we can gain insights into the scope of their influence on the development of political ideas over time. This includes understanding how they adapt and transform political narratives in response to various challenges and opportunities, as well as their role in molding the political agenda and decision-making processes within the country.
Concluding Remarks
This study has demonstrated the application of LDA to analyze nearly 200 years of Chilean presidential speeches, revealing patterns and prevalent themes. Through this approach, we have underscored the potential for combining computational text analysis with historical and contextual validation to enhance the study of political discourse. Our findings highlight the value of descriptive analysis, not just as a preliminary step to causal inference but as a significant method for revealing the thematic underpinnings of political communication.
Using LDA to analyze presidential speeches enables the identification of latent thematic patterns that other approaches, such as traditional qualitative analyses or quantitative methods based on keywords, may fail to capture. While manual analyses often rely on subjective interpretations and focus on predefined aspects, LDA uncovers emerging topics without prior assumptions, allowing for a broader perspective on recurring content. In the case of Chilean presidential speeches from 1832 to 2021, LDA revealed continuities and thematic shifts over nearly two centuries of political history. In addition, LDA quantified the prevalence of each topic over time, offering a longitudinal and comparative perspective that complements contextual analyses. Thus, the use of LDA not only organizes extensive textual data efficiently but also provides an understanding of historical and discursive dynamics that might go unnoticed in conventional approaches.
Unlike approaches that presuppose analytical categories based on theoretical frameworks or prior judgments, LDA operates inductively, allowing topics to be revealed through statistical word co-occurrence patterns. This characteristic reduces the risk of bias from imposing external categories on the text, whether through theoretical choices or analytical preferences. Moreover, LDA’s flexibility in identifying multiple themes within a document better reflects the multifaceted nature of political speeches, often shaped by diverse agendas and audiences. Thus, the method broadens the interpretative scope by uncovering connections and patterns that might be overlooked in analyses with predefined categories while preserving the ability to capture complexities that reflect the semantic richness of the original texts.
This method allows scholars to undertake similar analyses in other countries and political contexts. Its adaptability makes it possible to examine how executive communication reflects shifts in political priorities, strategies, and responses to socio-political events across different regimes and periods. Such comparative applications could provide new insights into how leaders in various systems—from presidential to semi-presidential and parliamentary—utilize rhetoric to shape political narratives and influence public perception.
Furthermore, this methodology offers invaluable insights into theoretical discussions, particularly regarding the part played by ideas in institutional evolution. For instance, our findings show how specific topics persist across various presidencies, spanning decades; some quickly fade while others undergo significant shifts. These patterns are consistent with Hall’s (1993) work on paradigms. Policy paradigms serve as a collective cognitive framework, shaping how political figures and specialists conceptualize challenges and policy goals.
This study opens research avenues that bridge the gap between qualitative and quantitative methodologies. It reinforces the importance of validating automated models with contextual and human insights, ensuring the robustness of findings in complex social science studies. Future investigations could apply this framework to broader datasets or incorporate more granular comparisons between leaders or political eras. This would contribute to a more comprehensive understanding of how rhetoric evolves and impacts political landscapes globally.
Ultimately, this approach enhances the toolkit for political scientists, historians, and other scholars interested in political communication and institutional development, offering a means to track the interplay between discourse, policy, and public sentiment over time. The insights garnered here open up new horizons for further scholarly inquiry, underscoring the enduring interplay between ideas, speech, leadership, and politics in shaping history.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This article is part of the FONDECYT REGULAR project No. 1220123 and No. 1210157 of the National Agency for Research and Development (ANID). This research has also been supported by the Millenium Nucleus Nº NCS2024_065 on Political Crises in Latin America – CRISPOL, an initiative funded by Chile’s National Research and Development Agency (ANID).
