In this article, we'll continue studying texts of election programs, presented by presidential candidates. The previous article on this topic is here.
Thus, we have determined in the previous article the words which are often repeated in texts of election programs. Unfortunately, the most used words were well-known "power", "secure" and others. These words either not informative at all, or notify about possible measures. In fact, we are afraid that the government will "secure" the "power" for itself, and ignore ideals of the Maidan.
In order to research the world relation in texts, we used two methods: Text Mining (TM) and Social Network Analysis (SNA). TM method was used for word clustering (we've selected set of words that are most often used). SNA method was used to create the flowgraph of word relation in the whole text. We have also used here the limited number of the most popular words. Similarly to our other researches, only programming language R was used during this analysis.
Let's start from the SNA method. The flowgraph of word relation is provided below. Besides words, the graph depicts the range of groups, selected during analysis procedure.
In fact, the graph doesn't need any explanations; the arrangement of words speaks for itself. We are glad about the fact that words "development", "state", and "citizens" are close to each other in the graph.
In the cluster analysis, we used Euclidean distance formula as the distance metric, and the Ward's method as cluster algorithm. In result, we've received the following image:
The clusters, which (to our mind) are mostly related to the semantic characteristics of texts, we've put in red rectangles. Pay attention to quite remote position of the word "Ukraine". It means that the word is excessively used in the text.
However, the given information allows to conduct another analysis: to mark out the certain groups of candidates. We've already done something similar in the previous article, when calculating the Ochiai coefficient. However, we used the paired comparison that time. This time, we are making multidimensional analysis.
To group the candidates, we transposed the word matrix, put the words in names of columns, and surnames of candidates in names of rows. The dendrogram of cluster analysis is given below.
Despite the dendrogram is quite unusual, it contains two types of clusters. These types are as different as "heaven and earth" in the in the literal sense of this phrase. To our mind — clusters of the upper group are the most "puffy" and literary texts of election programs. The lower group contains simple texts, written in a style "report from the battlefield.
Thus, we may say that PR-services of presidential candidates provide the different level of "literary support".