Analyzing social media data: A mixed-methods framework combining computational and qualitative text analysis
- Published: 02 April 2019
- Volume 51 , pages 1766–1781, ( 2019 )
Cite this article
- Matthew Andreotta 1 , 2 ,
- Robertus Nugroho 2 , 3 ,
- Mark J. Hurlstone 1 ,
- Fabio Boschetti 4 ,
- Simon Farrell 1 ,
- Iain Walker 5 &
- Cecile Paris 2
42k Accesses
61 Citations
182 Altmetric
24 Mentions
Explore all metrics
To qualitative researchers, social media offers a novel opportunity to harvest a massive and diverse range of content without the need for intrusive or intensive data collection procedures. However, performing a qualitative analysis across a massive social media data set is cumbersome and impractical. Instead, researchers often extract a subset of content to analyze, but a framework to facilitate this process is currently lacking. We present a four-phased framework for improving this extraction process, which blends the capacities of data science techniques to compress large data sets into smaller spaces, with the capabilities of qualitative analysis to address research questions. We demonstrate this framework by investigating the topics of Australian Twitter commentary on climate change, using quantitative (non-negative matrix inter-joint factorization; topic alignment) and qualitative (thematic analysis) techniques. Our approach is useful for researchers seeking to perform qualitative analyses of social media, or researchers wanting to supplement their quantitative work with a qualitative analysis of broader social context and meaning.
Similar content being viewed by others
Text Mining and Big Textual Data: Relevant Statistical Models
Comparing qualitative and quantitative text analysis methods in combination with document-based social network analysis to understand policy networks
Have We Even Solved the First ‘Big Data Challenge?’ Practical Issues Concerning Data Collection and Visual Representation for Social Media Analytics
Explore related subjects.
- Artificial Intelligence
Avoid common mistakes on your manuscript.
Introduction
Social scientists use qualitative modes of inquiry to explore the detailed descriptions of the world that people see and experience (Pistrang & Barker, 2012 ). To collect the voices of people, researchers can elicit textual descriptions of the world through interview or survey methodologies. However, with the popularity of the Internet and social media technologies, new avenues for data collection are possible. Social media platforms allow users to create content (e.g., Weinberg & Pehlivan, 2011 ), and interact with other users (e.g., Correa, Hinsley, & de Zùñiga, 2011 ; Kietzmann, Hermkens, McCarthy, & Silvestre, 2010 ), in settings where “Anyone can say Anything about Any topic” ( AAA slogan , Allemang & Hendler, 2011 , pg. 6). Combined with the high rate of content production, social media platforms can offer researchers massive and diverse dynamic data sets (Yin & Kaynak, 2015 ; Gudivada et al., 2015 ). With technologies increasingly capable of harvesting, storing, processing, and analyzing this data, researchers can now explore data sets that would be infeasible to collect through more traditional qualitative methods.
Many social media platforms can be considered as textual corpora, willingly and spontaneously authored by millions of users. Researchers can compile a corpus using automated tools and conduct qualitative inquiries of content or focused analyses on specific users (Marwick, 2014 ). In this paper, we outline some of the opportunities and challenges of applying qualitative textual analyses to the big data of social media. Specifically, we present a conceptual and pragmatic justification for combining qualitative textual analyses with data science text-mining tools. This process allows us to both embrace and cope with the volume and diversity of commentary over social media. We then demonstrate this approach in a case study investigating Australian commentary on climate change, using content from the social media platform: Twitter.
Opportunities and challenges for qualitative researchers using social media data
Through social media, qualitative researchers gain access to a massive and diverse range of individuals, and the content they generate. Researchers can identify voices which may not be otherwise heard through more traditional approaches, such as semi-structured interviews and Internet surveys with open-ended questions. This can be done through diagnostic queries to capture the activity of specific peoples, places, events, times, or topics. Diagnostic queries may specify geotagged content, the time of content creation, textual content of user activity, and the online profile of users. For example, Freelon et al., ( 2018 ) identified the Twitter activity of three separate communities (‘Black Twitter’, ‘Asian-American Twitter’, ‘Feminist Twitter’) through the use of hashtags Footnote 1 in tweets from 2015 to 2016. A similar process can be used to capture specific events or moments (Procter et al., 2013 ; Denef et al., 2013 ), places (Lewis et al., 2013 ), and specific topics (Hoppe, 2009 ; Sharma et al., 2017 ).
Collecting social media data may be more scalable than traditional approaches. Once equipped with the resources to access and process data, researchers can potentially scale data harvesting without expending a great deal of resources. This differs from interviews and surveys, where collecting data can require an effortful and time-consuming contribution from participants and researchers.
Social media analyses may also be more ecologically valid than traditional approaches. Unlike approaches where responses from participants are elicited in artificial social contexts (e.g., Internet surveys, laboratory-based interviews), social media data emerges from real-world social environments encompassing a large and diverse range of people, without any prompting from researchers. Thus, in comparison with traditional methodologies (Onwuegbuzie and Leech, 2007 ; Lietz & Zayas, 2010 ; McKechnie, 2008 ), participant behavior is relatively unconstrained if not entirely unconstrained, by the behaviors of researchers.
These opportunities also come up with challenges, because of the following attributes (Parker et al., 2011 ). Firstly, social media can be interactive : its content involves the interactions of users with other users (e.g., conversations), or even external websites (e.g., links to news websites). The ill-defined boundaries of user interaction have implications for determining the units of analysis of qualitative study. For example, conversations can be lengthy, with multiple users, without a clear structure or end-point. Interactivity thus blurs the boundaries between users, their content, and external content (Herring, 2009 ; Parker et al., 2011 ). Secondly, content can be ephemeral and dynamic . The users and content of their postings are transient (Parker et al., 2011 ; Boyd & Crawford, 2012 ; Weinberg & Pehlivan, 2011 ). This feature arises from the diversity of users, the dynamic socio-cultural context surrounding platform use, and the freedom users have to create, distribute, display, and dispose of their content (Marwick & Boyd, 2011 ). Lastly, social media content is massive in volume . The accumulated postings of users can lead to a large amount of data, and due to the diverse and dynamic content, postings may be largely unrelated and accumulate over a short period of time. Researchers hoping to harness the opportunities of social media data sets must therefore develop strategies for coping with these challenges.
A framework integrating computational and qualitative text analyses
Our framework—a mixed-method approach blending the capabilities of data science techniques with the capacities of qualitative analysis—is shown in Fig. 1 . We overcome the challenges of social media data by automating some aspects of the data collection and consolidation, so that the qualitative researcher is left with a manageable volume of data to synthesize and interpret. Broadly, our framework consists of the following four phases: (1) harvest social media data and compile a corpus, (2) use data science techniques to compress the corpus along a dimension of relevance, (3) extract a subset of data from the most relevant spaces of the corpus, and (4) perform a qualitative analysis on this subset of data.
Schematic overview of the four-phased framework
Phase 1: Harvest social media data and compile a corpus
Researchers can use automated tools to query records of social media data, extract this data, and compile it into a corpus. Researchers may query for content posted in a particular time frame (Procter et al., 2013 ), content containing specified terms (Sharma et al., 2017 ), content posted by users meeting particular characteristics (Denef et al., 2013 ; Lewis et al., 2013 ), and content pertaining to a specified location (Hoppe, 2009 ).
Phase 2: Use data science techniques to compress the corpus along a dimension of relevance
Although researchers may be interested in examining the entire data set, it is often more practical to focus on a subsample of data (McKenna et al., 2017 ). Specifically, we advocate dividing the corpus along a dimension of relevance, and sampling from spaces that are more likely to be useful for addressing the research questions under consideration. By relevance, we refer to an attribute of content that is both useful for addressing the research questions and usable for the planned qualitative analysis.
To organize the corpus along a dimension of relevance , researchers can use automated, computational algorithms. This process provides both formal and informal advantages for the subsequent qualitative analysis. Formally, algorithms can assist researchers in privileging an aspect of the corpus most relevant for the current inquiry. For example, topic modeling clusters massive content into semantic topics—a process that would be infeasible using human coders alone. A plethora of techniques exist for separating social media corpora on the basis of useful aspects, such as sentiment (e.g., Agarwal, Xie, Vovsha, Rambow, & Passonneau, 2010 ; Paris, Christensen, Batterham, & O’Dea, 2015 ; Pak & Paroubek, 2011 ) and influence (Weng et al., 2010 ).
Algorithms also produce an informal advantage for qualitative analysis. As mentioned, it is often infeasible for analysts to explore large data sets using qualitative techniques. Computational models of content can allow researchers to consider meaning at a corpus-level when interpreting individual datum or relationships between a subset of data. For example, in an inspection of 2.6 million tweets, Procter et al., ( 2013 ) used the output of an information flow analysis to derive rudimentary codes for inspecting individual tweets. Thus, algorithmic output can form a meaningful scaffold for qualitative analysis by providing analysts with summaries of potentially disjunct and multifaceted data (due to interactive, ephemeral, dynamic attributes of social media).
Phase 3: Extract a subset of data from the most relevant spaces of the corpus
Once the corpus is organized on the basis of relevance, researchers can extract data most relevant for answering their research questions. Researchers can extract a manageable amount of content to qualitatively analyze. For example, if the most relevant space of the corpus is too large for qualitative analysis, the researcher may choose to randomly sample from that space. If the most relevant space is small, the researcher may revisit Phase 2 and adopt a more lenient criteria of relevance.
Phase 4: Perform a qualitative analysis on this subset of data
The final phase involves performing the qualitative analysis to address the research question. As discussed above, researchers may draw on the computational models as a preliminary guide to the data.
Contextualizing the framework within previous qualitative social media studies
The proposed framework generalizes a number of previous approaches (Collins and Nerlich, 2015 ; McKenna et al., 2017 ) and individual studies (e.g., Lewis et al., 2013 ; Newman, 2016 ), in particular that of Marwick ( 2014 ). In Marwick’s general description of qualitative analysis of social media textual corpora, researchers: (1) harvest and compile a corpus, (2) extract a subset of the corpus, and (3) perform a qualitative analysis on the subset. As shown in Fig. 1 , our framework differs in that we introduce formal considerations of relevance, and the use of quantitative techniques to inform the extraction of a subset of data. Although researchers sometimes identify a subset of data most relevant to answering their research question, they seldom deploy data science techniques to identify it. Instead, researchers typically depend on more crude measures to isolate relevant data. For example, researchers have used the number of repostings of user content to quantify influence and recognition (e.g., Newman, 2016 ).
The steps in the framework may not be obvious without a concrete example. Next, we demonstrate our framework by applying it to Australian commentary regarding climate change on Twitter.
Application Example: Australian Commentary regarding Climate Change on Twitter
Social media platform of interest.
We chose to explore user commentary of climate change over Twitter. Twitter activity contains information about: the textual content generated by users (i.e., content of tweets), interactions between users, and the time of content creation (Veltri and Atanasova, 2017 ). This allows us to examine the content of user communication, taking into account the temporal and social contexts of their behavior. Twitter data is relatively easy for researchers to access. Many tweets reside within a public domain, and are accessible through free and accessible APIs.
The characteristics of Twitter’s platform are also favorable for data analysis. An established literature describes computational techniques and considerations for interpreting Twitter data. We used the approaches and findings from other empirical investigations to inform our approach. For example, we drew on past literature to inform the process of identifying which tweets were related to climate change.
Public discussion on climate change
Climate change is one of the greatest challenges facing humanity (Schneider, 2011 ). Steps to prevent and mitigate the damaging consequences of climate change require changes on different political, societal, and individual levels (Lorenzoni & Pidgeon, 2006 ). Insights into public commentary can inform decision making and communication of climate policy and science.
Traditionally, public perceptions are investigated through survey designs and qualitative work (Lorenzoni & Pidgeon, 2006 ). Inquiries into social media allow researchers to explore a large and diverse range of climate change-related dialogue (Auer et al., 2014 ). Yet, existing inquiries of Twitter activity are few in number and typically constrained to specific events related to climate change, such as the release of the Fifth Assessment Report by the Intergovernmental Panel on Climate Change (Newman et al., 2010 ; O’Neill et al., 2015 ; Pearce, 2014 ) and the 2015 United Nations Climate Change Conference, held in Paris (Pathak et al., 2017 ).
When longer time scales are explored, most researchers rely heavily upon computational methods to derive topics of commentary. For example, Kirilenko and Stepchenkova ( 2014 ) examined the topics of climate change tweets posted in 2012, as indicated by the most prevalent hashtags. Although hashtags can mark the topics of tweets, it is a crude measure as tweets with no hashtags are omitted from analysis, and not all topics are indicated via hashtags (e.g., Nugroho, Yang, Zhao, Paris, & Nepal, 2017 ). In a more sophisticated approach, Veltri and Atanasova ( 2017 ) examined the co-occurrence of terms using hierarchical clustering techniques to map the semantic space of climate change tweet content from the year 2013. They identified four themes: (1) “calls for action and increasing awareness”, (2) “discussions about the consequences of climate change”, (3) “policy debate about climate change and energy”, and (4) “local events associated with climate change” (p. 729).
Our research builds on the existing literature in two ways. Firstly, we explore a new data set—Australian tweets over the year 2016. Secondly, in comparison to existing research of Twitter data spanning long time periods, we use qualitative techniques to provide a more nuanced understanding of the topics of climate change. By applying our mixed-methods framework, we address our research question: what are the common topics of Australian’s tweets about climate change?
Outline of approach
We employed our four-phased framework as shown in Fig. 2 . Firstly, we harvested climate change tweets posted in Australia in 2016 and compiled a corpus (phase 1). We then utilized a topic modeling technique (Nugroho et al., 2017 ) to organize the diverse content of the corpus into a number of topics. We were interested in topics which commonly appeared throughout the time period of data collection, and less interested in more transitory topics. To identify enduring topics, we used a topic alignment algorithm (Chuang et al., 2015 ) to group similar topics occurring repeatedly throughout 2016 (phase 2). This process allowed us to identify the topics most relevant to our research question. From each of these, we extracted a manageable subset of data (phase 3). We then performed a qualitative thematic analysis (see Braun & Clarke, 2006 ) on this subset of data to inductively derive themes and answer our research question (phase 4). Footnote 2
Flowchart of application of a four-phased framework for conducting qualitative analyses using data science techniques. We were most interested in topics that frequently occurred throughout the period of data collection. To identify these, we organized the corpus chronologically, and divided the corpus into batches of content. Using computational techniques (shown in blue ), we uncovered topics in each batch and identified similar topics which repeatedly occurred across batches. When identifying topics in each batch, we generated three alternative representations of topics (5, 10, and 20 topics in each batch, shown in yellow ). In stages highlighted in green , we determined the quality of these representations, ultimately selecting the five topics per batch solution
Phase 1: Compiling a corpus
To search Australian’s Twitter data, we used CSIRO’s Emergency Situation Awareness (ESA) platform (CSIRO, 2018 ). The platform was originally built to detect, track, and report on unexpected incidences related to crisis situations (e.g., fires, floods; see Cameron, Power, Robinson, & Yin 2012 ). To do so, the ESA platform harvests tweets based on a location search that covers most of Australia and New Zealand.
The ESA platform archives the harvested tweets, which may be used for other CSIRO research projects. From this archive, we retrieved tweets satisfying three criteria: (1) tweets must be associated with an Australian location, (2) tweets must be harvested from the year 2016, and (3) the content of tweets must be related to climate change. We tested the viability of different markers of climate change tweets used in previous empirical work (Jang & Hart, 2015 ; Newman, 2016 ; Holmberg & Hellsten, 2016 ; O’Neill et al., 2015 ; Pearce et al., 2014 ; Sisco et al., 2017 ; Swain, 2017 ; Williams et al., 2015 ) by informally inspecting the content of tweets matching each criteria. Ultimately, we employed five terms (or combinations of terms) reliably associated with climate change: (1) “climate” AND “change”; (2) “#climatechange”; (3) “#climate”; (4) “global” AND “warming”; and (5) “#globalwarming”. This yielded a corpus of 201,506 tweets.
Phase 2: Using data science techniques to compress the corpus along a dimension of relevance
The next step was to organize the collection of tweets into distinct topics. A topic is an abstract representation of semantically related words and concepts. Each tweet belongs to a topic, and each topic may be represented as a list of keywords (i.e., prominent words of tweets belonging to the topic).
A vast literature surrounds the computational derivation of topics within textual corpora, and specifically within Twitter corpora (Ramage et al., 2010 ; Nugroho et al., 2017 ; Fang et al., 2016a ; Chuang et al., 2014 ). Popular methods for deriving topics include: probabilistic latent semantic analysis (Hofmann, 1999 ), non-negative matrix factorization (Lee & Seung, 2000 ), and latent Dirichlet allocation (Blei et al., 2003 ). These approaches use patterns of co-occurrence of terms within documents to derive topics. They work best on long documents. Tweets, however, are short, and thus only a few unique terms may co-occur between tweets. Consequently, approaches which rely upon patterns of term co-occurrence suffer within the Twitter environment. Moreover, these approaches ignore valuable social and temporal information (Nugroho et al., 2017 ). For example, consider a tweet t 1 and its reply t 2 . The reply feature of Twitter allows users to react to tweets and enter conversations. Therefore, it is likely t 1 and t 2 are related in topic, by virtue of the reply interaction.
To address sparsity concerns, we adopt the non-negative matrix inter-joint factorization (NMijF) of Nugroho et al., ( 2017 ). This process uses both tweet content (i.e., the patterns of co-occurrence of terms amongst tweets) and socio-temporal relationship between tweets (i.e., similarities in the users mentioned in tweets, whether the tweet is a reply to another tweet, whether tweets are posted at a similar time) to derive topics (see Supplementary Material ). The NMijF method has been demonstrated to outperform other topic modeling techniques on Twitter data (Nugroho et al., 2017 ).
Dividing the corpus into batches
Deriving many topics across a data set of thousands of tweets is prohibitively expensive in computational terms. Therefore, we divided the corpus into smaller batches and derived the topics of each batch. To keep the temporal relationships amongst tweets (e.g., timestamps of the tweets) the batches were organized chronologically. The data was partitioned into 41 disjoint batches (40 batches of 5000 tweets; one batch of 1506 tweets).
Generating topical representations for each batch
Following standard topic modeling practice, we removed features from each tweet which may compromise the quality of the topic derivation process. These features include: emoticons, punctuation, terms with fewer than three characters, stop-words (for list of stop-words, see MySQL, 2018 ), and phrases used to harvest the data (e.g., “#climatechange”). Footnote 3 Following this, the terms remaining in tweets were stemmed using the Natural Language Toolkit for Python (Bird et al., 2009 ). All stemmed terms were then tokenized for processing.
The NMijF topic derivation process requires three parameters (see Supplementary Material for more details). We set two of these parameters to the recommendations of Nugroho et al., ( 2017 ), based on empirical analysis. The final parameter—the number of topics derived from each batch—is difficult to estimate a priori , and must be made with some care. If k is too small, keywords and tweets belonging to a topic may be difficult to conceptualize as a singular, coherent, and meaningful topic. If k is too large, keywords and tweets belonging to a topic may be too specific and obscure. To determine a reasonable value of k , we ran the NMijF process on each batch with three different levels of the parameter—5, 10, and 20 topics per batch. This process generated three different representations of the corpus: 205, 410, and 820 topics. For each of these representations, each tweet was classified into one (and only one) topic. We represented each topic as a list of ten keywords most prevalent within the tweets of that topic.
Assessing the quality of topical representations
To select a topical representation for further analysis, we inspected the quality of each. Initially, we considered the use of a completely automatic process to assess or produce high quality topic derivations. However, our attempts to use completely automated techniques on tweets with a known topic structure failed to produce correct or reasonable solutions. Thus, we assessed quality using human assessment (see Table 1 ). The first stage involved inspecting each topical representation of the corpus (205, 410, and 820 topics), and manually flagging any topics that were clearly problematic. Specifically, we examined each topical representation to determine whether topics represented as separate were in fact distinguishable from one another. We discovered that the 820 topic representation (20 topics per batch) contained many closely related topics.
To quantify the distinctiveness between topics, we compared each topic to each other topic in the same batch in an automated process. If two topics shared three or more (of ten) keywords, these topics were deemed similar. We adopted this threshold from existing topic modeling work (Fang et al., 2016a , b ), and verified it through an informal inspection. We found that pairs of topics below this threshold were less similar than those equal to or above it. Using this threshold, the 820 topic representation was identified as less distinctive than other representations. Of the 41 batches, nine contained at least two similar topics for the 820 topic representation (cf., 0 batches for the 205 topic representation, two batches for the 410 topic representation). As a result, we chose to exclude the representation from further analysis.
The second stage of quality assessment involved inspecting the quality of individual topics. To achieve this, we adopted the pairwise topic preference task outlined by Fang et al. ( 2016a , b ). In this task, raters were shown pairs of two similar topics (represented as ten keywords), one from the 205 topic representation and the other from the 410 topic representation. To assist in their interpretation of topics, raters could also view three tweets belonging to each topic. For each pair of topics, raters indicated which topic they believed was superior, on the basis of coherency, meaning, interpretability, and the related tweets (see Table 1 ). Through aggregating responses, a relative measure of quality could be derived.
Initially, members of the research team assessed 24 pairs of topics. Results from the task did not indicate a marked preference for either topical representation. To confirm this impression more objectively, we recruited participants from the Australian community as raters. We used Qualtrics—an online survey platform and recruitment service—to recruit 154 Australian participants, matched with the general Australian population on age and gender. Each participant completed judgments on 12 pairs of similar topics (see Supplementary Material for further information).
Participants generally preferred the 410 topic representation over the 205 topic representation ( M = 6.45 of 12 judgments, S D = 1.87). Of 154 participants, 35 were classified as indifferent (selected both topic representations an equal number of times), 74 preferred the 410 topic representation (i.e., selected the 410 topic representation more often than the 205 topic representation), and 45 preferred the 205 topic representation (i.e., selected the 205 topic representation more often that the 410 topic representation). We conducted binomial tests to determine whether the proportion of participants of the three just described types differed reliably from chance levels (0.33). The proportion of indifferent participants (0.23) was reliably lower than chance ( p = 0.005), whereas the proportion of participants preferring the 205 topic solution (0.29) did not differ reliably from chance levels ( p = 0.305). Critically, the proportion of participants preferring the 410 topic solution (0.48) was reliably higher than expected by chance ( p < 0.001). Overall, this pattern indicates a participant preference for the 410 topic representation over the 205 topic representation.
In summary, no topical representation was unequivocally superior. On a batch level, the 410 topic representation contained more batches of non-distinct topic solutions than the 205 topic representation, indicating that the 205 topic representation contained topics which were more distinct. In contrast, on the level of individual topics, the 410 topic representation was preferred by human raters. We use this information, in conjunction with the utility of corresponding aligned topics (see below), to decide which representation is most suitable for our research purposes.
Grouping similar topics repeated in different batches
We were most interested in topics which occurred throughout the year (i.e., in multiple batches) to identify the most stable components of climate change commentary (phase 3). We grouped similar topics from different batches using a topical alignment algorithm (see Chuang et al. 2015 ). This process requires a similarity metric and a similarity threshold. The similarity metric represents the similarity between two topics, which we specified as the proportion of shared keywords (from 0, no keywords shared, to 1, all ten keywords shared). The similarity threshold is a value below which two topics were deemed dissimilar. As above, we set the threshold to 0.3 (three of ten keywords shared)—if two topics shared two or fewer keywords, the topics could not be justifiably classified as similar. To delineate important topics, groups of topics, and other concepts we have provided a glossary of terms in Table 2 .
The topic alignment algorithm is initialized by assigning each topic to its own group. The alignment algorithm iteratively merges the two most similar groups, where the similarity between groups is the maximum similarity between a topic belonging to one group and another topic belonging to the other. Only topics from different groups (by definition, topics from the same group are already grouped as similar) and different batches (by definition, topics from the same batch cannot be similar) can be grouped. This process continues, merging similar groups until no compatible groups remain. We found our initial implementation generated groups of largely dissimilar topics. To address this, we introduced an additional constraint—groups could only be merged if the mean similarity between pairs of topics (each belonging to the two groups in question) was greater than the similarity threshold. This process produced groups of similar topics. Functionally, this allowed us to detect topics repeated throughout the year.
We ran the topical alignment algorithm across both the 205 and 410 topic representations. For the 205 and 410 topic representation respectively, 22.47 and 31.60% of tweets were not associated with topics that aligned with others. This exemplifies the ephemeral and dynamic attributes of Twitter activity: over time, the content of tweets shifts, with some topics appearing only once throughout the year (i.e., in only one batch). In contrast, we identified 42 groups (69.77% of topics) and 101 groups (62.93% of topics) of related topics for the 205 and 410 topic representations respectively, occurring across different time periods (i.e., in more than one batch). Thus, both representations contained transient topics (isolated to one batch) and recurrent topics (present in more than one batch, belonging to a group of two or more topics).
Identifying topics most relevant for answering our research question
For the subsequent qualitative analyses, we were primarily interested in topics prevalent throughout the corpus. We operationalized prevalent topic groupings as any grouping of topics that spanned three or more batches. On this basis, 22 (57.50% of tweets) and 36 (35.14% of tweets) groupings of topics were identified as prevalent for the 205 and 410 topic representations, respectively (see Table 3 ). As an example, consider the prevalent topic groupings from the 205 topic representation, shown in Table 3 . Ten topics are united by commentary on the Great Barrier Reef (Group 2)—indicating this facet of climate change commentary was prevalent throughout the year. In contrast, some topics rarely occurred, such as a topic concerning a climate change comic (indicated by the keywords “xkcd” and “comic”) occurring once and twice in the 205 and 410 topic representation, respectively. Although such topics are meaningful and interesting, they are transient aspects of climate change commen tary and less relevant to our research question. In sum, topic modeling and grouping algorithms have allowed us to collate massive amounts of information, and identify components of the corpus most relevant to our qualitative inquiry.
Selecting the most favorable topical representation
At this stage, we have two complete and coherent representations of the corpus topics, and indications of which topics are most relevant to our research question. Although some evidence indicated that the 410 topic representation contains topics of higher quality, the 205 topic representation was more parsimonious on both the level of topics and groups of topics. Thus, we selected the 205 topic representation for further analysis.
Phase 3. Extract a subset of data
Extracting a subset of data from the selected topical representation.
Before qualitative analysis, researchers must extract a subset of data manageable in size. For this process, we concerned ourselves with only the content of prevalent topic groupings, seen in Table 3 . From each of the 22 prevalent topic groupings, we randomly sampled ten tweets. We selected ten tweets as a trade-off between comprehensiveness and feasibility. This thus reduced our data space for qualitative analysis from 201,423 tweets to 220.
Phase 4: Perform qualitative analysis
Perform thematic analysis.
In the final phase of our analysis, we performed a qualitative thematic analysis (TA; Braun & Clarke, 2006 ) on the subset of tweets sampled in phase 3. This analysis generated distinct themes, each of which answers our research question: what are the common topics of Australian’s tweets about climate change? As such, the themes generated through TA are topics. However, unlike the topics derived from the preceding computational approaches, these themes are informed by the human coder’s interpretation of content and are oriented towards our specific research question. This allows the incorporation of important diagnostic information, including the broader socio-political context of discussed events or terms, and an understanding (albeit, sometimes ambiguous) of the underlying latent meaning of tweets.
We selected TA as the approach allows for flexibility in assumptions and philosophical approaches to qualitative inquiries. Moreover, the approach is used to emphasize similarities and differences between units of analysis (i.e., between tweets) and is therefore useful for generating topics. However, TA is typically applied to lengthy interview transcripts or responses to open survey questions, rather than small units of analysis produced through Twitter activity. To ease the application of TA to small units of analysis, we modified the typical TA process (shown in Table 4 ) as follows.
Firstly, when performing phases 1 and 2 of TA, we initially read through each prevalent topic grouping’s tweets sequentially. By doing this, we took advantage of the relative homogeneity of content within topics. That is, tweets sharing the same topic will be more similar in content than tweets belonging to separate topics. When reading ambiguous tweets, we could use the tweet’s topic (and other related topics from the same group) to aid comprehension. Through the scaffold of topic representations, we facilitated the process of interpreting the data, generating initial codes, and deriving themes.
Secondly, the prevalent topic groupings were used to create initial codes and search for themes (TA phase 2 and 3). For example, the groups of topics indicate content of climate change action (group 1), the Great Barrier Reef (group 2), climate change deniers (group 3), and extreme weather (group 5). The keywords characterizing these topics were used as initial codes (e.g., “action”, “Great Barrier Reef”, “Paris Agreement”, “denial”). In sum, the algorithmic output provided us with an initial set of codes and an understanding of the topic structure that can indicate important features of the corpus.
A member of the research team performed this augmented TA to generate themes. A second rater outside of the research team applied the generated themes to the data, and inter-rater agreement was assessed. Following this, the two raters reached a consensus on the theme of each tweet.
Through TA, we inductively generated five distinct themes. We assigned each tweet to one (and only one) theme. A degree of ambiguity is involved in designating themes for tweets, and seven tweets were too ambiguous to subsume into our thematic framework. The remaining 213 tweets were assigned to one of five themes shown in Table 5 .
In an initial application of the coding scheme, the two raters agreed upon 161 (73.181%) of 220 tweets. Inter-rater reliability was satisfactory, Cohen’s κ = 0.648, p < 0.05. An assessment of agreement for each theme is presented in Table 5 . The proportion of agreement is the total proportion of observations where the two coders both agreed: (1) a tweet belonged to the theme, or (2) a tweet did not belong to the theme. The proportion of specific agreement is the conditional probability that a randomly selected rater will assign the theme to a tweet, given that the other rater did (see Supplementary Material for more information). Theme 3, theme 5, and the N/A categorization had lower levels of agreement than the remaining themes, possibly as tweets belonging to themes 3 and 5 often make references to content relevant to other themes.
Theme 1. Climate change action
The theme occurring most often was climate change action, whereby tweets were related to coping with, preparing for, or preventing climate change. Tweets comment on the action (and inaction) of politicians, political parties, and international cooperation between government, and to a lesser degree, industry, media, and the public. The theme encapsulated commentary on: prioritizing climate change action (“ Let’s start working together for real solutions on climate change ”); Footnote 4 relevant strategies and policies to provide such action (“ #OurOcean is absorbing the majority of #climatechange heat. We need #marinereserves to help build resilience. ”); and the undertaking (“ Labor will take action on climate change, cut pollution, secure investment & jobs in a growing renewables industry ”) or disregarding (“ act on Paris not just sign ”) of action.
Often, users were critical of current or anticipated action (or inaction) towards climate change, criticizing approaches by politicians and governments as ineffective (“ Malcolm Turnbull will never have a credible climate change policy ”), Footnote 5 and undesirable (“ Govt: how can we solve this vexed problem of climate change? Helpful bystander: u could not allow a gigantic coal mine. Govt: but srsly how? ”). Predominately, users characterized the government as unjustifiably paralyzed (“ If a foreign country did half the damage to our country as #climatechange we would declare war. ”), without a leadership focused on addressing climate change (“ an election that leaves Australia with no leadership on #climatechange - the issue of our time! ”).
Theme 2. Consequences of climate change
Users commented on the consequences and risks attributed to climate change. This theme may be further categorized into commentary of: physical systems, such as changes in climate, weather, sea ice, and ocean currents (“ Australia experiencing more extreme fire weather, hotter days as climate changes ”); biological systems, such as marine life (particularly, the Great Barrier Reef) and biodiversity (“ Reefs of the future could look like this if we continue to ignore #climatechange ”); human systems (“ You and your friends will die of old age & I’m going to die from climate change ”); and other miscellaneous consequences (“ The reality is, no matter who you supported, or who wins, climate change is going to destroy everything you love ”). Users specified a wide range of risks and impacts on human systems, such as health, cultural diversity, and insurance. Generally, the consequences of climate change were perceived as negative.
Theme 3. Conversations on climate change
Some commentary centered around discussions of climate change communication, debates, art, media, and podcasts. Frequently, these pertained to debates between politicians (“ not so gripping from No Principles Malcolm. Not one mention of climate change in his pitch. ”) and television panel discussions (“ Yes let’s all debate whether climate change is happening... #qanda ”). Footnote 6 Users condemned the climate change discussions of federal government (“ Turnbull gov echoes Stalinist Russia? Australia scrubbed from UN climate change report after government intervention ”), those skeptical of climate change (“ Trouble is climate change deniers use weather info to muddy debate. Careful???????????????? ”), and media (“ Will politicians & MSM hacks ever work out that they cannot spin our way out of the #climatechange crisis? ”). The term “climate change” was critiqued, both by users skeptical of the legitimacy of climate change (“ Weren’t we supposed to call it ‘climate change’ now? Are we back to ‘global warming’ again? What happened? Apart from summer? ”) and by users seeking action (“ Maybe governments will actually listen if we stop saying “extreme weather” & “climate change” & just say the atmosphere is being radicalized ”).
Theme 4. Climate change deniers
The fourth theme involved commentary on individuals or groups who were perceived to deny climate change. Generally, these were politicians and associated political parties, such as: Malcolm Roberts (a climate change skeptic, elected as an Australian Senator in 2016), Malcolm Turnbull, and Donald Trump. Commentary focused on the beliefs and legitimacy of those who deny the science of climate change (“ One Nation’s Malcolm Roberts is in denial about the facts of climate change ”) or support the denial of climate change science (“ Meanwhile in Australia... Malcolm Roberts, funded by climate change skeptic global groups loses the plot when nobody believes his findings ”). Some users advocated attempts to change the beliefs of those who deny climate change science (“ We have a president-elect who doesn’t believe in climate change. Millions of people are going to have to say: Mr. Trump, you are dead wrong ”), whereas others advocated disengaging from conversation entirely (“ You know I just don’t see any point engaging with climate change deniers like Roberts. Ignore him ”). In comparison to other themes, commentary revolved around individuals and their beliefs, rather than the phenomenon of climate change itself.
Theme 5. The legitimacy of climate change and climate science
Using our four-phased framework, we aimed to identify and qualitatively inspect the most enduring aspects of climate change commentary from Australian posts on Twitter in 2016. We achieved this by using computational techniques to model 205 topics of the corpus, and identify and group similar topics that repeatedly occurred throughout the year. From the most relevant topic groupings, we extracted a subsample of tweets and identified five themes with a thematic analysis: climate change action, consequences of climate change, conversations on climate change, climate change deniers, and the legitimacy of climate change and climate science. Overall, we demonstrated the process of using a mixed-methodology that blends qualitative analyses with data science methods to explore social media data.
Our workflow draws on the advantages of both quantitative and qualitative techniques. Without quantitative techniques, it would be impossible to derive topics that apply to the entire corpus. The derived topics are a preliminary map for understanding the corpus, serving as a scaffold upon which we could derive meaningful themes contextualized within the wider socio-political context of Australia in 2016. By incorporating quantitatively-derived topics into the qualitative process, we attempted to construct themes that would generalize to a larger, relevant component of the corpus. The robustness of these themes is corroborated by their association with computationally-derived topics, which repeatedly occurred throughout the year (i.e., prevalent topic groupings). Moreover, four of the five themes have been observed in existing data science analyses of Twitter climate change commentary. Within the literature, the themes of climate change action and consequences of climate change are common (Newman, 2016 ; O’Neill et al., 2015 ; Pathak et al., 2017 ; Pearce, 2014 ; Jang and Hart, 2015 ; Veltri & Atanasova, 2017 ). The themes of the legitimacy of climate change and climate science (Jang & Hart, 2015 ; Newman, 2016 ; O’Neill et al., 2015 ; Pearce, 2014 ) and climate change deniers (Pathak et al., 2017 ) have also been observed. The replication of these themes demonstrates the validity of our findings.
One of the five themes—conversations on climate change—has not been explicitly identified in existing data science analyses of tweets on climate change. Although not explicitly identifying the theme, Kirilenko and Stepchenkova ( 2014 ) found hashtags related to public conversations (e.g., “#qanda”, “#Debates”) were used frequently throughout the year 2012. Similar to the literature, few (if any) topics in our 205 topic solution could be construed as solely relating to the theme of “conversation”. However, as we progressed through the different phases of the framework, the theme became increasingly apparent. By the grouping stage, we identified a collection of topics unified by a keyword relating to debate. The subsequent thematic analysis clearly discerned this theme. The derivation of a theme previously undetected by other data science studies lends credence to the conclusions of Guetterman et al., ( 2018 ), who deduced that supplementing a quantitative approach with a qualitative technique can lead to the generation of more themes than a quantitative approach alone.
The uniqueness of a conversational theme can be accounted for by three potentially contributing factors. Firstly, tweets related to conversations on climate change often contained material pertinent to other themes. The overlap between this theme and others may hinder the capabilities of computational techniques to uniquely cluster these tweets, and undermine the ability of humans to reach agreement when coding content for this theme (indicated by the relatively low proportion of specific agreement in our thematic analysis). Secondly, a conversational theme may only be relevant in election years. Unlike other studies spanning long time periods (Jang and Hart, 2015 ; Veltri & Atanasova, 2017 ), Kirilenko and Stepchenkova ( 2014 ) and our study harvested data from US presidential election years (2012 and 2016, respectively). Moreover, an Australian federal election occurred in our year of observation. The occurrence of national elections and associated political debates may generate more discussion and criticisms of conversations on climate change. Alternatively, the emergence of a conversational theme may be attributable to the Australian panel discussion television program Q & A. The program regularly hosts politicians and other public figures to discuss political issues. Viewers are encouraged to participate by publishing tweets using the hashtag “#qanda”, perhaps prompting viewers to generate uniquely tagged content not otherwise observed in other countries. Importantly, in 2016, Q & A featured a debate on climate change between science communicator Professor Brian Cox and Senator Malcolm Roberts, a prominent climate science skeptic.
Although our four-phased framework capitalizes on both quantitative and qualitative techniques, it still has limitations. Namely, the sparse content relationships between data points (in our case, tweets) can jeopardize the quality and reproducibility of algorithmic results (e.g., Chuang et al., 2015 ). Moreover, computational techniques can require large computing resources. To a degree, our application mitigated these limitations. We adopted a topic modeling algorithm which uses additional dimensions of tweets (social and temporal) to address the influence of term-to-term sparsity (Nugroho et al., 2017 ). To circumvent concerns of computing resources, we partitioned the corpus into batches, modeled the topics in each batch, and grouped similar topics together using another computational technique (Chuang et al., 2015 ).
As a demonstration of our four-phased framework, our application is limited to a single example. For data collection, we were able to draw from the procedures of existing studies which had successfully used keywords to identify climate change tweets. Without an existing literature, identifying diagnostic terms can be difficult. Nevertheless, this demonstration of our four-phased framework exemplifies some of the critical decisions analysts must make when utilizing a mixed-method approach to social media data.
Both qualitative and quantitative researchers can benefit from our four-phased framework. For qualitative researchers, we provide a novel vehicle for addressing their research questions. The diversity and volume of content of social media data may be overwhelming for both the researcher and their method. Through computational techniques, the diversity and scale of data can be managed, allowing researchers to obtain a large volume of data and extract from it a relevant sample to conduct qualitative analyses. Additionally, computational techniques can help researchers explore and comprehend the nature of their data. For the quantitative researcher, our four-phased framework provides a strategy for formally documenting the qualitative interpretations. When applying algorithms, analysts must ultimately make qualitative assessments of the quality and meaning of output. In comparison to the mathematical machinery underpinning these techniques, the qualitative interpretations of algorithmic output are not well-documented. As these qualitative judgments are inseparable from data science, researchers should strive to formalize and document their decisions—our framework provides one means of achieving this goal.
Through the application of our four-phased framework, we contribute to an emerging literature on public perceptions of climate change by providing an in-depth examination of the structure of Australian social media discourse. This insight is useful for communicators and policy makers hoping to understand and engage the Australian online public. Our findings indicate that, within Australian commentary on climate change, a wide variety of messages and sentiment are present. A positive aspect of the commentary is that many users want action on climate change. The time is ripe it would seem for communicators to discuss Australia’s policy response to climate change—the public are listening and they want to be involved in the discussion. Consistent with this, we find some users discussing conversations about climate change as a topic. Yet, in some quarters there is still skepticism about the legitimacy of climate change and climate science, and so there remains a pressing need to implement strategies to persuade members of the Australian public of the reality and urgency of the climate change problem. At the same time, our analyses suggest that climate communicators must counter the sometimes held belief, expressed in our second theme on climate change consequences, that it is already too late to solve the climate problem. Members of the public need to be aware of the gravity of the climate change problem, but they also need powerful self efficacy promoting messages that convince them that we still have time to solve the problem, and that their individual actions matter.
On Twitter, users may precede a phrase with a hashtag (#). This allows users to signify and search for tweets related to a specific theme.
The analysis of this study was preregistered on the Open Science Framework: https://osf.io/mb8kh/ . See the Supplementary Material for a discussion of discrepancies. Analysis scripts and interim results from computational techniques can be found at: https://github.com/AndreottaM/TopicAlignment .
83 tweets were rendered empty and discarded from the corpus.
The content of tweet are reported verbatim. Sensitive information is redacted.
Malcolm Turnbull was the Prime Minister of Australia during the year 2016.
“ #qanda ” is a hashtag used to refer to Q & A, an Australian panel discussion television program.
Commonwealth Scientific and Industrial Research Organisation (CSIRO) is the national scientific research agency of Australia.
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis of Twitter data. In Proceedings of the Workshop on Languages in Social Media (pp. 30–38). Stroudsburg: Association for Computational Linguistics.
Allemang, D., & Hendler, J. (2011) Semantic web for the working ontologist: Effective modelling in RDFS and OWL , (2nd edn.) United States of America: Elsevier Inc.
Google Scholar
Auer, M.R., Zhang, Y., & Lee, P. (2014). The potential of microblogs for the study of public perceptions of climate change. Wiley Interdisciplinary Reviews: Climate Change , 5 (3), 291–296. https://doi.org/10.1002/wcc.273
Article Google Scholar
Bird, S., Klein, E., & Loper, E. (2009) Natural language processing with Python: Analyzing text with the natural language toolkit . United States of America: O’Reilly Media, Inc.
Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research , 3 , 993–1022.
Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society , 15 (5), 662–679. https://doi.org/10.1080/1369118X.2012.678878
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology , 3 (2), 77–101. https://doi.org/10.1191/1478088706qp063oa
Cameron, M.A., Power, R., Robinson, B., & Yin, J. (2012). Emergency situation awareness from Twitter for crisis management. In Proceedings of the 21st international conference on World Wide Web (pp. 695–698). New York : ACM, https://doi.org/10.1145/2187980.2188183
Chuang, J., Wilkerson, J.D., Weiss, R., Tingley, D., Stewart, B.M., Roberts, M.E., & et al. (2014). Computer-assisted content analysis: Topic models for exploring multiple subjective interpretations. In Advances in Neural Information Processing Systems workshop on human-propelled machine learning (pp. 1–9). Montreal, Canada: Neural Information Processing Systems.
Chuang, J., Roberts, M.E., Stewart, B.M., Weiss, R., Tingley, D., Grimmer, J., & Heer, J. (2015). TopicCheck: Interactive alignment for assessing topic model stability. In Proceedings of the conference of the North American chapter of the Association for Computational Linguistics - Human Language Technologies (pp. 175–184). Denver: Association for Computational Linguistics, https://doi.org/10.3115/v1/N15-1018
Collins, L., & Nerlich, B. (2015). Examining user comments for deliberative democracy: A corpus-driven analysis of the climate change debate online. Environmental Communication , 9 (2), 189–207. https://doi.org/10.1080/17524032.2014.981560
Correa, T., Hinsley, A.W., & de, Zùñiga H.G. (2010). Who interacts on the Web?: The intersection of users’ personality and social media use. Computers in Human Behavior , 26 (2), 247–253. https://doi.org/10.1016/j.chb.2009.09.003
CSIRO (2018). Emergency Situation Awareness. Retrieved 2019-02-20, from https://esa.csiro.au/ausnz/about-public.html
Denef, S., Bayerl, P.S., & Kaptein, N.A. (2013). Social media and the police: Tweeting practices of British police forces during the August 2011 riots. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 3471–3480). NY: ACM.
Fang, A., Macdonald, C., Ounis, I., & Habel, P. (2016a). Topics in Tweets: A user study of topic coherence metrics for Twitter data. In 38th European conference on IR research, ECIR 2016 (pp. 429–504). Switzerland: Springer International Publishing.
Fang, A., Macdonald, C., Ounis, I., & Habel, P. (2016b). Using word embedding to evaluate the coherence of topics from Twitter data. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval (pp. 1057–1060). NY: ACM Press, https://doi.org/10.1145/2911451.2914729
Freelon, D., Lopez, L., Clark, M.D., & Jackson, S.J. (2018). How black Twitter and other social media communities interact with mainstream news (Tech. Rep.). Knight Foundation. Retrieved 2018-04-20, from https://knightfoundation.org/features/twittermedia
Gudivada, V.N., Baeza-Yates, R.A., & Raghavan, V.V. (2015). Big data: Promises and problems. IEEE Computer , 48 (3), 20–23.
Guetterman, T.C., Chang, T., DeJonckheere, M., Basu, T., Scruggs, E., & Vydiswaran, V.V. (2018). Augmenting qualitative text analysis with natural language processing: Methodological study. Journal of Medical Internet Research , 20 , 6. https://doi.org/10.2196/jmir.9702
Herring, S.C. (2009). Web content analysis: Expanding the paradigm. In J. Hunsinger, L. Klastrup, & M. Allen (Eds.) International handbook of internet research (pp. 233–249). Dordrecht: Springer.
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval , (Vol. 51 pp. 211–218). Berkeley: ACM, https://doi.org/10.1109/BigDataCongress.2015.21
Holmberg, K., & Hellsten, I. (2016). Integrating and differentiating meanings in tweeting about the fifth Intergovernmental Panel on Climate Change (IPCC) report. First Monday , 21 , 9. https://doi.org/10.5210/fm.v21i9.6603
Hoppe, R. (2009). Scientific advice and public policy: Expert advisers’ and policymakers’ discourses on boundary work. Poiesis & Praxis , 6 (3–4), 235–263. https://doi.org/10.1007/s10202-008-0053-3
Jang, S.M., & Hart, P.S. (2015). Polarized frames on “climate change” and “global warming” across countries and states: Evidence from Twitter big data. Global Environmental Change , 32 , 11–17. https://doi.org/10.1016/j.gloenvcha.2015.02.010
Kietzmann, J.H., Hermkens, K., McCarthy, I.P., & Silvestre, B.S. (2011). Social media? Get serious! Understanding the functional building blocks of social media. Business Horizons , 54 (3), 241–251. https://doi.org/10.1016/j.bushor.2011.01.005
Kirilenko, A.P., & Stepchenkova, S.O. (2014). Public microblogging on climate change: One year of Twitter worldwide. Global Environmental Change , 26 , 171–182. https://doi.org/10.1016/j.gloenvcha.2014.02.008
Lee, D.D., & Seung, H.S. (2000). Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems (pp. 556–562). Denver: Neural Information Processing Systems.
Lewis, S.C., Zamith, R., & Hermida, A. (2013). Content analysis in an era of big data: A hybrid approach to computational and manual methods. Journal of Broadcasting & Electronic Media , 57 (1), 34–52. https://doi.org/10.1080/08838151.2012.761702
Lietz, C.A., & Zayas, L.E. (2010). Evaluating qualitative research for social work practitioners. Advances in Social Work , 11 (2), 188–202.
Lorenzoni, I., & Pidgeon, N.F. (2006). Public views on climate change: European and USA perspectives. Climatic Change , 77 (1-2), 73–95. https://doi.org/10.1007/s10584-006-9072-z
Marwick, A.E. (2014). Ethnographic and qualitative research on Twitter. In K. Weller, A. Bruns, J. Burgess, M. Mahrt, & C. Puschmann (Eds.) Twitter and society , (Vol. 89 pp. 109–121). New York: Peter Lang.
Marwick, A.E., & Boyd, D. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society , 13 (1), 114–133. https://doi.org/10.1177/1461444810365313
McKechnie, L.E.F. (2008). Reactivity. In L. Given (Ed.) The SAGE encyclopedia of qualitative research methods , (Vol. 2 pp. 729–730). Thousand Oaks California United States: SAGE Publications, Inc, https://doi.org/10.4135/9781412963909.n368 .
McKenna, B., Myers, M.D., & Newman, M. (2017). Social media in qualitative research: Challenges and recommendations. Information and Organization , 27 (2), 87–99. https://doi.org/10.1016/j.infoandorg.2017.03.001
MySQL (2018). Full-Text Stopwords. Retrieved 2018-04-20, from https://dev.mysql.com/doc/refman/5.7/en/fulltext-stopwords.html
Newman, T.P. (2016). Tracking the release of IPCC AR5 on Twitter: Users, comments, and sources following the release of the Working Group I Summary for Policymakers. Public Understanding of Science , 26 (7), 1–11. https://doi.org/10.1177/0963662516628477
Newman, D., Lau, J.H., Grieser, K., & Baldwin, T. (2010). Automatic evaluation of topic coherence. In Human language technologies: The 2010 annual conference of the North American chapter of the Association for Computational Linguistics (pp. 100–108). Stroudsburg: Association for Computational Linguistics.
Nugroho, R., Yang, J., Zhao, W., Paris, C., & Nepal, S. (2017). What and with whom? Identifying topics in Twitter through both interactions and text. IEEE Transactions on Services Computing , 1–14. https://doi.org/10.1109/TSC.2017.2696531
Nugroho, R., Zhao, W., Yang, J., Paris, C., & Nepal, S. (2017). Using time-sensitive interactions to improve topic derivation in Twitter. World Wide Web , 20 (1), 61–87. https://doi.org/10.1007/s11280-016-0417-x
O’Neill, S., Williams, H.T.P., Kurz, T., Wiersma, B., & Boykoff, M. (2015). Dominant frames in legacy and social media coverage of the IPCC Fifth Assessment Report. Nature Climate Change , 5 (4), 380–385. https://doi.org/10.1038/nclimate2535
Onwuegbuzie, A.J., & Leech, N.L. (2007). Validity and qualitative research: An oxymoron? Quality & Quantity , 41 (2), 233–249. https://doi.org/10.1007/s11135-006-9000-3
Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the international conference on Language Resources and Evaluation (Vol. 5, pp. 1320–1326). European Language Resources Association.
Paris, C., Christensen, H., Batterham, P., & O’Dea, B. (2015). Exploring emotions in social media. In 2015 IEEE Conference on Collaboration and Internet Computing (pp. 54–61). Hangzhou, China: IEEE. https://doi.org/10.1109/CIC.2015.43
Parker, C., Saundage, D., & Lee, C.Y. (2011). Can qualitative content analysis be adapted for use by social informaticians to study social media discourse? A position paper. In Proceedings of the 22nd Australasian conference on information systems: Identifying the information systems discipline (pp. 1–7). Sydney: Association of Information Systems.
Pathak, N., Henry, M., & Volkova, S. (2017). Understanding social media’s take on climate change through large-scale analysis of targeted opinions and emotions. In The AAAI 2017 Spring symposium on artificial intelligence for social good (pp. 45–52). Stanford: Association for the Advancement of Artificial Intelligence.
Pearce, W. (2014). Scientific data and its limits: Rethinking the use of evidence in local climate change policy. Evidence and Policy: A Journal of Research, Debate and Practice , 10 (2), 187–203. https://doi.org/10.1332/174426514X13990326347801
Pearce, W., Holmberg, K., Hellsten, I., & Nerlich, B. (2014). Climate change on Twitter: Topics, communities and conversations about the 2013 IPCC Working Group 1 Report. PLOS ONE , 9 (4), e94785. https://doi.org/10.1371/journal.pone.0094785
Article PubMed PubMed Central Google Scholar
Pistrang, N., & Barker, C. (2012). Varieties of qualitative research: A pragmatic approach to selecting methods. In H. Cooper, P.M. Camic, D.L. Long, A.T. Panter, D. Rindskopf, & K.J. Sher (Eds.) APA handbook of research methods in psychology, vol 2: Research designs: Quantitative, qualitative, neuropsychological, and biological (pp. 5–18). Washington: American Psychological Association.
Procter, R., Vis, F., & Voss, A. (2013). Reading the riots on Twitter: Methodological innovation for the analysis of big data. International Journal of Social Research Methodology , 16 (3), 197–214. https://doi.org/10.1080/13645579.2013.774172
Ramage, D., Dumais, S.T., & Liebling, D.J. (2010). Characterizing microblogs with topic models. In International AAAI conference on weblogs and social media , (Vol. 10 pp. 131–137). Washington: AAAI.
Schneider, R.O. (2011). Climate change: An emergency management perspective. Disaster Prevention and Management: An International Journal , 20 (1), 53–62. https://doi.org/10.1108/09653561111111081
Sharma, E., Saha, K., Ernala, S.K., Ghoshal, S., & De Choudhury, M. (2017). Analyzing ideological discourse on social media: A case study of the abortion debate. In Annual Conference on Computational Social Science . Santa Fe: Computational Social Science.
Sisco, M.R., Bosetti, V., & Weber, E.U. (2017). Do extreme weather events generate attention to climate change? Climatic Change , 143 (1-2), 227–241. https://doi.org/10.1007/s10584-017-1984-2
Swain, J. (2017). Mapped: The climate change conversation on Twitter in 2016. Retrieved 2019-02-20, from https://www.carbonbrief.org/mapped-the-climate-change-conversation-on-twitter-in-2016
Veltri, G.A., & Atanasova, D. (2017). Climate change on Twitter: Content, media ecology and information sharing behaviour. Public Understanding of Science , 26 (6), 721–737. https://doi.org/10.1177/0963662515613702
Article PubMed Google Scholar
Weinberg, B.D., & Pehlivan, E. (2011). Social spending: Managing the social media mix. Business Horizons , 54 (3), 275–282. https://doi.org/10.1016/j.bushor.2011.01.008
Weng, J., Lim, E.-P., Jiang, J., & He, Q. (2010). TwitterRank: Finding topic-sensitive influential Twitterers. In Proceedings of the Third ACM international conference on web search and data mining (pp. 261–270). New York: ACM, https://doi.org/10.1145/1718487.1718520
Williams, H.T., McMurray, J.R., Kurz, T., & Hugo Lambert, F. (2015). Network analysis reveals open forums and echo chambers in social media discussions of climate change. Global Environmental Change , 32 , 126–138. https://doi.org/10.1016/j.gloenvcha.2015.03.006
Yin, S., & Kaynak, O. (2015). Big data for modern industry: Challenges and trends [point of view]. Proceedings of the IEEE , 103 (2), 143–146. https://doi.org/10.1109/JPROC.2015.2388958
Download references
Author information
Authors and affiliations.
School of Psychological Science, University of Western Australia, 35 Stirling Highway, Perth, WA, 6009, Australia
Matthew Andreotta, Mark J. Hurlstone & Simon Farrell
Data61, CSIRO, Corner Vimiera and Pembroke Streets, Marsfield, NSW, 2122, Australia
Matthew Andreotta, Robertus Nugroho & Cecile Paris
Faculty of Computer Science, Soegijapranata Catholic University, Semarang, Indonesia
Robertus Nugroho
Ocean & Atmosphere, CSIRO, Indian Ocean Marine Research Centre, The University of Western Australia, Crawley, WA, 6009, Australia
Fabio Boschetti
School of Psychology and Counselling, University of Canberra, Canberra, Australia
Iain Walker
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Matthew Andreotta .
Additional information
Author note.
This research was supported by an Australian Government Research Training Program (RTP) Scholarship from the University of Western Australia and a scholarship from the CSIRO Research Office awarded to the first author, and a grant from the Climate Adaptation Flagship of the CSIRO awarded to the third and sixth authors. The authors are grateful to Bella Robinson and David Ratcliffe for their assistance with data collection, and Blake Cavve for their assistance in annotating the data.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
(PDF 343 KB)
Rights and permissions.
Reprints and permissions
About this article
Andreotta, M., Nugroho, R., Hurlstone, M.J. et al. Analyzing social media data: A mixed-methods framework combining computational and qualitative text analysis. Behav Res 51 , 1766–1781 (2019). https://doi.org/10.3758/s13428-019-01202-8
Download citation
Published : 02 April 2019
Issue Date : 15 August 2019
DOI : https://doi.org/10.3758/s13428-019-01202-8
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Topic modeling
- Thematic analysis
- Climate change
- Joint matrix factorization
- Topic alignment
- Find a journal
- Publish with us
- Track your research
IEEE Account
- Change Username/Password
- Update Address
Purchase Details
- Payment Options
- Order History
- View Purchased Documents
Profile Information
- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Accessibility
- Terms of Use
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
Information
- Author Services
Initiatives
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
- Active Journals
- Find a Journal
- Journal Proposal
- Proceedings Series
- For Authors
- For Reviewers
- For Editors
- For Librarians
- For Publishers
- For Societies
- For Conference Organizers
- Open Access Policy
- Institutional Open Access Program
- Special Issues Guidelines
- Editorial Process
- Research and Publication Ethics
- Article Processing Charges
- Testimonials
- Preprints.org
- SciProfiles
- Encyclopedia
Article Menu
- Subscribe SciFeed
- Recommended Articles
- Google Scholar
- on Google Scholar
- Table of Contents
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
JSmol Viewer
Social media analytics and metrics for improving users engagement.
1. Introduction
2. related background, 2.1. conceptualizing social media analytics and metrics—an overall point of view, 2.2. importance of social media platforms and analytics for lams—a scientometrics analysis, 2.3. prior efforts and research gaps, 3. methodology, 3.1. data collection and sample, 3.2. validity and reliability assessment, 3.3. predictive regression models, 4.1. validation of the proposed factors, 4.2. descriptive data summarization for initial performance estimations, 4.3. predictive regressions results, 5. discussion, 5.1. practical-managerial implications, 5.2. theoretical implications, 5.3. limitations and future work, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.
- W3C. A Standards-Based, Open and Privacy-Aware Social Web. W3C, 2010. Available online: https://www.w3.org/2005/Incubator/socialweb/XGR-socialweb-20101206/ (accessed on 11 March 2022).
- Department, Statista Research. Number of Social Network Users Worldwide from 2017 to 2025. Statista. Available online: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/ (accessed on 11 March 2022).
- Cheng, W.W.H.; Lam, E.T.H.; Chiu, D.K.W. Social Media as a Platform in Academic Library Marketing: A Comparative Study. J. Acad. Libr. 2020 , 46 , 102188. [ Google Scholar ] [ CrossRef ]
- Fong, K.C.H.; Au, C.H.; Lam, E.T.H.; Chiu, D.K.W. Social Network Services for Academic Libraries: A Study Based on Social Capital and Social Proof. J. Acad. Librariansh. 2020 , 46 , 102091. [ Google Scholar ] [ CrossRef ]
- Choi, N.; Joo, S. Understanding Public Libraries’ Challenges, Motivators, and Perceptions toward the Use of Social Media for Marketing. Libr. Hi Tech. 2018 , 39 , 352–367. [ Google Scholar ] [ CrossRef ]
- Al-Daihani, M.S.; Abrahams, A. Analysis of Academic Libraries’ Facebook Posts: Text and Data Analytics. J. Acad. Librariansh. 2018 , 44 , 216–225. [ Google Scholar ] [ CrossRef ]
- Bountouri, L.; Giannakopoulos, G. The Use of Social Media in Archives. Procedia-Soc. Behav. Sci. 2014 , 147 , 510–517. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Garoufallou, E.; Siatri, R.; Zafeiriou, G.; Balampanidou, E. The Use of Marketing Concepts in Library Services: A Literature Review. Libr. Rev. 2013 , 62 , 312–334. [ Google Scholar ] [ CrossRef ]
- Langa, L.A. Does Twitter Help Museums Engage with Visitors? In Proceedings of the iConference 2014, Berlin, Germany, 4–7 March 2014. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Mensah, M.; Onyancha, O.B. Building and Enhancing Library Services: Patrons’ Awareness of, and Engagement with Social Media in Academic Libraries in Ghana. J. Librariansh. Inf. Sci. 2021 , 1–18. [ Google Scholar ] [ CrossRef ]
- Han, L.; Shen, Y. Design of Social Media User Satisfaction Evaluation System from the Perspective of Big Data Services. In Proceedings of the 2021 International Conference on Big Data Analysis and Computer Science (BDACS), Kunming, China, 25–27 June 2021. [ Google Scholar ] [ CrossRef ]
- Prado, C.; Javier, F.; García-Reyes, M.C.J. Social Media and Library Metrics and Indicators: How Can We Measure Impact on Performance? In Proceedings of the 17th Conference of the International Society for Scientometrics and Informetrics ISSI2019, Rome, Italy, 2–5 September 2019; Available online: https://e-archivo.uc3m.es/handle/10016/31448 (accessed on 11 March 2022).
- Drivas, I.C.; Sakas, D.P.; Giannakopoulos, G.A.; Kyriaki-Manessi, D. Big Data Analytics for Search Engine Optimization. Big Data Cogn. Comput. 2020 , 4 , 5. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Kaplan, A.M.; Haenlein, M. Users of the World, Unite! The Challenges and Opportunities of Social Media. Bus. Horiz. 2010 , 53 , 59–68. [ Google Scholar ] [ CrossRef ]
- Yang, M.; Kiang, M.; Ku, Y.; Chiu, C.; Li, Y. Social Media Analytics for Radical Opinion Mining in Hate Group Web Forums. J. Homel. Secur. Emerg. Manag. 2011 , 8 . [ Google Scholar ] [ CrossRef ]
- Zeng, D.; Chen, H.; Lusch, R.; Li, S. Social Media Analytics and Intelligence. IEEE Intell. Syst. 2010 , 25 , 13–16. [ Google Scholar ] [ CrossRef ]
- Awareness Inc. Actionable Social Analytics: From Social Media Metrics to Business Insights. 2012. Available online: https://www.cbpp.uaa.alaska.edu/afef/Actionable-Social-Analytics.pdf (accessed on 29 April 2022).
- Misirlis, N.; Vlachopoulou, M. Social Media Metrics and Analytics in Marketing–S3m: A Mapping Literature Review. Int. J. Inf. Manag. 2018 , 38 , 270–276. [ Google Scholar ] [ CrossRef ]
- Bowden, J.L. The Process of Customer Engagement: A Conceptual Framework. J. Mark. Theory Pract. 2009 , 17 , 63–74. [ Google Scholar ] [ CrossRef ]
- Barger, V.; Peltier, J.W.; Schultz, D.E. Social Media and Consumer Engagement: A Review and Research Agenda. J. Interact. Mark. 2016 , 10 , 268–287. [ Google Scholar ] [ CrossRef ]
- van Doorn, J.; Lemon, K.N.; Mittal, V.; Nass, S.; Pick, D.; Pirner, P.; Verhoef, P.C. Customer Engagement Behavior: Theoretical Foundations and Research Directions. J. Serv. Res. 2010 , 13 , 253–266. [ Google Scholar ] [ CrossRef ]
- Trunfio, M.; Rossi, S. Conceptualizing and Measuring Social Media Engagement: A Systematic Literature Review. J. Mark. 2021 , 2021 , 267–292. [ Google Scholar ] [ CrossRef ]
- Hallock, W.; Roggeveen, A.L.; Crittenden, V. Firm-Level Perspectives on Social Media Engagement: An Exploratory Study. Qual. Mark. Res. 2019 , 22 , 217–226. [ Google Scholar ] [ CrossRef ]
- Le, T.D. Influence of Wom and Content Type on Online Engagement in Consumption Communities. Online Inf. Rev. 2018 , 42 , 161–175. [ Google Scholar ] [ CrossRef ]
- Schee, V.A.B.; Peltier, J.; Dahl, A.J. Antecedent Consumer Factors, Consequential Branding Outcomes and Measures of Online Consumer Engagement: Current Research and Future Directions. J. Serv. Res. 2020 , 14 , 239–268. [ Google Scholar ] [ CrossRef ]
- Cervone, H.F. Evaluating Social Media Presence. Digit. Libr. Perspect. 2017 , 33 , 2–7. [ Google Scholar ] [ CrossRef ]
- Jones, M.J.; Harvey, M. Library 2.0: The Effectiveness of Social Media as a Marketing Tool for Libraries in Educational Institutions. J. Librariansh. Inf. Sci. 2019 , 51 , 3–19. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Pozas, A.C. Evaluando Los Medios Sociales De La Biblioteca Nacional De España: Métricas E Indicator’s. Available online: https://perma.cc/96KZ-SPW6 (accessed on 11 March 2022).
- González Fernández-Villavicencio, N. Bibliotecas, Medios Y Métricas De La Web Social. An. Doc. 2016 , 19 . [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Gerrard, D.; Sykora, M.; Jackson, T. Social Media Analytics in Museums: Extracting Expressions of Inspiration. Mus. Manag. Curatorship 2017 , 32 , 232–250. [ Google Scholar ] [ CrossRef ]
- Villaespesa, E. Diving into the Museum’s Social Media Stream. Analysis of the Visitor Experience in 140 Characters. In Proceedings of the Museums and the Web 2013 Conference, Portland, OR, USA, 17–20 April 2013; Available online: https://www.researchgate.net/publication/334285940_Diving_into_the_museum’s_social_media_stream_Analysis_of_the_visitor_experience_In_140_characters (accessed on 11 March 2022).
- Lê, J.T. #Fashionlibrarianship: A Case Study on the Use of Instagram in a Specialized Museum Library Collection. Art Doc. 2019 , 38 , 279–304. [ Google Scholar ] [ CrossRef ]
- Boulton, S. Social Engagement and Institutional Repositories: A Case Study. Insights UKSG J. 2020 , 33 . [ Google Scholar ] [ CrossRef ]
- Magier, D. Archives at the Time of Lockdown. Activity in Social Media Based on the Example of the State Archives in Siedlce between March 2020 and March 2021 (a Research Report). Hist. Świat 2021 , 10 , 454–459. [ Google Scholar ] [ CrossRef ]
- Melchers, R.E.; Beck, A.T. Structural Reliability Analysis and Prediction ; John Wiley & Sons: Hoboken, NJ, USA, 2018. [ Google Scholar ]
- Aven, T. Improving the Foundation and Practice of Reliability Engineering. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 2017 , 231 , 295–305. [ Google Scholar ] [ CrossRef ]
- Drivas, I.; Kouis, D.; Kyriaki-Manessi, D.; Giannakopoulos, G. Content Management Systems Performance and Compliance Assessment Based on a Data-Driven Search Engine Optimization Methodology. Information 2021 , 12 , 259. [ Google Scholar ] [ CrossRef ]
- Huertas, A.; Marine-Roig, E. User Reactions to Destination Brand Contents in Social Media. Inf. Technol. Tour. 2016 , 15 , 291–315. [ Google Scholar ] [ CrossRef ]
- Manca, S. Digital Memory in the Post-Witness Era: How Holocaust Museums Use Social Media as New Memory Ecologies. Information 2021 , 12 , 31. [ Google Scholar ] [ CrossRef ]
- Cassidy, E.D.; Colmenares, A.; Jones, G.; Manolovitz, T.; Shen, L.; Vieira, S. Higher Education and Emerging Technologies: Shifting Trends in Student Usage. J. Acad. Librariansh. 2014 , 40 , 124–133. [ Google Scholar ] [ CrossRef ]
- Okoroma, F.N. Use of Social Media for Modern Reference Service Delivery in Academic Libraries in Nigeria. Int. J. Asian Soc. Sci. 2018 , 8 , 518–527. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Fissi, S.; Gori, E.; Romolini, A.; Contri, M. Stakeholder Engagement: Verso Un Utilizzo Dei Social Media Nei Musei Italiani? Manag. Control 2019 , 145–160. [ Google Scholar ] [ CrossRef ]
- Camarero, C.; Garrido, M.; Jose, R.S. What Works in Facebook Content Versus Relational Communication: A Study of Their Effectiveness in the Context of Museums. Int. J. Hum.-Comput. Interact. 2018 , 34 , 1119–1134. [ Google Scholar ] [ CrossRef ]
- Mukwevho, J.; Ngoepe, M. Taking Archives to the People. Libr. Hi Tech. 2019 , 37 , 374–388. [ Google Scholar ] [ CrossRef ]
- Tkacová, H.; Králik, R.; Tvrdoň, M.; Jenisová, Z.; Martin, J.G. Credibility and Involvement of Social Media in Education—Recommendations for Mitigating the Negative Effects of the Pandemic among High School Students. Int. J. Environ. Res. Public Health 2022 , 19 , 2767. [ Google Scholar ] [ CrossRef ] [ PubMed ]
- Dziuban, C.D.; Shirkey, E.C. When Is a Correlation Matrix Appropriate for Factor Analysis? Some Decision Rules. Psychol. Bull. 1974 , 81 , 358–361. [ Google Scholar ] [ CrossRef ]
- Schmidt, A.F.; Finan, C. Linear Regression and the Normality Assumption. J. Clin. Epidemiol. 2018 , 98 , 146–151. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Razali, N.M.; Wah, Y.B. Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests. J. Stat. Mod. Anal. 2011 , 2 , 21–33. Available online: https://www.nrc.gov/docs/ML1714/ML17143A100.pdf (accessed on 17 March 2022).
- Von, H.; Paul, T. Mean, Median, and Skew: Correcting a Textbook Rule. J. Educ. Stat. 2005 , 13 . [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Crede, M.; Harms, P. Questionable Research Practices When Using Confirmatory Factor Analysis. J. Manag. Psychol. 2019 , 34 , 18–30. [ Google Scholar ] [ CrossRef ]
- Hayes, A.F.; Coutts, J.J. Use Omega Rather Than Cronbach’s Alpha for Estimating Reliability. But…. Commun. Methods Meas. 2020 , 14 , 1–24. [ Google Scholar ] [ CrossRef ]
- Ursachi, G.; Horodnic, I.A.; Zait, A. How Reliable Are Measurement Scales? External Factors with Indirect Influence on Reliability Estimators. Procedia Econ. Fin. 2015 , 20 , 679–686. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Callender, J.C.; Osburn, H.G. An Empirical Comparison of Coefficient Alpha, Guttman’s Lambda-2, and Msplit Maximized Split-Half Reliability Estimates. J. Educ. Meas. 1979 , 16 , 89–99. [ Google Scholar ] [ CrossRef ]
- Revelle, W. Hierarchical Cluster Analysis and the Internal Structure of Tests. Multivar. Behav. Res. 1979 , 14 , 57–74. [ Google Scholar ] [ CrossRef ]
- Petter, S.; Straub, D.; Rai, A. Specifying Formative Constructs in Information Systems Research. MIS Q. 2007 , 31 , 623–656. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Diamantopoulos, A.; Siguaw, J.A. Formative Versus Reflective Indicators in Organizational Measure Development: A Comparison and Empirical Illustration. Br. J. Manag. 2006 , 17 , 263–282. [ Google Scholar ] [ CrossRef ]
- Bedeian, A.G.; Mossholder, K.W. Simple Question, Not So Simple Answer: Interpreting Interaction Terms in Moderated Multiple Regression. J. Manag. 1994 , 20 , 159–165. [ Google Scholar ] [ CrossRef ]
- Dankowski, T. How Libraries Are Using Social Media. Am. Libr. 2013 , 44 , 38–41. Available online: https://www.jstor.org/stable/24602212 (accessed on 11 March 2022).
- Stieglitz, S.; Mirbabaie, M.; Ross, B.; Neuberger, C. Social Media Analytics–Challenges in Topic Discovery, Data Collection, and Data Preparation. Int. J. Inf. Manag. 2018 , 39 , 156–168. [ Google Scholar ] [ CrossRef ]
- Eurostat; European Union. Culture Statistics-2019 Edition ; European Union: Luxembourg, 2019. [ CrossRef ]
- Eurostat Statistics Explained. Government Expenditure on Cultural, Broadcasting and Publishing Services. Available online: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Culture_statistics_-_government_expenditure_on_cultural,_broadcasting_and_publishing_services&oldid=554580 (accessed on 11 March 2022).
- Krstić, N.; Masliković, D. Pain Points of Cultural Institutions in Search Visibility: The Case of Serbia. Libr. Hi Tech 2019 , 37 , 496–512. [ Google Scholar ] [ CrossRef ]
- Järvinen, J.; Karjaluoto, H. The Use of Web Analytics for Digital Marketing Performance Measurement. Ind. Mark. Manag. 2015 , 50 , 117–127. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Voorbij, H. The Use of Web Statistics in Cultural Heritage Institutions. Perform. Meas. Metr. 2010 , 11 , 266–279. [ Google Scholar ] [ CrossRef ]
- Grover, V.; Chiang, R.H.L.; Liang, T.; Zhang, D. Creating Strategic Business Value from Big Data Analytics: A Research Framework. Manag. Inf. Syst. 2018 , 35 , 388–423. [ Google Scholar ] [ CrossRef ]
- Sheldon, P.; Herzfeldt, E.; Rauschnabel, P.A. Culture and Social Media: The Relationship between Cultural Values and Hashtagging Styles. Behav. Inf. Technol. 2020 , 39 , 758–770. [ Google Scholar ] [ CrossRef ]
- Lykourentzou, I.; Antoniou, A. Digital Innovation for Cultural Heritage: Lessons from the European Year of Cultural Heritage. SCIRES-IT 2019 , 9 , 91–98. [ Google Scholar ] [ CrossRef ]
- Anyim, W.O. Identifying Gaps and Opportunities to Improve Performance in University Libraries Using Benchmarking and Performance Appraisal System. Libr. Philos. Pract. 2021 , 5066. Available online: https://digitalcommons.unl.edu/libphilprac/5066 (accessed on 11 April 2022).
- Rzheuskiy, A.; Kunanets, N.; Kut, V. The Analysis of the United States of America Universities Library Information Services with Benchmarking and Pairwise Comparisons Methods. In Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 5–8 September 2017. [ Google Scholar ] [ CrossRef ]
- Bhama, V.S.; Srividhya, S. Benchmarking for Quality Improvement in Academic Libraries: A Study with Special Reference to Bangalore City College Libraries. Int. J. Econ. Res. 2015 , 12 , 775–785. [ Google Scholar ]
- Leppink, J.; Perez-Fuster, P. We Need More Replication Research-a Case for Test-Retest Reliability. Perspect. Med. Educ. 2017 , 6 , 158–164. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Persson, S.; Svenningsson, M. Librarians as Advocates of Social Media for Researchers: A Social Media Project Initiated by Linköping University Library, Sweden. New Rev. Acad. Libr. 2016 , 22 , 304–314. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Click here to enlarge figure
Research Context Issues | Contributions |
---|---|
There is a need to improve the social media skills of staff for understanding users’ engagement with the uploaded content [ , , , , , , , ]. | Understanding social media analytics and metrics and the possible intercorrelations between them will improve staff skills in providing content that results in higher levels of users’ engagement. |
The majority of the current studies proceed into individual examinations of how a LAM utilizes SMPs to understand and measure users’ engagement with the published content [ , , , , , ]. | Further research is needed to provide a holistic approach and consequently a generalization framework on how SMAs could be utilized to increase users’ engagement and expand the awareness of LAMs organizations. This could also work as a benchmarking process for the administrators of the LAMs. |
Lack of a SMAs methodological framework that exhibits validity, reliability and internal consistency in terms of the included variables that measure LAMs users’ engagement with the published content [ , ]. | Suggest an assessment schema that expresses statistical reliability in its nature. This schema will quantitatively measure users’ engagement within an SMP of a LAM. |
Metric Name | Metric Description |
---|---|
Number of Posts | Number of posts that have been published in a specific period. |
Link posts | It is the number of posts in URL format that have been published in a specific period. |
Picture posts | It is the number of posts in picture format that have been published in a specific period. |
Video Posts | Number of posts in video format that have been published in a specific period. |
Comments per post | The average number of comments on posts in a specific period. |
Number of reactions | The total number of (like, love, haha, thankful, wow, sad, angry) on posts that have been published in a specific period. |
Reactions per Post | The average number of reactions on posts that have been published in a specific period. |
Number of Comments (total) | It refers to the total number of comments on posts. This includes answers to these comments that have been published in a specific period. |
Total Reactions, Comments, Shares | It expresses the number of reactions of any type (like, love, haha, thankful, wow, sad and angry), comments and shares on posts that the LAM organization has published in a specific period. |
Name of LAM | Number of Posts | Link Posts | Picture Posts | Video Posts | Comments per Post | Number of Reactions | Reactions per Post | Number of Comments (Total) | Total Reactions, Comments, Shares |
---|---|---|---|---|---|---|---|---|---|
Denver Art Museum | 29 | 3 | 17 | 3 | 2.52 | 1799 | 78.21 | 58 | 2105 |
National Library of Spain | 19 | 1 | 9 | 5 | 7.25 | 3940 | 246.25 | 116 | 5512 |
National Archives of Georgia | 34 | 3 | 22 | 6 | 1.22 | 1847 | 59.58 | 38 | 2242 |
Administrators Actions | Users Engagement | ||
---|---|---|---|
Variables | Variable Loading | Variable | Variable Loading |
Number of posts | 0.767 | Comments per post | 0.706 |
Link posts | 0.519 | Number of reactions | 0.727 |
Picture posts | 0.667 | Reactions per post | 0.690 |
Video posts | 0.624 | Number of comments (total) | 0.655 |
Total reactions, comments, shares | 0.751 | ||
Factors | McDonald’s ω | Cronbach’s α | Guttman’s λ-2 | Guttman’s λ-6 |
---|---|---|---|---|
Administrators Actions | 0.967 | 0.748 | 0.847 | 0.917 |
Followers Engagement | 0.915 | 0.648 | 0.889 | 0.934 |
Number of Posts | Link-Posts | Picture Posts | Video-Posts | |
---|---|---|---|---|
Mean | 27.591 | 3.695 | 19.447 | 4.448 |
Std. Deviation | 23.498 | 6.069 | 17.448 | 8.498 |
Skewness | 2.304 | 4.882 | 2.206 | 4.814 |
Shapiro-Wilk | 0.811 | 0.563 | 0.816 | 0.539 |
Minimum | 1.000 | 1.000 | 1.000 | 1.000 |
Maximum | 158.000 | 56.000 | 114.000 | 75.000 |
Comments per Post | Number of Reactions | Reactions per Post | Number of Comments (Total) | Total Reactions, Comments, Shares | |
---|---|---|---|---|---|
Mean | 3.562 | 3148.467 | 101.424 | 121.029 | 3890.619 |
Std. Deviation | 6.278 | 5706.804 | 159.020 | 262.374 | 7155.607 |
Skewness | 3.506 | 3.127 | 2.964 | 4.143 | 3.147 |
Shapiro-Wilk | 0.574 | 0.576 | 0.613 | 0.487 | 0.565 |
Minimum | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Maximum | 44.429 | 35,424.000 | 871.000 | 2109.000 | 40,991.000 |
Variable | Coefficient | R | F | p-Value |
---|---|---|---|---|
Constant (Number of Total Reactions, Comments, Shares) Number of Posts | 785.88 122.02 | 0.154 | 38.44 | <0.001 |
Constant Link Posts | | |||
Constant Picture posts | 806.75 168.01 | 0.155 | 38.25 | <0.001 |
Constant Video Posts | 2802.88 454.38 | 0.122 | 19.16 | <0.001 |
Variable | Coefficient | R | F | p-value |
---|---|---|---|---|
Constant (Number of Comments Total) Number of Posts | 9.723 4.46 | 0.171 | 35.47 | <0.001 |
Constant Link Posts | | |||
Constant Picture posts | 19.548 6.55 | 0.108 | 21.61 | <0.001 |
Constant Video Posts | 70.64 16.68 | 0.102 | 22.18 | <0.001 |
Variable | Coefficient | R | F | p-Value |
---|---|---|---|---|
Constant (Number of Reactions) Number of Posts | 78.87 127.26 | 0.163 | 40.66 | <0.001 |
Constant Link Posts | | |||
Constant Picture posts | 170.83 164.79 | 0.164 | 38.96 | <0.001 |
Constant Video Posts | 2490.82 322.38 | 0.107 | 15.78 | <0.001 |
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
Share and Cite
Drivas, I.C.; Kouis, D.; Kyriaki-Manessi, D.; Giannakopoulou, F. Social Media Analytics and Metrics for Improving Users Engagement. Knowledge 2022 , 2 , 225-242. https://doi.org/10.3390/knowledge2020014
Drivas IC, Kouis D, Kyriaki-Manessi D, Giannakopoulou F. Social Media Analytics and Metrics for Improving Users Engagement. Knowledge . 2022; 2(2):225-242. https://doi.org/10.3390/knowledge2020014
Drivas, Ioannis C., Dimitrios Kouis, Daphne Kyriaki-Manessi, and Fani Giannakopoulou. 2022. "Social Media Analytics and Metrics for Improving Users Engagement" Knowledge 2, no. 2: 225-242. https://doi.org/10.3390/knowledge2020014
Article Metrics
Article access statistics, further information, mdpi initiatives, follow mdpi.
Subscribe to receive issue release notifications and newsletters from MDPI journals
The Power of Social Media Analytics
- Communications of the ACM 57(6):74-81
- University of Iowa
- University of Michigan
Abstract and Figures
Discover the world's research
- 25+ million members
- 160+ million publication pages
- 2.3+ billion citations
- Sampath Kini K
- Pengfei Wang
- Nguepi Tsafack Elvis
- Daniel Jemiard Mmasomwayera Sinkula
- Made Ayu Jayanti Prita Utami
- I Wayan Eka Dian Rahmanu
- Ni Kadek Herna Lastari
- Kuldeep Vayadande
- Vipul Gejage
- Rutvik Gaikwad
- Sarthak Ghavate
- Gobinda Chandra Panda
- DECIS SUPPORT SYST
- Merrill Warkentin
- Isabelle Bär
- Roula AlBaroudi
- Mohamad Badran
- Ousha Awad AlNeyadi
- Gurdal Ertek
- Hsinchun Chen
- Sandor Czellar
- Edward R. Tufte
- James Sanger
- Andreas Christmann
- Daniel Marino
- Satish Narayanasamy
- David M. Blei
- Armin Shmilovici
- Recruit researchers
- Join for free
- Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
- Access through your organization
- Purchase PDF
Article preview
Introduction, section snippets, references (116), cited by (74).
Information Processing & Management
Social media analytics and business intelligence research: a systematic review.
- • A systematic review on social media analytics-based BI studies is presented.
- • Social media platforms that are in use in BI research are classified and analyzed.
- • Dominant methodologies or algorithms for BI research are classified and analyzed.
- • Types of intelligent information social media-based BI studies identify are analyzed.
- • Promising directions of the research stream are intended for researchers and practitioners.
Theoretical background
Evaluation dimension, systematic process, discussions and concluding remarks, credit authorship contribution statement, acknowledgement, in plain view: open source intelligence, computer fraud & security, discovering business intelligence from online product reviews: a rule-induction framework, expert systems with applications, 39, identifying comparative customer requirements from product online reviews for competitor analysis, engineering applications of artificial intelligence, tracking emerging technologies in energy research: toward a roadmap for sustainable energy, technological forecasting and social change, review based measurement of customer satisfaction in mobile service: sentiment analysis and vikor approach, expert systems with applications, social network analysis: characteristics of online social networks after a disaster, international journal of information management, liberating data for public value: the case of data. gov, identifying new business opportunities from competitor intelligence: an integrated use of patent and trademark databases, application technology opportunity discovery from technology portfolios: use of patent classification and collaborative filtering, a social media text analytics framework for double-loop learning for citizen-centric public services: a case study of a local government facebook use, government information quarterly, predicting the performance of online consumer reviews: a sentiment mining approach to big data analytics, decision support systems, 81, social media research in the industrial marketing field: review of literature and future research directions, industrial marketing management, tour recommendations by mining photo sharing social media, decision support systems, network-based modeling and intelligent data mining of social media for improving care, ieee journal of biomedical and health informatics, a novel data-mining approach leveraging social media to monitor consumer opinion of sitagliptin, the use of grey literature in health sciences: a preliminary survey, bulletin of the medical library association, detecting breaking news rumors of emerging topics in social media, information processing&management, predicting the future with social media, a systematic review of open government data initiatives, combining different evaluation systems on social media for measuring user satisfaction, use of social media applications for supporting new product development processes in multinational corporations, latent dirichlet allocation, journal of machine learning research, social media analytics and value creation in urban smart tourism ecosystems, information & management, detecting tension in online communities with computational twitter analysis, revealed: 50 million facebook profiles harvested for cambridge analytica in major data breach, the guardian, conceptualising electronic word of mouth activity: an input-process-output perspective, marketing intelligence and planning, 29, social media analytics: extracting and visualizing hilton hotel ratings and reviews from tripadvisor, branding and internet marketing in the age of digital media, journal of broadcasting & electronic media, an overview of business intelligence technology, communications of the acm, business intelligence and analytics: from big data to big impact, mis quarterly, predicting online product sales via online reviews, sentiments, and promotion strategies: a big data architecture and neural network approach, international journal of operations & production management, 36, grey literature in meta-analyses, nursing research, 52, monitoring trends on facebook, 2011 ieee ninth international conference on dependable, autonomic and secure computing, mining social media data for opinion polarities about electronic cigarettes, tobacco control, competing on analytics, harvard business review, exploring the impact of social media on hotel service performance: a sentimental analysis approach, cornell hospitality quarterly, topic-sensitive influencer mining in interest-based social media networks via hypergraph learning, ieee trans. multimedia, consumer perceptions of information helpfulness and determinants of purchase intention in online consumer reviews of services, social media and customer dialog management at starbucks, mis quarterly executive, one year ahead investigating the time lag between patent publication and market launch: insights from a longitudinal study in the automotive industry, r&d management, open source intelligence: an intelligence lifeline, the rusi journal, intensified sentiment analysis of customer product reviews using acoustic and textual features, etri journal, learning representations from heterogeneous network for sentiment classification of product reviews, knowledge-based systems, open data: empowering the empowered or effective data use for everyone, first monday, how do users like this feature a fine grained sentiment analysis of app reviews, 2014 ieee 22nd international requirements engineering conference (re), application of social media analytics: a case of analyzing online hotel reviews, online information review, 41, social media competitive analysis and text mining: a case study in the pizza industry, international journal of information management, 33, the time-varying nature of social media sentiments in modeling stock returns, the effect of user-controllable filters on the prediction of online hotel reviews, location-based event search in social texts, 2015 international conference on computing, networking and communications (icnc), analytics of social media data – state of characteristics and application.
The fundamental difference between social media analytics and traditional business analytics methods is that it uses near-real-time data rather than structured, historical data (Liere-Netheler et al., 2019). Existing data are primarily retrieved instead of proactively creating data (Choi et al., 2020b). Thelwall (2018) argues that the approaches of combining different types of information (e.g., comments, likes, hit counts), making extensive method triangulations and allowing a phenomenon to be studied dynamically partially compensate for the low sample validity of social web data.
The Business Intelligence impact on the financial performance of start-ups
Machine learning algorithms for social media analysis: a survey, social network data analysis to highlight privacy threats in sharing data, mining online reviews with a kansei-integrated kano model for innovative product design, a survey of 15 years of data-driven persona development.
IMAGES
VIDEO
COMMENTS
89) define social media analytics as "an emerging interdisciplinary research field that aims on combining, extending, and adapting methods for analysis of social media data". The social media analytics process involves four distinct steps, data discovery, collection, preparation, and analysis (Stieglitz et al., 2018).
In this paper, we demonstrate how big data analytics meets social media, and a comprehensive review is provided on big data analytic approaches in social networks to search published studies between 2013 and August 2020, with 74 identified papers.
However, few research articles consider the steps of social media analytics. Such frameworks take the form of process models. Fan and Gordon (2014) propose a process for social media analytics consisting of three steps "capture", "understand", and "present". The authors state that the step of capture consists of gathering the data and preprocessing it, whereas pertinent information ...
Stieglitz et al. 2014, p. 90 argue that social media analytics is "an approach toward research that involves multiple disciplines of kno wledge." The mentioned scholars
This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment ...
The spread and use of social networks provide a rich data source that can be used to answer a wide range of research questions from various disciplines. However, the nature of social media data ...
PDF | This article journal focuses on the identification, analysis, and consolidation of social media metrics into small and medium-sized enterprises... | Find, read and cite all the research you ...
The targets of research in social media analytics are the commercial platforms and their users, and the relevant data reside within the physical and legal domain of the platforms themselves. ... Participants submit papers in which they present their approach, and the organizers summarize the results in a paper. Benchmarking overview.
To qualitative researchers, social media offers a novel opportunity to harvest a massive and diverse range of content without the need for intrusive or intensive data collection procedures. However, performing a qualitative analysis across a massive social media data set is cumbersome and impractical. Instead, researchers often extract a subset of content to analyze, but a framework to ...
The findings reveal the current status of social media analytics in marketing research and identify various untapped areas for further research. This paper proposes that the impact of social media analytics is not restricted as a marketing research method; it fosters or amplifies changes in marketing approach, and structure and culture in ...
The paper explores possible reasons for quality issues in the data generated over social media platforms along with the suggestive measures to minimize them using the proposed social media data quality framework. ... a recent article on a survey of the social media analytics literature revealed that social media data is a source of high-quality ...
This paper provides a systematic review and analysis of the literature on sentiment analysis in social media. In addition to gaining a comprehensive understanding of the application of sentiment analysis to user-generated data, the paper identifies the challenges and issues in the existing sentiment analysis research.
The paper is organised as follows. We first discuss existing methodological literature in social media analytics, highlighting the shortage of methodological strategies for handling social media data. We go on to posit a (visual analytic) framework that seeks to address this, grounded in an abductive ontological perspective.
The recent advancement in internet 2.0 creates a scope to connect people worldwide using society 2.0 and web 2.0 technologies. This new era allows the consumer to directly connect with other individuals, business corporations, and the government. People are open to sharing opinions, views, and ideas on any topic in different formats out loud. This creates the opportunity to make the "Big ...
for Information Syst ems. Social Media Analytics is an emerging interdisciplinary research field that aims on. combining, extending, and adapting methods for analysis of social media data. On the ...
Abstract : Social media analytics has emerged as a powerful tool for extracting valuable insights from the vast amounts of data generated on social media platforms. This paper provides a comprehensive review of social media analytics, highlighting its importance in modern business and society.
It requires lots of manual resources, so the solution to this is applying machine learning techniques or developing social media analytics for business intelligence [117]. Social media analytics reduces the workload and improves the business by doing useful work for companies.In [118], the authors explored the importance of SM analytics. They ...
Millions of people around the world make use of smartphones to be connected to the internet and use social media. The use of social media sites like Facebook, Instagram, WhatsApp, YouTube, Snapchat, Twitter, Linked-in, Tik-Tok, etc., And social networks connect up to trillions of data are collected and marketing agents are using that data to promote their market. Well, we analyze these data ...
rapidly due to market demands and enorm ous applications. This paper presents. a comprehensive review of leading social media analytics tools available for. various social networking platforms. A ...
This special issue samples the state of the art in social media analytics and intelligence research that has direct relevance to the AI subfield from either an methodological or domain perspective. Published in: IEEE Intelligent Systems ( Volume: 25 , Issue: 6 , Nov.-Dec. 2010 ) Article #: Page (s): 13 - 16. Date of Publication: 30 December 2010.
Social media platforms can be used as a tool to expand awareness and the consideration of cultural heritage organizations and their activities in the digital world. These platforms produce daily behavioral analytical data that could be exploited by the administrators of libraries, archives and museums (LAMs) to improve users' engagement with the provided published content. There are multiple ...
The Social Media Analytics Process. Social media analytics involves a three-stage process: capture, understand, and. present. (See Figure 1). The capture stage involves obtaining relevant social ...
This paper has attempted to analyze the applicability of social media in BI research and the recent trend of the research domain. This was done by evaluating the open data of social media as input data and by a systematic review on articles that conducted social media-based BI research.
Abstract. In the few years since the advent of 'Big Data' research, social media analytics has begun to accumulate studies drawing on social media as a resource and tool for research work. Yet, there has been relatively little attention paid to the devel-opment of methodologies for handling this kind of data.