This article was co-authored by Sara Marcucci (Research and Project Management Assistant, Nesta Italia) and Delger Enkhbayar (Senior Applied Scientist, Microsoft).
This article wants to present an experimental methodology we at Nesta Italia have recently tested, with the aim to analyse new trends in the world of social innovation. More specifically, we have attempted to do that through a process of Twitter data mining that gathers all of the tweets that have been published during a specific time frame and that satisfy a certain set of criteria. In the case of this first attempt at testing the methodology and its potential, the time frame in question refers to the month of November.
Thus, the first section of this blogpost will present the methodological strategy developed. This was both quantitative and qualitative, and consisted of a first step where data was automatically mined, and then a second step where principles of digital ethnography were employed to select and interpret the data gathered.
Following the illustration of the methodology, findings are reported. More specifically, we divided the findings into three main perspectives, i.e. (1) November context, (2) problems identified in the world of social innovation, (3) solutions identified in the world of social innovation.
Finally, the article focuses on the limitations of the methodology and things we have learned during this first trial, to ultimately illustrate the areas of future research.
Our approach to make use of this data source can be summarised in four steps:
1| Cast a wide net
2| Enrich and classify
3| Filter and select
4| Interpret as a human
1| Cast a wide net
Source of information
As our source for news and up-to-date information about recent initiatives in the field of social innovation, our methodology uses Twitter.
In particular, we use the Twitter API to programmatically retrieve relevant tweets and conversation taking place on the Twitter platform by defining the parameters or “keywords” of interest for the research.
With a developer access to the Twitter API we could process up to 500,000 tweets per month, with a backward looking time window of one week at a time.
With these constraints in mind, we set out to retrieve as large a number of tweets as possible covering the main topics related to social innovation, using a weekly content retrieval routine to span the whole month of November.
To systematically retrieve content from Twitter we used a keyword-based approach, by which all tweets matching a predefined set of words and hashtags would be saved for future processing.
We wanted our keywords to span Social Innovation topics and related methodologies in order to be able to observe what activities and initiatives were taking place as well as potentially identify common threads and emerging trends.
To do this we used two different sets of keywords and retrieved tweets with a match in both:
- Social Innovation keywords: specific expressions and hashtags as “social innovation”, “social impact”, “#socialinnovation”.
- Methodology keywords: specific words and expression representing the 11 methodology categories used by the Stanford Social Innovation Review: Advocacy; Collaboration; Design thinking; Governance; Impact investing; Leadership; Measurement and evaluation; Organisational development; Philanthropy and funding; Scaling; Technology.
It must be noted that this approach only selects tweets in English, making the methodology blind to content written in any other language.
2 | Enrich and classify
In order to provide a thematic context to the tweets we assigned them a topic area using the framework of the Sustainable Development Goals (SDG) of the Agenda 2030 of the United Nations. In a nutshell, we tried to assign one or more of the 17 SDGs to every tweet based on the text of the tweet itself.
To do so we proceeded in two steps: keyword matching (again!) and machine learning classification.
First, we made the assignment using a keyword-matching approach, basically checking if a tweet contained one or more keywords associated with each of the 17 SDGs. These keywords were defined starting from the official definitions of the Goals by automatic scraping and manual handpicking.
Second, we used a machine learning classification algorithm to assign SDGs to the tweets for which no assignment was possible using the first method (unlabelled). We did this using BERT pre-trained sentence embeddings. In particular, we trained the multi-label classification algorithm on the tweets labelled with the keyword matching method (train/test split at 80/20%) and then applied it to the unlabelled tweets, only assigning SDG labels above a certain level of confidence.
3 | Filter and select
In order to filter and select the tweets to analyse for our research, we have applied a series of criteria to the initial set of data gathered through the process explained above.
More specifically, we first eliminated the duplicates of the tweets, which were in fact retweets of the original tweet. Subsequently, we decided to only include the tweets that addressed either an SDG or one of the 11 methodologies illustrated above.
Finally, since the goal of our research was to find the major trends in the world of social innovation, we only selected the tweets that had at least one like. This was done based on the assumption that those with no likes at all would not really be a good representation of the social innovation trends, and that likes are more indicative of retweets because retweets are generally less common and more hardly given – and thus that by filtering the dataset on retweets we would lose important information.
4 | Interpret as a human
Once we were able to filter and select the initial dataset, the remaining tweets have been examined on an individual basis. More specifically, we have read and analysed both the text and the links incorporated in any given tweet.
Indeed, because many tweets were in fact only partially interpretable, as they were mainly promoting an externally-linked initiative, it was necessary to investigate the context surrounding each tweet. Only this way, indeed, were we able to fully understand the meaning of every tweet. During the non-automated interpretation, many tweets were in fact deemed as non relevant based on their wider, individual meaning and context.
Indeed, this process was meant to strengthen our methodology by incorporating a qualitative approach to the quantitative one that we employed initially, so as to make as much of an informed analysis as possible.
Results & Findings
Once the tweets were gathered, we had 9,229 of them. In order for the analysis to be as precise and rigorous as possible, we proceeded with a process of selection and filtering of the data available.
Thus, we first cleaned the data set eliminating the duplicates of the tweets. Indeed, about 50% of the tweets were actually retweets that had not been picked up as such in the process of tweet mining, mainly because users had copy-pasted the text without using Twitter’s “retweet” feature. Once we deduplicated, we had 4,522 tweets left to analyse.
Using the keyword-based classifier we could label about 16% of the tweets with one or more SDGs. We then applied the machine learning classifier to the remaining tweets, using the accuracy on the test set (low, ~20%) to inform the setting the classification threshold (therefore high, ~80% ). This left us with 998 tweets, distributed across social innovation methodologies and SDGs, as depicted by the table below.
Finally, filtering out the tweets with 0 likes, we were left with 576 tweets, which were read and analysed on an individual basis.
What we found
Once we had the 576 tweets to analyse, we conducted the analysis thematising the data according to three main themes: (1) November context, (2) problems identified, (3) solutions identified. The following section will thus illustrate the findings related to each theme.
As for the analysis regarding the context of our research, we identified the tweets that were specifically relevant to the time and context in which they were tweeted. In particular, the month of November resulted as a particularly peculiar time, mainly because of the US elections.
Although the US-elections-related tweets were not that many (about 10), they were the most liked and retweeted ones. It is important to note that the small number of tweets does not indicate that the US elections were not talked about during the month of November; indeed, that only shows that the data mining and content classifying process we employed successfully filtered out the tweets that were not related to social innovation specifically.
Once the data about US elections was excluded, we also filtered out Brexit- and/or UK-politics-related tweets, which were not as relevant to our search and that seemed to come up quite a bit.
Both processes were conducted by filtering out the tweets that contained the following keywords: “Elections”; “Presidential”; “Brexit”; “Trump”; “Boris”; “Johnson”; “Cummings”; “Biden”; “Kamala”.
Furthermore, the global pandemic naturally played a crucial role in the analysis of November’s tweets. Hence, we sorted the data according to the following keywords: “Covid-19”; “coronavirus”; “Pandemic”; “social distancing”; “social distance”; “working from home”; “smart working”.
Thus, we found that the dataset contained 65 Covid-19-related tweets. The relatively small number of tweets related to the global pandemic was in fact rather surprising at first, as we know –as illustrated in one of our previous articles– that the health emergency we are going through has played and will continue to play a fundamental role in the world of social innovation.
In fact, once the tweets were individually analysed, it was easy to see how the pandemic was often implied differently, in ways that could not be detected by keywords. This process indeed proved how mixed methods that merge automated and quantitative analysis together with ethnographic, qualitative ones allow to gain a much wider and more reliable picture.
Problems identified in the world of Social Innovation
Following an analysis with respect to the month of November, useful to give context to the wider set of tweets gathered, we decided to investigate which were the major issues discussed with respect to social innovation. Thus, we identified the problems through the SDGs mentions, based on the assumption that the SDG objectives somehow imply the problems that need to be addressed (for instance, putting it plainly, SDG objective: peace / problem implied: war).
From the tweets analysed, it seems that the SDG that has been most discussed during the month of November is the 4th, aiming to achieve quality education, followed by the 8th – striving to build decent work environments and economic growth – and the 3rd – that aims to give good health and well-being to all. Given the pandemic, it seems to make sense for the 3rd goal to be among the most discussed objectives/problems.
However, it is very interesting to see how the covid-19 emergency seems to be conceptualized holistically and more as a social issue rather than simply a health-related one. Indeed, education and work conditions seem to be just as important, if not more.
Of course, the great majority of the Given how the pandemic has had fundamental consequences on both of those SDG goals, this is certainly an aspect that seems worth examining more closely in our future research.
Solutions identified in the world of Social Innovation
In addition to an analysis related to the most discussed issues regarding social innovation, we also investigated which were the most popular solutions present in our dataset. Thus, we identified those through the 11 methodologies, on the assumption that those are the tools deemed to have a way of reaching the 17 SDGs.
On the one hand, from the tweets analysed, it seems that the methodologies most addressed are Leadership and Technology. On the other hand, Measurement and evaluation, together with Philanthropy and funding and Organisational development are the least considered.
As for the former, we found that 228 tweets advocated for the need to build stronger systems of leadership. Leadership seems to have been addressed both in terms of (1) political leadership, thus often referring to the global pandemic and its management across the world (minding that, as previously specified, the gathered tweets were only in English and thus referring to a limited number of English-speaking realities), and (2) social entrepreneurial leadership.
Naturally, the latter seems to be most interesting to our research. Most interestingly, from the analysis emerged that tweets predominantly promoted leadership training programs and advocated for the need to teach young people and minority groups leadership skills to further develop and apply in the realm of social innovation and entrepreneurship. This is certainly a trend that seems to be worth continuing to examine in the long term, and in future research.
As for Technology, the tweets analysed were overwhelmingly addressing the potential technology has in giving people an opportunity to get an education, on the one hand, and the opportunities digital tools provide us with in terms of health and well-being. As we have seen in the previous section, both education and health were among the most discussed goals.
Interestingly, most tweets related to both of those themes were referring to the need to include marginalised groups and minorities in the process of social innovation and digitalisation, highlighting how the pandemic –and thus distance learning and health-related necessities– has made such need ever more urgent.
As for the least discussed methodologies, we have noted that these were (a) Measurement and evaluation, (b) Philanthropy and funding and (c) Organisational development. It is interesting to note that, although the quantitative methods here employed had them resulting as the least discussed, both (b) and (c) were in fact implicitly addressed in many of the tweets that were gathered.
Indeed, the tweets related to both Leadership and Technology did in fact address matters such as ways to help nonprofit organisations and leaders raise funds and make their workings more efficient, and to help fund raisers and donors to give their contribution more effectively.
On the other hand, however, Measurement and evaluation was indeed hardly discussed at all. In that regard, it may be interesting to read the recently published article by the Stanford Social Innovation Review: “Ten Reasons Not to Measure Impact—and What to Do Instead”.
Limitations & future research
Considering the experimental nature of this methodology, it seems worth conducting a discussion that is highly focused on the things we have learned through the process and the steps we plan on taking from here.
Limitations and things we learned
The large number of tweets gathered highlighted how research question (i.e. “What are the trends in the world of social innovation?”) seems too broad for the analysis to be sufficiently exhaustive and satisfying.
Indeed, the degree of variety between the tweets is so high that coming up with a few major trends seems to inevitably lead to sacrifice variety – this way, we might risk to lose most things/meanings, thus not being able to be as truthful to what is happening, for the sake of finding patterns and ordering an intrinsically non-ordered environment.
We believe a more precise research question will strengthen the methodology and allow for the research to be explicitly positioned and accountable for, ultimately seeking to demonstrate why particular conclusions were drawn from the study.
Future areas of research
Our findings have pointed out a series of elements that seem worth examining further. Among these, we have identified three major topics that seem align closely to Nesta Italia’s interests and thus represent important themes of future research:
- How the Covid-19 pandemic is being holistically conceptualised, and thus how the health emergency is in fact a social and welfare emergency;
- How leadership trainings are ever more needed for minorities and young people to be included in the so-called “post-pandemic world”;
- How technology in the education and health sectors highly relates to issues of social inclusion, and how the pandemic –rather than being “the great equaliser”, as many thought it would be–has exacerbated and exposed fundamental exclusions.
These themes allow us to start the process of narrowing our research question down, so as to strengthen our research. More specifically, the think the secondary research questions that give direction to the main research question could be:
What do you think? What should we further investigate? Let us know which secondary research question you think is the most important for us to get a clearer understanding of the current trends in the world of social innovation!