In my job, there is always an exploratory phase in which you need to get an idea of the topic as quickly as possible. It is not always possible to have enough time to build a good Python notebook to capture and analyze data. It is for these occasions that no-code tools can help in the preliminary phase … or beyond.
Let’s take an example: we want to see what kind of keywords appear when we do a first search in the field of Artificial Intelligence, Machine Learning, Deep Learning, etc. It would also be interesting to know if there are any communities or communities around these issues. Similarly, we could also follow them to analyze them in greater detail.
That is, we need to connect to the Twitter API, generate a query with those initial keywords, capture the information and separate it into fields. In addition, we need to pre-process it to be able to analyze clusters, the most relevant keywords or detect the most relevant topics. And this, just for the textual part of the problem. If we also want to analyze the network formed by the profiles that they publish on these topics, we will have to identify the nodes and vertices together with their most relevant attributes.
This is where the no-code exploratory tools come to our help. In some other post, I had already named one of the easiest for me: Orange3. In this case, I will also use Gephi, a historical reference in graph analysis. let me give you a hint of how they can do it. As a result, I will also leave the workflows available for download. Of course, you will have to have a Twitter developer account to be able to include your API Keys. Let’s start with the setup:
- Download Orange3 and install TextMining add-on in the Options menu. You can start exploring Twitter Data Analysis with their example workflow. After that, we can also add a branch for clustering analysis.
- Download Gephi and install the TwitterStreamingImporter plug-in. If you need some help, you can watch this tutorial on Youtube.
- Setup the import plug-ins with your dev account details to connect with Twitter and start gathering data. In Orange3, be sure to select the correct language in both widgets: Twitter and pre-process text for deleting the correct stop-words. In Gephi’s case, it’s just as simple. Anyway, I leave you here the official tutorial. I have used «Bernadamus Projection» as Network Logic.
Analysis and results
For the graph network, since the importer is via streaming, I kept it active for 4 hours. I got a network of 4,450 nodes and 31,185 vertices. Applying the Modularity Analysis of the network, the following image shows two of the closest detected classes. If we take a closer look at the hashtags they represent, we can see that one of them (the orange one) includes «digital transformation» while blue and red ones talk more about «typical» keywords like #artificialintelligence, #machinelearning or #deeplearning but also #datascience, #bigdata (no surprise) or #iot and #hyperautomation (interesting… 🧐).
If we now use Orange3’s natural language capabilities, we can study the clusters based on the distances of the different twetts as well as identify the most relevant topics using Topic Modeling and LDA.
Conclusion or next steps
As you can see, the time invested is minimal compared to what it would have cost if it had been programmed from scratch.
Of course, we do not enjoy all the power we would have if we did it on our own but for a preliminary analysis it gives more than acceptable results. In addition, little by little workflows are generated to which only the input parameters have to be changed.
And not enough with that, if you want to generate your own widgets, Orange3 allows you to make and deploy them to customize it to your liking. It is not KNIME but it is completely free in addition to OpenSource.