Word clouds have seen an unprecedented popularity in the recent past, and for that reason, there are many word cloud generators out there in the wild that offer very sophisticated GUI and let you create jazzy word clouds. You can supply the text and do configuration around style, size, color, shape, output format and much more.
If you are not a programmer and simply looking for ready to use online word cloud tools, then we have this article that lists the 10 best word cloud generators to create word clouds of all types – best word cloud generator.
If you want to build a complete GUI to generate word clouds, then keep fields on the GUI to grab the configuration details related to each of the below steps, and/or for any more features that you would like to add –
- Load the text file into your program from a local machine or from a web URL (For GUI, this translates to Select a text file or provide a URL Reference etc.)
- Convert text into structured data, in R you would load data as a corpus
- Remove the special characters from the text and replace simply with spaces
- Remove stop words, change case of the words, remove white space (you can give a configuration on the GUI, if planning to create GUI based tool)
- Remove numbers and punctuation from the words
- You might also want to change words into its root form, for example “removed” and “removing” can be counted as “remove” in the word cloud, just an example. In the world of text mining, you call it text stemming and there are out of the box packages in Python and R to do so, so no need to worry.
- Remove any specific words that you do not want to include or replace with other words. (on the GUI, ask for word replacements from the user)
- Figure out the weight of the words, it would be the count of occurrence of a word in the supplied text. Sort in descending order.
- So, we have the words and the count of occurrence of each word in some sort of a collection, Map or whatever you want to call it in your programming language.
- Assign font size proportional to the count of occurrence of the word, randomly assign colours as well to the words.
- Then finally, you start drawing the words avoiding collisions, with centre as the starting point to place the top word and then keep spiralling around with rest of the words. Converting each word into a 2D shape is what works the best.
- Then finally add some spice, link every placed word to a target URL if you want to, like to a dictionary, or thesaurus website etc. etc.
- You can bring in variations in terms of just horizontal placement, vertical placements, at specific angle and whatever you want. The target should be the next drawn word doesn’t collide with any other word and must be at some minimum distance from the adjacent words (with whatever padding/spacing you want between the words). This is the most difficult part in the whole game, do a bit of google and see how can it be done easily.
So here is the deal, you may not necessarily follow all the above-mentioned steps to create word clouds, there are packages/libraries available that would do some or more of these, let us see what existing code bundles offer, out of the box, in various programming languages –
Python Word Cloud
When it comes to creating word clouds using Python, “word_cloud” is the name of the package and you can install it using pip, or use anaconda cloud or can download the package from GitHub and install manually.
This package is created by Andreas Mueller and is available free to use under MIT licenses. Not to mention, this is the most sophisticated word cloud package available in Python and is used heavily by developers and students. The popularity of this Python word cloud package can be gauged from the fact that Reddit cloud is built using this package. Reddit cloud generates word clouds from the comments and user histories.
There also exists a twitter word cloud bot built using this package and generates clouds for twitter users. You need to mention the twitter user with specific hashtag to generate the word cloud. Check below image.
There are many examples available on the official website you can check those here (word cloud examples), and if needed, download the source code of these fully functional word cloud examples in Python.
Read more and download - Python word cloud package on Github
You might also like - Python Tutorials - PDF, Online and Interactive
Word Cloud R
If R is your game then “wordcloud” is the main package that can be used for creating word clouds in R programming language. The package depends on “RColorBrewer” and “methods”. You would need to use few other packages like tm (for text mining) and snowball for text stemming etc., to ease out data handling tasks and to make things easier.
The function takes below arguments to plot the word cloud, you can check out the documentation to get more details on type and “how to” etc.
- words: the corpus of words to be plotted
- freq: frequency of words, count of occurrence
- min.freq : minimum count required to plot the word
- max.words : maximum number of words to be plotted
- random.order : to plot in decreasing frequency or in random order
- rot.per: rotate by 90 degrees, vertical text alignment
- colors: list of colors, least frequent to most frequent used words take up each color. Or simple supply only one color to be used for all words.
Given below is the usage example for quick reference -
random.order=TRUE, random.color=FALSE, rot.per=.1,
Here is a simple step by step tutorial to create word cloud in R – Word Cloud R.
If you are looking for D3 word clouds, then there are ample examples out there that utilize D3 library to create stunning word clouds that are flexible as well as responsive. Check out some of those below –
Java Word Cloud Generator
The most famous word cloud of all times, the Wordle, is created using java language and runs as java applet in the browser. Though Wordle is not being updated since long and is dying out, you can do a quick search to figure out the source code of Wordle.
What I am going to share here is another word cloud implementation in Java, named Kumo, this one is created by Kenny Kason and takes a different approach and returns an image file of the word cloud instead of rendering it in a java applet.
Key features of this word cloud include variable font sizes, word rotation by providing start angle, end angle and number slices, use of custom color pallettes and background images, polar word clouds, option to overlay multiple word clouds, tokenizer for Chinese words and much more.
Check out Java Word Cloud here at – Kumo at Github.
Word clouds used to be very popular just a couple of years back, they are still used heavily in education by students and teachers, by researches for qualitative data analysis, by marketers for bringing key focus points to surface, by journalists, social media analysts etc. and many more.
The biggest advantage of word clouds is that they are easy to use and effective for visual communication and engaging the users. One can quickly understand the keywords by looking at the word clouds without the need to look at numbers and applying iota of their brains.
Though there are many ready to use word cloud generators out there in the wild but programmer always try to build in something new, with some more features, to showcase creativity and innovation. I hope you liked the article, do share your experience with using the word cloud libraries and packages, via comments!