How to Create a Word Cloud: Program in Python, R and JavaScript

word cloud

Word clouds have seen an unprecedented popularity in the recent past, and for that reason, there are many word cloud generators out there in the wild that offer very sophisticated GUI and let you create jazzy word clouds. You can supply the text and do configuration around style, size, color, shape, output format and much more.

If you are not a programmer and simply looking for ready to use online word cloud tools, then we have this article that lists the 10 best word cloud generators to create word clouds of all types – best word cloud generator.

Read on if you are a programmer and plan to build your own Python word cloud, JavaScript word cloud or word cloud in any other programming language.

We will look at various existing packages in Python, R, JavaScript and Java that can be used, out of the box, to create word clouds, we are also going to talk about how a word cloud tool can be built, grounds up, from scratch. The key focus would be on the concepts only, you do the coding in the programming language of your choice, we will refer you to the right tutorials.

How to Create Word Cloud in Python, JavaScript and other programming languages?

So, you have decided to create a word cloud or rather develop a piece of code to create word clouds, let us start off with the basic steps to follow before we get into available packages and code in R, Python and JavaScript.

If you want to build a complete GUI to generate word clouds, then keep fields on the GUI to grab the configuration details related to each of the below steps, and/or for any more features that you would like to add –

You can skip the below steps and go straight to the Python word cloud package or JavaScript word cloud library, there is a word cloud module in R as well that can be used.

  1. Load the text file into your program from a local machine or from a web URL (For GUI, this translates to Select a text file or provide a URL Reference etc.)
  2. Convert text into structured data, in R you would load data as a corpus
  3. Remove the special characters from the text and replace simply with spaces
  4. Remove stop words, change case of the words, remove white space (you can give a configuration on the GUI, if planning to create GUI based tool)
  5. Remove numbers and punctuation from the words
  6. You might also want to change words into its root form, for example “removed” and “removing” can be counted as “remove” in the word cloud, just an example. In the world of text mining, you call it text stemming and there are out of the box packages in Python and R to do so, so no need to worry.
  7. Remove any specific words that you do not want to include or replace with other words. (on the GUI, ask for word replacements from the user)
  8. Figure out the weight of the words, it would be the count of occurrence of a word in the supplied text. Sort in descending order.
  9. So, we have the words and the count of occurrence of each word in some sort of a collection, Map or whatever you want to call it in your programming language.
  10. Assign font size proportional to the count of occurrence of the word, randomly assign colours as well to the words.
  11. Then finally, you start drawing the words avoiding collisions, with centre as the starting point to place the top word and then keep spiralling around with rest of the words. Converting each word into a 2D shape is what works the best.
  12. Then finally add some spice, link every placed word to a target URL if you want to, like to a dictionary, or thesaurus website etc. etc.
  13. You can bring in variations in terms of just horizontal placement, vertical placements, at specific angle and whatever you want. The target should be the next drawn word doesn’t collide with any other word and must be at some minimum distance from the adjacent words (with whatever padding/spacing you want between the words). This is the most difficult part in the whole game, do a bit of google and see how can it be done easily.

So here is the deal, you may not necessarily follow all the above-mentioned steps to create word clouds, there are packages/libraries available that would do some or more of these, let us see what existing code bundles offer, out of the box, in various programming languages –

Python Word Cloud

When it comes to creating word clouds using Python, “word_cloud” is the name of the package and you can install it using pip, or use anaconda cloud or can download the package from GitHub and install manually.

This package is created by Andreas Mueller and is available free to use under MIT licenses. Not to mention, this is the most sophisticated word cloud package available in Python and is used heavily by developers and students. The popularity of this Python word cloud package can be gauged from the fact that Reddit cloud is built using this package. Reddit cloud generates word clouds from the comments and user histories.

There also exists a twitter word cloud bot built using this package and generates clouds for twitter users. You need to mention the twitter user with specific hashtag to generate the word cloud. Check below image.

There are many examples available on the official website you can check those here (word cloud examples), and if needed, download the source code of these fully functional word cloud examples in Python.

Read more and download - Python word cloud package on Github

You might also like - Python Tutorials - PDF, Online and Interactive

Word Cloud R

If R is your game then “wordcloud” is the main package that can be used for creating word clouds in R programming language. The package depends on “RColorBrewer” and “methods”. You would need to use few other packages like tm (for text mining) and snowball for text stemming etc., to ease out data handling tasks and to make things easier.

The function takes below arguments to plot the word cloud, you can check out the documentation to get more details on type and “how to” etc.

  • words: the corpus of words to be plotted
  • freq: frequency of words, count of occurrence
  • min.freq : minimum count required to plot the word
  • max.words : maximum number of words to be plotted
  • random.order : to plot in decreasing frequency or in random order
  • rot.per: rotate by 90 degrees, vertical text alignment
  • colors: list of colors, least frequent to most frequent used words take up each color. Or simple supply only one color to be used for all words.

Given below is the usage example for quick reference -

wordcloud(words,freq,scale=c(4,.5),min.freq=3,max.words=Inf,

            random.order=TRUE, random.color=FALSE, rot.per=.1,

            colors="black",ordered.colors=FALSE,use.r.layout=FALSE,

            fixed.asp=TRUE, ...)

Here is a simple step by step tutorial to create word cloud in R – Word Cloud R.

JavaScript Word Cloud

When it comes to JavaScript, there are many options to create word clouds but the most popular is the one from Jason Davis. The JavaScript library is inspired by Wordle Creator and utilizes sprite masks and HTML5 canvas for real time interactivity. For plotting the words, it utilizes D3 library as well.

Check out the Word Cloud JavaScript Library here at GitHub – Jason Davis Word Cloud

If you are looking for D3 word clouds, then there are ample examples out there that utilize D3 library to create stunning word clouds that are flexible as well as responsive. Check out some of those below –

D3 Word Cloud implementation – This is the fully functional code of the word cloud built using D3, the word cloud also utilizes the JavaScript word cloud library created by Jason Davis.

D3.js Responsive word cloud – This is another good word cloud implementation created by Julien Renaux and is the simplified implementation of Jason Davis’s library, it takes only two arguments and makes creation of word clouds much easier. So, you get a module created D, JavaScript and SVG.

Ok, so we have one for Angular lovers as well, you can download this npm package that utilizes D3, Angular and JavaScript, the layouts are used from the Davis word clouds only.

Java Word Cloud Generator

The most famous word cloud of all times, the Wordle, is created using java language and runs as java applet in the browser. Though Wordle is not being updated since long and is dying out, you can do a quick search to figure out the source code of Wordle.

What I am going to share here is another word cloud implementation in Java, named Kumo, this one is created by Kenny Kason and takes a different approach and returns an image file of the word cloud instead of rendering it in a java applet.

Key features of this word cloud include variable font sizes, word rotation by providing start angle, end angle and number slices, use of custom color pallettes and background images, polar word clouds, option to overlay multiple word clouds, tokenizer for Chinese words and much more.

Check out Java Word Cloud here at – Kumo at Github.

Conclusion

Word clouds used to be very popular just a couple of years back, they are still used heavily in education by students and teachers, by researches for qualitative data analysis, by marketers for bringing key focus points to surface, by journalists, social media analysts etc. and many more.

The biggest advantage of word clouds is that they are easy to use and effective for visual communication and engaging the users. One can quickly understand the keywords by looking at the word clouds without the need to look at numbers and applying iota of their brains.

Though there are many ready to use word cloud generators out there in the wild but programmer always try to build in something new, with some more features, to showcase creativity and innovation. I hope you liked the article, do share your experience with using the word cloud libraries and packages, via comments!

Further Reading

- Best Python Editor

- Python online Compiler

- Best Programming Books

About The Author: noeticsunil

Sunil is the founder and contributing editor at noeticforce.com. He writes about anything and everything that makes modern mobile apps, web apps and websites possible. Passionate about coding in any language including Python, Swift, JavaScript, PHP, Java, Android & iOS dev, not excluding CSS/HTML. 

If you like this article, you can connect with noeticforce on Twitter or subscribe to noeticforce feed via RSS.