当前位置:网站首页>Creating a text cloud or label cloud in Python

Creating a text cloud or label cloud in Python

2020-11-08 18:35:00 Artificial intelligence meets pioneer

author |ISHA5
compile |Flin
source |analyticsvidhya

Introduce

From the day I started working on Data Visualization , I fell in love with it . I always like to get useful insights from data .

Before that , I only know the basic chart , For example, a bar chart , Scatter plot , Histogram, etc , These basic charts are built into tableau in , and Power BI For data visualization . By completing this task every day , I came across a lot of new charts , For example, radial dashboard , Waffle et al .

therefore , out of curiosity , Recently I was searching for all the chart types used in data visualization , These words caught my attention , I find it very interesting . Until now, , Seeing this word cloud image forces me to think that these are just random images , These words are arranged randomly , But I was wrong , And it all starts here . after , I try to use Tableau and Power BI A small amount of data in the word cloud . After a successful attempt , I want to write bar graphs , Pie chart and other chart code to try to use it .

What is the word cloud ?

Definition : A cloud is a powerful word for visualization , For text processing , It's bigger , Thicker letters and different colors show the most commonly used words . The smaller the size of the word , The less important it is .

The purpose of the tag cloud

1) Hot tags on social media (Instagram,Twitter): All over the world , Social media is looking for the latest trends , therefore , We can get the tags that people use most in their posts .

2) Hot topics in the media : Analyzing news reports , We can find keywords in the headlines , And extract before n Topics with high demand , And get the results you need , The former n A hot media theme .

3) Search terms in e-commerce : In e-commerce shopping sites , Site owners can create word clouds for the most searched items . such , He can know which goods are in great demand in a given period of time .

Let's start at python In order to realize this word cloud

First , We need to be in jupyter notebook Install all libraries in .

stay python in , We will install a built-in Library wordcloud. stay Anaconda At the command prompt , Enter the following code :

pip install wordcloud

If your anaconda Environmental support conda, Please enter :

conda install wordcloud

although , This can be done directly in notebook In itself , Just add... At the beginning of the code “!” that will do .

like this :

!pip install wordcloud

Now? , ad locum , I'm going to generate a word cloud of Wikipedia text with any subject . therefore , I will need a Wikipedia Library to access Wikipedia API, It can be done by anaconda Install at the command prompt Wikipedia To complete , As shown below :

pip install wikipedia

Now we need some other libraries , They are numpy,matplotlib and pandas.

Up to now , The library we need is installed

import wikipedia
result= wikipedia.page("MachineLearning")
final_result = result.content
print(final_result)

Machine learning Wikipedia page output :

The image above shows us by searching Wikipedia The machine learning page gets the output image of . There? , We can also see that it can scroll down , This means that the entire page will be retrieved .

ad locum , We can also get a summary of the page through the summary method , Such as :

result= wikipedia.summary("MachineLearning", sentences=5)
print(result)

Here we have the parameters of the sentence , So we can use it to retrieve a specific number of rows .

Output 5 A sentence

Let's create wordcloud

from wordcloud import WordCloud, StopWords
import matplotlib.pyplot as plt 
def plot_cloud(wordcloud):
    plt.figure(figsize=(10, 10))
    plt.imshow(wordcloud) 
    plt.axis("off");
wordcloud = WordCloud(width = 500, height = 500, background_color='pink', random_state=10).generate(final_result)
plot_cloud(wordcloud)

Stop words are words that have no meaning , for example ‘is’, ‘are’, ‘an’, ‘I’ etc. .

Wordcloud With built-in disabled Thesaurus , The library will automatically remove stop words from the text .

Interestingly , We can go through stopwords.add() Function in python To add a stop word to .

Wordcloud Method will set the width and height , I set them all to 500, The background color is set to pink . If you don't add random States , Every time you run the code , The word cloud will look different . It should be set to any int value .

From the above code , We're going to get this word cloud :

By looking at the image above , We can see that machine learning is the most commonly used word , There are other words that are often used are models , Mission , Training and data . therefore , We can come to a conclusion , Machine learning is the task of training data models .

We can also change the background color through the background color method here , And pass colormap Method to change the font color , You can also add a color hash code to the background color , however mapcolor With built-in specific colors .

Let's change the background color to cyan by using the hash code , Change the font color to blue :

from wordcloud import WordCloud, StopWords
import matplotlib.pyplot as plt
def plot_cloud(wordcloud):
    plt.figure(figsize=(10, 10))
    plt.imshow(wordcloud)
    plt.axis("off");
wordcloud = WordCloud(width = 500, height = 500, background_color='#40E0D0', colormap="ocean",  random_state=10).generate(final_result)
plot_cloud(wordcloud)

ad locum , I designated ocean, If I add some wrong color maps ,jupyter Will throw a value error , And show me the available options for the color map , As shown below :

You can also use PIL The library implements the word cloud in any image .

Endnote

In this paper , We discussed the word cloud , The definition of word cloud , Areas of application and use of jupyter notebook Of python Example .

Link to the original text :https://www.analyticsvidhya.com/blog/2020/10/word-cloud-or-tag-cloud-in-python/

Welcome to join us AI Blog station :
http://panchuang.net/

sklearn Machine learning Chinese official documents :
http://sklearn123.com/

Welcome to pay attention to pan Chuang blog resource summary station :
http://docs.panchuang.net/

版权声明
本文为[Artificial intelligence meets pioneer]所创,转载请带上原文链接,感谢