author |ISHA5
compile |Flin
source |analyticsvidhya
Introduce
From the day I started working on Data Visualization , I fell in love with it . I always like to get useful insights from data .
Before that , I only know the basic chart , For example, a bar chart , Scatter plot , Histogram, etc , These basic charts are built into tableau in , and Power BI For data visualization . By completing this task every day , I came across a lot of new charts , For example, radial dashboard , Waffle et al .
therefore , out of curiosity , Recently I was searching for all the chart types used in data visualization , These words caught my attention , I find it very interesting . Until now, , Seeing this word cloud image forces me to think that these are just random images , These words are arranged randomly , But I was wrong , And it all starts here . after , I try to use Tableau and Power BI A small amount of data in the word cloud . After a successful attempt , I want to write bar graphs , Pie chart and other chart code to try to use it .
What is the word cloud ?
Definition : A cloud is a powerful word for visualization , For text processing , It's bigger , Thicker letters and different colors show the most commonly used words . The smaller the size of the word , The less important it is .
The purpose of the tag cloud
1) Hot tags on social media (Instagram,Twitter): All over the world , Social media is looking for the latest trends , therefore , We can get the tags that people use most in their posts .
2) Hot topics in the media : Analyzing news reports , We can find keywords in the headlines , And extract before n Topics with high demand , And get the results you need , The former n A hot media theme .
3) Search terms in e-commerce : In e-commerce shopping sites , Site owners can create word clouds for the most searched items . such , He can know which goods are in great demand in a given period of time .
Let's start at python In order to realize this word cloud
First , We need to be in jupyter notebook Install all libraries in .
stay python in , We will install a built-in Library wordcloud. stay Anaconda At the command prompt , Enter the following code :
pip install wordcloud
If your anaconda Environmental support conda, Please enter :
conda install wordcloud
although , This can be done directly in notebook In itself , Just add... At the beginning of the code “!” that will do .
like this :
!pip install wordcloud
Now? , ad locum , I'm going to generate a word cloud of Wikipedia text with any subject . therefore , I will need a Wikipedia Library to access Wikipedia API, It can be done by anaconda Install at the command prompt Wikipedia To complete , As shown below :
pip install wikipedia
Now we need some other libraries , They are numpy,matplotlib and pandas.
Up to now , The library we need is installed
import wikipedia
result= wikipedia.page("MachineLearning")
final_result = result.content
print(final_result)
Machine learning Wikipedia page output :
The image above shows us by searching Wikipedia The machine learning page gets the output image of . There? , We can also see that it can scroll down , This means that the entire page will be retrieved .
ad locum , We can also get a summary of the page through the summary method , Such as :
result= wikipedia.summary("MachineLearning", sentences=5)
print(result)
Here we have the parameters of the sentence , So we can use it to retrieve a specific number of rows .
Output 5 A sentence
Let's create wordcloud
from wordcloud import WordCloud, StopWords
import matplotlib.pyplot as plt
def plot_cloud(wordcloud):
plt.figure(figsize=(10, 10))
plt.imshow(wordcloud)
plt.axis("off");
wordcloud = WordCloud(width = 500, height = 500, background_color='pink', random_state=10).generate(final_result)
plot_cloud(wordcloud)
Stop words are words that have no meaning , for example ‘is’, ‘are’, ‘an’, ‘I’ etc. .
Wordcloud With built-in disabled Thesaurus , The library will automatically remove stop words from the text .
Interestingly , We can go through stopwords.add() Function in python To add a stop word to .
Wordcloud Method will set the width and height , I set them all to 500, The background color is set to pink . If you don't add random States , Every time you run the code , The word cloud will look different . It should be set to any int value .
From the above code , We're going to get this word cloud :
By looking at the image above , We can see that machine learning is the most commonly used word , There are other words that are often used are models , Mission , Training and data . therefore , We can come to a conclusion , Machine learning is the task of training data models .
We can also change the background color through the background color method here , And pass colormap Method to change the font color , You can also add a color hash code to the background color , however mapcolor With built-in specific colors .
Let's change the background color to cyan by using the hash code , Change the font color to blue :
from wordcloud import WordCloud, StopWords
import matplotlib.pyplot as plt
def plot_cloud(wordcloud):
plt.figure(figsize=(10, 10))
plt.imshow(wordcloud)
plt.axis("off");
wordcloud = WordCloud(width = 500, height = 500, background_color='#40E0D0', colormap="ocean", random_state=10).generate(final_result)
plot_cloud(wordcloud)
ad locum , I designated ocean, If I add some wrong color maps ,jupyter Will throw a value error , And show me the available options for the color map , As shown below :
You can also use PIL The library implements the word cloud in any image .
Endnote
In this paper , We discussed the word cloud , The definition of word cloud , Areas of application and use of jupyter notebook Of python Example .
Link to the original text :https://www.analyticsvidhya.c...
Welcome to join us AI Blog station :
http://panchuang.net/
sklearn Machine learning Chinese official documents :
http://sklearn123.com/
Welcome to pay attention to pan Chuang blog resource summary station :
http://docs.panchuang.net/