当前位置:网站首页>Inventory of common tools used by dry goods | data journalists
Inventory of common tools used by dry goods | data journalists
2022-06-24 04:54:00 【Octopus big data】
The coming of big data era , It has brought obvious changes to all aspects of people's life , And the data news generated around the data , It has become a new carrier , With the description it has 、 Judge 、 Prediction and other functions bring convenience and quickness to the majority of readers .
But the production of data news also brings higher requirements to the news team , Not just writing 、 survey 、 Interpreting data 、 Basic professional abilities such as drawing , Also learn to work with programmers 、 Data analysts work closely with web developers . If you can flexibly use tools to deal with work , Many problems can be solved easily .
Small eight Cong Data collection 、 Data analysis 、 Data visualization Three aspects have sorted out some tools commonly used by data journalists , Collect it quickly !
01. Data acquisition tools
Data collection (Data Scraping) Also known as data capture or web page capture , Is the use of computer programs to collect text and data from web pages , And organize it into a format convenient for analysis . The more common method is to use R Language or Python To write “ Reptiles ” Program , besides , The existing acquisition software can also be used , It can collect the required web page data without programming foundation .
1. Octopus collector
Octopus collector is a very suitable collector for novices . It has the characteristics of simplicity and ease of use , So you can get started quickly in a few minutes . To make it easier to use , Octopus is ready for beginners “ Easy template for website ”, Covering most of the mainstream websites on the market . Using simple templates , Users can collect data without task configuration . If you want to grab websites without templates , The official website also provides very detailed graphic and video tutorials . Besides , You can also set the timing of Cloud Collection , Real time access to dynamic data and regularly export data to the database or any third-party platform .
2. Scrapinghub
If you want to capture foreign website data , You can consider Scrapinghub.Scrapinghub It's based on Python Of Scrapy Framework of the cloud crawler platform .Scrapehub It is a very complex and powerful network capture platform in the market , A solution provider that provides data capture .
3.WebScraper
WebScraper Is an excellent foreign browser plug-in . It is also a visual tool suitable for novices to grab data . We simply set some capture rules , The rest is left to the browser to work .
4. Import.io
Import.io Is a web based data capture tool . It's on 2012 It was first launched in London in . Now? ,Import.io Take its business model from B2C Turned to B2B.2019 year ,Import.io Acquired Connotate And become a web data integration platform . With a wide range of Web data services ,Import.io It's a great choice for business analysis .
5. Parsehub
Parsehub Is a web based crawler program , Support collection using AJax, JavaScripts Web data of Technology , It also supports data collection of web pages that need to be logged in . It has a one week free trial feature .
6. Mozenda
Mozenda It's a web crawler , It can also provide customized services for commercial data capture . Users can grab data from the cloud and local software and host the data .
02. Data analysis tools
1. Excel
Despite all these years ,Excel Still a classic tool for processing data . Today, when all kinds of advanced data analysis software are popular , Most data analysis projects can still be used Excel solve , And it's easier to learn . Like summarizing data 、 Visualization data 、 Data cleaning and other important functions ,Excel Can support . No matter how many data analysis tools you know ,Excel Be familiar with the . For simple logical analysis and small data sets ,Excel It can fully meet the requirements of data cleaning , meanwhile Excel You can also use classification 、 clustering 、 Association and prediction algorithms are used to realize simple data mining .
2. Tableau Public
Tableau It's an interactive data visualization tool . Rich visualization Library , It's easy to operate . Unlike most visualization tools that require scripting ,Tableau It's easy for beginners to use it . Like a huge PivotTable , There is an interactive visual dashboard , Drag and drop data fields for data analysis in a visual way . They also have one “ Starter Kit ” And rich training materials , Help users create more analysis reports .
3. Power BI
Power BI It's a set of business analysis tools , Used to provide insights in the organization . Can connect hundreds of data sources 、 Simplify data preparation and provide ad hoc analysis . Generate beautiful reports and publish them , For organizations in Web And mobile devices . Everyone can create a personalized dashboard , Get a comprehensive and unique insight into their business . Expand within the enterprise , Built in management and security .
4. FineBI
FineBI It is a new generation of business intelligence products for self-service big data analysis , Provides data preparation from 、 Self service data processing 、 Data analysis and mining 、 A complete solution for data visualization .FineBI The feeling of using is the same as Tableau similar , Both advocate visual exploratory analysis , It's a bit like the enhanced PivotTable . Easy to get started , Rich visualization Library . Can act as a portal for data reports , It can also serve as a platform for business analysis .
5. Qlikview
Qlikview It is one of the most popular tools in the field of business intelligence in the world , It has excellent data analysis and visualization functions , And it's easy to operate . At the data processing level , By clicking , You can easily delete duplicate lines 、 Empty replace 、 Data tailoring 、 Data desensitization 、 Type conversion, etc .QlikView Allow users to browse data with one click , The system automatically matches the most appropriate graphic display database data , Help users preliminarily understand the data rules , Secondary analysis can also be carried out on the basis of digital portrait . The types of charts are rich , All charts can be linked without any settings , You can also select some charts to participate in linkage drilling . It also supports one click selection of statistical methods .
6. Trifacta
Trifacta The data collation tool innovates the traditional data cleaning method , therefore Excel Data processing is sometimes limited by the size of the data , and Trifacta There is no such concern , It can be safely and boldly used to deal with super large data sets . in addition , Like chart recommendation 、 built-in “ Open the box ” The algorithm of 、 Analytical insights and other functions , Can make it very convenient for you to generate data analysis reports .
7. Rapid Miner
This tool is more than just a data cleaning tool , It can also be used to create machine learning models , It integrates all commonly used machine learning algorithms . In terms of data analysis ,Rapider Miner Provide light and fast analysis function , And big data 、 visualization 、 Model deployment, etc . If the business involves loading from data 、 cleaning 、 From analysis to model building and deployment ,Rapider Miner It can definitely help .
8. Weka
Weka One advantage of is that it's easy to get started , The interface is very intuitive . It provides data preprocessing 、 data classification 、 Data regression 、 Data clustering and visualization . first Weka It is a tool designed by Waikato University in New Zealand for research purposes , But now more and more professionals are using it .
9. Data Preparator
This tool allows us to complete data mining 、 Data cleaning and data analysis , Built in a variety of toolkits , Can handle discretization 、 Numerical calculation 、 Data scaling 、 Attribute selection 、 Missing value 、 outliers 、 Statistics 、 Sampling, etc . A special benefit of this tool is that the data set used for data analysis does not occupy computer memory , So you won't encounter memory problems when dealing with large data sets .
10. DataCracker
Data analysis software dedicated to processing research data . Now many companies collect research data , Data research is also an indispensable step in data news , And the survey data need to be cleaned up , There are a lot of missing values and outliers .DataCracker It can help us quickly clean up and analyze the research data . It can also load data from many mainstream research projects .
03. Data visualization tool
1、Pyecharts
Python Is slowly becoming data analysis 、 One of the mainstream languages in the field of data mining . stay Python In the ecology of , Many developers provide a very rich 、 Data visualization third-party library for various scenarios . These third-party libraries allow us to combine Python Language draws beautiful charts .Echarts( As we'll see ) It's open source and free javascript Data visualization Library , It allows us to easily draw professional business data charts . When Python I met Echarts,pyecharts It was born. , It is from chenjiandongx Wait for a group of developers to maintain Echarts Python Interface , So we can go through Python Language draws all kinds of Echarts Chart .
2、Bokeh
Bokeh It's based on Python Interactive data visualization tool , It provides an elegant and concise way to draw a variety of graphics , It can visualize large data sets and stream data with high performance , Help us make interactive charts 、 Visual dashboard, etc .
3、Echarts
Echarts It's open source and free javascript Data visualization Library , It allows us to easily draw professional business data charts . Baidu migration, previously reported on a large scale 、 Sinan Baidu 、 Baidu big data forecast, etc , The data visualization of these products is through ECharts To achieve .
4、D3
D3(Data Driven Documents) It's supporting SVG Another kind of rendering JavaScript library . however D3 Can provide a large number of complex chart styles in addition to linear charts and bar charts , for example Voronoi chart 、 Tree diagram 、 Circular clusters and word clouds, etc .
5. CartoDB
CartoDB Is an interactive map making tool , Provide “ One click mapping ” function , After uploading data, a series of map formats will be automatically recommended for users to select and modify , Convenient and practical , Suitable for people who lack programming foundation and want to try Visualization . The program was originally developed by two Spanish scientists studying biodiversity and nature conservation , So far, it has more than 12 Million users , Especially loved by data journalists .
6. Google Fusion
Fusion Tables It belongs to Google Drive An application in the product , It is a drawing tool with complex functions , Apply to CSV and Excel And other common data formats . Mapping , One of its characteristics is the ability to integrate different data sets , And the function of geographic information coding is also very prominent . Record geographic information KML(Keyhole Markup Language) Is its common format .
7. TimelineJS
TimelineJS Used to make news events timeline , It is a free and open source visualization tool , At present, we support 40 Languages . You need to use Google Spreadsheet Prepare a form according to the format requirements , Copy table links to TimelineJS, Then you can automatically generate a timeline .
8. Infogram
Infogram It's an intuitive visualization tool , Can help you create beautiful information charts and reports . It provides more than 35 An interactive chart and 500 Multiple maps , Help you visualize data . Except for all kinds of charts , And then there's the histogram 、 Bar chart 、 Pie chart or word cloud, etc .
9. BDP Personal Edition
BDP Personal edition is a free online data visualization analysis tool , Don't need to download , From data access integration , To data processing 、 analysis 、 mining , Then to multi terminal visualization , Help users greatly improve the efficiency of data analysis , By simply dragging and dropping fields , Present all kinds of exquisite visual charts .
10. Dysprosium number chart
Dy number chart is a powerful free online data visualization tool , Input data to generate a visual picture with one click , Web interaction chart , Dynamic data map 、 Vector chart and information chart support, including word cloud chart , Sanguitu , Rose chart , River Map , Radar chart, etc 110 A variety of chart types ; Provide thousands of visual templates , Content creation 、 Media operation 、 Marketing posters 、 market research 、 Thesis writing 、 Job summary 、 The visual design of personal resume and other scenes can be easily done in dysprosium number .
边栏推荐
- How to build an ECS and how to control the server through the local host
- 重新认识WorkPlus,不止IM即时通讯,是企业移动应用管理专家
- What does IIS mean and what is its function? How does IIS set the size of the web site space on the server?
- Abnova多肽设计和合成解决方案
- The trunk warehouse can also be tob, and Tencent cloud microenterprises do not leave quality behind
- Replication of variables in golang concurrency
- How RedHat 8 checks whether the port is connected
- Network timeout configuration method when PR and push are proposed
- Abnova fluorescence in situ hybridization (FISH) probe solution
- oracle数据库提示无操作权限的问题
猜你喜欢

SAP MTS/ATO/MTO/ETO专题之七:ATO模式1 M+M模式策略用82(6892)

Introduction to gradient descent method - black horse programmer machine learning handout

SAP mts/ato/mto/eto topic 10: ETO mode q+ empty mode unvalued inventory policy customization

C语言自定义类型的介绍(结构体,枚举,联合体,位段)
uni-app进阶之认证【day12】

MySQL - SQL execution process

Apipost interface assertion details

Analysis on the subjective enthusiasm of post-90s makers' Education

少儿编程课程改革后的培养方式

Analyzing the superiority of humanoid robot in the post human era
随机推荐
Powerbi - for you who are learning
Oracle database prompts no operation permission
MySQL - SQL execution process
C语言自定义类型的介绍(结构体,枚举,联合体,位段)
How to control CDN traffic gracefully in cloud development?
What is the new generation cloud computing architecture cipu of Alibaba cloud?
Customer disaster recovery case - a MySQL database migration scheme
Network timeout configuration method when PR and push are proposed
Detailed explanation of tcpip protocol
Chemical properties and specificity of Worthington Papain
Analyzing the superiority of humanoid robot in the post human era
大一下学期期末总结(补充知识漏洞)
Weak current engineer, 25g Ethernet and 40g Ethernet: which do you choose?
How to use and apply for ECS? What parameters can be configured
Pg-pool-ii read / write separation experience
How RedHat 8 checks whether the port is connected
Find the current index of gbase 8C database?
How does ECS select bandwidth? What types of servers do you usually have?
How to restart the ECS? What are the differences between ECS restart and normal computers?
An interface testing software that supports offline document sharing in the Intranet