One 、 Write it at the front
Let's start with the background of this article , This is a story that the author met Node Back end interview questions , So I recorded it , Through the reading of this article , You will know something about the knowledge downstairs :
- midway Project creation and use
- typescript stay Node Application in the project
- How to base on Node Oneself API Encapsulation request
- cheerio Application in the project
- Application of regular expressions in projects
- unit testing
Two 、midway Project creation and use
First step : Enter the command **npm init midway** initialization midway project
The second step : choice **koa-v3 - A web application boilerplate with midway v3(koa)**, Press enter
* www npm init midway
npx: installed 1 in 4.755s
? Hello, traveller.
Which template do you like? …
⊙ v3
▸ koa-v3 - A web application boilerplate with midway v3(koa)
egg-v3 - A web application boilerplate with midway v3(egg)
faas-v3 - A serverless application boilerplate with midway v3(faas)
component-v3 - A midway component boilerplate for v3
⊙ v2
web - A web application boilerplate with midway and Egg.js
koa - A web application boilerplate with midway and koa
The third step : Enter the name of the project you want to create , for example **“midway-project”****, ****What name would you like to use for the new project? ‣ midway-project**
Step four : Just follow the prompts , Separately **cd midway-project** and **npm run dev**, At this time, if you don't have special settings , open **http://localhost:7001** And you can see the effect
* www npm init midway
npx: installed 1 in 4.755s
Hello, traveller.
Which template do you like? · koa-v3 - A web application boilerplate with midway v3(koa)
What name would you like to use for the new project? · midway-project
Successfully created project midway-project
Get started with the following commands:
$ cd midway-project
$ npm run dev
Thanks for using Midway
Document Star: https://github.com/midwayjs/midway
╭────────────────────────────────────────────────────────────────╮
│ │
│ New major version of npm available! 6.14.15 → 8.12.1 │
│ Changelog: https://github.com/npm/cli/releases/tag/v8.12.1 │
│ Run npm install -g npm to update! │
│ │
╰────────────────────────────────────────────────────────────────╯
* www
The specific official website has been written in great detail , I won't repeat , See :
3、 ... and 、 How to capture the content of Baidu homepage
3.1、 be based on node Oneself API Encapsulation request
stay node.js Of https The module has related get The request method can get the page elements , For details, see :, I sealed it up
import { get } from 'https';
async function getPage(url = 'https://www.baidu.com/'): Promise<string> {
let data = '';
return new Promise((resolve, reject) => {
get(url, res => {
res.on('data', chunk => {
data += chunk;
});
res.on('error', err => reject(err));
res.on('end', () => {
resolve(data);
});
});
});
}
forehead , You have to test this method , stay node Environment , In fact, it's very simple , Write it like this
(async () => {
const ret = await getPage();
console.log('ret:', ret);
})();
Four 、 How to get the attribute of the corresponding label element
The title is , Get from HTML Source code text , It is concluded that id=lg Of div In the label img label , And back here img On the label src Property value
4.1、cheerio A shuttle
If you miss JQuery Time , So you can actually learn cheerio This library , It has this JQuery Allied API ------ Specially tailored for the server , Fast 、 flexible 、 Implemented jQuery The core to realize . See :,github The address is :
After knowing the knowledge points upstairs , That's actually very simple , Modulation API Produce results . The following code block means , obtain id by lg Of div label , Get its child tags img label , And then it calls ES6 Higher order functions of arrays in map, This is an idempotent function , It will return data with the same data structure as the input , Last call get Get it and string it .
@Get('/useCheerio')
async useCheerio(): Promise<IPackResp<IHomeData>> {
const ret = await getPage();
const $ = load(ret);
const imgSrc = $('div[id=lg]')
.children('img')
.map(function () {
return $(this).attr('src');
})
.get()
.join(',');
return packResp({ func: 'useCheerio', imgSrc });
}
4.2、 Regularize a shuttle
See a lot of strings , Um. , Regular is also the answer that should be thought of . The author's regularity is not very good , There is no one-step rule here , Write the match first id by lg Of div Regular , Then further match the corresponding img Labeled src attribute , Yes , One step is not enough , Let's take two steps , The final result is the same as taking one step .
@Get('/useRegExp')
async useRegExp(): Promise<IPackResp<IHomeData>> {
const ret = await getPage();
// matching id by lg Of div Regular
const reDivLg = /(?<=<div.*?id="lg".*?>)(.*?)(?=<\/div>)/gi;
// matching img Labeled src attribute
const reSrc = /<img.*?src="(.*?)".*?\/?>/i;
const imgSrc = ret.match(reDivLg)[0].match(reSrc)[1];
return packResp({ func: 'useRegExp', imgSrc });
}
5、 ... and 、 unit testing
The two test points to be implemented here are ,1、 If the interface request time exceeds 1 Second , be Assert Assertion failed , 2、 If the interface return value is not equal to "//www.baidu.com/img/bd_logo1.png", be Assert Assertion failed
midway Integrated jest Unit test , The official website has written in great detail , See :
About 1 Second thing , We can calculate the timestamp of the request , The details are as follows :
const startTime = Date.now();
// make request
const result: any = await createHttpRequest(app).get('/useRegExp');
const cost = Date.now() - startTime;
Just make a final assertion expect(cost).toBeLessThanOrEqual(1000);
The final code is as follows :
it.only('should GET /useRegExp', async () => {
const startTime = Date.now();
// make request
const result: any = await createHttpRequest(app).get('/useRegExp');
const cost = Date.now() - startTime;
// 2. If the interface request time exceeds 1 Second , be Assert Assertion failed
const {
data: { imgSrc },
} = result.body as IPackResp<IHomeData>;
expect(imgSrc).not.toBe('//www.baidu.com/img/bd_logo1.png');
notDeepStrictEqual(imgSrc, '//www.baidu.com/img/bd_logo1.png');
expect(cost).toBeLessThanOrEqual(1000);
expect(imgSrc).toBe('//www.baidu.com/img/flexible/logo/pc/index.png');
deepStrictEqual(imgSrc, '//www.baidu.com/img/flexible/logo/pc/index.png');
});
it.only('should GET /useCheerio', async () => {
const startTime = Date.now();
// make request
const result: any = await createHttpRequest(app).get('/useCheerio');
const cost = Date.now() - startTime;
const {
data: { imgSrc },
} = result.body as IPackResp<IHomeData>;
expect(imgSrc).not.toBe('//www.baidu.com/img/bd_logo1.png');
notDeepStrictEqual(imgSrc, '//www.baidu.com/img/bd_logo1.png');
expect(cost).toBeLessThanOrEqual(1000);
expect(imgSrc).toBe('//www.baidu.com/img/flexible/logo/pc/index.png');
deepStrictEqual(imgSrc, '//www.baidu.com/img/flexible/logo/pc/index.png');
});
6、 ... and 、 Written in the back
here , If your eyes are thin enough , You'll find an interesting phenomenon , You open Baidu home page from the browser , Then the demand of the console output floor is like this
const lg = document.getElementById('lg');
undefined
lg.childNodes.forEach((node) => { if(node.nodeName.toLowerCase() === 'img') { console.log(node.src) } })
2VM618:1 https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/logo/logo_white-d0c9fe2af5.png
VM618:1 https://www.baidu.com/img/PCfb_5bf082d29588c07f842ccde3f97243ea.png
undefined
However , adopt Node Self contained https library , You'll find that //www.baidu.com/img/flexible/logo/pc/index.png This
Why , shock .jpg. What happened? ? What did Doo do ?
So , I use wget Tested under wget -O baidu.html [https://www.baidu.com](https://www.baidu.com), It is found that normal requests are like this
* tmp wget -O baidu.html https://www.baidu.com
--2022-06-10 00:36:17-- https://www.baidu.com/
Resolving www.baidu.com (www.baidu.com)... 182.61.200.6, 182.61.200.7
Connecting to www.baidu.com (www.baidu.com)|182.61.200.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2443 (2.4K) [text/html]
Saving to: ‘baidu.html’
baidu.html 100%[=====================================================================================================================================================>] 2.39K --.-KB/s in 0s
2022-06-10 00:36:18 (48.3 MB/s) - ‘baidu.html’ saved [2443/2443]
* tmp cat baidu.html
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title> use Baidu Search , You will know </title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value= use Baidu Search class="bg s_btn" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav> Journalism </a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav> Map </a> <a href=http://v.baidu.com name=tj_trvideo class=mnav> video </a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav> tieba </a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb> Sign in </a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb"> Sign in </a>');
</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;"> More products </a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com> About Baidu </a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/> Read Before Using Baidu </a> <a href=http://jianyi.baidu.com/ class=cp-feedback> Feedback </a> Beijing ICP Prove 030173 Number <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
* tmp
But when I send a request to simulate the browser wget --user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16" [https://www.baidu.com](https://www.baidu.com)
* tmp wget --user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16" https://www.baidu.com
--2022-06-10 00:38:53-- https://www.baidu.com/
Resolving www.baidu.com (www.baidu.com)... 182.61.200.7, 182.61.200.6
Connecting to www.baidu.com (www.baidu.com)|182.61.200.7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’
index.html [ <=> ] 350.76K --.-KB/s in 0.01s
2022-06-10 00:38:53 (35.1 MB/s) - ‘index.html’ saved [359175]
* tmp
This is consistent with the behavior of the browser , The output is three img label .
About Node.js Of https I didn't delve into Ku's handling of this piece , I just guessed through the example upstairs , It should be that the server on its side has made relevant judgments on the client , Then return to the corresponding html Text , So here is a way to node.js Set up an upstairs user-agent I guess you can get the following PC The same result , This assignment is left to the reader , Feel free to leave a comment below !
Project address :https://github.com/ataola/play-baidu-midway-crawler
Online access :http://106.12.158.11:8090/
midway More articles about using the tutorial
- Angular2 Introduction series 7-HTTP( One )- Use Angular2 Self contained http Make a network request
Last one :Angular2 Introduction series 6- route ( Two )- Use multi-level routing and pass complex parameters in the routing I feel that this article is not very easy to write , Because it involves network requests , If real network requests are used , If you get this example, you may have to write one yourself web ...
- Angular2 Introduction series 6- route ( Two )- Use multi-level routing and pass complex parameters in the routing
Last one :Angular2 Introduction series 5- route ( One )- Use a simple route and pass parameters in the route The simple routing and parameter transfer were introduced before , In this article, we're going to learn more about complex routing and passing other additional parameters . A good routing system can make us ...
- Angular2 Introduction series 5- route ( One )- Use a simple route and pass parameters in the route
Last one :Angular2 Introduction series - service In the last article we will Angular2 Data services are separated from each other , To study the Angular2 Dependency injection of , In this article we are going to learn Angualr2 The routing For the convenience of writing styles , This one of us ...
- Angular2 Introduction series 4- service
Last article Angular2 Introduction series - Multiple components , Master-slave relationship In programming , We usually separate the data providers from each other , So as not to copy and paste the data request code repeatedly in the process of programming Angular2 The concept of dependency injection is provided in , bring ...
- Angular2 Introduction series 1- Use Angular-cli build Angular2 development environment
I've been learning Angular2, Take time out of your busy schedule to write a simple tutorial . 2016 It's a year of rapid development of the front end , The front end is more and more formed (web component) Component programming mode : before Jquery Gone are the days of taking everything for granted ...
- wepack+sass+vue Introductory tutorial ( 3、 ... and )
11、 ... and . install sass The file is converted to css Related dependency packages needed npm install --save-dev sass-loader style-loader css-loader loader The role of the teacher is to assist web ...
- wepack+sass+vue Introductory tutorial ( Two )
6、 ... and . newly build webpack The configuration file webpack.config.js The overall framework of the document is as follows , The configuration of each configuration item will be described in detail later webpack.config.js Put it directly on the project demo Under the table of contents module.e ...
- wepack+sass+vue Introductory tutorial ( One )
One . install node.js node.js It's the foundation , Must be installed first . And the latest version of node.js, It's integrated npm. Download address node install , Press default all the way . Two . Global installation webpack npm install ...
- Virtual Box To configure CentOS7 The Internet ( Text course )
Many times before CentOS7 virtual machine , Every time you configure the network, look for a tutorial on the Internet , Let's summarize today , Full text configuration , Easy to check later . Virtual Box Optional network access methods include : NAT Network address translation mode (NAT,Network ...
- webpack It's a stupid tutorial
Contact webpack It's been a long time , The company's projects also use webpack In the packaging process , But a few days ago, in the case of teaching new people , There's a problem , That's it : Although online webpack All over the world , But it's hard to find one that makes new people fast ...
Random recommendation
- CustomUI Direct3D9_Sample
At the beginning of this project, the compiler reported a lot Link2019 Error of . Later, I added some lib It's just the documents that solve it , Reference resources The lack of .lib Caused by files Link2019 Solution summary ==================== ...
- nginx Reverse proxy tomcat Browser load failed while accessing , appear ERR_CONTENT_LENGTH_MISMATCH problem
Problem specification : A business environment is deployed on the test machine ,nginx Reverse proxy tomcat, Being loaded for a long time during access , Very slow ! Debug through browser (F12 key ->Console), Found a mistake ERR_CONTENT_LEN ...
- Java Understanding of safe release
see <Java Concurrent programming practice > We have the following problems Code : /** * Created by yesiming on 16/11/11. */public class Holder { private ...
- About Python2 Experience of character coding
about Python The problem of character encoding of the has also been confused for a long time , Recently, I have encountered the problem of web page transcoding many times when I am a crawler , Just get rid of it completely !Just Do it! 1. Two types of str And unicode str and unicode All are basestring Of ...
- Note_JavaWeb_MyBatis3
Jar package mybatis-3.2.8.jar junit4.4.jar log4j-1.2.17.jar Common classes Resources SqlSession SqlSessionFactory SqlS ...
- Wireshark Show results filter basic syntax
Press IP Address filtering :1. Only the source address is displayed 192.168.1.95 Project : ip.src eq 192.168.1.95 2. Only the destination address is displayed 192.168.1.95 Project : ip.dst eq 192.16 ...
- Mu Xue online 0.2_users Table design (2)
Next to the last one , Let's go on with users Table for improvement . Including email verification code (EmailVerifyRecord) And rotation pictures (PageBanner) Two model. 1. To write "models.py" ...
- git Use experience for windows
One . Local synchronization fork Latest version Two .git command One . Local synchronization fork Latest version ① open Git CMD Tools , Get into git The home directory ② Use git remote -v see fork The remote warehouse address of origi ...
- Learn from brother gang Spring frame -- Transaction configuration ( 7、 ... and )
Business Transactions are used to ensure data integrity and consistency . The transaction should have 4 Attributes : Atomicity . Uniformity . Isolation, . persistence . These four attributes are often called ACID characteristic .1. Atomicity (atomicity). A transaction is an indivisible unit of work , Business ...
- Sql Optimization of dynamic query splicing string
Sql Optimization of dynamic query splicing string ancestral direct writing :string sql="select * from TestTables where 1=1";... This kind of code is inefficient , such ...






![[innovative document technology solution] Shanghai daoning provides you with a product - litera, which covers the entire document drafting life cycle, to help users create higher quality documents](/img/7e/a0d1b75a57f9e4e94ccdac5244d30c.png)

