当前位置:网站首页>There is another problem just online... Warm
There is another problem just online... Warm
2022-07-07 22:53:00 【Yes' level training strategy】
Hello everyone , I am a yes.
I've come to the online investigation experience again !
Here's the thing , Today, my colleague gave me feedback on a question .

Because our application needs to synchronize order information from a third party , If the user has not entered the order page for a period of time , Then, after entering again, it will automatically pull the order from the third party , In this way, the order information can be updated in time , Prevent users from operating expired orders .
In the near future , This colleague found that every time he clicked the order list, he would trigger a full pull , This is obviously unreasonable , Very expensive resources for back-end tasks .
At first I thought it had nothing to do with me , Maybe there is something wrong with the front-end code BUG ( Ha ha ha , I thought so last time ).
So I told my colleagues at the front end , After investigation , He definitely told me that the code must be ok , Only users who have not synchronized their orders for more than an hour , When you enter the order page again, the pull will be triggered .
I see his vows , I believe . Can't , I can only study it myself .
I really found the problem in this research , And tracing back to the source, it was caused by a problem encountered before , It's really a buckle, this ring !
Start troubleshooting
I'll log in to the test account first , It is found that it is impossible to reproduce what colleagues said that each click on the order list will trigger the full pull of orders .
Good! , lose the first battle .
Then I communicated with him , I find , It's an example ? therefore , Find out individual users who will have this situation .
Simulate , When the full order pull task is executed , In fact, the report is wrong , What's wrong accessToken Be overdue .
What we and the third party authorized to go is oauth2
in other words , Authorized to us by a third party token Out of date , This leads to an error in our order pull interface , So the task failed .
So , I doubt it again token The code of , Because we have a mission , Will be based on token The expiration time of , Use... In advance refreshToken In exchange for the latest token .
therefore , It's impossible to be reasonable token Overdue error reports , So I visually saw that this must be a refresh token There's something wrong with your task , Lead to token Be overdue , This causes the order pull task to fail . Then the front end will not record the failed task time , Therefore, when you enter the order page again, you find that it has not been synchronized for more than one hour , Then trigger full pull .
At this time , I want to find someone responsible for refreshing token Colleagues in the task of , After a round of searching , I found that I wrote it …

I checked that the scheduled refresh task is really running , That can only be a refresh token There's something wrong with your request , Let me check the log , There was a mistake !

This is a mistake , I've seen it before .
This is a call to a third-party refresh token Interface , Then the error returned by the third party , I didn't seem to have any clue , The lack of code , what code?

As you can see from the code above , Refresh token The interface only needs to pass these two parameters , There are no other operations or the like .
also , When I saw this mistake , I'll get it right here refreshToken Make a test call , It is found that there is no error at all , Can successfully return accessToken.
And after many days of observation , I found that some users can refresh successfully , And some can't .
because , Refresh token The interface is so simple , And the error report is returned by the other party , And from the wrong information, it seems to have nothing to do with me , Taken for granted , I think there must be something wrong with the other party's interface , What do I think? There's no room for mistakes on my side ( Remember this sentence ).
therefore , I said I couldn't handle this problem before , Throw the pot directly to a third party ( Because the third party has had problems many times ) 了 , Who knows it's coming back now .
Can't , This problem happened again , Now I can only take this user's refreshToken Try again locally .
By coincidence , I looked in the database before refreshToken , This time I used the company's internal tools to get , Then I found Huadian !

You can see refreshToken It's empty ??? I immediately boarded it and checked it from the library , Found that there are data !!

It's numb , I'm numb again , So what happened ??
I checked it immediately token The code for the task , Confirmed my sql It does get refreshToken , Since there are values in the database , Then I can “ conclude ” When I go to refresh the task refreshToken It must not be empty !
And all of a sudden , I found that this acquisition is cached !

Just a flash of light , I'll check the cache right away , Found in the cache refreshToken It's empty , I wonder which bastard put... In the cache refreshToken Deleted .

immediately , I denied the idea , It should be impossible for us to have such requirements and Implementation …
No idea , I went to see the company's internal tool call to get token Code for , It is found that what is called is a rpc Interface , Because I don't have the code for that service , So I went to ask an old colleague , He was a little impressed , Here comes a sentence :

Good! , I got caught , Make a direct alignment with the modified colleague , Who knows, the other party only replied three words :

I'll go straight to one :
So far, the case has been solved …
The colleague's idea is like this : He thinks he can get token No need refreshToken Of , So out of select The rules of what to take , He chose not to take refreshToken, In this way, the slice cache is not stuffed into the cache refreshToken value .
Then the authorization service is written at the beginning , At that time, the service of this colleague was not taken out A, So about token The acquisition and writing of are implemented by the authorization service itself operating the database , So I'm pretty sure my code does get from the database refreshToken , You wouldn't even think of refreshToken It will be empty .
The problem is that they share a cache key , service A Out of the principle of economy , The user authorization information is not inserted into the cache refreshToken , This causes the authorization service to obtain user authorization information , Due to hit cache , Get the value directly from the cache , And there's nothing in the cache refreshToken Value , So call a third-party refresh token At the interface , refreshToken The value passed is empty !
So the third party returned an error :

thus , I realized the lack of code The meaning of … I want to say that the error message returns refreshToken Is the parameter null fragrant , Give me the whole code, I don't even know what it is code!
then , For those authorized services before services A For users who put it in the cache , Their refresh authorization is normal , Because the authorization service will refreshToken Put it in the cache .
Okay , The investigation is complete , The final treatment is service A Will also be refreshToken Put it in the cache .
Last
You can see , In fact, this investigation does not involve any advanced technology , In fact, it is multi-party linkage , And mistakes caused by poor consideration . In fact, most errors in the production environment are details , For example, the parameter configuration is incorrect , Wrote an additional judgment and so on .
Let's summarize this experience :
- The correctness of the cache should be considered when obtaining data , You can't just rely on the database , Don't forget to cache
- Operation of convergence service , That is, the service division is clear and independent , Try not to implement the functions of other services internally , In this way, multiple changes and missing changes can be avoided when the requirements are changed , The above problems will not happen , Unified constraints , Most comfortable
- The error message is clear , Like the error report above, if it's not missing code It is refreshToken The parameter is empty. , I may have finished checking when I first saw this error report , You don't have to wait until now ( Trust values are also important , There are many mistakes , Gradually distrust each other's service )
- Global awareness is key . Even if you are responsible for only one service , Have the opportunity to know more about other people's services , Especially their own upstream and downstream , There's something wrong with this , The brain can scan the whole situation clearly , Quickly locate where problems may be found , This is the difference between Daniel and ordinary people ( You can't handle , I'll finish it in two minutes ).
That's about it , If you have a need , You can also take this experience for an interview , Ha ha ha , Don't be polite to me !
I am a yes, From a little bit to a billion , Let's look forward to the next online investigation !
边栏推荐
- Loki, the "open source star picking program", realizes the efficient management of harbor logs
- 6-3 find the table length of the linked table
- Pyqt GUI interface and logic separation
- Debezium series: introducing support for the final operator
- Visual studio 2019 installation
- Debezium series: binlogreader for source code reading
- 微服務遠程Debug,Nocalhost + Rainbond微服務開發第二彈
- 行測-圖形推理-4-字母類
- Line test - graphic reasoning - 4 - alphabetic class
- Vs custom template - take the custom class template as an example
猜你喜欢

Basic knowledge of linked list

Firefox browser installation impression notes clipping

Redis cluster installation

Gazebo import the mapping model created by blender

Micro service remote debug, nocalhost + rainbow micro service development second bullet

PCL .vtk文件与.pcd的相互转换

Ligne - raisonnement graphique - 4 - classe de lettres

行测-图形推理-9-线条问题类

如何选择合适的自动化测试工具?

php 获取图片信息的方法
随机推荐
Yarn开启ACL用户认证之后无法查看Yarn历史任务日志解决办法
Debezium系列之:支持 mysql8 的 set role 語句
Early childhood education industry of "screwing bar": trillion market, difficult to be a giant
LeetCode203. Remove linked list elements
Basic knowledge of linked list
Line test - graphic reasoning - 3 - symmetric graphic class
Revit secondary development - collision detection
Common verification rules of form components -2 (continuously updating ~)
Write in front -- Talking about program development
Record problems fgui tween animation will be inexplicably killed
Revit secondary development - intercept project error / warning pop-up
LeetCode707. Design linked list
Line test graph reasoning graph group class
行测-图形推理-6-相似图形类
Xcode modifies the default background image of launchscreen and still displays the original image
Debezium series: source code reading snapshot reader
Quick sort (diagram +c code)
苹果在iOS 16中通过'虚拟卡'安全功能进一步进军金融领域
UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xf9 in position 56: illegal multibyte sequence
7-18 simple simulation of banking business queue