当前位置:网站首页>There is another problem just online... Warm

There is another problem just online... Warm

2022-07-07 22:53:00 Yes' level training strategy

Hello everyone , I am a yes.

I've come to the online investigation experience again !

Here's the thing , Today, my colleague gave me feedback on a question .


Because our application needs to synchronize order information from a third party , If the user has not entered the order page for a period of time , Then, after entering again, it will automatically pull the order from the third party , In this way, the order information can be updated in time , Prevent users from operating expired orders .

In the near future , This colleague found that every time he clicked the order list, he would trigger a full pull , This is obviously unreasonable , Very expensive resources for back-end tasks .

At first I thought it had nothing to do with me , Maybe there is something wrong with the front-end code BUG ( Ha ha ha , I thought so last time ).

So I told my colleagues at the front end , After investigation , He definitely told me that the code must be ok , Only users who have not synchronized their orders for more than an hour , When you enter the order page again, the pull will be triggered .

I see his vows , I believe . Can't , I can only study it myself .

I really found the problem in this research , And tracing back to the source, it was caused by a problem encountered before , It's really a buckle, this ring !

Start troubleshooting

I'll log in to the test account first , It is found that it is impossible to reproduce what colleagues said that each click on the order list will trigger the full pull of orders .

Good! , lose the first battle .

Then I communicated with him , I find , It's an example ? therefore , Find out individual users who will have this situation .

Simulate , When the full order pull task is executed , In fact, the report is wrong , What's wrong accessToken Be overdue .

What we and the third party authorized to go is oauth2

in other words , Authorized to us by a third party token Out of date , This leads to an error in our order pull interface , So the task failed .

So , I doubt it again token The code of , Because we have a mission , Will be based on token The expiration time of , Use... In advance refreshToken In exchange for the latest token .

therefore , It's impossible to be reasonable token Overdue error reports , So I visually saw that this must be a refresh token There's something wrong with your task , Lead to token Be overdue , This causes the order pull task to fail . Then the front end will not record the failed task time , Therefore, when you enter the order page again, you find that it has not been synchronized for more than one hour , Then trigger full pull .

At this time , I want to find someone responsible for refreshing token Colleagues in the task of , After a round of searching , I found that I wrote it …


I checked that the scheduled refresh task is really running , That can only be a refresh token There's something wrong with your request , Let me check the log , There was a mistake !


This is a mistake , I've seen it before .

This is a call to a third-party refresh token Interface , Then the error returned by the third party , I didn't seem to have any clue , The lack of code , what code?


As you can see from the code above , Refresh token The interface only needs to pass these two parameters , There are no other operations or the like .

also , When I saw this mistake , I'll get it right here refreshToken Make a test call , It is found that there is no error at all , Can successfully return accessToken.

And after many days of observation , I found that some users can refresh successfully , And some can't .

because , Refresh token The interface is so simple , And the error report is returned by the other party , And from the wrong information, it seems to have nothing to do with me , Taken for granted , I think there must be something wrong with the other party's interface , What do I think? There's no room for mistakes on my side ( Remember this sentence ).

therefore , I said I couldn't handle this problem before , Throw the pot directly to a third party ( Because the third party has had problems many times ) 了 , Who knows it's coming back now .

Can't , This problem happened again , Now I can only take this user's refreshToken Try again locally .

By coincidence , I looked in the database before refreshToken , This time I used the company's internal tools to get , Then I found Huadian !


You can see refreshToken It's empty ??? I immediately boarded it and checked it from the library , Found that there are data !!


It's numb , I'm numb again , So what happened ??

I checked it immediately token The code for the task , Confirmed my sql It does get refreshToken , Since there are values in the database , Then I can “ conclude ” When I go to refresh the task refreshToken It must not be empty !

And all of a sudden , I found that this acquisition is cached !


Just a flash of light , I'll check the cache right away , Found in the cache refreshToken It's empty , I wonder which bastard put... In the cache refreshToken Deleted .


immediately , I denied the idea , It should be impossible for us to have such requirements and Implementation …

No idea , I went to see the company's internal tool call to get token Code for , It is found that what is called is a rpc Interface , Because I don't have the code for that service , So I went to ask an old colleague , He was a little impressed , Here comes a sentence :

Good! , I got caught , Make a direct alignment with the modified colleague , Who knows, the other party only replied three words :

I'll go straight to one :

So far, the case has been solved …

The colleague's idea is like this : He thinks he can get token No need refreshToken Of , So out of select The rules of what to take , He chose not to take refreshToken, In this way, the slice cache is not stuffed into the cache refreshToken value .

Then the authorization service is written at the beginning , At that time, the service of this colleague was not taken out A, So about token The acquisition and writing of are implemented by the authorization service itself operating the database , So I'm pretty sure my code does get from the database refreshToken , You wouldn't even think of refreshToken It will be empty .

The problem is that they share a cache key , service A Out of the principle of economy , The user authorization information is not inserted into the cache refreshToken , This causes the authorization service to obtain user authorization information , Due to hit cache , Get the value directly from the cache , And there's nothing in the cache refreshToken Value , So call a third-party refresh token At the interface , refreshToken The value passed is empty !

So the third party returned an error :

thus , I realized the lack of code The meaning of … I want to say that the error message returns refreshToken Is the parameter null fragrant , Give me the whole code, I don't even know what it is code!

then , For those authorized services before services A For users who put it in the cache , Their refresh authorization is normal , Because the authorization service will refreshToken Put it in the cache .

Okay , The investigation is complete , The final treatment is service A Will also be refreshToken Put it in the cache .

Last

You can see , In fact, this investigation does not involve any advanced technology , In fact, it is multi-party linkage , And mistakes caused by poor consideration . In fact, most errors in the production environment are details , For example, the parameter configuration is incorrect , Wrote an additional judgment and so on .

Let's summarize this experience :

  • The correctness of the cache should be considered when obtaining data , You can't just rely on the database , Don't forget to cache
  • Operation of convergence service , That is, the service division is clear and independent , Try not to implement the functions of other services internally , In this way, multiple changes and missing changes can be avoided when the requirements are changed , The above problems will not happen , Unified constraints , Most comfortable
  • The error message is clear , Like the error report above, if it's not missing code It is refreshToken The parameter is empty. , I may have finished checking when I first saw this error report , You don't have to wait until now ( Trust values are also important , There are many mistakes , Gradually distrust each other's service )
  • Global awareness is key . Even if you are responsible for only one service , Have the opportunity to know more about other people's services , Especially their own upstream and downstream , There's something wrong with this , The brain can scan the whole situation clearly , Quickly locate where problems may be found , This is the difference between Daniel and ordinary people ( You can't handle , I'll finish it in two minutes ).

That's about it , If you have a need , You can also take this experience for an interview , Ha ha ha , Don't be polite to me !

I am a yes, From a little bit to a billion , Let's look forward to the next online investigation

原网站

版权声明
本文为[Yes' level training strategy]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202130602331917.html