当前位置：网站首页>2022-7-22 face review + simple topic sorting

2022-7-22 face review + simple topic sorting

2022-07-23 23:15:00 【lyz_ fish】

listC = [('e', 4), ('o', 2), ('!', 5), ('v', 3), ('l', 1)] 
print(sorted(listC, key=lambda x: x[1]))

class Solution:
    def intersectionSizeTwo(self, intervals: List[List[int]]) -> int:
        # print(intervals)
        intervals.sort(key = lambda x:(x[1],-x[0]))
        # print(intervals)
        li = [-1,-1]
        for x in intervals:
            if x[0] <= li[-2]:
                continue
            if x[0] > li[-1]:
                li.append(x[1]-1)
            li.append(x[1])
            # print(li)
        return len(li) - 2

Batch Normalization (BN) It is added between each full connection and the excitation function .

Question group I

Batch Normalization shortcoming ？（ Baidu ）

Check the answer what is BN: What is batch Standardization (Batch Normalization) - You know (zhihu.com)
batch Too small , It will cause great fluctuations ; For text data , Different effective lengths ; It is inappropriate that the mean and variance of the two data on the test set are very different attach ：LN It is to subtract the mean and divide the standard deviation of the data on a time step of a sample , Then play back （ Parameter learning ） Corresponding to ordinary linear regression is to divide the standard deviation by the mean of one layer of nodes .

How to do word segmentation ？（ Baidu ）

Check the answers
rule-based （ Super large vocabulary ）; Based on Statistics （ The more two words appear at the same time , The more likely it is to be a word ）; Based on the Internet LSTM + CRF Part of speech tagging , You can also participle .

word2vector Why do we need to do the frequency when negative sampling 3/4 Power ?（ Baidu ）

Check the answers
In the general direction of ensuring that high-frequency words are easy to be drawn , By weight 3/4 Power way , Appropriately improve low-frequency words 、 The probability of rare words being drawn . If not , Low frequency words , Rare words are hard to draw , So that it is not updated to the corresponding Embedding.

word2vec Two optimization methods ？（ Car companies ）

Check the answers
The first improvement is based on sequence softmax Model of .
First, build a Huffman tree , Take word frequency as n Node weight of words , Constantly merge the nodes with the lowest weight , To form a tree , The greater the weight, the closer the leaf node is to the root node , The smaller the weight, the farther the leaf node is from the root node . Then Huffman coding , That is, for nodes other than the root node , The left subtree is encoded as 1, The right subtree is coded as 0. Finally, binary logistic regression method is used , Walking along the left subtree is the negative class , Walking along the right subtree is the positive class , Learn the model parameters of logistic regression from the training samples .
advantage ： The calculation amount is determined by V（ The total number of words ） Reduce to log2V; High frequency words are near the root node , The number of steps required is small , Low frequency words are far away from the root node .
The second is the model based on negative sampling .
A small number of negative samples are obtained by sampling , For positive samples and a small number of negative samples , Using binary logistic regression model , Through the gradient rising method , To get the model parameters corresponding to each word . The specific negative sampling method is ： Sample according to word frequency , That is, the higher the word frequency, the greater the probability of word acquisition .

CNN Principle, advantages and disadvantages ？（ Car companies ）

Check the answers
CNN It's a feedforward neural network , Usually contains 5 layer , Input layer , Convolution layer , Activation layer , Pooling layer , Full connection FC layer , The core part is convolution layer and pooling layer .
advantage ： Shared convolution kernel , No pressure on high-dimensional data processing ; There is no need to manually select features .
shortcoming ： You need to adjust parameters ; A large number of samples are needed .

Describe below CRF Model and application （ Car companies ）

Check the answers
Given a set of input random variables, the conditional probability distribution density of another set of output random variables . Conditional random fields assume that the output variables constitute Markov random fields , And what we usually see is linear chain random field , That is to say, the discriminant model that predicts the output from the input . The solution is MLE or regularized MLE .CRF Models are usually used to optimize named entity recognition tasks .

transformer structure ？（ Car companies ）

Check the answers
Transformer Itself is a typical encoder-decoder Model ,Encoder End sum Decoder Both ends 6 individual Block,Encoder Terminal Block It includes two modules , long position self-attention Module and a feedforward neural network module ;Decoder Terminal Block It includes three modules , long position self-attention modular , long position Encoder-Decoder attention Interaction module , And a feedforward neural network module ; We need to pay attention to ：Encoder End sum Decoder Each module in the end has a residual layer and Layer Normalization layer .

elmo and Bert The difference between ？（ Car companies ）

Check the answers
BERT It's using Transformer In the architecture Encoder modular ; GPT It's using Transformer In the architecture Decoder modular ; ELMo Double layer and two-way LSTM modular .

elmo and word2vec The difference between ？（ Car companies ）

Check the answers
elmo Word vectors contain context information , Not immutable , It changes at any time according to the context .

lstm And GRU difference ？（ Car companies ）

Check the answers
（1）LSTM and GRU The performance of is equal in many tasks ;
（2）GRU Fewer parameters , Therefore, it is easier to converge , But in the case of large data sets ,LSTM Better performance ;
（3）GRU There are only two doors （update and reset）,LSTM There are three doors （forget,input,output）,GRU Direct will hidden state Pass on to the next unit , and LSTM use memory cell hold hidden state Pack up .

There is no need to instantiate , Just call directly class The function in

class C(object):
    @staticmethod
    def f():
        print('runoob');
 
C.f();          #  Static methods do not require instantiation 
cobj = C()
cobj.f()        #  It can also be invoked after instantiation

原网站

版权声明
本文为[lyz_ fish]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/204/202207231246203529.html

当前位置：网站首页>2022-7-22 face review + simple topic sorting

2022-7-22 face review + simple topic sorting

边栏推荐

猜你喜欢

随机推荐