来源|The Robot Brains Podcast
翻译|胡燕君、沈佳丽、程浩源、许菡如、贾川
In today's famous in the worldAI科学家中,深度学习教父Geoffery HintonPossibly the most distinctive research mindset——He likes acted on instinct,More inclined to use analogy,Research careers in the pen that stemmed from sudden spark of thinking.
This is closely related to his own educational background.His undergraduate majors are Physiology and Physics,Also read philosophy,Got a bachelor's degree in psychology andAIDoctoral degree in,This interdisciplinary experience means open confused thinking basis,Let him not constrained by formal and rigorous mathematical deduction,But have the sharp sense of imagination and unique research taste.
The biggest mock object of artificial neural network is undoubtedly the human brain,The scientists explore the mysterious origin of intelligence.Hinton也是受此启发,He started out with restricted Boltzmann machines,Trying to figure out how the human brain works,Then he let nature take its course to the traditional back propagation neural network,直到2012年,他与学生Alex Krizhevsky、
提出的AlexNetBecame the pioneering work of the rise of deep neural networks.
Half a century in the field of deep learning,It can be said that he supported it by himself.AIStudy up half the sky,But those who study in a long period of time are relatively unknown.2019年,Geoffery Hinton,与Yoshua Bengio、Yann LeCun共同获得了图灵奖,His papers have been cited up to now50多万次.
如今,Hinton认为,深度学习这种非常成功的范式将继续保持繁荣.不过,Lead the deep learning is no longer a forward back propagation,Based on his ongoing research into how the human brain works,He peeped into deep learning development of the next big thing:
Learning Algorithms for Spiking Neural Networks
.
这一次,Will his research intuition be validated in the future??
最近,在Pieter Abbeel主持的The Robot Brains Podcast节目中,He expresses in depth about the working mechanism of the brain、脉冲神经网络、大规模模型、玻尔兹曼机、t-SNE技术的见解.以下是对话内容,由OneFlow社区编译.
1、The latest research progress on the working mechanism of the brain
Pieter Abbeel:What are the three problems that have kept you up at night lately??
Geoffery Hinton:
第一,When will the Attorney General do something?,因为时间不多了,This is what worries me the most;第二,How should we deal with such as vladimir putin have nuclear weapons;最后,Does the brain use backpropagation?(Back Propagation).
Pieter Abbeel:You've spent a long time studying how the brain works,进展如何?
Geoffery Hinton:It's a productive thing,I always believe that will find out the problem in the future five years.We are getting closer to the answer,But at the same time I also make sure there is no back propagation in the brain.
我认为,The existing artificial intelligence the underlying technology principle and the working principle of the brain is very different,But at a high level they are the same again,They all have many parameters——the weights between neurons,We can tune the parameters through a large number of training samples.
Both the brain and deep learning involve a large number of parameters,问题是,How can we get the gradient for tuning these parameters.We need some criteria to judge whether the result is ideal or not,If the results are not ideal, you need to adjust the parameters,so as to optimize the prediction of the target.目前我认为,Although back propagation is a working mechanism is generally used to deep learning,But this is very different from how it works in the brain,The brain and other gradient method.
Pieter Abbeel:近期,You also claimed that the working mechanisms of the brain is not back propagation,but closer to a Boltzmann machine,Do you think the Boltzmann machine architecture is a viableAI模型,It is also a theoretical model of how the brain works.?
Geoffery Hinton:
归根结底,If the brain works like backpropagation,So how does it get gradient information??这就是NGRAD
(
https://brainscan.uwo.ca/research/cores/computational_core/uploads/11May2020-Lillicrap_NatNeuroRev_2020.pdf
)(neural gradient representation by activity differences,简称NGRAD)The core of the algorithm theory,It expresses the error derivative in terms of neural activity differences,That is, the time derivative is used to represent the error derivative.
不过,I don't really believe that assumption anymore.
The principle of Boltzmann machine is very simple,My opinion on it is also changing,Now I have partial approval for it.
Boltzmann machine models contain Markov chains,Need symmetrical weight,这似乎并不合理,但另一方面,Boltzmann machine using comparative study,It's more like a generative adversarial network(GAN)instead of typical unsupervised contrastive learning.
在无监督对比学习中,You are asking for two image blocks from the same image(crop)To have similar characterization of,Two images from different image block to be not very similar characterization.And in a boltzmann machine,Do you require positive data(positive data)To produce low energy,负数据(negative data)Produce high energy(The data here refers to a single image,Instead of images or other).所以,If you want to let the unsupervised learning become feasible,Need to have two stages like a Boltzmann machine.
第一阶段,Need to find out the structure of positive data,This does not refer to the structure of pairs of image blocks,But the structure of the whole image,Need to find local extraction and the essence of context between prediction in common;The second stage is different,The first thing to have negative data,It is very close to real images,But there are subtle differences.然后,Your requirement just in the structure of the data is not in the negative data,That is, the structure of positive data must be unique to positive data,Because the front-end connection of the neural network itself(wiring)From data and may lead to negative data structure consistent,But through the practice,Can ensure that the data structure is not influenced by neural network connection.
This is one aspect of Boltzmann Machines that I agree with,但我认为,By using the method of markov chain to generate negative data too complicated,而且低效,So we need to find another negative data generated way.
It's a lot like Generative Adversarial Networks.在生成对抗网络中,First the input real data,Generative models generate negative data,Then judging device will pass whether it is unique to the data structure,To determine the authenticity of the data.I hope with the discriminant of internal representation as the generated model,To generate negative example,从而训练判别器.
因此,I now the idea of between generated against network and boltzmann machine,Not by markov chain to generate data,Instead, by directly generating the model,After all, the latter is much simple.
此外,I imagine also exist a discriminant and another for direct generation model is used to study,Make the generated negative samples more realistic.
Pieter Abbeel:原则上,这并不冲突,Because the generated against network based on energy model can be rewritten into(energy-based model),The former is just one form of the latter.
Geoffery Hinton:
没错.However, in Generative Adversarial Networks,You generate new data from random data at the top,Difficult to completely cover,Because there is a lot of data that will never be generated.but if you generate again from the top level of the discriminator,good coverage.
2006年,我与Simon Osindero和Yee-Whye TehPublished an article on Arousal in Neural Computing-睡眠(wake-sleep)算法的论文(
https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf
).The algorithm does not use backpropagation,good learning effect.It is with the contrast wake up-睡眠算法,之所以称为“对比(contrastive)”,Because contain two aspects,首先是识别,This belongs to adjust the weights phase;其次是生成,but not generated from random data,Instead, it is generated from the data obtained at the time of recognition,This gives good coverage.
Pieter Abbeel:You had said in a paper,Neural networks can be trained via backpropagation.Now almost everyone study are using back propagation mechanism,But now you said,Maybe we should be closer to the brain in ways to work.某种程度上,If you can think back propagation mechanism may be better than the way the brain work?
Geoffery Hinton:
首先,I need to correct,I do andDavid Rumelhart、Ronald WilliamsTogether to write a paper about reverse the spread of highly cited papers,But the backpropagation algorithm was there before,
We just to use and has proved it can learn some interesting characterization of,例如词嵌入(Word Embedding),But we didn't invent the backpropagation algorithm.
I think back propagation can be more efficient than the brain mechanism of similar work,It will be huge amounts of information compression in the billions of neurons connected.要知道,The brain has as many as hundreds of trillions of neuron connections,This also results in its low connection cost,But using the experience of(训练数据)却很少,The brain into a large number of parameters,Only need a small amount of experience.
But artificial neural networks are just the opposite,It has plenty of experience(训练数据)Only need few parameters,We try to find out the influence between the input and output relationship of information,并将其添加到参数中.所以我认为,Backpropagation is more efficient than how the brain works,But the former may not be good at from abstracts the structure of a large number of small amount of data.
Pieter Abbeel:在这方面,Have you ever imagine getting better performance there is no other way?
Geoffery Hinton:I always thought this required an unsupervised objective function,Especially for perceptual learning,这很关键.
If you can according to the physical world abstracts a model,You can based on this model instead of the original data to adjust their behavior,This makes it easier to find the right way.
I am sure the many small local objective function were used in the brain,因此,If you want to optimize an objective function,Not by training an end-to-end system chain,But with the help of many small local objective function,This allows more bandwidth to be used for learning.
例如,This method can get a good objective function,But it is more difficult to implement:首先,Look at a local image block(patch)and try to extract some possible representations;其次,According to its adjacent tiles context prediction can get some;然后,We compare the two,and then predict which representations the image block should contain.
显然,As long as the machine to study the image domain is in place,According to the local image block and adjacent tiles to extract features are generally consistent.如果不一致,You will be very surprised,But you can also learn from the case of a big gap to more.Although there are difficult to implement,But I believe that this will be the objective function of optimization method.
If you will be a large image into many small local image block,这意味着,Through the comparison in the image characterization of local prediction and context to predict whether consistent,Or by comparing the many layers of feature extracting are consistent,get rich feedback.Although it is difficult to realize,But I think research will go along these lines.
Pieter Abbeel:In exploring how to use the local blank part of the picture, the objective function to fill or complement a word,A lot of people are beginning to study, including you, how to effectively learning untagged data,This is also the most cutting-edge research,But you are still in common use back propagation mechanism.
Geoffery Hinton:
So I don't like masking autoencoders(Msked Auto-encoders,MAE)的原因是,After the input image block,在不同层(layer)提取表征,Finally trying to neural network's output layer to reconstruct the missing input image block.
But the brain works differently,It also extracts representations at different levels,but it will reconstruct the next layer at each layer,Not only in the output layer for refactoring.问题是,Whether in the case of not using back propagation algorithm to do this?
If like a masked autoencoder,After extracting representations of different layers,Then reconstruct those missing image patches at the output layer,It is necessary to obtain the information returned by all layers.那么,Now that all of the simulator are embedded into the back propagation algorithm,It's better to continue to use this method,But this has to do with the brain's way of working is not the same.
Pieter Abbeel:When the brain is processing all these local targets,Engineering system is important?如果重要的话,That may be related to three:一是,which local objectives we want to optimize?二是,Which algorithm to use to optimize the local objective?三是,Use which kinds of architecture to connect neurons are studying?But these three points we didn't seem to find the right answer.
Geoffery Hinton:
If you are interested in perceptual learning,That would definitely want to have retina topology mapping(Retinotopic Maps)层次体系,Its architecture is the local connection.这样做的好处是,Can simply assume that a topological map of retinal local content will be decided by feed into its corresponding position,So as to solve many credit allocation problems(Credit Assignment Problem).You just need to take advantage of the local interactions,know what's in a specific location.
Will assume that the neural network in each local position using the same function,卷积神经网络和Transformer都是这样操作的,But I don't think the brain does that,Because this involves Shared weight,And need to be exactly the same in different parts of the calculation.
Compared with Shared weight,I think the brain has a more rational implementation.You can imagine there are a lot of columns(columns )Ongoing local prediction,and observe adjacent columns for contextual prediction,If the context prediction in accordance with the characteristics of the local prediction,then you can infer the local from the context,反之亦然.
因此,You can think the context information can be distilled to all local extractor,They can actually distill each other,彼此学习.也就是说,if they agree,So you get knowledge in one place can also be moved to another place.For example, you want to nose and mouth all agree that they are from with a face,Then they should produce the same representation.事实是,If different local regions produce the same representation,Knowledge can be distilled from one place to another place.
This works much better than actual weight sharing.从生物学的角度来看,这样做有两大好处:一是,The exact architecture in different locations does not need to be the same,二是,Their front-end processing doesn't need to be the same either.
It's like the retina of man,The retina's different parts have different size of receptive field(Receptive Field),And convolutional networks try to avoid this fact.Convolutional networks sometimes have multiple different resolutions,And convolve according to different resolutions,But they can't do different front-end processing.而在大脑中,If you want knowledge to distillation from one position to another position,Then only need to perform the same function in different places can be achieved.Even if you are in two different locations of different pretreatment was carried out on the optical array,Is the front-end processing is different,But still can be accomplished by function knowledge from optical array to the characterization of.
因此,Although extract less efficient than using Shared weight,But more flexible distillation,And it makes more sense from a neurological point of view.I realized that a year ago,Although we need efficient weight sharing,However, if the adjacent regions can reach a consensus on the representation,That partial distillation will suffice.但是,If you want to achieve a local representation agreement,The knowledge of the means to make a location to monitor another location of knowledge.
Pieter Abbeel:You're talking about two views:一种看法是,Neural networks use weight sharing,The brain uses another way,但二者殊途同归,So we should continue weight sharing;Another view is that,Because the brain does it differently,So we should not continue to insist on research on weight sharing,And should try to change direction.So what do you think?
Geoffery Hinton:
I don't think brain will do weight to be Shared,Because it's hard to pass information to various places.We should continue research on convolutional neural networks andTransformerIn the convolution operation.
对于神经网络而言,We should share knowledge by sharing weights,但要记住,The brain does not share knowledge by sharing weights,But by sharing from input to output function and transfer knowledge through distillation.
2、下一个AI大事件
Pieter Abbeel:Do you ever get inspiration from the working mechanisms of the human brain,So which technologies will ultimately be the key,Pulse neural network to calculate?
Geoffery Hinton:
spiking neural network is very important.In the early development of neural network,Marvin Minsky和Seymour Papert就发现,A single neuron cannot handle the XOR problem(XOR),It can't tell if two inputs are different,这和“Check if two inputs are the same”是等价的.可惜Minsky和PapertChoose to study“异或”问题而不是“相同”问题.
如果研究“相同”问题,You will probably think can take advantage of the pulse sequence(spike timing)Determine if two pulses arrive at the same time,if they arrive at the same time,The two pulses will be at the same time to inject a large number of neurons charge,To break through the threshold.特别是,If you have incentive to enter,After the inhibitory input,They must arrive within a narrow window.
所以,Spiking neural networks are very good at detecting consistency(agreement),But ordinary neural network need to go through several layers to do.I think if we can develop a strong learning algorithm,to find out how they learn to use this ability,For example, learning to use impulses for sound source localization.
Pieter Abbeel:这让我想到了Transformer架构,It is also to define consistency or relevance(correlations)而设计的,TransformerThe architecture is much larger than the pulse architecture,But there should be some kind of connection between them.
Geoffery Hinton:
多年来,Neural scientists believe that do not use the pulse tones(spike tones)就是疯了.If we can develop corresponding learning algorithms,And prove that when you start learning when hearing this kind of sequence data,They do make reasonable use of the pulse time,then you can use these pulse cameras(spiking cameras),then got very satisfactory results.Pulse cameras are very clever,they can give a lot of information,But the problem is that no one knows how to use them.Scholars of voice also characterized by the neural network is proposed using pulse auditory input,But the problem is that no one knows how to characterize,and learning and applying this representation.
Pieter Abbeel:事实上,Neurons can release the pulse signal of artificial neurons do not,Is this just because of structural differences??We need to better understand and learn the advantages of neurons in the brain?
Geoffery Hinton:
There are more than structural differences between them.Why the brain is so good,Why the brain can use the pulse signal to do so many things and such a low energy consumption?Once we understand how,You will find that the brain USES pulse unit genius.例如,The retina has no spiking neurons,But it has a lot of nerve to the pulse processing visual information.
同时,It also depends on the use which kinds of neural network learning algorithm to calculate the pulse gradient,The pulse neurons are concerned with two different question is,When will appear pulse and pulse.
People use a variety of alternative function is presented to optimize the system,That did make a difference,But these functions are not necessarily correct.因此,I think it would be nice to have a learning algorithm.
2000年左右,Andrew Brownand I wrote an article about pulsed Boltzmann machines(Spiking Boltzmann Machine)Learning algorithm of paper
(
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.830.2783&rep=rep1&type=pdf
),We was hoping to find a suitable pulse neuron learning algorithm,I think it is the key to the hardware pulse neurons will progress.including the University of ManchesterSteve FurberMany people, including,After finding a suitable learning algorithm for spiking neurons,to build more energy efficient hardware,And to establish a large system.Only after having the right learning algorithm,Can we make full use of pulse neurons do more.
Achieving consistency in convolutional neural networks is not easy,But with the pulse neurons,Consistency can be easily achieved.If an ordinary artificial neuron to judge whether the values of the two input the same,能做到吗?答案是不能,The normal neurons or normal is not easy to artificial neural.
Using pulse neurons can help us to easily build system.If two pulses arrive at the same time,They will be sending you instructions,反之则不会,So using pulse time seems to be a good way to measure consistency.The same mechanism exists in biological systems,So you can distinguish the sound signals directly to two ears time difference.If it is measured by a distance of one foot,Light travel time is about one nanosecond,And sound travels in about a millisecond.
重点是,If I move the object a few inches in front of you,Reach the ears of path distance and time difference is only small changes,But humans can quick recognition so subtle differences,The owl is even more sensitive.
Humans do this through two axons.Pulses of axons move in different directions,Respectively toward the two ears,then if the pulses arrive at the same time,Axons fire cellular signals,Simply put, the principle is.所以,Using the pulse time can realize the recognition so sensitive.
我一直认为,If we can use the pulse time to test the consistency of supervised learning,Will be achieved amazing results.
比如,Extract representations of mouth and nose separately,Then you can infer what the entire face looks like,If the mouth and nose agree on representation,then the predictions for the face will be consistent.By using a pulse time to observe if these predictions are consistent,It is a very good idea.
But it's hard to do,One reason is that we don't have a good algorithm of train like human neuron network,So I now focus on how to get a good training from pulse network output,I think this will have a big impact on the hardware.
Pieter Abbeel:你刚才提到,The retina uses not all spiking neurons,So you mean the brain has two types of neurons,an artificial neuron,The other is the spiking neuron?
Geoffery Hinton:
我不确定视网膜是否更像人工神经元,But what is certain is that the brain cortex neurons with pulse,This is its main means of communication,That is, from a parameters cell to another cell send pulse signal.而且我认为,We only found out why the brain can select send pulse signal,Can you really know your brain.
I had a good argument before,Compared with the typical neural network we use,The parameters of the brain has a lot of,But the lack of data.这种情况下,除非使用强大的正则化,Otherwise, overfitting may occur.A good regularization technique is random deactivation(Dropout),every time you use a neural network,Ignore a lot of units.Neurons send pulse,Maybe what they really convey is the underlying Poisson rate(Poisson Rate).
Suppose this is a Poisson rate,也就是说,Pulse send is random,However, due to the different input information of neurons,The ratio of the process will also vary,And you can send real value rate from one neuron to another.But if you want to do a lot of regularization,You can add some noise when sending rate of real value,One way to add noise is to use pulses.This is why random deactivation is used,因为大多数情况下,If you view in any time window,Will find most neurons are not participate in anything.
You can think of the impulse as the underlying Poisson rate,it is a noisy representation.It sounds very bad,但实际上,Once you understand regularization,Understand we have so many parameters,Can understand this is a very good idea.I still have a soft spot for this idea,But in fact we don't use the pulse time,Key is to use full of poisson's ratio of noise characterization for good regularization.
I'm a little torn between the two.I think to do scientific research,Can't just focus on one idea and ignore the others,But if you're open to accept different ideas,What will you get entangled with.
因此,Some years I have been think we should have a deterministic neural network,But over the years,I also think randomness is very important,Randomness is important again for Boltzmann machines.
Never be completely attached to one of these ideas,Rather be open to both.
Pieter Abbeel:Since spiking neurons consume less energy,Can we ignore the training of spiking neurons for now?,Instead, an efficient pure inference chip is pre-trained separately.,Then compile them onto the spiking neuron chip,So that we will have low energy consumption and strong reasoning ability at the same time.
Geoffery Hinton:
This is a very smart idea,It may further advance the development of neural networks,Because such a method of reasoning is very powerful.A lot of people are already doing this,And they've proven to be more energy efficient,Companies have also produced such a large pulse system.When you in order to achieve a strong reasoning ability and research to the system,Will naturally think about how to achieve lower energy.
想象一下,在某个系统中,you use backpropagation to learn,Then you will migrate it to the low energy consumption on hardware.这样是可行的,But we would like to use this kind of low energy consumption hardware to direct learning.
3、Large-scale neural networks have the ability to understand
Pieter Abbeel:There are a lot of large neural networks in the news right now,With size of window,当然,They are no bigger than the human brain.But mass seems to have become a development trend,And the performance of large-scale neural networks is indeed dazzling..I would also like to know what you think of small scale neural networks?例如,Ants have smaller brains than humans,But the visual motor system we've built now(Visual Motor Systems)The performance is not even as good as the bee
.
Geoffery Hinton:
The bee though small,But a bee probably has millions of neurons,So also a fairly large neural network.我认为,If a model has a large number of parameters,And you according to a reasonable objective function,Use gradient descent to continuously adjust and optimize these parameters,So you end up with model will have a very good performance.GPT-3和Google的GLaMDid the model is.But my approach is different from them.
What we do in neural networks is more recursion.我在2021年2月发表了一篇论文《如何在神经网络中表示部分-整体层次结构》(
https://arxiv.org/pdf/2102.12627.pdf
)介绍了GLOM这个概念,which also involves symbolic computation.This is what most people call symbolic computing:On the symbolic level calculation,The calculation rules depend on the form of the string of symbols being processed,while each symbol has unique properties,This feature may be the same as other symbols,也可能不同,Perhaps the character as a pointer,To point to other things.But I here refers to symbolic calculation, in order to cooperate with some-measures taken by the overall structure.
我不确定GPT-3The text of the understanding ability to what extent,But I believe it is comprehensible,Don't like the earlyELIZAJust put the string restructuring but do not know what the text said.Why am I so sure?证据就是,If you use English toGPT-3发出指令:“Show me a picture of a hamster in a red hat”,然后GPT-3Will generate a hamster pictures of red hat.可以确定,GPT-3Never received this message before,So it must first understand English instructions say what,Pictures can be generated accordingly.
如果放在以前,This is a very powerful evidence,Enough to persuade those who do not believe that neural network has a comprehension.就像Terry Winograd在1971年发表的论文(
https://dspace.mit.edu/handle/1721.1/7095
)里描述的,The machine executed without error “Put the blue blocks in the green box”这一指令,Effectively prove the machine can understand human in what to say.但现在,These skeptics raise new questions.They feel that:Even if the machine can execute commands based on text,This also proves nothing.
Pieter Abbeel:The higher the sceptics always pulled the more standard.2022年4月,Google发表了论文,Shows itPaLMModels can accurately interpret jokes' punchlines.这很不容易,Requires machines to have strong language understanding.
Geoffery Hinton:
非常厉害.If the machine doesn't understand the point of the joke,It certainly can't generate that much detail、Precise explanation.不过,I still have reservations about the understanding of machines.Because the machine learning model is trained by back propagation algorithm after all,different from the way humans understand,So the results of understanding may also be different from humans.
Machine object recognition way,Is the picture of object texture structure and other texture contrast,从而进行归类.And this is very different from the way we humans identify objects.对抗图像(Adversarial Image)There also proves that the difference between machine and human in this respect(Against the image is on the original image after fine-tuning pictures,The adjustment of human almost can examine,But it will cause a huge disturbance to the machine,cause the machine to produce a completely different recognition result).
以“Insect identification flowers”Example to explain adversarial examples(Adversarial Example).昆虫可以看到紫外线,而人类不能,So two flowers that look exactly the same to humans,In the eyes of insects, but may be very different.So can we say insects are wrong??Insects through different uv signal identified it as two different flowers,There is nothing wrong with obviously insects,It's just that humans can't see ultraviolet light,So I don't know the difference.
Not because the human eye can not see difference as two flowers no difference,In fact, the judgment of insects is correct.Humans can only detect with the help of machines,See the color of the two flowers signals in the electromagnetic spectrum belong to different area,To be sure that the two flowers are indeed different.
Pieter Abbeel:But we use the neural network is used to identify the image may not be the purpose of the analysis of machine and human no,But to let the machine to help human beings to solve problems in real life,For example, let the machine accurately distinguish between cars and pedestrians.
Geoffery Hinton:
同意.我的那篇GLOMA big focus of the paper is,Hope for the machine build similar to human perception system,Let the machine even make mistakes as much as possible according to the logic of human to make mistakes,Rather than some human impossible mistakes,相比之下,Or is the former more acceptable.
Pieter Abbeel:OpenAI发布了最新版的DALL-E,Do you think and change(embodiment)How important is for smart?
Geoffery Hinton:
We need to distinguish between the problem in the field of engineering and philosophy.从哲学的角度来看:A person sitting in a room,Apart from listening to the radio and watching TV,不能移动,So he can only rely on the sensory input to figure out how the world is running? 我认为可以.从这个角度来看,Intelligence does not need to be embodied.但实际上,Once the intelligence is and,It will change the way you build awareness system and behavior.
从工程的角度来看:Only by listening to the radio and watch TV to know the world is how to run,This is a good way? I think it's not.从这个角度来看,And the very important,But it also brings a lot of trouble.
我认为,We can make a lot of progress on video databases,When someone made a video,These videos can be used as data for research.Since there is no need to control the collection of data,No need to move the robot,So there is still a lot of room for research.
1980年代时,Dana H. BallardTo realize mobile robot animation perception and standards of computer vision is very different,I am also very agree with.
4、The role of sleep and Boltzmann machines
Pieter Abbeel:据我所知,you've been researching sleep lately?
Geoffery Hinton:
是的,When I can't sleep at night thinking about sleep problems.关于睡眠,There are some very interesting facts.Animals also sleep,比如果蝇,But they sleep may just in order to stop the night activities,Can live without sleeping.But if people don't sleep,The body will have a very abnormal reaction.CIASleep deprivation studies showed,A person for three days without sleep,Begins to hallucinate;7天不睡觉,permanent mental illness.
到底为什么会这样?What's the use of sleep to the human body?Since humans do not sleep, they will collapse,Then sleep must be of extraordinary significance to the human body.现在的理论认为,Humans in sleep,Information in the hippocampus is transferred to the cerebral cortex,To consolidate the memory.
80年代初,Terry Sejnowskiand I came up with the theory of Boltzmann machines.Part of our inspiration comes from British biologistsFrancis Crick对HopfieldNeural network point of view.Francis Crick和Graeme MitchisonPublished an article on sleep(
http://www.rctn.org/cadieu/pdf/CrickMitchisonDreamSleep1983.pdf
)(译注:Francis CrickThe theory put forward is that,The brain receives a lot of confusing and useless information during the day,During sleep will remove them,Make room for memory,这个过程称为“逆向学习”(Reverse Learning).
神经网络也是如此,We give random inputs to the neural network,Hope neural nets sort order out of chaos.对于Hopfield神经网络,We feed it some vector that we want it to remember,Then the neural network continuously adjusts the weights by,Minimize the energy of the vector(注:The smaller the energy function values,系统趋于稳定),But if the start toHopfieldNeural network input random vector,Then ask to make the energy reach higher,This is a method of reverse can actually make the operation of the neural network more efficient.
This is where the inspiration for the Boltzmann machine came from.我们发现,It is possible to avoid random input to the neural network,But to make neural network by its own internal markov chain to generate data,Then in turn ask the neural network,“Make it more likely to generate this kind of data,A little less likely to generate that kind of data”.这本质上是一种“Maximum likelihood method”(Maximum Likelihood Learning).
有了这个想法之后,我们非常兴奋,This and sleep“逆向学习”过程类似.This reverse principle can also be applied to“对比学习”中,当有2From the same image when the image block,You want it2image patches produce similar representations;而当有2image blocks from different images,You want them to produce2a marked difference,Once they produce different representations,When your goal is not to2A characterization of widening differences between,But to prevent the2A characterization of tend to be similar to.
Positive and negative examples are also involved in contrastive learning.Using boltzmann machine,It is not possible to separate positive and negative examples,rather they have to be mixed together,Otherwise, the whole system will be difficult to operate.I have tried separating positive and negative examples,With analysis of a large number of positive cases,Then analyze a large number of counter-examples,find it very difficult.
但几年前,我发现,在对比学习中,Positive and negative examples can be analyzed separately.In this way, a large number of positive examples can be analyzed first,Analysis of a large number of example again.like sleep mentioned earlier,Brain activity is divided into two states of day and night:During the day to accept information,Night to remove useless“反例”.
This comparative study is made more reasonable.Analyze positive and negative examples in stages,First, the characteristics of the analysis are cases,Add some weights and parameters,The characteristics of the analysis example again,Reduce certain weights and parameters.So even the most basic use of contrast learning,也可以做得很好.Although it takes a lot of momentum and skill,But in the end it can be done.
所以,The role of sleep, I think, is likely to be let a person forget the useless information and“反例”.因此,Although you dream for hours at night,But when I wake up, I only remember the last minute dream,Because dreaming is the process of clearing information,The brain and don't want to remember a dream.You may remember quick weights(注:Allows the use of weight matrices to store short-term memory,In order to gain more capacity)In the dream,Because fast weights are a temporary storage.
我认为,This is the most plausible sleep theory I know,Because it explains why people don't sleep,Nervous system will be a problem,会犯错,Hallucinations and various abnormal reactions.
Here I want to talk about the importance of contrast study counterexample.Objective function of the neural network need to optimize their internal,Optimize some characterization of,Need to agree contextual and local predictions,It hopes the consistent result can reflect the characteristics of real data.
But there is a problem with neural networks,Neurons receive more than just input data,There are also various connections between the input data,And these connections have nothing to do with the properties of the real data,but by the connections of the neural network(wiring)And the transmission of the data in the neural network of.
If two neurons are analyzing the same pixel,then they are connected,But this connection does not reflect the characteristics of the pixel itself.So you have to think of some way to extract the structure characteristics of the real data,Avoid interference brought by the neural network.办法就是,
To provide positive cases of neural network,Let the neural network to identify are in counter example case has no the structure characteristics of.The difference between this positive cases and cases to reflect the characteristics of the real data.
所以,If you have a powerful learning algorithm,Should learn to avoid algorithm and the connection weights of the neural network itself,They will produce interference.
Pieter Abbeel:hallucinations without sleep,Is it in order to hallucinate and sleep the same effect?
Geoffery Hinton:
Visual hallucinations may and sleep have the same effect.All the experiments I've done show,Every day it's best not to continuous16Hours awake and8Hours on sleep.相反,When sleep wake is better,So many people, including Einstein found a nap is very helpful to them.
5、t-SNE:数据可视化技术
Pieter Abbeel:在神经网络学习中,尤其重要的一点是,When you build a model,you need to know what it is,它在学习什么.People often try to visualize the learning process,One of the most common form of visualization technology was invented by yout-SNE(t-Distribution Stochastic Neighbor Embedding,t分布随机邻域嵌入).
Geoffery Hinton:
When you try to plot high dimensional data2D和3D映射,You can select2个主成分(principal component)来绘制.在2DMapping is the most important thing in,呈现2The largest distance between the principal components(distance),That is their biggest difference,without caring about those small distances(The small differences).It also means not very well in the mapping to restore data of high-dimensional similarity.
But what people tend to focus on is not the difference in the data,But data similarity,If a map,The mapping relationship of small distances between data is correct,Then they don't care if the big distance is wrong.
很久以前,I thought about whether it is possible to calculate the distance between the data(差异性)转化为“Between data matching possibility(probabilities of pairs)”?Is the distance from small(差异小)A pair of data representation for“Possible pairs”,距离大(差异大)的表示为“Can't match”.这样的话,“小距离”就意味着“High matching possibility”.
我们的方法是,Put a Gaussian distribution around a data point,Compute the gaussian distribution under another data point density,This is an unnormalized probability,After that all you have to do is normalize it.
Then you will these points in2DGraphic representation,To render these“Between data matching possibility”.If two points distance is far away,Need not too care about the relative position between them,Because the probability of their pairing is very low,Focus on those data for matching probability is higher.This produces a good mapping,这就是“随机邻域嵌入(SNE)”:Place a gaussian distribution,Then according to the density under the Gaussian distribution,Randomly selected a neighboring data.
We finally get the derivative of very concise,It makes me feel to find the right way,I also got a nice mapping,But all the data points on the map together,显然,This is the high-dimensional data into low-dimensional data will produce the basic question.
在高维空间,There can be many points near a data point,比如AThere can beB,C,D,E……但是B,C,D,EThe distance between them is not too close.
But in low dimensional space,如果B,C,D,EAll is very close toA,那么B,C,D,Emust be close to each other.这就造成了问题.So I suddenly got an idea,既然用“Between data matching possibility”as the intermediate currency of exchange(intermediate currency),That should have a mixed model.
it should be a mixed version,In the high dimension, for example,With a pair of data matching possibility ise的-sPower of gaussian is proportional to the distance.而在低维度上,Suppose there are two different2D映射,The pairing possibility of this pair of data is the first2D映射中e的-sthe quadratic Gaussian distance and the second2D映射中e的-ssum of added Gaussian distances.
这样一来,we can avoid the aboveB,C,D,E彼此靠近.例如,单词bank有“银行”和“河岸”Different meanings, such as,we try to put its cognates near it,当“bank”表示“银行”can be approached in one of the mappings when“贪婪”,表示“河岸”can be approached in another map when“河流”,但是“河流”永远不会靠近“贪婪”.
I think it's a good idea,So I continue to push forward,trying to get a mixed map.James CookAnd others also involved,But didn't find good method to implement the mixed model.
I have been disappointed,Start working on a simpler version,我称其为UNI-SNE,It is a mixture of Gaussian and uniform distribution,Work very well.UNI-SNE具体是这样的:In a map,All data have equal probability of pairing,Formed the background probability.
On the another mapping,The data between the matching probability is proportional to the square of the distance from them,This means that on this map,Data points can be far apart.后来,I received a paper fromLaurens Van Der Maatens的文章,Very formal format,Makes me think it's a published paper,但实际上并不是.
LaurensWant to study with me,I thought he was holding a published paper,Should be very good,So I invited him to come and do research with me.事实证明,He is really very good.We began to study togetherUNI-SNE,然后我意识到,既然UNI-SNEIs a mixture of gaussian distribution and uniform distribution,And uniform distribution can be regarded as a very wide range of gaussian distribution,Then why not use a combination of different levels of gaussian distribution,consists of many Gaussian distributions of different widths,这就是我们所说的t分布.
于是,t-SNE诞生了,Its effect is much better,And has good characteristics of,Data points can be displayed on multiple scales.When data points are too far apart,They will show a similar qualities of the gravity and clusters of gravity.And you can get different levels of structure,For example, both coarse and fine structures can be displayed..
Now we are using the objective function is associated with the relative density of gaussian distribution,This is I before andAlberto Pacinero的研究结果,But the results of this study are difficult to publish.The feedback when our paper was rejected by a conference was:Hintonhave researched on this idea7年,But no one is interested in.
往好的方面想,The feedback was tell me,My research is very original,We now use the function from this research,t-SNE实际上是NCE的其中一个版本,Using the contrast method,Used to make maps.
总结起来,t-SNEBirth is a long process,首先是原始的SNE,Then we tried to make a hybrid version,Found that it is not successful,Finally because coincidence,Found should be usedt-SNE.我非常感谢Laurens的到来,他非常聪明,Also is very good at programming,Help this project move forward successfully.
Pieter Abbeel:现在来看,The grand idea is important,But only rein details can succeed.
Geoffery Hinton:
You usually need to do the following two things:首先,you have to have a big idea,to produce an interesting original work;然后,You also have to keep the details correct,This is what graduate students are doing.
6、Intelligent with the living computer、心理学、意识
Pieter Abbeel:在神经网络领域,You from nothing until now and obtained significant achievement,实属不易.你曾说过,在某种程度上,Deep learning can achieve everything.Still think so?
Geoffery Hinton:
Sometimes I speak too late to think seriously,因此不够准确,so when someone calls me,我就回复“We don't need a radiologist anymore”之类的话.Actually I meant to:We can use stochastic gradient,When I speak of deep learning,Mind to present is just a bunch of parameters.
The way we compute gradients is not necessarily backpropagation,And we achieved by gradient of also is not necessarily the final performance indicators,but a large number of local objective functions.This is basically how the brain works,It can explain everything.
我还想说一点,The computers we have now are very precise,银行、Space industry or other areas of high accuracy must use accurate system.The human brain is different from the computer,The former is not so accurate.In fact, people have not fully realized:We have set the way of calculation,计算机是精确的,This also means that the knowledge stored in computer and computer can“永生”.
Based on the computer now,Imagine you have a computer program or are some of the neural network weight(Also can regard it as a kind of program),If your hardware failure,You can also run this program on another hardware——This is the realization of knowledge“永生”,Knowledge does not depend on the same hardware to exist.但就目前而言,实现知识“永生”的成本是巨大的,Because it requires the two hardware to be exactly the same,Can't appear any deviation.This requires a digital approach to achieve,May want to perform operations such as multiplication,This is going to use lots of energy,Not the ideal development direction for hardware.
Once the decision to let your program or neural network to get“永生”,You will be costly to calculate the cost of energy consumption and hardware manufacturing.To produce precise hardware,You may have to first in2DEnvironment design,然后再叠加在一起.
If you give up on“永生”的追求呢?If it is in the novel,Those who give up eternal life usually get“爱”,But in the field of artificial intelligence,放弃“永生”Can reduce energy consumption and production cost calculation.因此,We need to do not make the computer,But give it a“种子”,Let them grow on your own.We can use nanotechnology in3DEnvironment in the process,But different objects should be taken in different ways.
An analogy that comes to mind for this is:If you pull a plant out of its pot,You will find that its roots are spherical,Close to the shape of a flower pot.Different potted plant roots grew and flowerpot contour similar ball,But look different,and they function the same,get nutrients from the soil.
The human brain is like the root of a potted plant,Grow on its own,Differences also exist between different brains,but can perform similar functions.I imagined the living computer the way it should be.
Non-immortal computers evolved by themselves,Instead of being made.You can't program it in advance,But to make its own study.所以,We need to configure it with a set of learning algorithms.The living computer most of the calculation can be done in analog signal.例如,Voltage can be multiplied by resistance and converted to charge,And then will charge together.Similar chips are already available.
问题是,We should do next,and how to learn in these chips.目前,It has been put forward in backward propagation and various versions of the boltzmann machine,但我觉得这还不够,We also need to other research.
我想,
在不久的将来,We can produce non-immortal computers at low cost,To study knowledge,耗能很低,When at the end of their life cycle,The knowledge they acquire also disappears.
And the weights in the original computer have no reference meaning.,Because these weights only apply to the hardware of the original computer,我们只能通过“知识蒸馏”In the original computer“知识”Transferred to the new computer.这就好比,The older generation of living computer would like you to make a lot of“Podcast节目”to impart knowledge to a new generation of non-immortal computers.
Pieter:Machine language originated in human psychology,Do you think that now the progress of the machine language can help people better understand human psychology in the future?Think of people as neural networks or classifiers,see their cognitive biases as overfitting.
Geoffery Hinton:
是的,I firmly believe that the development of the machine language can help people better understand human psychology.Until we figure out how the brain works,Actually also can provide some breakthrough to the development of psychology.like learning chemistry on the basis of atoms,Know how to collide with each other between the molecules and produce what reaction after,We also have a better understanding of the gas laws.
Understanding at the nuanced level is important,This will help us understand a higher level of what happened.但我认为,It is difficult to many things at a higher level on satisfactory explanation,例如,The cause of schizophrenia is still unknown.
Pieter Abbeel:If neural network conscious,Now to what degree?
Geoffery Hinton:
100年前,如果你问别人,“What is the difference between living and dying?”他们会说,“Living things alive,And dead things have no life.”And then you ask them,“What is dynamic?”他们会告诉你,“Vitality is something all living things have in common.”
随后,Humans pioneered biochemistry,figured out what's going on with it.但从那时起,We won't talk about“活力”这个词了,It doesn't mean that there is no vitality,Just it is no longer a useful concept.
现在,We have from the biochemical level to understand the causes of things alive,Knowing that when organs are undersupplied, they fail,Then people will die,The body it will rot.所以,It's not that the biological body loses its vitality and goes to heaven,It's just that the biochemical substances in the body have changed..
“意识”也是如此.我认为“意识”Is a scientific concept before,大家对“意识”的定义不同,It's hard to come up with a precise definition,But there are many related concepts in our life.例如,“Are you aware of what's going on around you??”If muhammad ali beat your chin,But you don't realize what's going on around you,此时我们用“无意识”这个词来表示,这是“无意识”的第一层含义.但是,If I was driving,But don't realize what I was doing,这是“无意识”的另一层含义,We have many different understandings.
在我看来,“意识”Is a primitive attempt,It works by naming the thoughts in the brain,and assume that there is some essence that explains everything,In order to understand how to deal with events in the brain.
拿汽车举例,If you want to know about the car,Must understand the power.Once you start to understand the dynamics,will know how the engine works,Will discuss how power conversion, and a series of problems.But when you are familiar with after these steps,就不会再使用“动力”这个词了.
(本文经授权后编译发布,原视频:
1.https://www.youtube.com/watch?v=4Otcau-C_Yc
2.https://www.youtube.com/watch?v=2EDP4v-9TUA)
其他人都在看
OneFlow源码解析:Op、Kernel与解释器
分布式深度学习编程新范式:Global Tensor
欢迎下载体验 OneFlow v0.8.0 最新版本:
https://github.com/Oneflow-Inc/oneflow/
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/214/202208021000451226.html