当前位置:网站首页>Brief introduction of machine learning framework

Brief introduction of machine learning framework

2022-07-05 14:50:00 Full stack programmer webmaster

Hello everyone , I meet you again , I'm your friend, Quan Jun .

     Machine learning framework means a system or method that can integrate all machine learning including machine learning algorithms , Enable users to use them most effectively . In particular , This includes data representation and processing methods 、 The method of expressing and establishing prediction model 、 Methods of evaluating and using modeling results .

     In all available machine learning frameworks , Frameworks that focus on iterative algorithms and interactive processing are recognized as the best , Because these characteristics can promote the estimation of complex prediction models and the good interaction between researchers and data . The present , A good machine learning framework still needs to include big data functions 、 A lot of fast processing capacity , And fault tolerance . Excellent machine learning frameworks usually include a large number of machine learning algorithms and available statistical tests .

     Compared with Spark,Hadoop MR It is more efficient for some big data that cannot be put into memory or because experienced researchers pursue better availability . although ,Spark Due to the memory processing technology , It has excellent interactive computing performance and high cost performance , but Hadoop MR It is a more mature platform , It came into being to solve the batch processing problem . Besides , With more support projects 、 Tools and cloud services ,Hadoop MR At present, it has a larger ecosystem .

All in all , A machine Study The framework includes how to process data , Analysis method , Analysis and calculation , Result evaluation and result utilization . A good machine learning framework needs to deal with large-scale data extraction and data preprocessing , You also need to deal with fast calculations 、 Large scale and high-speed interactive evaluation , And easy to understand result interpretation and deployment .

Here is a brief introduction to the next part of the mainstream framework :

  1. Apache Spark MLlib Apache Spark What is best known is that it is Hadoop A member of the family , But this memory data processing framework is born out of Hadoop outside , It's just Hadoop Get a reputation for yourself outside the ecosystem .Hadoop It has become a useful machine learning tool , This is due to its growing algorithm library , These algorithms can be applied to data in memory at high speed . Early versions of Spark Enhanced right MLib Support for ,MLib It is a platform mainly for mathematical and statistical users , It allows the By persisting pipeline properties Spark Machine learning work suspended and resumed .2016 Published in Spark2.0, Yes Tungsten High speed memory management system and new DataFrames Streaming media API Improved , Both of these will improve the performance of machine learning applications .
  2. H2O H2O, Now it's the Third Edition , Can be provided through the general development environment (Python, Java, Scala, R)、 Big data system (Hadoop, Spark) And data sources (HDFS, S3, SQL, NoSQL) Access to machine learning algorithms .H2O It's for data collection 、 End to end solutions for model building and service Forecasting . for example , You can export the model as Java Code , In this way, it can be predicted in many platforms and environments . H2O Can be used as a native Python library , Or through Jupyter Notebook, Or is it R Studio Medium R Language to work . This platform also includes an open source 、 be based on web Of 、 stay H2O called Flow Environment , It supports interaction with data sets during training , Not just before or after training .
  3. Apache Singa “ Deep learning ” The framework enhances the function of heavy task machine learning , Such as natural language processing and image recognition .Singa It's a Apache The incubator project , It's also an open source framework , The purpose is to make it easier to train deep learning models on large data sets . Singa Provides a simple programming model , For training deep learning networks on machine clusters , It supports many common types of training : Convolutional neural networks , Limited Boltzmann machine And recurrent neural networks . Models can be trained simultaneously ( One by one ) Or asynchronous ( Together ) Training , It can also be allowed in CPU and GPU On the cluster , I will support it soon FPGA.Singa Also through Apache Zookeeper Simplifies the setup of clusters .
  4. Caffe2 Deep learning framework Caffe The concept of development is “ expression 、 Speed and modularity ”, It started with 2013 Machine vision project in , thereafter ,Caffe It has also been expanded to absorb other applications , Such as voice and multimedia . Because speed is a priority , therefore Caffe Use it completely C+ + Realization , And support CUDA Speed up , And it can be in CPU and GPU Switch between processes . The distribution includes free open source reference models for general classification tasks , And so on Caffe A model created and shared by the user community . A new one by Facebook Supported by Caffe The iterative version is called Caffe2, Now in the process of development , It's going on 1.0 Release . The goal is to simplify distributed training and mobile deployment , Provide information about such as FPGA And other new types of hardware support , And make use of advanced technologies such as 16 The characteristics of bit floating point training .
  5. Google Of TensorFlow Microsoft DMTK Is very similar ,Google TensorFlow It's a machine learning framework , Designed to scale across multiple nodes . It's like Google Of Kubernetes equally , It is to solve google Designed for internal problems ,google Finally, it will be released as an open source product . TensorFlow Realize the so-called data flow graph , Batch data (“tensors”) It can be processed by a series of algorithms described in the figure . The movement of data in the system is called “ flow ”- Hence its name . These figures can be obtained by C++ perhaps Python It can be implemented in CPU and GPU Top processing . TensorFlow Recent upgrades have improved the Python The compatibility of , Improved GPU operation , Also for the TensorFlow Being able to run on more kinds of hardware opens the door , And expand the built-in classification and regression tool library .
  6. Amazon's machine learning Amazon's approach to cloud services follows a pattern : Provide basic content , Let the core audience pay attention , Let them build applications on it , Find out what they really need , Then deliver it to them . Amazon is offering machine learning as a service - The same is true of Amazon machine learning . The service can be connected to storage on Amazon S3、Redshift or RDS The data on the , And run binary classification on this data 、 Multilevel classification or regression to build a model . however , It is worth noting that the generated model cannot be imported or exported , And the data set of training model can't exceed 100GB. however , Amazon machine learning shows the practicality of machine learning , Not just luxuries . For those who want to go further , Or people who are not so closely connected with Amazon cloud , Amazon's deep learning machine landscape contains many major deep learning frameworks , Include Caffe2、CNTK、MXNet and TensorFlow.
  7. Microsoft Azure ML Studio Considering the amount of data and computing power required to perform machine learning , Cloud is an ideal environment for machine learning applications . Microsoft has been working on Azure Equipped with its own pay as you go machine learning service -Azure ML Studio, Provides monthly 、 By hour and free version .( The company's HowOldRobot The project was created using this system .) You don't even need an account to try the service ; You can login anonymously , Free use Azure ML Studio most 8 Hours . Azure ML Studio Allow users to create and train models , Then turn these models into those used by other services API. Each account of free users can try up to 10GB Model data for , You can also connect your own Azure Store for larger models . There are a wide range of algorithms available , Thanks to Microsoft and third parties . Recent improvements include the adoption of Azure Batch service 、 Better deployment management control and detailed web Service usage statistics , Batch management of training tasks .
  8. Microsoft's distributed machine learning tool set Invest more machines in machine learning problems , It will have a better effect - But developing machine learning applications that work well on a large number of computers is quite a headache . Microsoft DMTK( Distributed machine learning toolset ) The framework solves the problem of distributing multiple machine learning tasks in a system cluster . DMTK It is considered a framework rather than a fully mature 、 On go solutions , Therefore, the number of algorithms involved is very small . However , You will still find some key machine learning libraries , For example, gradient enhancement framework (LightGBM), And for something like Torch and Theano The support of such a deep learning framework . DMTK The design of allows users to use limited resources to build the largest cluster . for example , Each node in the cluster will have a local cache , Thus, the communication flow with the central server node is reduced , This node provides parameters for tasks .
  9. Microsoft's computing network tool set In the release DMTK after , Microsoft has launched another machine learning tool set , Computing network toolkit , abbreviation CNTK. CNTK And Google TensorFlow similar , It allows users to create neural networks through a directed graph . Microsoft also believes that CNTK Can work with people like Caffe、Theano and Torch Such projects are comparable ,- Besides CNTK You can also use more CPU and GPU Get faster speed by parallel processing . Microsoft claims to be Azure Upper GPU Run on Cluster CNTK, Can be for Cortana The training speed of speech recognition is improved by an order of magnitude . the latest version CNTK 2.0 By improving accuracy TensorFlow The heat of the , Added a Java API, be used for Spark Compatibility , And support kera frame ( Usually used for TensorFlow) Code for .
  10. Apache Mahout stay Spark Long before the mainstream ,Mahout Has been developed , Used in Hadoop Scalable machine learning on . But after a long period of relative silence ,Mahout It's coming back to life , For example, a new environment for Mathematics , be called Samsara, Allow multiple algorithms to span distributed Spark Run on Cluster . And support CPU and GPU function . Mahout The framework has long been associated with Hadoop binding , But many of its algorithms can also be used in Hadoop External operation . This is for those who eventually migrate to Hadoop Independent application or from Hadoop It's useful to separate them into separate applications .
  11. Veles (Samsung) [Veles]https://velesnet.ml/) It is a distributed platform for deep learning applications , It's like TensorFlow and DMTK equally , It is to use C++ Compiling , Although it uses Python To perform automation and coordination between nodes . Before being transferred into the cluster , The data set should be analyzed and normalized automatically , And then call REST API To instantly use the trained model ( Suppose your hardware meets the needs of this task ) Veles It's not just about using Python As glue code , Because it is based on Python Of Jupyter Notebook Can be used to visualize and publish by a Veles The result of clustering .Samsung hope , By way of Veles Open source will stimulate further development , As a way to Windows and MacOS Way .
  12. mlpack 2 As a basis for C++ Machine learning library ,mlpack Originally produced in 2011 year , According to the idea of the founder of the Library , Design mlpack In order to “ Extensibility , Speed and ease of use .”mlpack It can be realized by “ Black box ” To operate , You can also use C++ API To complete complex work . mlpack The second edition of contains many new algorithms , And the reconstruction of existing algorithms , To increase their speed or make them slim . for example , It abandoned Boost Random number generator of Library , Turn to C++ 11 The native random number function of . mlpack One of the chronic diseases of is the lack of C++ For the support of language . This means that users of other languages need the support of third-party libraries , Such a Pyhton library . And I have finished some work to increase my understanding of MATLAB Support for , But like mlpack Such a project , When playing a direct role in the main environment of machine learning , Often get larger applications .
  13. Neon Nervana, A company that builds its own deep learning hardware and software platform ( Now it's part of Intel ), A file named “Neon” The framework of deep learning , It's an open source project .Neon Use pluggable modules , To achieve in CPU、GPU perhaps Nervana Complete heavy tasks on the chip developed by yourself . Neon Mainly used Python To write , Some of them are used C++ And assembly to improve speed . This makes the framework usable Python Or anything else with Python Binding framework for people who work in Data Science . Many standard deep learning models , Such as LSTM、AlexNet and GoogLeNet, Can be used as Neon Pre training model of . The latest version Neon 2.0, Added Intel math kernel library to improve CPU Performance of .
  14. Marvin Another relatively recent product ——Marvin Neural network framework , It is the product of Princeton Vision Group .Marvin“ Born to be black ”, As its creator explained in the project document , The project only relies on some C++ Prepared documents and CUDA GPU frame . Although the code of this project is very few , However, a large number of pre training models are provided , These models can be like the code of the project itself , It can be reused in appropriate occasions or shared according to the needs of users .

Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/149705.html Link to the original text :https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207051444214224.html