当前位置:网站首页>On the confirmation of original data assets

On the confirmation of original data assets

2020-11-08 11:26:00 osc_vnopm

After the development of data element market , Naturally, a lot of data assets will be formed . On the macro , Data elements value flow , The process of forming data assets is shown below . The timing of data assets on the balance sheet , It mainly needs to solve such problems as right confirmation 、 pricing 、 Trading and measurement . With the gradual implementation and implementation of relevant policies and supporting laws and regulations, the research on these aspects has become increasingly active .

This paper makes some preliminary analysis and Research on the confirmation of data assets right . The method used is , Building a market for simple data elements , Build some core concepts and analytical frameworks . And then apply these core concepts 、 The framework analyzes some key issues of data assets right confirmation , Put forward some solutions and problems that need further study .

Imagine a market for simple data elements It's made up of the primary market 、 The secondary market consists of . There are two sellers Each has two independent data sources The original data set of After simple processing , Form data assets separately

, Enter the primary market to trade . buyers Using different strategies to enter the market to trade . among , buyers C Just simply buy After their own consumption ; buyers D purchase after , By processing two data assets Output a new dataset And eventually form data assets , Enter the secondary market to trade . market The graph is as follows :

 

The reason for setting up primary and secondary markets , This is mainly because of the confirmation of the ownership of the data assets generated by the original data set , It is quite different from the data assets formed after processing the original data set . therefore , The primary market is set to trade data assets generated from the original data set ; The secondary market deals with the data assets produced by the primary market data assets after processing .

The generation of data assets requires a series of processes and paths , Including which data source to get the original record from (Records)、 What transport communication network is used to transfer records to records / Storage facilities , And after cleaning 、 mark 、 Synthesis and a series of processes , Eventually, it's a deliverable asset . For simple , This paper calls it data asset production chain .

The characteristics of the data asset production chain are made up of data “5V1P” Characteristics determine .“5V1P” It refers to the amount of data (Volume)、 Speed (Velocity)、 type (Variety)、 variability (Variability)、 accuracy (Veracity) And data sources (Provenance). In general , The delivery of data assets is not one-off , It's a continuous dynamic process .

When data assets enter the market , If it wasn't for one-off consumption , How to use these data assets is beyond the seller's control . Follow up buyers in order to better use the data , There is bound to be an initial source of data 、 And how to deal with the historical evolution needs more understanding and grasp .

From the diagram above , It can be seen that , When data assets enter the market , It's going to be processed over and over again 、 Reprocessing 、 Generate new data assets 、 This is the iterative process of re-entry . To ensure that the value of each data asset is maintained throughout the circulation process , We need the integrity of its production chain 、 Consistency and accuracy ( hereinafter referred to as “ Sexuality ”) Take the necessary measures to guarantee . otherwise , The value of data assets is not guaranteed to buyers .

thus , The market is bound to demand that data asset owners not only need static control , Also need to be able to dynamically control the production chain , That is to say, to be able to dominate and decide “ The purpose of production activities 、 object 、 methods 、 Methods and results ”. Limited to space , This paper only discusses the primary market , The problem of confirming the rights of data assets generated from the original data set .

One 、 Data assets in the primary market are confirmed

Data assets in the primary market From the original data set Generate . Its production chain can be formally expressed as : General , Data assets And datasets Although it's two different levels of concepts , but

The data in is A subset of , Expressed as Usually , The owner of the data set is consistent with the owner of the data asset .

data source yes IoT Or a node in a sensor network ( equipment ), It will sensor /IoT Of “ perception ”( For simple , It can be recorded as a function f)  Encoded as a string of data bytes , And is recorded and stored in the medium . according to This same path records and stores the data set , It's a data set

  • 1、 Data birthplace

Data sets The data in is raw data , They don't exist naturally , It's generated . In order to solve the need of subsequent confirmation of rights , The place where the original data is first recorded and stored is called the birthplace of the data (DBP:Data Birth Place), And can be recorded for the first time by / Stored device information 、 A combination of geographic information and network address information (DBP-ID). That is, any data set has a DBP-ID With the corresponding . The place of birth is a very important piece of evidence to confirm ownership of a dataset , In the later analysis , You will also see , It's also a very critical foundation for building the entire data market .

  • 2、 Data birth certificate

In order to prove that a data set is composed of some data source and function f Generated , It can be done by issuing a data birth certificate (DBC:Data Brith Certification) The way to achieve . This is a very important measure to ensure consistency in the process of data set generation . because , If the data source or function f change , So the data is not what it used to be .

Data birth certificate is to verify the consistency and invariance of the generating path of the original data set , That is, the authentication data set The data in is all from the data source And the function f Generate , namely Symbol For consistency and invariance .

The data birth certificate is issued by a third party . The certification body that issued the certificate (Issuer) It can be centralized , It can also be an alliance . Theoretically , When the dataset produces a new batch of raw data , You should apply to the certification authority for this batch of data DBC.

This paper is to simplify , Suppose the entire data set is in its lifecycle , Do not change data sources and functions , therefore , Do it once DBC Certification is enough . thus , There is at least one in any dataset DBC With the corresponding . The process diagram is as follows :

  • 3、 Production chain status

From the above , Data sets There is one DBP-ID、DBC With the corresponding , That is to say, it can be used for such a production chain state Create a description :

{ datasetID:  xxxxx

dataset name: 

data birth place: DBP-ID;

data birth certificationID: xxxxxxxx

data source:

sensor device ID: xxxxxxxxx

sensor function:f

timestamp: xx-xx-xx

}

  • 4、 Ownership confirmation

Such as A Data source 、DBP The owner of the device on , You can determine the dataset 、 Data assets The owner of is A. But according to the previous analysis , thus , The process of confirming the right of the owner has not been completed yet .A As the owner , It has to be proved to the market that , Every time new data is generated , Data is still coming from the same data sources and functions , Promise and guarantee the data assets to produce the chain of production “ Sexuality ”, Otherwise, there is no proof that A The right of control as the owner is valid , It can not be determined as the owner .

Confirm the ownership of data assets , In fact, the owner is required to be able to 、 object 、 methods 、 Methods and results , That is, the production chain “ Sexuality ” The ability to control is effectively verified . that , How to achieve the above-mentioned task goal of confirming rights ?

Back to the simple market above , In order to complete the To confirm the ownership of ,A There's no way to prove it , It needs some infrastructure support to complete . To illustrate , The author simply constructs a right confirmation infrastructure ( The schematic diagram is as follows ).

First , Between the data source and the birthplace , Using secure trusted computing environment (TEE). And use zero knowledge proof in the data source (ZKP) The way , Proof is written to the dataset Data in :1) It all comes from ;2) Recording and storing for the first time . thus , You can build datasets and DBP-ID The consistent correspondence of .

secondly , Throughout the life cycle of the data set , Whenever new data is generated , Just apply for a birth certificate DBC. Each block of the dataset has DBC. And map the data of the dataset to when , Will also be DBC Mapping together .

Last , Put data assets Real time status information of production chain Write to blockchain .

With these three infrastructures , At the moment Yes When the ownership is confirmed , Only the following steps are needed to confirm the ownership of A:

1) Data assets Of all blocks of data DBC Agreement ;

2) Data blocks all come from the same data birthplace DBP;

3) Production chain status is consistent , namely

4) Data source device 、DBP The ownership of the equipment and software is A.

 

The above simplified discussion , It is mainly to facilitate the establishment of basic core concepts and analysis framework . Next , We apply these basic core concepts and frameworks , This paper gives a brief discussion on the confirmation of the original data assets generated by application services .

Two 、 Raw data assets generated by application services

By application service ( hereinafter referred to as “App”) Generated raw data assets , It means that the original dataset was born in a App in . Data sources perceive subjects with civil rights , This is collectively referred to as the user (User).

One App We can think of it as a service A collection of components , namely . For simple , We assume that the perception function only includes User Here it is App Using different services on Generated behavioral data , It can be expressed as Application service App By the provider (SP) Provide . The user's behavior data forms a data set , And form data assets , The formal representation and schematic diagram of its production chain are as follows :

For data assets To confirm the right of , It mainly needs to investigate the application service usage agreement between users and service providers (Service agreement). In such a scenario , It can be simply understood that users and service providers in accordance with the agreement , Generated raw data assets . Confirmation of rights should be agreed in accordance with the agreement . So , The service provider should notarize these agreements , And tell buyers of data assets .

therefore , In the state description information of the production chain , Need to add notarization status . Because the specific terms of each user's service agreement may be different , therefore , Notarization needs to maintain a dynamic scene . In terms of efficiency , This kind of notarization mostly uses verifiable unilateral privacy computation to solve . It is impossible to adopt the traditional mode of third-party notarization . thus , We are based on the above framework , Build a schematic diagram of right confirmation, as shown in the figure . thus , Can carry on the effective confirmation right .

Conclusion

The author thinks , The ownership confirmation of original data assets is the cornerstone of the whole data element market . because , If the property rights of data assets generated from raw data cannot be clearly defined in the primary market , So once the data is in circulation , The subsequent confirmation of rights will become very complicated 、 Inefficiency and chaos , Make the market finally fall into the plight of unsustainable operation . therefore , It is necessary to build a clear property right 、 Effective operation of data elements primary market . Building an efficient infrastructure for right determination , Straighten out the relationship between property rights at the source .

meanwhile , Because of the data 5V1P characteristic , Which determines the production chain of data assets “ Sexuality ” Importance . therefore , The core of ownership confirmation is to control and decide the owner “ The purpose of production activities 、 object 、 methods 、 Methods and results ” The identification of . And to achieve that goal , It can't be accomplished only by the perfection of theory and legal system , We must rely on a certain supporting infrastructure to achieve .

reference

  •   This article refers to big data .

  •   Zhangjialin ,《 Data is valuable —— Research on data asset pricing 》,2019

  •  “ The father of big data ” Victor · Maier · Schoenberg .

  •   Central government on data elements 、 Data element market construction documents and data related laws 、 A series of regulations .

  •  NIST 《 Big data reference architecture 》

edit : Wang Jing

proofreading : Lin Yilin

版权声明
本文为[osc_vnopm]所创,转载请带上原文链接,感谢