当前位置:网站首页>On the growth of data technicians
On the growth of data technicians
2022-07-26 17:27:00 【Min Shu v587】
introduction
A while ago, a classmate chatted about a topic in the Group , How to evaluate a model , Prepare for the interview , Definitely go to all kinds of Baidu , Of course, I also like to ask people such questions , Especially as an interviewer , I actually hope the interviewer can give me some new ideas , I don't like the answers on Baidu very much , I also want to take this opportunity to talk about my own views !
Model VS surface
It seems that in the present , The model seems to be just a table , There seems to be no difference . I have completely experienced this transformation in my career , When I was a programmer , I do build various tables , No threshold , But at that time , I build tables to produce some results , Just have the result , So that when an analyst came to me for data , I think I can give someone a result directly . But this period , I haven't thought about one thing , One day in the future ,BI When changing some conditions or obtaining some data columns , So that your watch is a little unusable , Even need to redevelop . I actually changed my job later , I entered the position as a data warehouse , At this time, you will find an obvious difference , That is, your table is provided as a public data layer , Satisfaction to downstream business parties is also diverse , The difference at this time is obvious , That is, as a public data middle tier, this part of the table has been converted into a data asset , It is something that actually generates value for downstream parties , At this time, there is the division of data fields , There are naming conventions , There are field type requirements , There are also strict requirements for data quality , At this time, you will find that the whole operation process is called a model , In the past, tables were built in the pursuit of results , We can only call it a watch .
Good model and bad model
Have you thought about , Whether it's people or affairs , In fact, the only criterion we can measure is whether we can stand the test of time , I have seen it many times << Man and Nature >>- Show with ancient people , What impresses me most is that human beings can survive because they can still find food in harsh environments , What is more serious and even affects human transformation is that Bushmen store water in ostrich eggs , Bury it underground and drink it when there is no water , This is actually an adaptation to the future . Of course, the same is true for our model , Whether the model you build meets the current business support , With the iterative development of the business , The amount of data continues to increase , Can your model still maintain clear iteration and efficient output ? In fact, good and bad can be distinguished in this . I don't see many perfect models , Because there will always be iterations in the business. At some point, in order to meet a certain business support, some bad designs will be introduced , But on the whole , If a model 3 year 5 It still has good performance after years , Have good business support , It's great . There are many bad models , I often call it the stone in the pit , Real and smelly and hard , And occupy resources , Account for the cost . I think good and bad evaluation can be directly reflected in the following aspects
1、 The normalization of the model : Model naming , Field naming
2、 Data quality : The data needs to be correct
3、 Output performance of data
4、 Iteratability of data , Whether the iteration efficiency is high
5、 Completeness of data , Historical data lifecycle , Whether the dimension coverage meets the public business needs
6、 Whether the facts and objects are clearly defined , The bad model is hybridity
Growth of data personnel
In fact, most people have no concept of data , Even feel that there is no growth , Is there any , That is, how to divide experts and novices in measuring data ? In fact, it is very simple to directly see whether the model built by that person is a good model or a bad model . All masters , In business 、 efficiency 、 Design modules can get high scores in terms of accounting and storage , This ability is reflected in the ability to design the architecture of data and the overall cost efficiency of the data domain .
It may be a little abstract , Then I'll give you a list, This reference is from the contents of the data warehouse toolbox , I think it can be used as a guideline for data personnel :
Understand business users
Understand their job responsibilities 、 Goals and tasks
Determine what decisions business users need to make DW/BI Help for the system
Identify those who develop high efficiency 、 High impact decisions “ The best ” user
Discover potential new users 、 And let them realize DW/BI What capabilities can the system bring to them
Publish high quality to business users 、 dependent 、 Accessible information and analysis
Choose the most robust 、 Operable data is put into DW/BI In the system , Carefully select from various data sources of the organization
Simplify user interfaces and Applications , Template driven , Match the user's cognitive process contour
Ensure data accuracy 、 trusted , Make its identity consistent throughout the enterprise
Continuously monitor the accuracy of data and analysis
Adapt to users' changing ways of thinking 、 Requirements and business priorities 、 And the availability of new data sources
maintain DW/BI Environmental Science
use DW/BI Successful business decisions made by the system , Verify staffing and expenses to be invested
On a regular basis DW/BI The system is updated
Maintain the trust of business users
Keep business users 、 Executive sponsors and IT Management satisfaction
In the above work , One by one , Carry out self scoring and comparison when actually landing .
Data personnel and Technology
This is what I learned when I was recruiting in the company , During the interview, it is actually a requirement that the recruiter is familiar with or even proficient in the bottom of the big data engine , Most people also think why it should be like this , In the past, it was just building tables . It seems that when you don't understand big data technology, you actually want to understand , When you really master big data technologies, you will find that these technologies are difficult to directly bring landing results , Until you take the technical means to model , Will you be able to I found this way to work . Of course , On the other hand , If you start modeling directly , It's simple , Even if your model looks like a variety of specifications , accuracy , But there will never be a way to make performance improvements , This is often a feeling beyond reach , People know that a few adjustments will bring you a very fast time limit , At this time, you can only stare
The latter
In fact, we usually refer to ourselves as technicians , And in our There is also a part of the downstream that does data , Consume our data , We will call them data personnel , On the growth path of big data , Use technical means to support your real data link , From this perspective, explore new etl pattern, Bring new design paradigm , Create industry norms , It is also a matter of Starry Sea , I won't think the name of data person is not good !!
边栏推荐
- 简述CUDA镜像构建
- Speaker recruitment | AI time recruit icml/ijcai 2022 as a Chinese speaker!!!
- 现在网上开户安全么?股票开户要找谁?
- JD Sanmian: I want to query a table with tens of millions of data. How can I operate it?
- Batch normalization batch_ normalization
- "Green is better than blue". Why is TPC the last white lotus to earn interest with money
- Is the rolling update of pod similar to Canary deployment or blue-green deployment?
- 正则表达式
- MySQL foundation - basic database operation
- Advantages of time series database and traditional database
猜你喜欢

带你一分钟了解对称加密和非对称加密

My meeting of OA project (meeting seating & submission for approval)

重磅公布!ICML2022奖项:15篇杰出论文,复旦、厦大、上交大研究入选

Application of machine vision in service robot

浅谈云原生边缘计算框架演进

Pass-19,20

Take you a minute to learn about symmetric encryption and asymmetric encryption

OA项目之我的会议(会议排座&送审)

机器学习-什么是机器学习、监督学习和无监督学习

leetcode:1206. 设计跳表【跳表板子】
随机推荐
How does the data link layer transmit data
the loss outweighs the gain! Doctors cheated 2.1 million yuan and masters cheated 30000 yuan of talent subsidies, all of which were sentenced!
"Green is better than blue". Why is TPC the last white lotus to earn interest with money
Alibaba cloud Toolkit - project one click deployment tool
Interface comparator
[visdrone data set] yolov7 training visdrone data set and results
03|实现 useReducer 和 useState
(25)Blender源码分析之顶层菜单Blender菜单
API for sellers -- description of the return value of adding baby API to Taobao / tmall sellers' stores
【虚拟机数据恢复】意外断电导致XenServer虚拟机不可用,虚拟磁盘文件丢失的数据恢复案例
leetcode:1206. 设计跳表【跳表板子】
Eureka Registry - from entry to application
FIR filter design
In the first half of the year, sales increased by 10% against the trend. You can always trust Volvo, which is persistent and safe
常用超好用正则表达式!
What are the popular technologies in 2022?
37. [categories of overloaded operators]
图解用户登录验证流程,写得太好了!
SCCM tips - improve the download speed of drivers and shorten the deployment time of the system when deploying the system
Create MySQL function: access denied; you need (at least one of) the SUPER privilege(s) for this operation