当前位置:网站首页>On the growth of data technicians
On the growth of data technicians
2022-07-26 17:27:00 【Min Shu v587】
introduction
A while ago, a classmate chatted about a topic in the Group , How to evaluate a model , Prepare for the interview , Definitely go to all kinds of Baidu , Of course, I also like to ask people such questions , Especially as an interviewer , I actually hope the interviewer can give me some new ideas , I don't like the answers on Baidu very much , I also want to take this opportunity to talk about my own views !
Model VS surface
It seems that in the present , The model seems to be just a table , There seems to be no difference . I have completely experienced this transformation in my career , When I was a programmer , I do build various tables , No threshold , But at that time , I build tables to produce some results , Just have the result , So that when an analyst came to me for data , I think I can give someone a result directly . But this period , I haven't thought about one thing , One day in the future ,BI When changing some conditions or obtaining some data columns , So that your watch is a little unusable , Even need to redevelop . I actually changed my job later , I entered the position as a data warehouse , At this time, you will find an obvious difference , That is, your table is provided as a public data layer , Satisfaction to downstream business parties is also diverse , The difference at this time is obvious , That is, as a public data middle tier, this part of the table has been converted into a data asset , It is something that actually generates value for downstream parties , At this time, there is the division of data fields , There are naming conventions , There are field type requirements , There are also strict requirements for data quality , At this time, you will find that the whole operation process is called a model , In the past, tables were built in the pursuit of results , We can only call it a watch .
Good model and bad model
Have you thought about , Whether it's people or affairs , In fact, the only criterion we can measure is whether we can stand the test of time , I have seen it many times << Man and Nature >>- Show with ancient people , What impresses me most is that human beings can survive because they can still find food in harsh environments , What is more serious and even affects human transformation is that Bushmen store water in ostrich eggs , Bury it underground and drink it when there is no water , This is actually an adaptation to the future . Of course, the same is true for our model , Whether the model you build meets the current business support , With the iterative development of the business , The amount of data continues to increase , Can your model still maintain clear iteration and efficient output ? In fact, good and bad can be distinguished in this . I don't see many perfect models , Because there will always be iterations in the business. At some point, in order to meet a certain business support, some bad designs will be introduced , But on the whole , If a model 3 year 5 It still has good performance after years , Have good business support , It's great . There are many bad models , I often call it the stone in the pit , Real and smelly and hard , And occupy resources , Account for the cost . I think good and bad evaluation can be directly reflected in the following aspects
1、 The normalization of the model : Model naming , Field naming
2、 Data quality : The data needs to be correct
3、 Output performance of data
4、 Iteratability of data , Whether the iteration efficiency is high
5、 Completeness of data , Historical data lifecycle , Whether the dimension coverage meets the public business needs
6、 Whether the facts and objects are clearly defined , The bad model is hybridity
Growth of data personnel
In fact, most people have no concept of data , Even feel that there is no growth , Is there any , That is, how to divide experts and novices in measuring data ? In fact, it is very simple to directly see whether the model built by that person is a good model or a bad model . All masters , In business 、 efficiency 、 Design modules can get high scores in terms of accounting and storage , This ability is reflected in the ability to design the architecture of data and the overall cost efficiency of the data domain .
It may be a little abstract , Then I'll give you a list, This reference is from the contents of the data warehouse toolbox , I think it can be used as a guideline for data personnel :
Understand business users
Understand their job responsibilities 、 Goals and tasks
Determine what decisions business users need to make DW/BI Help for the system
Identify those who develop high efficiency 、 High impact decisions “ The best ” user
Discover potential new users 、 And let them realize DW/BI What capabilities can the system bring to them
Publish high quality to business users 、 dependent 、 Accessible information and analysis
Choose the most robust 、 Operable data is put into DW/BI In the system , Carefully select from various data sources of the organization
Simplify user interfaces and Applications , Template driven , Match the user's cognitive process contour
Ensure data accuracy 、 trusted , Make its identity consistent throughout the enterprise
Continuously monitor the accuracy of data and analysis
Adapt to users' changing ways of thinking 、 Requirements and business priorities 、 And the availability of new data sources
maintain DW/BI Environmental Science
use DW/BI Successful business decisions made by the system , Verify staffing and expenses to be invested
On a regular basis DW/BI The system is updated
Maintain the trust of business users
Keep business users 、 Executive sponsors and IT Management satisfaction
In the above work , One by one , Carry out self scoring and comparison when actually landing .
Data personnel and Technology
This is what I learned when I was recruiting in the company , During the interview, it is actually a requirement that the recruiter is familiar with or even proficient in the bottom of the big data engine , Most people also think why it should be like this , In the past, it was just building tables . It seems that when you don't understand big data technology, you actually want to understand , When you really master big data technologies, you will find that these technologies are difficult to directly bring landing results , Until you take the technical means to model , Will you be able to I found this way to work . Of course , On the other hand , If you start modeling directly , It's simple , Even if your model looks like a variety of specifications , accuracy , But there will never be a way to make performance improvements , This is often a feeling beyond reach , People know that a few adjustments will bring you a very fast time limit , At this time, you can only stare
The latter
In fact, we usually refer to ourselves as technicians , And in our There is also a part of the downstream that does data , Consume our data , We will call them data personnel , On the growth path of big data , Use technical means to support your real data link , From this perspective, explore new etl pattern, Bring new design paradigm , Create industry norms , It is also a matter of Starry Sea , I won't think the name of data person is not good !!
边栏推荐
- 办公软件常用快捷键大全
- Speaker recruitment | AI time recruit icml/ijcai 2022 as a Chinese speaker!!!
- maximum likelihood estimation
- 现在网上开户安全么?股票开户要找谁?
- [untitled]
- SQL injection (mind map)
- Can TCP and UDP use the same port?
- Quickly build a development platform for enterprise applications
- 如何使用 align-regexp 对齐 userscript 元信息
- API for sellers -- description of the return value of adding baby API to Taobao / tmall sellers' stores
猜你喜欢

the loss outweighs the gain! Doctors cheated 2.1 million yuan and masters cheated 30000 yuan of talent subsidies, all of which were sentenced!

Pass-19,20

Win11 how to close a shared folder

My meeting of OA project (meeting seating & submission for approval)

浅谈云原生边缘计算框架演进

PXE efficient batch network installation

Small application of C language using structure to simulate election

Alibaba cloud Toolkit - project one click deployment tool
![37. [categories of overloaded operators]](/img/67/b821270079589c53b9c38b0ca033ac.png)
37. [categories of overloaded operators]

得不偿失!博士骗领210万元、硕士骗领3万元人才补贴,全被判刑了!
随机推荐
VIM visualization mode and its usage
机器视觉在服务机器人中的应用
TD database syntax
[classification] vgg16 training record
Pack tricks
The principle of reliable transmission in TCP protocol
Detailed explanation of tcpdump command
【机器学习】Mean Shift原理及代码
On the evolution of cloud native edge computing framework
Operating system migration practice: deploying MySQL database on openeuler
[Luogu cf643f] bears and juice (conclusion)
Merge multiple row headers based on apache.poi operation
现在网上开户安全么?股票开户要找谁?
How to use align regexp to align userscript meta information
Machine learning - what are machine learning, supervised learning, and unsupervised learning
[machine learning] principle and code of mean shift
Comparison between dimensional modeling and paradigm modeling
Tensorflow Lite source code analysis
Is it safe for Guosen Securities to open an account? How can I find the account manager
Use replace regexp to add a sequence number at the beginning of a line