2022-11-17
If you want to do well in this area, you also need to have the ability to design the entire system architecture, relatively strong pressure resistance, solve problems, and collect resources. You can enter the open source community, so that you can follow the latest trends and technologies at any time.
2. Data Warehouse ETL
It's really hard to be a warehouse worker, and Oncall alone will discourage people. Many database engineers are often woken up by the Oncall call when they sleep at night. Because the data process has problems, they need to find out which data source has problems and solve them immediately, otherwise the entire data process will be affected.
If the data process is affected, you may be called to the office by the chief leader to say: Why is the data I want not ready, and why is my business report not issued today.
From the above scenario, we can know that this is a very important position, because the data process is very important, which determines the disorganization of data from the source. After ETL, it becomes neat data. These neat and consistent data can make it easy to calculate the statistical results of various businesses, and can unify the caliber. Otherwise, there will be several departments, and there will be several statistical results. When the time comes, Department A says that the business has increased by 5%, and Department B says that the business has increased by 10%. OMG, who should trust.
At least in the following aspects, I think data warehouse personnel should do a good job:
a. For the integrity of the data dictionary, users want to know clearly what the logic of this field is. The fields should be consistent. Do not have the same field defined differently in different tables.
b. The stability of the core process. Don't let the main order table be used for an unstable time every day. Sometimes it is very early, and sometimes it comes out at noon. If it is not stable, it will cause the users of the data to have no confidence in you.
C. The warehouse version iteration should not be too frequent, and the compatibility between different versions should be maintained. Don't do a good job of warehouse 1.0. It will soon be replaced by 2.0. In the data warehouse, continuity needs to be taken into account. The main table should not change too frequently, otherwise the users will be very painful. They are used to the 1.0 table structure with difficulty, and cannot switch so quickly. Simply put, downward compatibility is required.
d. Keep the consistency of all business logics. Do not have the same business logic. People in the same group have different results. The reason is that the common logic has not been translated into common things, so everyone writes differently. This actually requires special attention.
For the above, the skill requirements of this position are: Don't be a person who can only write SQL. Now tools are very developed. If your skills are very single, then the replaceable index is very high, and you have no sense of achievement. This does not mean that people who can write SQL are very low, but that they should learn more skills, otherwise it will be very dangerous.
The warehouse staff should always think about how to design the architecture is the most reasonable. You should consider whether to need field redundancy, row storage or column storage, how to expand the fields most effectively, how to split hot data and cold data, etc., so you need to have architecture thinking.
In terms of skills, besides being proficient in SQL, you also need to know how to write Transform and MapReduce. Because a lot of business logic is very complex to implement in SQL, but if you know other script languages, it will provide you with convenience and improve your efficiency a lot. In addition, good warehouse personnel need to write Java or Scala. It is necessary to improve your efficiency by writing UDTF or UDAF.
Data warehouse personnel should also often consider automation and toolization. They need good tools or module abstraction capabilities to implement automated tools to improve the efficiency of the entire organization. The problem of data skew often encountered needs to be quickly located and optimized.
After talking about data storage, there are several key positions of data application. Before that, I want to say that one of the most critical prerequisites of data application is: data quality, data quality, and data quality!! Every time you explain your views, analyze conclusions or use algorithms, you need to check the correctness of the source data, otherwise any conclusion is a false proposition.
3. Data visualization
This is a very cool job. It's better to understand the front-end, such as js. Data visualization personnel need to have a good analytical thinking, and cannot ignore the degree of help to the business in order to show off their skills. Because I don't have many guest roles in this position, I don't have a particularly deep understanding. However, I think this position needs the ability of analysis to do a good job of visualization.
On the other hand, people who do data applications should know a little about data visualization. They should know that the material order of opinion expression is: pictures>tables>words. An opportunity that can be illustrated with pictures should not be described with words, because it is easier for others to understand. You should know that when you explain something to the big leader, you need to think of the big leader as a "data idiot", so that you can speak a thing more vividly.
4. Data analyst
Now, there is a great demand for data analysis, because everyone wants to say: Data is available, but what can be done? This requires data analysts to analyze and mine data, and then do data applications.
The most common complaint to data analysts is: what you analyze is not normal business logic, and what else need you to analyze? Or your analysis conclusion is wrong, which is inconsistent with our business logic. In particular, when the ABTest results do not conform to the original expectations, analysts will often be pulled to say, "Analyze why my AB test results are not significant, there must be a reason.".
In many cases, the baby's heart is bitter. If you say that the conversion rate has dropped, you can see which segment channel has dropped from the data. As for why the customer does not place an order, we have to go to the user. In many cases, the data does not show why, but can only tell you what the current situation is.
If you have been writing analysis reports and giving conclusions, and continue to go round and round, and have not directly reflected the results in the business, data analysts should wake up. Do you think this is the position you want?
For the positioning of data analysts: personally, it is very difficult to become an excellent data analyst, and there are not many excellent analysts on the market now. In addition to data analysis, conclusion refining and insight into the reasons behind the data, data analysts also need to understand business and algorithms. Only in this way, when faced with a business problem, data analysts can peel off the cocoon of the problem, solve the problem layer by layer, and then respond to the strategy according to the positioning problem, such as whether to test the strategy first or apply the algorithm to optimize, which scenario the algorithm is used in, and whether the algorithm can be used to solve the problem.
An excellent data analyst is a versatile data scientist who is proficient in business and algorithms, not an idle person who only listens to business needs to pull data, make reports, and analyze. We all say that analysis should draw conclusions. The conclusions of excellent analysts are a package of strategies and countermeasures that can solve problems. At the same time, many needs are discovered by analysts and mined through data.
From the above description, we can see that the requirements for data analysts are: be able to write SQL to pull data, be proficient in business, data insight, and algorithms, and have strong initiative and high requirements.
If you are always busy with daily analysis needs and are keen to write gorgeous reports, you should remember that you are very dangerous, because there will be a lot of people questioning the value of your existence, especially small companies. Because the salary of data personnel is not a small expense.
Most non landing analyses are pseudo analyses, and some exploratory feasibility studies may not consider landing, but other analyses of specific business needs need to consider landing, and then reverse your role through practice. Only by doing this repeatedly can you slowly affirm your value and improve your analysis skills, and only in this way can you prove your value as an analyst and data landing person.
5. Data Mining/Algorithms
After three years of experience, I feel a lot about this. Roast with deep experience mainly includes the following points:
When a rule is settled, what algorithm should be used.
Why is your accuracy so low?!
Can you get 99% accuracy?
Is your recommendation valuable? If you don't recommend it, the customer will also place an order for that product.
Help me make a big data forecast. What does he want?
In many cases, different scenarios have different requirements for accuracy, so it is necessary to argue with the business under certain reasonable scenarios. Don't be afraid to let the business roast, and more often manage their expectations.
In some scenarios, the value of recommendation lies in the "long-term repurchase rate", so don't always focus on the conversion rate of ABTest to talk about things. It is also promising to reduce the cost of customers. An intelligent product will make customers love it. Although there is no obvious difference in this transformation, the value can only be reflected by observing the long-term repurchase rate. In particular, distinguish between high-frequency and low-frequency products. Low frequency products are particularly difficult to reflect short-term value.
For the skill requirements of this position, you are not required to implement all algorithms from scratch. Now there are many ready-made algorithm packages to call. The basic requirement is that you should know which algorithm will be used in each scenario, such as classification scenarios. Common classification algorithms include LR/RF/Xgboost/ET, etc. In addition, you should know what the effective optimization parameters of each algorithm are and how to optimize when the model is not good. It also requires the ability to implement algorithms. Scala/python/R/Java can be used for language. We often say: tools are not important. What is important is that you play with tools, not that tools play with you.
In addition, for supervised learning algorithms, algorithm engineers should have good business sense, so that features can be designed more specifically, and features designed are likely to have good apriority.
6. Deep learning (NLP, CNN, speech recognition)
I haven't used this product commercially, but I just practiced it. Personally, I think commercialization is the key point. In particular, everyone is waiting and saying that your chatbot is very useful, but siri has been doing it for so long, and the final response is average.
Now the customer service robot is very popular, and everyone roast that the context is poorly understood, and how the robot's semantic recognition is so poor. Who knows? It is much more difficult to recognize Chinese semantics than foreign countries, because there are too many varieties of a negative statement in Chinese. You don't know which one we will say.
In addition, some people often complain that your CNN is so complicated that we need to meet the requirement of returning within 100ms online. It is too complicated to make real-time calls. It must be too late. Finally, we can only consider offline prediction. People who often say this will not write the underlying code themselves. Many times, I think that it is not that you do not have a solution to the problem, but that you do not think about how to solve the problem. Your mind determines your output.
On the whole, this requires a high level of personal comprehensive quality. If you just want to simply use the ready-made model to extract the features of the middle tier, and then apply other machine learning models for prediction, you can also solve some real company applications, such as Yelp's image classification.
However, strictly speaking, this is not a person who does in-depth learning. Because people who really play DL need to build models, adjust parameters, and change symbols themselves, so their programming ability is very strong. In this regard, I have always stood up to them. In particular, some startups have high requirements for the programming ability of this position. If you do not have the following information after your interview with a startup company, it means that you are excellent, but not necessarily suitable for our company, because we are looking for someone with strong programming skills.
I'm not professional about this, so I'll just point it out and not say too much. Personally, I believe that in this area, we need to have a relatively strong ability of algorithm transformation and optimization, try to improve the speed of algorithm prediction, and constantly improve the extensiveness of the algorithm to improve the accuracy. At present, the entire industry is also developing in a good direction. If many people see the high salaries offered by this industry, they should remember to check with the recruitment requirements to find out which skills need to be supplemented. In this way, you can become the phoenix among people.
For the future, there is a bright future. For the future, there is a great expectation. For the future, everything is possible.
summary
So much has been said and talked about. In fact, the core is: how to create value with data. If you don't have the ability to create value with data, you can only wait to be overwhelmed by data, beaten to death in the workplace, and reach the ceiling of your career early.
On the level of reflecting the value of data, the closer to the data application layer, the higher the demand for data to generate value. People engaged in this field should often reflect on whether they have a good business sense. After all, in the industry, no one cares whether you have increased a percentage point compared with the traditional baseline. What they care about is the value of the company after you have increased a percentage point.
At the lower level, there is no mandatory requirement to bind performance together, but more agreements are made on the process. The value of this area is mainly reflected in technical innovation. If you solve the problem of the existing architecture, you can become a big bull. So learn more about programming, and don't be too restrictive.
Thanks for watching
Previous: How to become an excellent data scientist Next:
CDA Certification
About CDA Exam Latest Exam Schedule Becoming CDA MemberCDA Cooperation
CDA Education CDMS Pearson CVA InstituteFollow CDA
About US Email:exam@cdaglobal.com Tel:010-68454276