2022-11-17
When people think of ideal data scientists, they often come up with famous AI professors from famous universities. When companies are competing to build machine learning models with the highest accuracy possible, it is meaningful to recruit such talents. When it is very important to extrude the last precision percentage with any necessary method, you need to pay attention to the mathematical details, test the most complex methods, and even invent new statistical techniques for optimizing specific use cases.
But in the real world, this is hardly necessary. For most companies, standard models with considerable accuracy are good enough, and it is not worth spending time and resources to turn these models into the most advanced models in the world. More importantly, build the model with acceptable accuracy quickly, and establish the feedback cycle as early as possible, so that you can start iteration and accelerate the process of identifying the most valuable use cases. Small differences in accuracy are not usually the reason for the success or failure of data science projects, which is why software engineering skills are superior to scientific skills in the business world.
The typical workflow of a data team is usually as follows: data scientists use trial and error code and spaghetti code to build prototypes of some solutions. Once the results start to look promising, they hand them over to software engineers, who must then rewrite everything from scratch to make the solution scalable, efficient, and maintainable. Data scientists cannot be expected to deliver production code equivalent to that of full-time software engineers. However, if data scientists are more familiar with the principles of software engineering and have a certain understanding of possible architecture problems, the whole process will be smoother and faster.
As more and more data science workflows are replaced by new software frameworks, solid engineering skills are one of the most important skills of data scientists.
3. Focus on expectation management
Externally, data science may be a very vague and confusing field. Is this just hype, or is the world really going through a revolutionary change? Are every data science project a machine learning project? Are these people scientists, engineers or statisticians? Their main output software or dashboard and visualization? Why does this model show me a wrong prediction? Can anyone fix this bug? If they only have these lines of code now, what have they been doing in the past month?
There are many things that are not clear. What data scientists should do may vary greatly among different people in the company.
For data scientists, it is essential to actively and continuously communicate with stakeholders, so as to set clear expectations, find misunderstandings early, and let everyone stand on the same side.
The best data scientists understand how the different backgrounds and agendas of other teams affect their expectations, and carefully adjust their communication methods. They can explain complex methods in a simple way so that non-technical stakeholders can better understand the goals. They know when to suppress overly optimistic expectations and when to persuade overly pessimistic colleagues. Most importantly, they emphasize the inherent experimental nature of data science. When the success of a project is still uncertain, they will not over commit.
4. Get familiar with cloud services
Cloud computing is the core of the Data Science Toolkit. In many cases, playing with Jupyter notebooks on the local machine has reached the limit and is not enough to complete the task. Cloud services are particularly useful when you need to train machine learning models on powerful gpus, parallelize data preprocessing on distributed clusters, deploy REST apis to expose machine learning models, manage and share datasets or query databases for scalable analysis.
The largest vendors are Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP).
Thanks for watching
CDA Certification
About CDA Exam Latest Exam Schedule Becoming CDA MemberCDA Cooperation
CDA Education Pearson CVA InstituteFollow CDA
About US Email:exam@cdaglobal.com Tel:010-68454276