Return Home
CERTIFIED DATA ANALYST LEVEL II EXAMINATION OUTLINE
CERTIFIED DATA ANALYST LEVEL II EXAMINATION OUTLINE
一、Overall Objectives
CDA (Certified Data Analyst), or “CDA Data Analyst”, is a professional and authoritative international qualification certification for entire industry under the background of digital economy and the trend of artificial intelligence era. It aims to improve the public digital skills, help digital transformation of enterprises, and promote digital development of industry. [The Certified Data Analyst (CDA) Talent Industry Standard] is a scientific, professional, international talent skill guideline targeting data-related positions. CDA Exam Outline defines the examination range and key points. Candidates can refer to the outline for acquiring the needed skills and knowledge to become a professional data analyst when preparing the exam.
二、Exam Format and Structure
Exam method: Offline written exam and computer-based exams
Exam question type: Objective choice questions (100 single choice questions, 20 multiple choice questions, 15 content related questions, 15 case analysis questions)
Exam duration: 150 minutes
Exam scores: The final exam scores are classified into four grades: A, B, C, and D. Passing grades include A, B, and C, while D is the failing grade.
Exam requirements: closed book computer-based exams, without carrying calculator and other exam related supplies.
三、Knowledge Requirements
The three different mastery levels that candidates need to obtain for different types of data analysis knowledge are comprehension, competency, and application. Exam candidates should proceed with their studies based on these different knowledge requirements.
1. Comprehension: Candidates should understand and grasp key points in data analysis regulations, understand the connotation and extension of these key points, distinguish the differences and relation of these key points, and correctly elaborate each key point.
2. Competency: Candidates should master important data analysis knowledge, understand, and memorize relevant theories and methods. They must be able to logically explain data analysis knowledge based on different requirements. Candidates’ knowledgeability and competency with different types of data analysis is the key of this exam.
3. Application: Candidates should be able to demonstrate their ability to apply data analysis theory in practice while combining related tools for commercial application, and propose specific implementation procedures and the strategy to problems based on specific requirements or conditions.
四、Exam subjects
PART 1 Data collection and processing (12%)
a. Data collection method (2%)
b. Market research and data entry
Market research process (1%)
Sample selection (2%)
Questionnaire design and entry (2%)
c. Data exploration and visualization (2%)
d. Data preprocessing method (3%)
PART 2 Data model management (3%)
a. Data classification (1%)
b. Relational model (1%)
c. Data warehouse system and ETL (1%)
PART 3 Label system and user portrait (5%)
a. Design principle of label system (3%)
b. Label processing method (1%)
c. User portrait (1%)
PART 4 Statistical analysis (25%)
a. Sampling estimation (5%)
b. Hypothesis testing (5%)
c. variance analysis (5%)
d. Unary linear regression analysis (10%)
PART 5 Data analysis model (40%)
a. Principal component analysis (6%), factor analysis (4%)
b. Multiple regression analysis
Multiple linear regression (10%)
Logistic regression (10%)
c. Cluster analysis
System clustering method (3%)
K-Means clustering method (2%)
d. Time series (5%)
PART 6 Digital working methods ( 15%)
a. Business exploration and problem location (3%)
b. Problem diagnosis
Proximate analysis (5%)
Root cause analysis (2%)
c. Business strategy optimization and guidance
Business goal setting principles (1%)
Knowledge base, strategy database, process analysis (2%)
Linear and integer programming (1%)
Secondary optimization (1%)
五、Subject contents
PART 1 Data collection and processing
1、Data collection method
[Comprehension]
Primary data and secondary data sources
Advantages and disadvantages analysis
Precautions for use
[Competency]
Candidates should understand the difference, advantages and disadvantages of probability sampling and non-probability sampling in primary data collection
[Application]
Probability sampling methods, including simple random sampling, stratified sampling, systematic sampling, and segmented sampling
Identify the advantages and disadvantages of each sampling
Choose the most feasible sampling method according to the given conditions
Calculate the sample size required for simple random sampling
2、Market research and data entry
[Competency]
Familiar with the basic steps of market research (asking questions, theoretical deduction, collecting materials, building models, attribution analysis)
Familiar with adaptability, advantages and disadvantages of sample selection methods
Familiar with questionnaire design principle, questionnaire question type setting, and data coding and input of each question type
3、Data exploration and visualization
[Comprehension]
Understand the purpose and significance of data exploration
Understand common data visualization tool software (EXCEL BI, SPSS, PYTHON, etc.)
[Competency]
Familiar with the relationship between data exploration and data preprocessing
Familiar with common data description methods for data exploration: central trend analysis, deviation trend analysis, data distribution relationship, graph analysis
Familiar with commonly used mathematical statistics methods for data exploration: hypothesis test, variance test, correlation analysis, regression analysis, factor analysis
[Application]
Candidates should be able to use data visualization tools (EXCEL BI, SPSS, PYTHON, etc.) to complete the data exploration tasks of related data analysis projects. (Note: The use of this part of the tools and software will not be assessed in the exam).
4、Data preprocessing method
[Competency]
Candidates must be familiar with the basic steps of data preprocessing, including data integration (integration of different data sources), data exploration, data transformation (standardization), data reduction (dimensional reduction technology, numerical reduction technology), this part of the content does not involve calculation, only the optional processing technology needs to be clarified according to the demand.
[Application]
Candidates should be able to apply data cleaning includes filling in missing data values (using constants, medians, modes, etc. according to business scenarios, not involving multiple patching), smoothing noisy data (moving average), identifying or removing outliers (single variables are standardized according to the central value, multi-variables use fast clustering), as well as solving inconsistencies (familiar with the concept), duplicate checking (only SQL statements are evaluated, no other languages).
PART 2 Data model management
[Comprehension]
Concept of data and information; the concepts of master data, transaction data and metadata in data classification
Relationship between conceptual, logical, and physical models in database modeling
Concept of database paradigm, data warehouse and data mart, ETL process
[Competency]
Familiar with usage scenarios of relationship model and dimension model
PART 3 Label system and user portrait
1、Design principle of label system
[Comprehension]
Distinguish the concept of labels and indicators
The concept of precision marketing and quantitative risk control
Consumer decision-making process
Core content of customer, product, and channel labels
[Competency]
Hierarchical labels and group labels
The relationship between Maslow's hierarchy of needs theory and precision marketing
2、Label processing method
[Comprehension]
Basic, statistics, model tags
[Competency]
RFM model
3、User portrait
[Comprehension]
User journey analysis
Standard user analysis and deviation analysis
[Competency]
Candidates must be familiar with application of user portrait technology in marketing customer acquisition and risk prevention and control
PART 4 Statistical analysis
1、Sampling estimation
[Comprehension]
Concept of random trials, random events, and random variables
Concept of population and sample
Theoretical basis of sampling estimation
Normal distribution and the function form and image form of the three major distributions
Multiple organization forms of sampling
Reasons for determining the necessary sample size
Significance and Application of Law of Large Numbers and Central Limit Theorem
[Competency]
Probability of random event
Concept and mathematical properties of average sampling error
Characteristics, advantages and disadvantages of point estimation and interval estimation methods
Total population and sample population
Parameters and statistics
Repeated sampling and non-repetitive sampling
Concept of sampling error is an interval estimation method for the population mean, population fraction and population variance
Factors affecting the necessary sample size
[Application]
Random variables and its probability distribution
Concept of the number of all possible sample units and its determination under different sampling methods
Calculation method of sampling average error in actual data analysis
2、Hypothesis testing
[Comprehension]
Basic concepts of hypothesis testing
Functions of its basic ideas in data analysis
Basic steps of hypothesis testing
Connection between hypothesis testing and interval estimation
Two types of errors in hypothesis testing
[Competency]
Basic definition of test statistics, significance level and corresponding critical value
Meaning and calculation of P value
Familiar with how to use P value to test
z test statistics
t test statistics
F test statistics
Functional form and test steps of X2 test statistics
[Application]
Implement one-sample t test
Capable of applying steps of the two independent sample t test and the statistics and null hypothesis used in the test
Data analysis scenarios for two inspection applications
3、Variance analysis
[Comprehension]
Related concepts of analysis of variance
Principle of one-way analysis of variance
Statistics construction process
[Competency]
Basic steps of one-way analysis of variance
Meaning and calculation of Sum of Squares for Total (SST)
Meaning and calculation of the sum of squared deviations (SSA) between groups
Meaning and calculation of the sum of squared deviations (SSE) within a group
Original hypothesis of one-way analysis of variance
[Application]
Steps to achieve one-way analysis of variance
Analysis of variance analysis table and analysis of multiple comparison table
4、Unary linear regression analysis
[Comprehension]
Drawing and function of correlation diagram
Compilation and function of correlation table
Correlation coefficient definition formula letter meaning
Relationship between estimated standard error and correlation coefficient
[Competency]
Concept and characteristics of correlation
Difference and connection between correlation and function
Types of correlation
Candidates must be familiar with significance of the correlation coefficient and the use of the specific value of the correlation coefficient to divide the correlation level of the phenomenon
Concept of regression analysis
Main content and characteristics of regression analysis
Conditions for establishing a linear regression equation
Least Square Estimation of Unary Linear Regression Coefficient
Points for Attention in Application of Regression Analysis
The meaning and calculation of standard error of estimation
[Application]
Candidates should be able to use simple formula to calculate correlation coefficient and regression coefficient
Candidates should apply problems in correlation analysis
Candidates should apply differences and relations between regression analysis and correlation analysis
PART 5 Data analysis model
General requirements
Candidates should understand basic principles of the model, the operation process of the numerical model, understand the application scenarios of the model, and be able to complete data modeling analysis reports.
1、Principal component analysis
[Comprehension]
Candidates should understand calculation steps of principal component analysis
Candidates should understand assumptions and model settings for the relationship between the distribution of variables and multivariate in principal component analysis
[Competency]
Candidates must be familiar with variable measurement type suitable for principal component analysis. Through the analysis of the results, Candidates must be familiar with select the appropriate number of principal components to be retained, and pay attention to distinguish between two different analysis purposes (as much as possible to compress variables and avoid collinearity to retain more information) to retain the difference in the number of principal components.
[Application]
On the basis of in-depth understanding of the meaning of principal components, when encountering business problems, have the ability to decide whether to use principal component analysis method; Candidates should be able to decide when to use correlation coefficient calculation method and the covariance matrix calculation method; be able to explain the results of principal component scores; be able to perform function conversion according to the distribution of variables
2、Factor analysis
[Comprehension]
Candidates should understand the factor analysis model settings, you only need to pay attention to the calculation steps of the principal component method
[Competency]
Candidates must be familiar with variable measurement type suitable for factor analysis, through the analysis results, select the appropriate number of factors.
Familiar with common factor rotation methods
[Application]
When encountering business problems, Candidates should be able to decide whether to use factor analysis or principal component analysis; clarify the meaning of each factor based on the weight of the original variable on each factor; dimension a large number of variables Analyze, score by dimension, and compare the difference with expert score (Delphi method); describe the data before clustering and find the ideal clustering method and quantity.
3、Regression analysis
[Comprehension]
Comprehensive application of linear regression
[Competency]
Clarify 6 classic assumptions of linear regression (linear model, no collinearity, residual expectation of 0 (no endogeneity), homoscedasticity, normality, random sampling)
Concept of independent and identical distribution
Clarify the problems that arise after violating the above assumptions
Method of checking whether the model violates the classical hypothesis and the method of correcting the model
Variable screening method
Outlier, index calculation method
Clarify the difference in regression modeling between cross-sectional and time series data
[Application]
Combine business to build regression models and explain regression coefficients
Function conversion according to business scenarios and variable distribution
Processing method when explanatory variable is categorical
Distinguish the relationship between predictive modeling and explanatory modeling
Use results to make new sample predictions
Basic steps and precautions for customer value analysis
4、Classification analysis
[Comprehension]
Chi-square test calculation formula
Calculation formula of binary logistic regression
[Competency]
Candidates must be familiar with description method and test method of whether there is correlation between categorical variables, involving contingency table analysis and chi-square test
Likelihood ratio and Logit conversion
Two-class logistic regression model construction and variable selection
The method of model evaluation, involving confusion matrix and ROC curve
[Application]
Candidates should be able to combine business to build regression models and explain regression coefficients
Capable of applying function conversion according to business scenarios and variable distribution
Candidates should be able to use results to make new sample predictions
Candidates should be able to Combine application of logistic regression and multiple linear regression model
Basic steps and precautions for models such as customer churn prediction, credit rating, and precision marketing
5、Cluster analysis
[Comprehension]
Features of multiple clustering algorithms
Concept and implementation of iteration
[Competency]
Basic logic of clustering method
Distance calculation
Basic algorithms and advantages and disadvantages of systematic clustering and K-Means clustering
Calculation steps of systematic clustering, including the calculation method of the distance between two points and the combination of two types
Method of selecting the optimal number of clusters in systematic clustering
Basic algorithm of K-Means clustering
Candidates must be familiar with reasons and calculation methods of variable standardization in cluster analysis
Candidates must be familiar with reason why the variable needs principal component analysis
Candidates must be familiar with reason and calculation method of variable function transformation
[Application]
Combining customer portraits, customer segmentation, product clustering, outlier inspection (fraud, anti-money laundering) and other business application scenarios, select appropriate clustering methods and steps.
Candidates should be able to apply after clustering analysis, and obtain characteristics of each category according to the distribution of variables after clustering.
6、Time series
[Comprehension]
Clarify differences and applicable scenarios of trend decomposition method, ARIMA method, and time series regression method
Clarify the calculation method of ARIMA method
[Competency]
Trend decomposition method, involving multiplication model and addition model
Specific steps of the ARIMA method; the method of time series regression
[Application]
Combine business (performance forecast, early warning), select appropriate analysis methods
Basic steps and precautions for models such as business time series forecasting
PART 6 Digital working methods
1、Business exploration and problem location
[Comprehension]
Evaluation criteria for the severity of abnormal events.
Event restoration tools such as business processes.
[Competency]
Familiar with drawing of business flowchart
2、Problem diagnosis
[Comprehension]
Candidates should understand brainstorming method of proximate analysis and the selection of quantitative analysis tools.
Candidates should understand 5WHY analysis method in root cause analysis, cause and countermeasure causal diagram.
[Competency]
Familiar with Identifying key points through Pareto analysis
Familiar with performing correlation analysis through scatter plot, correlation graph, and affinity graph
Familiar with exploring through funnel analysis, user profile, retention analysis, and tracking digital footprint
Draw cause and effect diagrams
3、Business strategy optimization and guidance
[Comprehension]
Business goal setting principles
Component and standard form of linear programming.
Difference between integer programming and tailing linear programming.
Component and standard form of secondary planning.
Types and components of the knowledge base.
Types and components of the strategy library.
[Competency]
Modeling steps of linear programming.
Modeling steps of secondary planning.
Analysis methods and tools for process optimization
[Application]
According to the requirements of the subject, the objective function and constraint conditions are given.
六、Recommended Reading
Notes: Candidates can select reading material from the list of recommended books based on their needs. Candidates do not have to read all recommended books but can study on the key points highlighted in the exam outline.
[1] Jia Junping, He Xiaoqun, Jin Yongjin. Statistics (7th Edition) [M]. China Renmin University Press, 2018. (Required)
[2] Jingguanzhijia, Ding Yajun. Statistical analysis: From small data to big data [M]. Electronic Industry Press, 2020. (Optional)
[3] He Xiaoqun. Multivariate statistical analysis (4th Edition) [M]. China Renmin University Press, 2015. (Optional)
[4] Li Zinai, Pan Wenqing. Econometrics (4th Edition) [M]. Higher Education Press, 2015. (Optional)
[5] Sheng Ju, Shi Shiqian, Pan Chengyi, etc. Probability Theory and Mathematical Statistics (4th Edition) [M]. Higher Education Press, 2018. (Optional)
[6] Zhang Hao. Probability Theory[M]. Higher Education Press, 2018. (Optional)
[7] Zhang Wentong. The Basic Course of SPSS Statistical Analysis [M]. Higher Education Press, 2017. (Optional)
[8] Wang Yan. Applied Time Series Analysis (4th Edition) [M]. China Renmin University Press, 2015. (Optional)
[9] James D. Hamilton. Time series analysis [M]. China Renmin University Press, 2015. (Optional)
[10] Data Management Association (DAMA International). DAMA Data Management Knowledge System Guide (Original Book 2nd Edition) [M]. Mechanical Industry Press, 2020. (Optional)
[11] Bjorn et al. Root cause analysis-simplified tools and techniques (2nd Edition) [M]. China Renmin University Press, 2011. (Optional)
CDA Certification Exam Committee
CDA Institute