On September 18-20, 2017, CDA Data Analysis Institute conducted a talent training program with the theme of "Python language data mining application" in the data center (Shanghai) of Industrial and Commercial Bank of China. The training class started in Jiading Park, and the remote video of the Information Technology Department of Xisanqi Park, Waigaoqiao Park and Shanghai Branch participated. The employees of all departments of the center actively registered , and a total of 95 employees participated in the concentrated learning . Teachers and related colleagues in the data analysis department actively communicated and learned, and finally gained a lot of learning results.
company profile:
The Industrial and Commercial Bank of China Co., Ltd. Data Center (Shanghai) [hereinafter referred to as the Data Center (Shanghai)] is an organization directly under the head office and was officially established on November 10, 2000. The data center (Shanghai) undertakes the functions of production operation and disaster recovery management of the entire bank's information system, has established a world-leading core production environment, formed a production operation and maintenance system based on the ITIL concept, and provides data services for domestic and foreign institutions of ICBC, and Connect with more than 500 third-party organizations.
To meet the higher standards of business continuity operation and system availability requirements, ICBC took the lead in launching the construction of the "two centers and three centers" project in the domestic industry. The data center (Shanghai) has built two parallel-operated, fast-takeover data centers in the same city and an off-site disaster recovery center in the three parks of Shanghai Waigaoqiao, Jiading, and Beijing Xisanqi to achieve the highest level of disaster recovery deployment. The information system operates 365 days a year, 24 hours a day.
More than ten years of hard work has not only cast a strong and stable information system in the data center (Shanghai), but also cultivated a team of talents who are determined to forge ahead and pursue excellence. We gather talents and are more committed to training talents. We lead change and are better at controlling change. We expand our horizons and are more willing to share our vision. We pursue dreams and dare to touch them.
brief introduction:
The first stage: Python basic learning
1. Preliminary Grammar
2. Lists, strings and tuples
3. Collections and dictionaries
4. Conditional and looping statements
5. Application of several important built-in functions
6. File operations
7. Functions and their applications
8. Regular expressions
9. Database and Python
10. Sorting algorithm, dynamic programming algorithm, recursive algorithm and other algorithms
The second stage: numpy, pandas, etc. for data cleaning and sorting
1. Organize data (slicing, generating random numbers, copying, broadcasting, sorting, etc.)
2. Various methods of data indexing and selection
3. Grouping, dividing, merging and transforming data
4. Data processing of missing and null values
5. Time series data processing, modeling and forecasting (ARIMA)
6. Processing with Chinese data
7. Data deduplication and outliers
8. Comparison of data sorting and modeling between R language and Python (pandas)
9. Descriptive statistics and inferential statistical analysis
The third stage: Python machine learning algorithm and data mining case summary
1. Logistic regression model for text classification
2. Picture structure and analysis (K-means cluster analysis of pictures)
3. Picture recognition and classification: PCA modeling
4. Two-dimensional handwritten digit recognition (KNN method)
5. Recommendation system and precision marketing (nearest neighbor method, collaborative filtering)
6. Various scenarios of data visualization
7. Text classification of news (TF-IDF guidelines, personal recommendation of travel news)
8. Handwriting recognition
9. Naïve Bayes decision
10. Wine quality classification prediction
11. Grid search and parameter optimization of machine learning
12. Penalized linear regression classifier
13. Use support vector machine to identify and classify
14. Financial time series forecasting (non-ARIMA method)
15. Machine integrated learning algorithm
16. Random simulation, user churn warning, quantitative investment actual combat
Student evaluation:
Teachers conduct in-depth analysis through typical data analysis and mining cases encountered in data analysis. Even beginners can quickly master the ideas and methods of Python data analysis and data mining (including machine learning) to form scientific and effective knowledge Ability structure system framework.
Enterprise evaluation:
This event is rich in content and basically covers the commonly used algorithms and methods of machine learning. Through intensive learning, students have expressed great benefits, gained a deeper understanding of machine learning, and improved their hands-on practical ability. In the subsequent data analysis practice, combined with the needs of our bank's business and operation and maintenance scenarios, we will use the knowledge and methods learned to solve problems better. At the same time, I hope to have more exchanges with CDA data analysts in special topics courses, CDA certification, project consulting and other aspects to achieve more in-depth cooperation.