Working in a hands-on learning environment, led by our Data Science expert instructor, students will learn about and explore: Parsing webpages with the BeautifulSoup library.Storing and processing data with pandas DataFrames.Converting raw text to numeric features (TF-IDF vectors) with the sklearn (scikit-learn) library.Measuring text similarity with a cosine distance function from the sklearn library.Dimensionality reduction with singular value decomposition (SVD) using sklearn.K-means clustering using sklearn.Creating word clouds with the WordCloud library for text cluster visualization.
July 29, 2021
In this course, you’re a budding data scientist who has created a draft of your resume. You want to apply for data science jobs, but would like to find the jobs you have the best shot at so would like to optimize your resume for a better chance at getting one of these jobs. We will be using NLP and text analytics to search for the most relevant data science jobs from online job postings and optimize our resume for the job postings. The job post HTML pages have already been web-scraped, and we will be loading them into Python and processing the text data from there. The number of job postings that were collected is large (over one thousand), so we will need to process them with data science methods using Python. We will use text similarity methods to find the most similar job postings, and also to find key skills we’re missing from our resume. We’ll summarize our findings by printing out highlights of the text results, as well as displaying plots and word clouds of the data
- Our first step is to take the raw HTML job postings and extract relevant information from them, such as the skill requirements for each job. 00:00:00
- Next, we will find the jobs that are most similar to our resume using cosine similarity. 00:00:00
- After that, we’ll use the most similar job postings to analyze what type of skills are typically asked for by clustering the skill requirements from the job postings. 00:00:00