Examining the correlation between company stock returns and employee satisfaction
In summer 2017, I developed a web-crawling tool to gather company reviews and employee salary information on online job-hunting platforms. Inspired by Dr. Edmans’s journal, we wanted to examine the relationship between company stock returns and employee review ratings(satisfaction) by looking at the data from online recruiting platforms.
My tool takes in a list of company names and designated geolocations via csv and crawls the company overview page, employee salary page under desired locations and all reviews(including sub-reviews, location, job position, Employee status, date, pros, cons, advice and review url) on Glassdoor.com. Click Here to view source code.
Can’t utilize Glassdoor API to get the information, while Glassdoor mechanisms are complex and arbitrary. Have to write three different scrapers for Company Overview, Company Review, and Salary. Scraping employee salaries with designated company name and locations is especially difficult since Glassdoor sometimes automatically jumps to undesired urls when encountering company naming, location ambiguity and when zero salary information is found in the area.
Use selenium to navigate through the pages and use beautifulsoup to scrape information. Use a complete list of Wilshire 5000 firms to avoid ambiguity and choose the right site. Instead of selecting location in the salary page, make each company-location tuple as a complete new search to get the designated information. Use a combination of classname, id, xpath and css_selector to select the desired section in pages.
Here’s a sample analysis on salary information based on geolocation, you can check it out on Tableau Public.
-Another idea I’m thinking about doing in the future is to use nltk to analyze the sentiment of each review and compare them to the overall user rated score.
October 2017 Update:
One major problem I am having right now is that the salary information and review information we have lack the time stamp necessary to compare to the fluctuation of the company’s stock price. I’m learning to see if using proxy can solve this problem. Meanwhile, I’m currently working on a more Macro view of the data I collected, comparing the employee distribution to that of labor force distribution in the U.S. to see how well data from Glassdoor can represent the actual labor market in general.