Wuzzuf Visualisations

June 28th 2017, 7:14 amCategory: None 0 comments

WUZZUF is Egypt’s #1 Online Recruitment Jobs Site, especially in terms of quality job offers and candidates. More than 3,000 companies and recruiters in Egypt are actively hiring since it was launched in 2012.

Also more than 160,000 job seekers consisting of Egypt’s top professionals and fresh graduates visit WUZZUF applying to jobs each month. Wuzzuf has recently published their data for exploration (https://www.kaggle.com/WUZZUF/wuzzuf-job-posts). The data includes job posts between 2014-2016, and the applicants' ids and their applications timestamp. In this work, we visualize the data to give insights into the Egyptian market, its needs, its evolution and its facts.

An interactive visualization tool can be found here.

What are the Egyptian business Needs?

Some of the most important questions for students and fresh graduates is "What is the most required skills in my domain?" or "What should I learn?" or "What gives me a competitive edge over other candidates?". Rather than speculating the answers, it is better to go to the recruiters to find the answer to what are their needs. As the recruiters time is limited, they try to declare most of their needs in the job description to filter the applicants. In this work, we exploit this valuable information and analyze the jobs descriptions of the posts between 2014 and 2016 in each business domain to capture the business needs.

The first step towards our goal is to extract the useful entities from the description. We used a third party API from https://www.meaningcloud.com to extract Tags which represents named entities as people, organization, places, etc. e.g. MS Office, Word, Excel, Weeks and Cairo, and Tags which represents significant keywords. e.g., ability, system, software, code and computer science. Secondly, we group the tags per each job title (e.g. Web Designer, Call Center, Sales Manager), and we visualize these tags according to their frequency (number of occurrences) in all the posts. In the online tool, the user can select a job title and see a word cloud visualization of the most common entities in these posts. We will demonstrate here some of the outputs of our tool.

What are top job requirements for Senior Java Developer?

The following word cloud demonstrates the most relevant technologies, skills, and languages that a senior Java developer should have, based on the job posts analysis. We see that J2EE, JavaScript amd JPA comes first, then JMS, MVC Capital, IBM Rational Rose, Linux, SQL and MySQL comes next. The tools also recommends other tools such as JBoss, Websphere, Tomcat, Weblogic. We see also that Android and Birt is less common for this job. We consider this word cloud as a helpful tool (or checklist) for Senior Java Developer to validate his knowledge.


What are top job requirements for Web Designers?

The following word cloud shows the most frequent entities in the job posts seeking web-designers. We see that JavaScript, JQuery and HTML5 are the top requirements, while Dreamweaver, Adobe Photoshop and Adobe Creative Suite comes next.  Knowing MVC Capital, Visual Arts and ASP.net are less required, yet needed.


What are top job requirements for Call Center Agents?

As we rely on text analysis, it is not always guaranteed that we visualize a meaningful tools or skills. When analyzing the Call Center Agents posts, we had the following word cloud. We can see that Cairo is the most frequent entity, and this is expected as most call centers exist there. Speaking about places, we see (in order of frequency): Maadi, Heliopolis, Nasr City and Zayed. It seems as well that there is a trend towards hiring "Males" (it is more frequent than the word "Off Gender" in the cloud map), but we see also in the words "Military Service" which seems to favor the exempted candidates. Looking at the tools we see that Excel, Word and MS Office is the required tools for this job.

Business Career: Opportunities & Salaries

An interesting career-related question is the trade-off between the experience level and the available opportunities. Additionally, the salary offered for each experience level. In this section, we capture these information and plot (per each industry) the number of vacancies per each career level, and show that against the average salaries for these vacancies. We used a bubble chart, where the bubble size is proportional with the number of vacancies, and the position of the bubble indicates the experience level and the average salary. An interactive tool that enables you to show any industry available in this link, we encourange you to go ahead and try out our tool, check your industry and get back to us by your comments.

Pharmaceuticals Industry: Career Levels against Average Salaries

The demand for pharmaceuticals vacancies is high at the "Entry" and "Experienced (Non-Manager)" with average wages 3000 EGP for entry level, and 5000 EGP for "Experienced (Non-Manager)". "Manager" vacancies posts are few and salaries varies between 12000-16000 EGP. The senior management vacancies posts are rare and salaries over 26000 EGP. There is no demand for "Student" level, which match the nature of the pharmaceuticals industry.


Computer Software Industry: Career Levels against Average Salaries

Interestingly, most of the demand in computer software industry in the "Experienced (Non-Manager)" with average salary 6000 EGP. Next comes the demand on the "Entry Level" with average salary 3000 EGP. According to the analyzed posts, the "Senior Management" salaries are not as high as expected, but this may due to other package compensations (e.g. profit share).

Egyptian Job Demand Growth per Industry

Finally, in this part we analyze the growth of job vacancies along the past 2 years. This would be important for investors and for online recruitment sites (like Wuzzuf), as it shows which industry sectors are important to approach. Besides, it is intuitive that the number of applicants (site visitors) is proportional to the number of vacancies in their industry. As per our plot below, the most appealing sectors are: Computer Software, IT Services, and Engineering Services. The grows were doubled in the last 2 years. On the other hand, the telecommunication services didn't have such growth and we see saturation in the market job demand. For other industries, The interactive tool can be used to plot their grows base on 2014-2016 data.



 

Wuzzuf Dataset Cleaning

June 28th 2017, 5:44 amCategory: Big Data 0 comments

Wuzzuf, is a technology firm founded in 2009 and one of the very few companies in the MENA region specialized in developing Innovative Online Recruitment Solutions for top enterprises and organizations, They successfully served 10,000+ top companies and employers in Egypt, 1.5 MILLION CVs were viewed on their platform and 100,000+ job seekers directly hired through them. In total, 250,000+ open job vacancies were advertised and now, 500,000+ users visit their website each month looking for jobs at top Employers.

Wuzzuf, has released a sample dataset on Kaggle (Which provides data science competitions, Datasets, and Kernels), named Wuzzuf Job Posts. The dataset contains 2 CSV files:

  • Wuzzuf_Job_Posts_Sample.csv: which contains Wuzzuf job posts with following attributes:

    • id: post identifier 

    • city_name: is the city of the job.

    • job_title: the title of the job

    • job_category_1, job_category_2 and job_category_3: which contains the most 3 relevant categories of the job post, e.g., Sales/Retail/Business Development

    • job_industry_1, job_industry_2 and job_industry_3:  which contains the most 3 relevant industries of the job post, e.g., Telecommunications Services

    • salary_minimum and salary_maximum: the salary limits.

    • num_vacancies: how many open vacancies for this job post.

    • career_level: enumeration of career levels e.g., Experienced (Non-Manager) and Entry Level

    • experience_years: number of years of experience.

    • post_date: publication timestamp of the post. e.g., 2014-01-02 16:01:26

    • views: count of views

    • job_description: detailed description for the job post.

    • job_requirements: main job requirements for the job post.

    • payment_period: salary payment interval e.g. Per Month

    • currency: salary currency e.g. Egyptian Pound

  • Wuzzuf_Applications_Sample.csv.zip: Which contains Wuzzuf job applications, it have the following attributes:

    • id : application identifier

    • user_id: applicant identifier

    • job_id: post identifier

    • app_date: application timestamp, e.g., 2014-01-01 07:27:52

Data Cleaning

The published data-set had many free-text fields, Wuzzuf system does not enforce a certain list of items to choose for them, which makes processing and aggregation difficult. A common handling such as lower case all values and remove trailing spaces was performed. Additionally some fields needed special handling such as:

  1. city_name:

    • this attribute is free text attribute, which represents Egyptian cities, but it has the following problems:

      • Misspelling of words. (i.e. cairo , ciro , ciaro).

      • Arabic names (i.e. القاهرة )

      • Outside Egypt cities (i.e. riyadh, doha)

      • General Cities (i.e. all egypt cities , any location)

      • Group of Cities (i.e."cairo, alexandria - damanhor")

    • All the above issues has been solved by: 

      • Outside Egypt cities: a static list of outside cities has been mapped to category "outside".

      • Arabic (Non-ascii) names: has been replaced statically be the corresponding english words.

      • General Cities: a static list of outside cities has been mapped to category "any"

      • Remove Not Needed substrings such as "el" and "al".

      • Replace "and" and "or" substrings with "-" to be splitted on next steps.

      • Group of Cities : attribute has been splitted on several delimiters.

      • Misspelling of words: a static list of valid cities and its states in Egypt has been created , each misspelled word has been mapped to the most similar word of valid cities, a threshold T has been used to accept only similarities above that threshold, otherwise city will mapped to "any" category.

      • Added new state attribute by mapping each city_name to its state from valid cities & states categories.

  2. job_category_1, job_category_2, job_category_3 attributes: cleaning was done by removing placeholder text "Select" from all 3 attributes, and merging the 3 attributes into one attribute called job_categoriesd

  3. job_industry_1, job_industry_2, job_industry_3 attributes: cleaning was done by removing placeholder text "Select" from all 3 attributes, and merging the 3 attributes into one attribute called job_industries.

  4. experience_years attribute: we manually normalizing free text onto one of 3 forms 'x+' or 'x-y' or 'x', and split the experience_years attribute to 2 new attributes experience_years_min and experience_years_max , which contains the minimum and maximum years respectively needed for a job.

  5. post_date attribute: we generated a new attribute called "post_timestamp" which has the POSIX timestamp value of the post_date attribute (i.e., the number of seconds that have elapsed since January 1, 1970 midnight UTC/GMT)

  6. job_description and job_requirements attributes: we noticed that job_requirements attribute are normally empty, so we added new derived attribute called "description" which contains the concatenation of job_requirements and job_description attributes.

Derived Attributes

The next step was deriving some attributes from these data sets. We derived the following attributes:

  1. Tags attributes: we used a third party API from MeaningCloud to extract Tags from the "description" attribute (recall that it contains the data from job_description and job_requirements). Thus, we added to the data-set the following attributes: 

    • quotation_list: which represents quoted text. e.g., you take on the responsibility of growing the Academy by increasing business and handling operational and technical challenges that arise in the process.

    • entity_list : which represents named entities as people, organization, places, etc. e.g. MS Office, Word, Excel, Weeks and Cairo

    • concept_list: which represents significant keywords. e.g., ability, system, software, code and computer science.

    • relation_list: This attribute could be used to provide a summary for the description attribute as it highlights most of the important notes from the description part.

    • money_expression_list: which represents money expressions, e.g., 2000 EGP

    • time_expression_list: which represents time expressions, e.g., 6 Months at least and 8.5 hours

    • other_expression_list: which contains other expressions such as alphanumeric patterns. e.g., php5

  2. applications_count attribute to each post, which calculates how many applicants has been applied to this job post. (derived from applications data-set)

  3. first_applicant_timestamp and  last_applicant_timestamp attributes per each post, which calculates the POSIX timestamp of the first and last applicant that applied to this job post.  (derived from applications data-set)

Case Study: Jobs Recommendation

Extracting tags from the jobs opens the doors for recommending jobs for applicants. We exploit the entity_list and concept_list attributes to rank the job posts that are relevant to the given applicant. On the other hand, we build a keywords vector from the applicant profile. The recommendation selection works calculating the 10 highest matching scores using a heuristic model. 

As a proof of concept, we analyzed some applicants profiles (their private information such as names was removed for anonymity). The following is a sample of the analyzed profiles:


Our system recommends the following job posts to him (ordered from best or lowest):

  • System Administrator, with job_id = "8c872132", with score = 1.0

  • System Administrator, with job_id = "a13539c", with score = 1.0

  • System Administrator, with job_id = "c820bb65", with score = 1.0

  • Technical Support Engineer French Speaker, with job_id = "6783a66f", with score = 1.0

  • Data Entry & IT Technician, with job_id = "8c872132", with score = 1.0

  • Software Developer SharePoint, with job_id = "990d3300", with score = 1.0

  • .Net Developer, with job_id = "22a298c7", with score = 1.0

  • Operations Support Engineer, with job_id = "69318c48", with score = 1.0

  • Microsoft Product Manager, with job_id = "eb59b18d", with score = 0.8571

For further details about these job posts, check the dataset using these IDs.

Case Study: Job Summary

In some use cases, it is useful to summarize a bulk of text and get the most relevant information from a given text. As mentioned before, we added the relation_list attribute which highlights most of the important notes from the description part. Using this attribute, we can provide a short, yet descriptive, summary of the post. As an example, here is the original job post description and its summary for job post number 68417a3c.
 

Original job description (1317 characters)
Temporary Vacancy (4 Months)
Students/Undergraduates are acceptable.
Working as a promoter at Key Accounts' stores that sell OneCard Items like " Mobile and Electronic Chains in Egypt" required  :
Daily contacting with the sales staff  working at the store/s for:
Training them and handling their complaints.
Delivering all POS materials as much as possible “posters, flyers and danglers,,, etc” .
Following up the stock movement and sales volumes.
Updating our files with the dealers’ data base.
Getting feedback and requested info about the market and competitors.
Achieving the monthly targeted plan of performing successful No of presentation s for the end users at the store/s that is set by the Distribution Team leader / Supervisor/ Manager.
Sending reports of these presentations to the Distribution Team leader/Supervisor/ Manager on Daily basis.
Male/Female
Bachelor Degree
Good command and knowledge of Microsoft office (Word-Excel- Outlook)
Good writing and speaking English
highly Presentab
Having training and educating skills
Having selling Skills
Communication & Personal Effectiveness/ Interpersonal Skills
Building Relationships
Delivering Excellent Service / Service Orientation
Problem Solving
Marketing & Sales
Team Working
0 up to 2years experience in sales, distribution& marketing activities

Job Summary (544 characters)
Students/Undergraduates are acceptable.
Working as a promoter at Key Accounts' stores that sell OneCard Items like " Mobile and Electronic Chains in Egypt" required :
Updating our files with the dealers’ data base.
Getting feedback and requested info about the market and competitors.
Achieving the monthly targeted plan of performing successful No of presentation s for the end users at the store/s that is set by the Distribution Team leader / Supervisor/ Manager.
0 up to 2 years experience in sales, distribution& marketing activities


The original description contains around 1317 characters while the summarized one contains only 544 characters, which an approx 59% reduction in size.