Back to all articles

Voting algorithms for higher employee diversity

As a company we’re pretty confident in our recruitment process. It was build and honed through the years to provide us with better and better candidates. When hiring, we take multiple things into account, but when it comes to internship candidates we hire mostly for their soft skills and potential rather than current technical knowledge. When you have as many candidates as we did this year (almost 50 candidates for 2 positions), at some point, after filtering out the candidates that were not a good fit, it is nearly impossible to point out with confidence which candidates are the best for Lunar. Here is where the diversity comes into play.

We try to hire people that will be a stretch to our culture and that will provide cognitive diversity to our team. During our last internship recruitment process I decided to test if we’re really as good as we think we are at picking diverse candidates.

Universal values versus personal preferences

Of course, not all of the diversity is good. We don’t want to hire someone that stands in clear opposition to the values that we represent as a company just because she is different. There are, however some traits or viewpoints that are neutral when it comes to the value of a potential employee and these are the areas that we look for diversity in. We would rather hire a volcano of energy and emotions if our team is predominantly calm and stoic. We would rather hire a person raised in a different culture if our team is mostly Polish. We would rather hire a man if our team is mostly female. At least, that’s theory.

In practice, each and every one of the people involved in the recruitment process has their biases. Because of that we do not agree on which candidates we should pick and we need a way to gather all our opinions and collectively choose who we want to hire.

Recruitment process

We start hiring process having most of us score all applications from candidates. Then we engage in a discussion to pick 10 candidates to invite to Happy Hours. The average score of an application serves as a strong guidance, although it doesn’t give a definite answer who will be invited. Finally, after we’re done all Happy Hours, we pick the best candidates who we invite to the internships.

Trust the machine

While the last stage is based on internal communication and negotiations choosing 10 candidates to invite cannot be. Meeting and discussing in depth applications of 60 people would take days, if not weeks. That’s why we have to trust an algorithm to gather all of our opinions and help us choose.

The “traditional” way to choose 10 candidates was to take arithmetic mean of all of our votes and discuss which candidates we want to invite, starting with those with the highest score. To understand the implications of such an approach, let’s look at an example vote below:

Lunar 1 Lunar 2 Lunar 3 Lunar 4 Lunar 5 AVG
Candidate A 5 5 5 1 1 3.4
Candidate B 1 1 5 5 5 3.4
Candidate C 4 4 4 3 3 3.6
Candidate D 3 3 4 4 4 3.6

If we were to choose 2 candidates from those above we would choose candidate C and D, because they have the highest average score, even though nobody chose their applications as the best. Candidates A and B were either loved or hated and even though more than 50% of voters deemed them the best - their average score dropped so much they would not be invited.

Here, I will make one assumption. Because all of the voters are looking at the same applications and share similar values - differences in votes will either come from the voters valuing various traits differently or from their internal biases towards or against some people.

For our example above, we can imagine that candidate A showed an amazing technical mind but didn’t really shine in a matter of soft skills. Candidate B however was truly empathic and kind but couldn’t deal very well with programming task. Or, we can say that candidate A was really talkative and open and candidate B was rather quiet and calm.

So, if we have 2 average candidates and 2 controversial candidates, who should we choose to meet with at the second stage of the recruitment process? Well, if we want to increase diversity we should choose the controversial ones.

Based on some research we decided to try a variation of Bloc algorithm, which should provide us with more diverse candidates. In addition to giving scores to candidates’ applications each of us was asked a question: “If you were to choose 10 candidates, which one would you personally choose?”. Then, we counted how many times each candidate appeared on the ‘top ten’ lists and choose 10 with the highest number of occurrences. Let’s see how it would work for the example above.

Lunar 1 Lunar 2 Lunar 3 Lunar 4 Lunar 5 AVG Top
Candidate A 5 5 5 1 1 3.4 3
Candidate B 1 1 5 5 5 3.4 3
Candidate C 4 4 4 3 3 3.6 2
Candidate D 3 3 4 4 4 3.6 2

In red I marked top 2 candidates for each voter. We can see that now Candidate A and B have 3 points each, while C and D have 2. Using Bloc algorithm we are choosing more controversial, more diverse candidates.

Experiment

This year we decided to gather both ‘average’ and ‘top 10’ scores to compare the results of the first stage of the recruitment in both cases. There were 12 Lunar employees voting on the applications. Below you can see what some of the relevant scores (top 10 from both categories) looked like. The names of the candidates were changed.

Candidate A B C D E F G H I J K L M
Avg. Score 4.83 4.69 4.46 4.38 4.23 4.23 4.08 4.00 3.92 3.92 3.92 3.92 3.77
Standard Deviation 0.55 0.61 0.75 0.84 0.80 0.70 0.64 0.96 0.62 1.00 0.83 0.83 0.80
Top 10 Votes 11 11 9 9 7 7 6 6 3 7 5 3 5

As you can see, the top 6 candidates both by average and number of votes were very consistent. Candidate J had a high number of votes but low average, which is not surprising given that the standard deviation on his votes was very high. On the same note, candidate I has a fairly high average with low number of votes and low standard deviation. It appears to me that both methods agree on who the best candidates are and point to slightly different people when it is not that obvious.

The fact that both approaches gave such similar results gives me hope that we’re not that driven by our internal biases when it comes to hiring and that we are able to overlook our personal preferences while judging how much value a person will bring to the company.

In the end, we had a fiery discussion about the candidates and selected 12 to meet with, not 100% in line with any of the results. In case you were wondering, we couldn’t decide and settle for just 2 candidates so we hired candidates B, C and D. And, after a couple of months of working with them, we’re all very happy with our decision.

Judging who will be the best candidate for an internship is such a complex problem that I’m not confident that using Bloc algorithm in the future can be better for us than a decision based on an average and discussion. It had one huge advantage though - choosing your top 10 candidates made some of the people (including me) reconsider some of their initial judgements. If only for that, the experiment was worth it. The additional scoring method also enabled us to invite some people with high diversity potential, even though their average score was not too high. Even though this year we didn’t decide to hire any of those candidates - the potential payoff from hiring a candidate with really high diversity was worth the cost of investing one or two Happy Hours slots.

Who knows, maybe this year we’ll try something new?

Share this article: