I think that the problem stated in one of my earlier posts is one of the most fascinating puzzles I came across recently. Many people that got confronted with it said bluntly that the problem simply has no solution, otherwise it would contradict common sense, information theory, etc. But surprisingly, it does have a solution.

For X% strictly bigger than 50% the solution is known as the MJRTY algorithm, discovered a while back by R.S. Boyer and J.S. Moore (also called the Majority Vote algorithm). Although the algorithm is deceptively simple, its proof is not. the In the PDF cited above, the authors present a complete proof and also some interesting history around this non-trivial result.

The correct algorithm was also found by Lailin Chen - also presented in the post comments, but without a proof:

Well, Since we already knew there is one and only one who has more votes than any others, we can find him by letting the votes "fight" each other:

Picked the first one, say X1, count 1 for X1, then move on,  if X1 repeats, add count to X1.

When X1 doesn't repeat, reduce the count by one, if it reaches 0, picked up the current one (say X2), and do the some counting to X2. 

Keep doing this until the end. The last one left with a counting bigger than 0 is the leader:

X3  X1  X1  X2  X1  X3  X3  X3  X4

   ^

X3, 1

X3  X1  X1  X2  X1  X3  X3  X3  X4

         ^

       X3, 0

X3  X1  X1  X2  X1  X3  X3  X3  X4

               ^

            X1, 1

X3  X1  X1  X2  X1  X3  X3  X3  X4

                     ^

                   X1, 0

X3  X1  X1  X2  X1  X3  X3  X3  X4

                           ^

                          X1, 1

X3  X1  X1  X2  X1  X3  X3  X3  X4

                                 ^

                               X1, 0

X3  X1  X1  X2  X1  X3  X3  X3  X4

                                       ^

                                    X3, 1

X3  X1  X1  X2  X1  X3  X3  X3  X4

                                             ^

                                           X3, 2

X3  X1  X1  X2  X1  X3  X3  X3  X4

                                                    ^

                                                  X3, 1

So X3 is the winner.

Here is a more formal description of the algorithm:

- You have with two registers: an “candidate register” containing the name of some candidate (candidate), and a counter (counter).

- Candidate_register = NULL;

- Counter = 0

- For each new candidate:

     If (Candidate_register is NULL)

            Candidate_register = candidate

            Counter = 1

     Else If (candidate == candidate_register)

            Counter++;

     Else if (counter > 0)

            Counter --;

            If (counter == 0)

                Candidate_register == NULL

- At the end you have the correct candidate in the register.

For X < 50% we can adapt a more complex variation of the same algorithm which requires two passes instead of a single one. Instead of keeping a single register (keeping the last candidate) and its associated counter, we keep a number of K registers (with their associated counters) and run the same algorithm. The integer K is chosen such that 100/(K+1) < X. After this pass, we discovered K candidates but of course the candidate with the biggest counter might not be the real candidate. To find the top candidate we simply reset all counters to zero (but keep the candidates found) and re-run the same algorithm.

- Instead of maintaining one “element” register and one counter as in the main algorithm, you need to use K registers, each with K counters.

- First pass

     Initialize all candidate registers with NULL and counters with zero

     For each candidate in the log file

          If the candidate is in one of the K registers, increase its counter

          Else if there is one candidate register which is NULL, put it there and increase its counter with 1

          Else, reduce all counters by 1

               If some counters become zero, set their corresponding candidate registers to NULL

     At the end, each of the candidate registers contains every element with frequency > 100/(K+1). One of them is the winner

- Second pass

     Initialize all counters with zero, but keep the candidate registers filled with the values above.

     For each candidate in the log file

          If the candidate is in one of the K registers, increase its counter

     At the end, you have all the correct counters for all the candidate registers.

     Select the one that has the maximum count.

A variation of this second algorithm is briefly mentioned in this presentation although I could not find a formal article describing it.

What all these algorithms have in common? The fact that you know ahead of time that there is a winner, but you don't know which one. While the algorithms are simple to implement, this is a pretty strong condition that is not likely to be encountered in our real world. Maybe on a planet called Xen :-)