Improving Steem’s rankings to cater to diverse content preferences

Would Steem fail if every blog post of a lady putting on her makeup is rewarded $26,000?

Destroy Internet?

This non-meritorious aberration is because Steem’s incentive system maximally rewards those who vote in a groupthink. This is becoming a significant concern that threatens to destroy Steem’s utility and value as evident by the parodies and rants against ‘circlejerks’.

The One-Size-Fits-All Problem

Community Disagreement

I explained in a comment that the fundamental technical flaw appears to be that Steem’s ranking model computes a one-size-fits-all ranking for all users.

It's getting harder and harder to find all the hidden gems on Steemit these days.

That is unavoidable in the current system design, due to the one-size-fits-all reputation system, i.e. each user will have different priorities but the design of the system doesn't accommodate such degrees-of-freedom.

If some of the voting power shares your preferences, that content will rank higher than content that interests none of the voting power, but assuming that interests are reasonably diverse then ranking will be more or less uniform and uncorrelated to individual preferences. So thus more or less in order for any content to rise up, it must be a groupthink effect.

To maximize each voter’s reward, the optimum mathematical strategy is for all voters to vote on the same blog post, because currently Steem only computes a one-size-fits-all global ranking metric. I explained in another comment that invested whales who were expected by Steem’s whitepaper to vote to curate and protect the long-term value of Steem ecosystem, are mathematically incapable of fulfilling that expectation, because they are tied in a knot of a one-size-fits-all global ranking metric.

I suppose that one possible response is that whales have an incentive to preserve the value of their investments, and the best way to do that is to promote a system of fair voting and promote the integrity of the system

The one-size-fits-all ranking system (c.f. my other reply to @smooth below) makes it impossible for whales to act rationally, because they can't compute a set of votes which would reflect their individual preferences for quality which might be shared with other like-minded users.

Thus as far as I can see, the system disincentivizes the whales from participating in voting, for they will come to see that either they become one dysfunctional groupthink monolith or they more or less effectively nullify each others votes in terms of anything other than a uniform ranking which is functionally equivalent to no ranking.

Proposed Solution of Clustering Multiple Rankings

In my prior comment I had alluded to a possible improvement and solution to the dilemma:

An improvement would be some algorithm which allows each grouping of like-minded interests to have their own separate ranking computation. The monetary reward algorithm would also need to change, so as to reward content that ranks highly in any grouping.

The inspiration is that votes should be automatically clustered (grouped) into coteries by an algorithm which can automagically detect the users’ shared preferences, so that a plurality of rankings are allowed: one for each coterie cluster detected. Each user will then see rankings customized to that user’s content preferences.

Benefits

Hypothetically, not only would this eliminate the mindless strategy of voting only for the most globally popular posts, thus reducing rewards for blogging to the meritorious content quality perceived by the voters, it would also stop spamming users with content they aren’t interested to see first. Each user would only see highest ranked the content that they prefer and the content rankings would vary for each user given each user’s individual content preferences. This should reduce animosity between people who have different content preferences and are in the current system competing against each other to spam each other.

No Troll Battles

And all of this would happen automatically, with no changes to the user interface. Users would continue voting, and only their optimum voting strategy would change. Users browsing blogs would continue to do so with the only change being they would see content highest ranked relevant to their preferences.

This would be in theory be a major innovation as compared to Reddit's global one-size-fits-all ranking algorithm.

Technical Description

The technical description of the algorithm I propose may be a bit difficult for some readers to grasp, but I’ll try to not use more technobabble words than absolutely necessary. Notice above I used ‘voting strategy’ instead of ‘game theory’, and I avoided mentioning ‘Nash equilibrium’.

I propose we compute the like-minded distance Dₛₜ between each pair of voters s and t by finding the mean of the following value (Dₛₜ)ⱼ for each pair of votes sⱼ and tⱼ for instance j of content on which either (or both) of these paired users voted.

Alternatively we may wish to underweight in the computation of the mean, the cases where the value is 1, given these add less information because one of the paired voters has not voted on the instance of the content. The term ‘content’ means for example a blog post.

The distances Dₛₜ are grouped into k clusters with the Jenks natural breaks optimization, a.k.a. one dimensional (univariate) k-means clustering.

Demonstration of k-means clustering algorithm

This clustering algorithm groups the distances Dₛₜ into the k clusters which minimize the sum-of-the-squares of the deviations of each member distance Dₛₜ of the cluster from the mean of the cluster. As depicted in the image above, this maximizes the separation between clusters by maximizing the sum-of-the-squares of the deviations of the cluster means from the mean of all the distances Dₛₜ.

We can either choose a value k; or find the value of k which provides the a minimum value for GVF (goodness of variance fit) we’ve chosen.

This algorithm has then grouped the voters into k clusters which maximize the like-mindedness of content preferences for the members of each cluster.

An extra restriction is required on the algorithm, in that for every instance of voter (for s or t) that is a member of a cluster then all instances of that voter (for s or t) must be grouped in the same cluster.

Application

The ranking within each cluster can be computed weighted by Steem Power (SP) according the current computation in the Steem white paper. These totals from the clusters are fed into the algorithm from the white paper to determine the relative rewards for each instance of the content.

Each voter will see the rankings for the cluster which he/she is automagically a member of.

Also the like-minded distances and clustering algorithm could be applied orthogonally to each hashtag (a.k.a. Tag or category), so that voters can be clustered differently for different hashtags. Individuals may have different automatically computed groupings depending on which genre of content they are voting on.