Creating a "what's hot" Reddit algorithm with Laravel
Eduar Bastidas • January 9, 2022
tips tutorials mysqlThis is a small guide on how to implement a Reddit-style "what's hot" algorithm with Laravel Frameowrk.
There are two primary benefits to using a "what's hot" algorithm.
- It gives newer posts a chance. The problem with many "most popular" pages is that they give older posts and items a huge advantage over their younger counterparts. This leads to a situation where superior content is losing out to mediocre content that was created months beforehand.
- It keeps things "fresh" and prevents content stagnation. Most users will quickly lose interest in a website that only displays a "most popular" list. This is because the content does not change enough to warrant any kind of long-term attention.
For this example, we are going to assume that we are running a website that provides users with the functionality to "thumbs up" or "thumbs down" videos that have been submitted.
Our table design.
Note that I am going to try to keep our database design as simple as possible, lest we allow other topics to creep into the article. Here is the structure of an example table. Note that the three columns that are most important to us are "thumbs_up", "thumbs_down" and "created_at" Without these columns, we cannot implement a "what's hot" page:
1<?php 2 3use Illuminate\Database\Migrations\Migration; 4use Illuminate\Database\Schema\Blueprint; 5use Illuminate\Support\Facades\Schema; 6 7class CreateVideosTable extends Migration 8{ 9 /**10 * Run the migrations.11 *12 * @return void13 */14 public function up()15 {16 Schema::create('videos', function (Blueprint $table) {17 $table->id();18 $table->string('title');19 $table->string('url');20 $table->smallInteger('thumbs_up');21 $table->smallInteger('thumbs_down');22 $table->timestamps();23 });24 }25 26 /**27 * Reverse the migrations.28 *29 * @return void30 */31 public function down()32 {33 Schema::dropIfExists('videos');34 }35}
Our SQL query.
Personally, I think that it is better to implement the "what's hot" algorithm in SQL, and not in Laravel directly. i.e. Let the database handle the heavy work. This is because:
- It allows us to avail of any indexes that we created.
- We can cache the results of the query in memory if we need to.
The query looks like this (based on the Reddit algorithm):
1Video::orderByRaw('2 LOG10(ABS(thumbs_up - thumbs_down) + 1) *3 SIGN(thumbs_up - thumbs_down) +4 (UNIX_TIMESTAMP(created_at) / 300000) DESC5')->limit(100)->get();
As you can see, the algorithm is implemented in the orderByRaw
method and we've limited the results to 100.
Let's explain a little bit how the sorting algorithm works.
- First, we use
LOG10()
to obtain the logarithm of the absolute value of the difference betweenthumbs_up
andthumbs_down
. This is because we want to sort the videos by their "hotness" score (i.e. how popular they are). - The "hotness" score is calculated by taking the logarithm of the absolute value of the difference between
thumbs_up
andthumbs_down
. The sign of the difference is used to determine whetherthumbs_up
orthumbs_down
is the most popular. - The
UNIX_TIMESTAMP()
function is used to determine the age of the video. SinceUNIX_TIMESTAMP(created_at)
will give a very large number (seconds elapsed since January 1st, 1970) adding the other operations will mean nothing, so it is divided by 300000 to make the age of the video more meaningful in the equation, It could also be useful to multiplythumbs_up
andthumbs_down
by a large number such as 86400 which is the multiplication of 24 hours * 60 minutes * 60 seconds. This is because videos created less than a minute ago are more significant than videos created more than a minute ago. - Finally, we use the
SIGN()
function to determine whetherthumbs_up
orthumbs_down
is the most popular.
Let's quickly create a seeder together with a factory to fill our video table:
1<?php 2 3namespace Database\Factories; 4 5use Illuminate\Database\Eloquent\Factories\Factory; 6 7class VideoFactory extends Factory 8{ 9 /**10 * Define the model's default state.11 *12 * @return array13 */14 public function definition()15 {16 return [17 'title' => $this->faker->sentence,18 'url' => $this->faker->url,19 'thumbs_up' => $this->faker->numberBetween(0, 120),20 'thumbs_down' => $this->faker->numberBetween(0, 120),21 ];22 }23}
1<?php 2 3namespace Database\Seeders; 4 5use App\Models\Video; 6use Illuminate\Database\Seeder; 7 8class VideoSeeder extends Seeder 9{10 /**11 * Run the database seeds.12 *13 * @return void14 */15 public function run()16 {17 Video::factory()->count(50)->create();18 }19}
After adding a few videos to your table and running the "what's hot" query above, you'll see that a row's total score does not guarantee it the top spot. Instead, our algorithm takes both the record's total score and its creation time into account.
Here is a result obtained in tinker for me (I am using the "what's hot" algorithm):
1[!] Aliasing 'Video' to 'App\Models\Video' for this Tinker session. 2=> Illuminate\Database\Eloquent\Collection {#1375 3 all: [ 4 App\Models\Video {#1743 5 id: 33, 6 title: "Voluptatibus autem similique et accusantium eos.", 7 url: "http://walter.com/sequi-ea-veritatis-est-laudantium-alias-numquam-voluptates", 8 thumbs_up: 56, 9 thumbs_down: 3,10 created_at: "2022-01-09 14:03:47",11 updated_at: "2022-01-09 22:13:55",12 },13 App\Models\Video {#199014 id: 35,15 title: "Numquam molestias sapiente quo corrupti nemo fuga.",16 url: "http://littel.org/et-facilis-voluptas-rerum",17 thumbs_up: 76,18 thumbs_down: 28,19 created_at: "2022-01-09 23:13:14",20 updated_at: "2022-01-09 22:13:55",21 },22 App\Models\Video {#199123 id: 2,24 title: "Rem sit explicabo harum.",25 url: "https://www.lang.com/ratione-quia-non-commodi-labore-occaecati-non",26 thumbs_up: 88,27 thumbs_down: 59,28 created_at: "2022-01-08 05:09:10",29 updated_at: "2022-01-09 22:13:55",30 },31 App\Models\Video {#199232 id: 43,33 title: "Minus fugit culpa ut necessitatibus tenetur non.",34 url: "http://www.ernser.com/",35 thumbs_up: 51,36 thumbs_down: 36,37 created_at: "2022-01-08 04:17:35",38 updated_at: "2022-01-09 22:13:55",39 },40 App\Models\Video {#199341 id: 23,42 title: "Tempora voluptatibus consequatur numquam sequi autem similique voluptatum.",43 url: "http://friesen.info/",44 thumbs_up: 93,45 thumbs_down: 8,46 created_at: "2022-01-04 23:29:15",47 updated_at: "2022-01-09 22:13:55",48 },49 ],50 }
If you want to get the source code for this example you can find it here. https://github.com/mreduar/whats-hot-algorithm-with-laravel
Extra Pointers.
- Your vote buttons should send off an Ajax request instead of redirecting the user or reloading the page. Websites like Youtube and Reddit have conditioned users to believe that their vote is in “real time.” Sending Ajax requests is extremely easy with JavaScript libraries such as axios.
- If performance is a concern, be sure to read up on indexes. You might also want to look into the possibility of using an object caching daemon such as Memcached or Redis.
- To prevent the user from voting multiple times, you should store votes against the User's ID. There are other methods of preventing vote rigging, but they fall outside the scope of this article.
If you have any questions or concerns, feel free to post them in the comment section below.