Logo

dev-resources.site

for different kinds of informations.

Building a Developer-Focused Search Engine in Rust: Lessons Learned and Challenges Overcome 🚀

Published at
1/11/2025
Categories
rust
showdev
webdev
Author
lechat
Categories
3 categories in total
rust
open
showdev
open
webdev
open
Building a Developer-Focused Search Engine in Rust: Lessons Learned and Challenges Overcome 🚀

As developers, we all know the struggle of wading through irrelevant search results to find that one golden line of code. So, I thought, why not build a search engine tailored for us devs? With Rust, Actix, Elasticsearch, React, and Next.js, I created a search engine for developers.

Here is what I made:
https://dev-search.com/

I am not a senior dev, so if I am doing something stupid, please let me know 😅

🎯 The Mission

The goal was simple: create a developer's information-focused search engine with:

Frontend: React + Next.js (SSG for speed and SEO)

Backend: Rust and Elasticsearch for robust, scalable search functionality

🚧 Challenges Faced

Search by Elasticsearch is slow 😢

Because there are more than 10 million documents, the search of elesticsearch was slow.

I found that the problem that was slowing it down was:

"track_total_hits": {big number like 10000}

The Solution

Actually keeping that number big like 10000 is as slow as actually fetching 10000 documents from elasticsearch. By changing this to

"track_total_hits": false

made the search a lot faster. But this change disables ability to track how many records were hit by a search, so you must consider well if it is good for your use case.

Too Many Malicious Users Scanning the Website 👽

Ah, the joys of running a public-facing site! Within days of launching, I noticed strange requests hitting my server logs. From bots pretending to be browsers to outright weird payloads like \x00\x00SMB, my site became a playground for malicious users. Here's a gem from my logs:

35.203.211.8 - - [30/Dec/2024:05:15:37 +0000] "\x00\x00\x00\xAC\xFESMB..."

The Solution: Fail2Ban

Fail2Ban came to the rescue! This nifty tool monitors log files and dynamically bans IPs that show malicious behavior. Here's how I set it up:

Defined a Fail2Ban Jail for Nginx:

[nginx-malicious]
enabled = true
port = http,https
logpath = /var/log/nginx/access.log
maxretry = 5
findtime = 300
bantime = 600
action = iptables[name=nginx-malicious, port="http,https", protocol=tcp]

Filter to Detect Malicious Patterns:

[Definition]
failregex = ^<HOST> - - .*SMB.*
ignoreregex =

Dynamic Blocking in Action:

When Fail2Ban detects malicious requests, it updates the firewall to block the offending IP:

sudo iptables -L -n | grep DROP

With Fail2Ban, malicious IPs were swiftly banned, and my server logs became much cleaner. Lesson learned: Bots will come, but so will the ban hammer. 🛠️

Please note that, if you are using Docker/Docker compose, you might need the following:
https://github.com/fail2ban/fail2ban/issues/2376#issuecomment-2565534465

Adsense not showing 😿

As you can see on the capture:

Image description

Even though Adsense is set, the Adsense often doesn't show up...
I investigated why it is not showing up, but I guess there are 2 reasons:

  1. My website's reputation is low
  2. Google cannot find ad for the specified ad size

Well, I cannot change the first reason, but maybe I can do something for the second one. What I did is as follows.

The Solution

At first, I tried the fixed sized ad because I wanted a not too large ad:

<GoogleAdUnit>
    <ins class="adsbygoogle"
        style="display:inline-block;width:300px;height:90px"
        data-ad-client="ca-pub-{ad-client-id}"
        data-ad-slot="{slot id}">
    </ins>
</GoogleAdUnit>

But this often fails to show the ad.

  • Please note that I am using nextjs13_google_adsense because I am using Next.js.

So, after that, I tried a responsive ad. The default code of the responsive ad is:

<GoogleAdUnit>
    <ins
        className="adsbygoogle"
        style={{ display: 'block', width: '100%' }}
        data-ad-client="ca-pub-{ad-client-id}" // Replace with your AdSense client ID
        data-ad-slot='{slot id}' // Replace with your Ad slot ID
        data-ad-format="auto"
        data-full-width-responsive="true"
    />
</GoogleAdUnit>

This is the best because the size is changed in accordance with the ad size. But, to me, the auto sized ad looked too big 😅

So I limited the height like this. Please note that I am using the "horizontal" for the data-ad-format because I wanted a not-too-big horizontal ad.

<GoogleAdUnit>
    <ins
        className="adsbygoogle"
        style={{ display: 'block', width: '100%', height: '50px' }} // limit height
        data-ad-client="ca-pub-{ad-client-id}" // Replace with your AdSense client ID
        data-ad-slot='{slot id}' // Replace with your Ad slot ID
        data-ad-format="horizontal" // horizontal
        data-full-width-responsive="true"
    />
</GoogleAdUnit>

It still sometimes fail to show ad, but ad more often appear on my website now because there is not limitation for the width 😀

Unsolved Problems

  • Website Design is too simple
  • The search accuracy is low
  • The returned data is almost always only stackoverflow because large amount of the database is records from stackoverflow. Not sure whether this is OK..

🙏 Thanks for Reading!

Featured ones: