dev-resources.site
for different kinds of informations.
Building a Developer-Focused Search Engine in Rust: Lessons Learned and Challenges Overcome 🚀
As developers, we all know the struggle of wading through irrelevant search results to find that one golden line of code. So, I thought, why not build a search engine tailored for us devs? With Rust, Actix, Elasticsearch, React, and Next.js, I created a search engine for developers.
Here is what I made:
https://dev-search.com/
I am not a senior dev, so if I am doing something stupid, please let me know 😅
🎯 The Mission
The goal was simple: create a developer's information-focused search engine with:
Frontend: React + Next.js (SSG for speed and SEO)
Backend: Rust and Elasticsearch for robust, scalable search functionality
🚧 Challenges Faced
Search by Elasticsearch is slow 😢
Because there are more than 10 million documents, the search of elesticsearch was slow.
I found that the problem that was slowing it down was:
"track_total_hits": {big number like 10000}
The Solution
Actually keeping that number big like 10000 is as slow as actually fetching 10000 documents from elasticsearch. By changing this to
"track_total_hits": false
made the search a lot faster. But this change disables ability to track how many records were hit by a search, so you must consider well if it is good for your use case.
Too Many Malicious Users Scanning the Website 👽
Ah, the joys of running a public-facing site! Within days of launching, I noticed strange requests hitting my server logs. From bots pretending to be browsers to outright weird payloads like \x00\x00SMB, my site became a playground for malicious users. Here's a gem from my logs:
35.203.211.8 - - [30/Dec/2024:05:15:37 +0000] "\x00\x00\x00\xAC\xFESMB..."
The Solution: Fail2Ban
Fail2Ban came to the rescue! This nifty tool monitors log files and dynamically bans IPs that show malicious behavior. Here's how I set it up:
Defined a Fail2Ban Jail for Nginx:
[nginx-malicious]
enabled = true
port = http,https
logpath = /var/log/nginx/access.log
maxretry = 5
findtime = 300
bantime = 600
action = iptables[name=nginx-malicious, port="http,https", protocol=tcp]
Filter to Detect Malicious Patterns:
[Definition]
failregex = ^<HOST> - - .*SMB.*
ignoreregex =
Dynamic Blocking in Action:
When Fail2Ban detects malicious requests, it updates the firewall to block the offending IP:
sudo iptables -L -n | grep DROP
With Fail2Ban, malicious IPs were swiftly banned, and my server logs became much cleaner. Lesson learned: Bots will come, but so will the ban hammer. 🛠️
Please note that, if you are using Docker/Docker compose, you might need the following:
https://github.com/fail2ban/fail2ban/issues/2376#issuecomment-2565534465
Adsense not showing 😿
As you can see on the capture:
Even though Adsense is set, the Adsense often doesn't show up...
I investigated why it is not showing up, but I guess there are 2 reasons:
- My website's reputation is low
- Google cannot find ad for the specified ad size
Well, I cannot change the first reason, but maybe I can do something for the second one. What I did is as follows.
The Solution
At first, I tried the fixed sized ad because I wanted a not too large ad:
<GoogleAdUnit>
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:90px"
data-ad-client="ca-pub-{ad-client-id}"
data-ad-slot="{slot id}">
</ins>
</GoogleAdUnit>
But this often fails to show the ad.
- Please note that I am using
nextjs13_google_adsense
because I am using Next.js.
So, after that, I tried a responsive ad. The default code of the responsive ad is:
<GoogleAdUnit>
<ins
className="adsbygoogle"
style={{ display: 'block', width: '100%' }}
data-ad-client="ca-pub-{ad-client-id}" // Replace with your AdSense client ID
data-ad-slot='{slot id}' // Replace with your Ad slot ID
data-ad-format="auto"
data-full-width-responsive="true"
/>
</GoogleAdUnit>
This is the best because the size is changed in accordance with the ad size. But, to me, the auto sized ad looked too big 😅
So I limited the height like this. Please note that I am using the "horizontal" for the data-ad-format
because I wanted a not-too-big horizontal ad.
<GoogleAdUnit>
<ins
className="adsbygoogle"
style={{ display: 'block', width: '100%', height: '50px' }} // limit height
data-ad-client="ca-pub-{ad-client-id}" // Replace with your AdSense client ID
data-ad-slot='{slot id}' // Replace with your Ad slot ID
data-ad-format="horizontal" // horizontal
data-full-width-responsive="true"
/>
</GoogleAdUnit>
It still sometimes fail to show ad, but ad more often appear on my website now because there is not limitation for the width 😀
Unsolved Problems
- Website Design is too simple
- The search accuracy is low
- The returned data is almost always only stackoverflow because large amount of the database is records from stackoverflow. Not sure whether this is OK..
🙏 Thanks for Reading!
Featured ones: