dev-resources.site
for different kinds of informations.
PostgreSQL vs. SQLite: read & write in multithreaded environment
This post is from my old blog, written at Mar. 7th 2024.
https://codenested.blogspot.com/2024/03/postgresql-vs-sqlite-read-write-in.html
The start was humble. I needed to cache some data, and I thought just push them to database table and give index, and the rest will be database's job. There were only 2 TEXT fields, and I needed to refer to only one field to search for specific row - which is some kind of key-value store -, so I thought whatever database engine should be fine.
And yes. It was a BIG mistake.
First I tried SQLite, and I found out that, in multithreaded environment some records are evaporated when trying to write to the table simultaneously, even with -DSQLITE_THREADSAFE=2
compile time option. I pushed the same data in same condition, and sometimes I have only 20 records, other times 40, and yet 26 for some others....... What drove me crazier was that the SQLite itself worked fine without any I/O problems. A good moment to shout "WHAT THE HELL?!" in real time.
So I changed the engine to PostgreSQL. Our trustworthy elephant friend saved all the records without any loss. I was satisfied with that, but...... Though I applied b-tree index to necessary field of the table, it took 100 milliseconds for just running SELECT field2 WHERE field1='something'
. No, the table was small enough. There were only 680 records and data length was at most 30 characters for field 1 and only 4 characters for field 2. I configured the engine with some optimization, so it worked fine for bigger tables so I felt assured for its performance, but I didn't expect something like this, even in my dreams.
Elephant is tough, but as a side effect it's too slow.......
So, one last chance: I ran pg_dump to move data from PostgreSQL to SQLite, and with same condition(same index, same table structure, ......), I turned on at .timer
SQLite shell and it took less than 0.001 second. Hooray!
After some more experiments, SQLite can't fully resist from data loss by itself even with multithread support option enabled, and you need more external support like std::mutex. I guess that it's fread() call doesn't support full serialization in multithread environment, but I have neither time nor abilities to do the proper inspection. :P
Anyway, now I use the combination of SQLite + WAL mode + more SQLite internal cache + std::mutex. Still the write performance looks good, but if needed, I think I could use more files with load balancing via non-cryptographic hash.
Featured ones: