dev-resources.site
for different kinds of informations.
The Struggle of Finding a Free Excel to PDF Converter: My Journey and Solution
Converting Excel files to PDF is a common task in many projects, whether for generating reports, sharing data, or creating documents. Like many developers, I initially believed this would be an easy task to automate. However, my search for a free, reliable solution turned into a frustrating journey filled with limitations, compatibility issues, and expensive tools.
Here’s how I overcame these challenges, built my own Excel-to-PDF converter, and made it available as an open-source tool for others who may be struggling like I did.
The Frustration
Commercial Tools
My initial search brought me to paid solutions like Aspose.Cells, Syncfusion, and others. While they offered robust features, they came with steep licensing costs—well beyond what I could justify for small or personal projects.
Online Services
Free online converters seemed like a promising alternative, but they were unsuitable for automation. These tools often raised privacy concerns (since files are uploaded to third-party servers), had file size limits, and didn’t provide programmatic APIs.
Open-Source Libraries
I also explored open-source libraries, but most lacked the ability to convert Excel files to PDF. Even those that did were either unreliable or didn’t support modern Microsoft Office formats.
Discovering LibreOffice in Headless Mode
After weeks of searching, I stumbled upon the idea of using LibreOffice in headless mode. LibreOffice is a free, open-source office suite that can convert various file formats, including Excel, to PDF. When run in headless mode, it operates via the command line, making it perfect for automation.
How My Solution Works
To make this approach developer-friendly, I built a lightweight Go-based HTTP server that acts as a REST API. This server wraps LibreOffice’s functionality and allows any programming language to interact with it via HTTP requests.
Key Features
-
Multiple File Format Support: Supports
.xlsx
,.xls
,.csv
,.docx
,.pptx
, and more. - Automatic Cleanup: Temporary files are automatically deleted after one hour to save disk space.
- Custom Fonts: You can mount custom fonts by cloning the GitHub repository or using Docker volumes.
- Cross-Language Integration: Works with any programming language that supports HTTP.
The Temporary Directory Approach
Instead of relying on the system’s temporary directory, I opted to use a custom ./tmp
directory. This ensures consistent behavior, as system temp directories sometimes have unpredictable permissions.
Implementation Details
How It Works
-
File Upload: Clients upload an Excel file via the
/convert
endpoint using a POST request. -
Temporary Storage: The server saves the file in the
./tmp
directory with a timestamp-based filename. - Conversion: LibreOffice is called in headless mode to convert the file to PDF and save the result in the same directory.
- File Cleanup: A background goroutine deletes files older than one hour.
- Response: The converted PDF is returned as the HTTP response.
Getting Started
GitHub Repository
You can find the source code at https://github.com/wteja/pdf-converter.
Docker Image
The project is also available as a Docker image: wteja/pdf-converter.
Running the Docker Container
docker pull wteja/pdf-converter
docker run -p 5000:5000 wteja/pdf-converter
Examples of Integrating with Other Languages
Since the service is exposed via HTTP, you can use any programming language to interact with it.
C#
var client = new HttpClient();
var fileContent = new ByteArrayContent(File.ReadAllBytes("example.xlsx"));
var formData = new MultipartFormDataContent { { fileContent, "file", "example.xlsx" } };
var response = await client.PostAsync("http://localhost:5000/convert", formData);
var pdfBytes = await response.Content.ReadAsByteArrayAsync();
File.WriteAllBytes("output.pdf", pdfBytes);
Node.js
const axios = require("axios");
const FormData = require("form-data");
const fs = require("fs");
const form = new FormData();
form.append("file", fs.createReadStream("example.xlsx"));
axios.post("http://localhost:5000/convert", form, { headers: form.getHeaders() })
.then(response => fs.writeFileSync("output.pdf", response.data))
.catch(console.error);
Python
import requests
with open("example.xlsx", "rb") as f:
response = requests.post("http://localhost:5000/convert", files={"file": f})
with open("output.pdf", "wb") as f:
f.write(response.content)
Go
package main
import (
"bytes"
"io"
"mime/multipart"
"net/http"
"os"
)
func main() {
file, _ := os.Open("example.xlsx")
defer file.Close()
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
part, _ := writer.CreateFormFile("file", "example.xlsx")
io.Copy(part, file)
writer.Close()
req, _ := http.NewRequest("POST", "http://localhost:5000/convert", body)
req.Header.Set("Content-Type", writer.FormDataContentType())
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
out, _ := os.Create("output.pdf")
defer out.Close()
io.Copy(out, resp.Body)
}
Challenges and Trade-Offs
Image Size
The Docker image is 2.67 GB due to the dependencies required by LibreOffice. While I tested smaller images like Alpine, they shipped with an older version of LibreOffice that wasn’t compatible with modern Microsoft Office formats. Debian, although offering the latest LibreOffice, resulted in an even larger image (~3 GB).
Why It’s Worth It
The large image size is a reasonable trade-off when compared to the cost of commercial solutions. Once set up, the image can be reused across multiple projects without any additional licensing fees.
Conclusion
The frustration of finding a free Excel-to-PDF converter led me to build my own solution using LibreOffice in headless mode. While it’s not perfect, it’s free, reliable, and flexible. If you’re facing the same challenge, I hope this project saves you time and effort.
Check out the project on GitHub or pull the Docker image from Docker Hub. Let me know how it works for you or if you have suggestions for improvement.
Featured ones: