Fetch and compute first, write later

Hey guys,
An update after I posted the “Laravel Excel: How to append rows to an existing Excel file“ a while back. This post will give you another solution and probably a better way to export that fits every case.
Problems
Previously, when trying to export more than 30k records, the overall process was getting slow.
A diagram to show the previous implementation.

The bottleneck was between the “Write to Excel”. Where we have to:
Download the Excel file from S3
Write (append)
Upload to S3 again
After 30k records, the file was huge (IIRC, 100~150MB). Each iteration took a lot of time and surpassed $timeout = 600 lol.
Well, at least we spotted the bottleneck; let’s enhance it.
Solutions
Every exporter out there, we always do:
Fetch data
Compute data (transform into readable or accountable data)
Write
Let’s add some love for fetch & compute.
Fetch & Compute
Fetch & compute are the most important tasks. We should handle it with care, indeed. So I created a new table:
export_rows
id
export_id
data (json column)
After fetching & computing each record, we’ll write into the export_rows table. The data column will store an array of values, e.g. ['Seth', 'Vietnam', 'github.com/sethsandaru']
To ensure fetch runs fast, I believe you already know how to optimize your query and add enough indexes.
Write
Once we insert all the export_rows. It’s time to write. We create a new Excel/CSV file and simply:
Pull data by chunk
- Approx pulling 1k records would take around ~150ms (even faster with nowadays CPU)
Write it into the file
- Approx appending 1k records would take ~200ms (even faster with nowadays CPU, too)
Upload to S3
This will take faster since it’s purely reading simple things from the DB, no hard pressure. Then write & upload.
Using AWS Lambda, average processing time takes around ~20s for writing & uploading 100k records, it’s pretty fast, I’d say.
Note: after uploading the exported file to S3, we should delete all of export_rows to save space.
Result
From exporting in hours for thousands of records, it is now minutes. And of course, once it’s done, users will get notified via email.
Thanks for reading, and I hope it helps to improve your exporters!


