Skip to main content

Command Palette

Search for a command to run...

Fetch and compute first, write later

Updated
2 min read
Fetch and compute first, write later
S

develops awesome software, contributes to OSS, writes tech tips, and loves Vietnamese milk coffee!

Hey guys,

An update after I posted the “Laravel Excel: How to append rows to an existing Excel file“ a while back. This post will give you another solution and probably a better way to export that fits every case.

Problems

Previously, when trying to export more than 30k records, the overall process was getting slow.

A diagram to show the previous implementation.

The bottleneck was between the “Write to Excel”. Where we have to:

  • Download the Excel file from S3

  • Write (append)

  • Upload to S3 again

After 30k records, the file was huge (IIRC, 100~150MB). Each iteration took a lot of time and surpassed $timeout = 600 lol.

Well, at least we spotted the bottleneck; let’s enhance it.

Solutions

Every exporter out there, we always do:

  • Fetch data

  • Compute data (transform into readable or accountable data)

  • Write

Let’s add some love for fetch & compute.

Fetch & Compute

Fetch & compute are the most important tasks. We should handle it with care, indeed. So I created a new table:

  • export_rows

    • id

    • export_id

    • data (json column)

After fetching & computing each record, we’ll write into the export_rows table. The data column will store an array of values, e.g. ['Seth', 'Vietnam', 'github.com/sethsandaru']

To ensure fetch runs fast, I believe you already know how to optimize your query and add enough indexes.

Write

Once we insert all the export_rows. It’s time to write. We create a new Excel/CSV file and simply:

  • Pull data by chunk

    • Approx pulling 1k records would take around ~150ms (even faster with nowadays CPU)
  • Write it into the file

    • Approx appending 1k records would take ~200ms (even faster with nowadays CPU, too)
  • Upload to S3

This will take faster since it’s purely reading simple things from the DB, no hard pressure. Then write & upload.

Using AWS Lambda, average processing time takes around ~20s for writing & uploading 100k records, it’s pretty fast, I’d say.

Note: after uploading the exported file to S3, we should delete all of export_rows to save space.

Result

From exporting in hours for thousands of records, it is now minutes. And of course, once it’s done, users will get notified via email.

Thanks for reading, and I hope it helps to improve your exporters!

10 views