Programming

How to Efficiently Retrieve Multiple Objects from AWS S3

Hhandling multiple S3 objects can be a royal pain when you’re dealing with performance issues. I recently faced this exact problem while working on a project that required me to retrieve numerous media files from AWS S3, process them to create thumbnails, and display them on a webpage.

It sounds simple enough, right? Wrong.

The performance bottleneck hit me like a ton of bricks. When you’re retrieving files one after another, the wait time becomes unbearable – especially since the number of files was completely dynamic. Your users will absolutely abandon your application if they have to stare at loading screens for ages.

The Parallel Request Solution

The solution is actually quite brilliant – make multiple requests in parallel instead of sequentially. This approach dramatically reduces retrieval time to essentially the duration needed for the longest file. Instead of adding up all the wait times, you’re overlapping them!

We will be using PHP based solution in this article(the stack I had to use while dealing with my project). However, even if you are trying to achieve this in a different language, chances are you would be able to do so following similar approach shown in this guide.

Thankfully, the official Amazon PHP SDK uses the Guzzle library for HTTP requests, and since version 2.0, Guzzle has supported parallel requests. This made implementing my solution much easier than expected.

Implementation: Creating a Custom S3 Client

Let’s dive right into the code. I’ll show you exactly how I extended the S3Client class to add parallel retrieval functionality:

<?php 
namespace S3;

use Aws\S3\S3Client;
use \Aws\Common\Exception\TransferException;

/**
 * Extended S3Client class for retrieving multiple objects in parallel
 * @author Your Name
 */
class MyS3Client extends S3Client {
    
    /**
     * Retrieves multiple S3 objects in parallel
     * 
     * @param Array $configs Configuration array for each object
     * @param S3Client $client S3Client instance
     * @return void
     */
    public static function getObjects(Array $configs, S3Client $client) {
        $requests = array();
        $savePaths = array();
        
        // Create request objects for each file
        foreach ($configs as $config) {
            $url = "https://" . $config["Bucket"] . ".s3.amazonaws.com/" . $config["Key"];
            $request = $client->get($url);
            $requests[] = $request;
            $savePaths[$url] = $config["saveAs"];
        }
        
        // Send all requests in parallel
        try {
            $responses = $client->send($requests);
        } catch(TransferException $e) {
            echo $e->getError();
        }
        
        // Process all responses and save files
        foreach ($responses as $res) {
            $localPath = $savePaths[$res->getEffectiveUrl()];
            file_put_contents($localPath, $res->getBody(true));
        }
    }
}Code language: HTML, XML (xml)

How to Use the Parallel S3 Client

Using this custom client is incredibly straightforward. Here’s how you’d implement it in your project:

// Create a new instance of our custom S3 client
$s3 = new \S3\MyS3Client([
    'version' => 'latest',
    'region'  => 'us-east-1', // Change to your region
    'credentials' => [
        'key'    => 'YOUR_AWS_ACCESS_KEY',
        'secret' => 'YOUR_AWS_SECRET_KEY',
    ]
]);

// Create configuration array for multiple objects
$configs = array();

// Add first object
$configs[] = array(
    'Bucket' => "my-test-bucket",
    'Key'    => "path/to/first-object.jpg",
    'saveAs' => "local/path/first-image.jpg"
);

// Add second object
$configs[] = array(
    'Bucket' => "my-test-bucket",
    'Key'    => "path/to/second-object.jpg",
    'saveAs' => "local/path/second-image.jpg"
);

// Add as many objects as needed following the same pattern

// Retrieve all objects in parallel
\S3\MyS3Client::getObjects($configs, $s3);Code language: PHP (php)

Understanding the Implementation in Detail

Let me break down exactly what’s happening in our implementation:

  1. We extend the original S3Client class provided by the AWS PHP SDK, which means you can use it exactly like the original client with all its methods.
  2. We add a static method called getObjects() that takes two parameters:
    • An array of configurations (one for each object to be retrieved)
    • An instance of the S3 client
  3. For each object in the configs array, we:
    • Construct the S3 URL
    • Create a GET request
    • Store the request in our requests array
    • Map the URL to the local save path for later use
  4. We send all requests simultaneously using the client’s send() method with our array of requests.
  5. Once we receive the responses, we iterate through them and save each object to its designated local path.

Why a Static Method?

You might be wondering why I implemented getObjects() as a static method. The original S3Client class is structured so that most methods map directly to AWS SDK REST API commands, with additional utility methods being static. I followed this pattern for consistency.

That said, if you have a better approach to implement this as a non-static method, I’d absolutely love to hear about it! Leave a comment below with your suggestions.

Performance Benefits

The performance improvement from this approach is nothing short of impressive. Let’s put it into perspective:

  • Sequential approach: If you have 10 files taking 2 seconds each = 20 seconds total wait time
  • Parallel approach: The same 10 files = approximately 2 seconds (the time of the slowest file)

That’s a 90% reduction in wait time! Your users will definitely notice the difference, and your application will feel much more responsive.

Important Considerations

Before implementing this solution, keep these points in mind:

  1. Memory Usage: Processing multiple large files simultaneously requires more memory. Monitor your application’s memory consumption.
  2. AWS Rate Limits: Be mindful of AWS request limits. Very large numbers of parallel requests might trigger throttling.
  3. Error Handling: Our example includes basic error handling, but you should enhance it for production use.
  4. The ‘saveAs’ Parameter: Unlike single object retrieval where ‘saveAs’ is optional, it’s required here to ensure each object is saved to the correct location.

Applying This Beyond S3

The parallel request pattern isn’t limited to S3. You can apply similar techniques to other scenarios requiring multiple HTTP requests, such as:

  • Fetching data from multiple API endpoints
  • Downloading files from various sources
  • Processing batches of database records

Conclusion

Retrieving multiple S3 objects in parallel through PHP is an extremely effective way to optimize your application’s performance. By extending the AWS SDK’s S3Client class and leveraging Guzzle’s parallel request capabilities, you can dramatically reduce wait times for your users.

The code provided here is straightforward to implement and can be easily integrated into existing projects. If performance is important for your S3 operations – and let’s be honest, when isn’t it? – this approach is definitely worth implementing.

Have you tried similar optimization techniques with AWS services? I’d love to hear about your experiences in the comments!

Rana Ahsan

Rana Ahsan is a seasoned software engineer and technology leader specialized in distributed systems and software architecture. With a Master’s in Software Engineering from Concordia University, his experience spans leading scalable architecture at Coursera and TopHat, contributing to open-source projects. This blog, CodeSamplez.com, showcases his passion for sharing practical insights on programming and distributed systems concepts and help educate others. Github | X | LinkedIn

Recent Posts

Python File Handling: A Beginner’s Complete Guide

Learn python file handling from scratch! This comprehensive guide walks you through reading, writing, and managing files in Python with real-world examples, troubleshooting tips, and…

5 days ago

Service Worker Best Practices: Security & Debugging Guide

You've conquered the service worker lifecycle, mastered caching strategies, and explored advanced features. Now it's time to lock down your implementation with battle-tested service worker…

2 weeks ago

Advanced Service Worker Features: Push Beyond the Basics

Unlock the full potential of service workers with advanced features like push notifications, background sync, and performance optimization techniques that transform your web app into…

4 weeks ago

This website uses cookies.