
Hhandling multiple S3 objects can be a royal pain when you’re dealing with performance issues. I recently faced this exact problem while working on a project that required me to retrieve numerous media files from AWS S3, process them to create thumbnails, and display them on a webpage.
It sounds simple enough, right? Wrong.
The performance bottleneck hit me like a ton of bricks. When you’re retrieving files one after another, the wait time becomes unbearable – especially since the number of files was completely dynamic. Your users will absolutely abandon your application if they have to stare at loading screens for ages.
The Parallel Request Solution
The solution is actually quite brilliant – make multiple requests in parallel instead of sequentially. This approach dramatically reduces retrieval time to essentially the duration needed for the longest file. Instead of adding up all the wait times, you’re overlapping them!
We will be using PHP based solution in this article(the stack I had to use while dealing with my project). However, even if you are trying to achieve this in a different language, chances are you would be able to do so following similar approach shown in this guide.
Thankfully, the official Amazon PHP SDK uses the Guzzle library for HTTP requests, and since version 2.0, Guzzle has supported parallel requests. This made implementing my solution much easier than expected.
Implementation: Creating a Custom S3 Client
Let’s dive right into the code. I’ll show you exactly how I extended the S3Client class to add parallel retrieval functionality:
<?php
namespace S3;
use Aws\S3\S3Client;
use \Aws\Common\Exception\TransferException;
/**
* Extended S3Client class for retrieving multiple objects in parallel
* @author Your Name
*/
class MyS3Client extends S3Client {
/**
* Retrieves multiple S3 objects in parallel
*
* @param Array $configs Configuration array for each object
* @param S3Client $client S3Client instance
* @return void
*/
public static function getObjects(Array $configs, S3Client $client) {
$requests = array();
$savePaths = array();
// Create request objects for each file
foreach ($configs as $config) {
$url = "https://" . $config["Bucket"] . ".s3.amazonaws.com/" . $config["Key"];
$request = $client->get($url);
$requests[] = $request;
$savePaths[$url] = $config["saveAs"];
}
// Send all requests in parallel
try {
$responses = $client->send($requests);
} catch(TransferException $e) {
echo $e->getError();
}
// Process all responses and save files
foreach ($responses as $res) {
$localPath = $savePaths[$res->getEffectiveUrl()];
file_put_contents($localPath, $res->getBody(true));
}
}
}Code language: HTML, XML (xml)How to Use the Parallel S3 Client
Using this custom client is incredibly straightforward. Here’s how you’d implement it in your project:
// Create a new instance of our custom S3 client
$s3 = new \S3\MyS3Client([
'version' => 'latest',
'region' => 'us-east-1', // Change to your region
'credentials' => [
'key' => 'YOUR_AWS_ACCESS_KEY',
'secret' => 'YOUR_AWS_SECRET_KEY',
]
]);
// Create configuration array for multiple objects
$configs = array();
// Add first object
$configs[] = array(
'Bucket' => "my-test-bucket",
'Key' => "path/to/first-object.jpg",
'saveAs' => "local/path/first-image.jpg"
);
// Add second object
$configs[] = array(
'Bucket' => "my-test-bucket",
'Key' => "path/to/second-object.jpg",
'saveAs' => "local/path/second-image.jpg"
);
// Add as many objects as needed following the same pattern
// Retrieve all objects in parallel
\S3\MyS3Client::getObjects($configs, $s3);Code language: PHP (php)Understanding the Implementation in Detail
Let me break down exactly what’s happening in our implementation:
- We extend the original
S3Clientclass provided by the AWS PHP SDK, which means you can use it exactly like the original client with all its methods. - We add a static method called
getObjects()that takes two parameters:- An array of configurations (one for each object to be retrieved)
- An instance of the S3 client
- For each object in the configs array, we:
- Construct the S3 URL
- Create a GET request
- Store the request in our requests array
- Map the URL to the local save path for later use
- We send all requests simultaneously using the client’s
send()method with our array of requests. - Once we receive the responses, we iterate through them and save each object to its designated local path.
Why a Static Method?
You might be wondering why I implemented getObjects() as a static method. The original S3Client class is structured so that most methods map directly to AWS SDK REST API commands, with additional utility methods being static. I followed this pattern for consistency.
That said, if you have a better approach to implement this as a non-static method, I’d absolutely love to hear about it! Leave a comment below with your suggestions.
Performance Benefits
The performance improvement from this approach is nothing short of impressive. Let’s put it into perspective:
- Sequential approach: If you have 10 files taking 2 seconds each = 20 seconds total wait time
- Parallel approach: The same 10 files = approximately 2 seconds (the time of the slowest file)
That’s a 90% reduction in wait time! Your users will definitely notice the difference, and your application will feel much more responsive.
Important Considerations
Before implementing this solution, keep these points in mind:
- Memory Usage: Processing multiple large files simultaneously requires more memory. Monitor your application’s memory consumption.
- AWS Rate Limits: Be mindful of AWS request limits. Very large numbers of parallel requests might trigger throttling.
- Error Handling: Our example includes basic error handling, but you should enhance it for production use.
- The ‘saveAs’ Parameter: Unlike single object retrieval where ‘saveAs’ is optional, it’s required here to ensure each object is saved to the correct location.
Applying This Beyond S3
The parallel request pattern isn’t limited to S3. You can apply similar techniques to other scenarios requiring multiple HTTP requests, such as:
- Fetching data from multiple API endpoints
- Downloading files from various sources
- Processing batches of database records
Conclusion
Retrieving multiple S3 objects in parallel through PHP is an extremely effective way to optimize your application’s performance. By extending the AWS SDK’s S3Client class and leveraging Guzzle’s parallel request capabilities, you can dramatically reduce wait times for your users.
The code provided here is straightforward to implement and can be easily integrated into existing projects. If performance is important for your S3 operations – and let’s be honest, when isn’t it? – this approach is definitely worth implementing.
Have you tried similar optimization techniques with AWS services? I’d love to hear about your experiences in the comments!
Discover more from CodeSamplez.com
Subscribe to get the latest posts sent to your email.

Leave a Reply