Recently I was working on a task that required retrieving multiple media files/objects from the amazon s3 service, processing them to create preview/thumbnail versions and showing them on a web page. It is a pretty straightforward task except for the performance issue. What is gonna happen when there are several files(as the number of files is dynamic) and retrieving them one after another takes a long time, even before any kind of processing?
To solve this, we came up with the idea of making the requests in parallel, which will reduce the time to a small fraction(time to retrieve the longest file). If you don’t know, the official amazon php sdk utilizes guzzle library to perform the HTTP requests. Luckily, guzzle also started support performing parallel requests from their version 2.0, which makes the job a lot easier for us. Here I am sharing the code I came up with to achieve this goal. If you are looking for solutions to similar issues as mine, you can also get help from this.
Let’s see the main class code first(gist link):
<?php
namespace S3;
use Aws\S3\S3Client;
use \Aws\Common\Exception\TransferException;
/**
* Description of S3Client
* @author Rana
* @link https://codesamplez.com/programming/amazon-s3-get-multiple-objects-php
*/
class MyS3Client extends S3Client
{
//put your code here
public static function getObjects(Array $configs, S3Client $client)
{
$requests = array();
$savePaths = array();
foreach ($configs as $config) {
$url = "https://".$config["Bucket"].".s3.amazonaws.com/".$config["Key"];
$request = $client->get($url);
$requests[] = $request;
$savePaths[$url] = $config["saveAs"];
}
try {
$responses = $client->send($requests);
}
catch(TransferException $e) {
echo $e->getError();
}
foreach ($responses as $res) {
$localPath = $savePaths[$res->getEffectiveUrl()];
file_put_contents($localPath, $res->getBody(true));
}
}
}
//gist: https://gist.github.com/ranacseruet/9167580
Code language: HTML, XML (xml)
To use this, you can use the code below:
s3 = new \S3\MyS3Client();
$configs = array();
$configs[] = array(
'Bucket' => "{test-bucket-name}",
'Key' => "{test-key-for-object}",
'saveAs' => "{local-path-to-save}"
);
//add more as you need similar to above
//$configs[] = ....
//retrieve all
\S3\MyS3Client::getObjects($configs, $s3);
Code language: PHP (php)
See, it’s as simple as that. Let me explain this small code.
the main source code extends the original S3Client class provided by amazon PHP SDK. So, you can use it completely as if you were using the original class. Or, if adding it to an existing project, just change the instantiation of the s3 client variable with this new class.
Now, construct the config as if you are constructing to retrieve the single object and merge them into another array. that’s it. Though for single object retrieval, ‘saveAs’ is an optional parameter, and it returns the content, here is our case; we must need to pass this so that all objects get saved as per the given path/name.
original S3Client class is made up in a way that all methods are mapped into AWS SDK REST API commands. Additional methods are static. Thus, we made this new method static as well. However, if you have a good suggestion on how we can get it working in a normal method instead of a static one, I am open to it.
I hope this small tutorial on retrieving multiple objects from S3 will help you to some extent to optimize your web app and perform better. Happy programming 🙂
Discover more from CodeSamplez.com
Subscribe to get the latest posts sent to your email.
[…] function that might seem easier for you. But, in terms of performance, cURL still should be your primary choice. So, instead of digging into complexity of cURL library, this library will give you easy to understand API interface. Also, with guzzle, you will able to take the full power of curl to perform multiple parallel requests simultaneously as I described on my another article about retrieving multiple objects from AWS S3 in parallel. […]