Recently I was working on a task, that requires retrieving multiple media files/objects from amazon s3 service, process them to create preview/thumbnail versions and show them on a web page. It is pretty straightforward tasks except the performance issue. What is gonna happen when there are several files(as the number of files is dynamic) and retrieving them one after another would take a long time, even before any kind of processing.
To solve this, we came up with the idea to make the requests in parallel. Which will reduce the time to a small fraction(time to retrieve the longest file). If you don’t know, the official amazon php sdk utilizes guzzle library to perform the http requests. Luckily, guzzle also started support performing parallel requests from their version 2.0. Which makes the job a lot easier for us. Here I am sharing the code I came up with to achieve this goal. If you are looking for solutions to similar issues as mine, you can also might get help from this.
Lets see the main class code first(gist link):
To use this, you can use code as below:
s3 = new \S3\MyS3Client(); $configs = array(); $configs[] = array( 'Bucket' => "{test-bucket-name}", 'Key' => "{test-key-for-object}", 'saveAs' => "{local-path-to-save}" ); //add more as you need similar to above //$configs[] = .... //retrieve all \S3\MyS3Client::getObjects($configs, $s3);
See, its as simple as that. Let me explain this small code.
the main source code actually extends the original S3Client class provided by amazon php sdk. So, you can use it completely as if you we re using the original class. Or, if adding it into an existing project, just change the instantiation of the s3 client variable with this new class.
Now, construct the config as if you are constructing to retrieve single object and merge them into another array. that’s it. Though for single object retrieval, ‘saveAs’ is optional parameter and it returns the content, here is our case, we must need to pass this so that all objects get saved as per given path/name.
original S3Client class is made up in a way that, all methods are mapped into aws SDK REST API commands. Additional methods are static. Thus, we made this new method static as well. However, if you have a good suggestion how we can get it working in normal method instead of static one, I am open to it.
Hope this small tutorial to retrieve multiple objects from S3 will help you in some extent to optimize your web app and perform better. Happy programming 🙂
[…] function that might seem easier for you. But, in terms of performance, cURL still should be your primary choice. So, instead of digging into complexity of cURL library, this library will give you easy to understand API interface. Also, with guzzle, you will able to take the full power of curl to perform multiple parallel requests simultaneously as I described on my another article about retrieving multiple objects from AWS S3 in parallel. […]