Drupal Storage API and AWS S3 tutorial
By default, Drupal supports a local public and private file system for storing user uploaded files (images, pdfs, etc.). While it works well for most use cases, there are disadvantages. For example, it's nearly impossible to switch from public to private or vice versa. Once you make your choice, you're stuck with it. Also, it's somewhat limiting in the modern cloud era with cheap 3rd party cloud storage. If you have disk space constraints and are considering moving your files to the cloud, the two most popular active Drupal options are S3 File System (s3fs) and Storage API (storage).
s3fs module
Pros:
- Active development (three main developers)
- Uses the official AWS SDK (version 2 only, though)
- Easy to set up and use
- Provides a migrate mechanism
- Vendor lock-in (only supports AWS S3)
- Name conflicts with other unrelated s3fs projects
- Doesn't support advanced CSS/JS aggregation
- Manual file metadata cache refreshing required (issues with modules like imce)
- AWS S3 has noticeable performance issues (thumbnail creation, initial page load, etc.)
- No documented recommendations for dev/stage/prod workflow (all sites sharing one bucket)
- Bucket tagging via the Drupal config not supported
storage module
Pros:
- Supports multiple vendors (AWS S3, Rackspace, FTP)
- Supports CSS/JS aggregation natively
- Provides a migrate mechanism
- More flexible, modular architecture allows for future growth and enhancements
- De-duplication saves disk space and money
- Smart cron workflow allows fast local thumbnail creation/image viewing and lazy uploading to AWS S3
- Support for AWS CloudFront (CDN) so no performance problems
Cons:
- Harder to initially set up (somewhat confusing terminology and outdated documentation)
- Semi-actively developed by only two main developers
- Custom module-coded implementation of the AWS S3/CloudFront API (instead of the official SDK) so future Amazon API updates may break the module until updated
- xmlsitemap support currently requires a patch
- Dynamic image styles (e.g. responsive images) requires additional setup
- No imce support
- No documented recommendations for dev/stage/prod workflow (all sites sharing one bucket)
- Doesn't work with PostgreSQL
- Bucket tagging via the Drupal config not supported
After weighing the pros and cons, I eventually decided to go with Storage API. Here's how to migrate an existing site file system to AWS S3 using that module:
1. Download the necessary modules: drush dl imageinfo_cache storage_api-7.x-1.x-dev storage_api_stream_wrapper-7.x-1.x-dev storage_api_populate-7.x-1.x-dev
2. If needed, apply patch for xmlsitemap
3. Optionally apply this fix to suppress a false positive nag error
4. Enable the modules: drush en storage storage_stream_wrapper storage_api_populate imageinfo_cache
5. Go to /admin/config/media/file-system and change the default download method to Storage API (public or private depending on your site needs)
6. Now, for the somewhat labor-intensive step: update all your content type fields that rely on the file system to use Storage API. For example, edit the Article Drupal content type (/admin/structure/types/manage/article/fields) and edit the Image field and change the upload destination to Storage API (public or private depending on your site needs)
7. Once all your content types are updated to use Storage API, you're ready to have your existing files managed by Storage API. Go to /admin/structure/storage/populate and check Migrate all local files and Confirm and then click Start
8. After the process update completes, you can disable the populate module: drush dis storage_api_populate
9. Now that all your static files are managed by Storage API, you need to migrate your dynamic image styles: drush image-generate Choose all for any prompts:
10. Once the image styles have been generated (which may take a while to complete), you're ready to verify the migration. Move everything in the site's files directory except the storage folder and the .htaccess file to a temporary backup location and then run drush cc all && drush cron
11. Now, verify the site functions normally.
Congratulations, you've updated your site to use Storage API!
...But you're probably thinking, "Okay, so what's the big deal? The site looks the same and it just seems like all the files moved into a new folder called storage. So what?!"
...But you're probably thinking, "Okay, so what's the big deal? The site looks the same and it just seems like all the files moved into a new folder called storage. So what?!"
Well, get ready to experience the awesome power of Storage API by migrating your file system to AWS S3! (or you could just as easily move them to Rackspace, etc. using the same process...)
{
"Statement": [{
"Sid": "ModifyAssets",
"Action": [
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:GetObjectVersion",
"s3:GetObjectVersionAcl",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:PutObjectVersionAcl"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::yourbucketname/*"
]
}, {
"Sid": "BucketRights",
"Action": [
"s3:ListBucket",
"s3:ListAllMyBuckets"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::*"
]
}]
}
{
"Sid": "Stmt1450391402000",
"Effect": "Allow",
"Action": [
"cloudfront:CreateDistribution",
"cloudfront:CreateInvalidation",
"cloudfront:DeleteDistribution",
"cloudfront:GetDistribution",
"cloudfront:ListDistributions",
"cloudfront:UpdateDistribution",
"cloudfront:ListInvalidations"
"cloudfront:ListStreamingDistributions"
],
"Resource": [
"*"
]
}
2. Once the account is created with the necessary IAM permissions, you'll need to create an access key:
3. Once you have your access key ID and Secret, go to your Drupal site and browse to /admin/structure/storage/create-container
4. Choose Amazon S3 from the service dropdown and click Next
5. Provide your access key ID, Secret, and a globally unique bucket name (I recommend a name that does NOT include a dot [.] since that's interpreted as a subdomain). In addition, select the AWS region you want to create the bucket in. Finally, make sure to check the Serve with CloudFront checkbox (note: streaming with CloudFront is out of scope for this tutorial). You can optionally select the Reduced redundancy checkbox for cheaper 99.99% durability. Then click Create.
Note: it may take up to 20 minutes for the CloudFront processing to complete on the AWS backend but you can continue the setup process below immediately:
6. Go to /admin/structure/storage/create-class and give it a descriptive name like "Cloud" (keep Initial container Filesystem for performance reasons) and then click Create class
Note: like others, I have no idea what the other checkboxes do so leave them unchecked.
7. On the subsequent screen, choose Amazon S3 (the container you created in the step above) from the dropdown and then click Add
8. Now, go to /admin/structure/storage/stream-wrappers and click edit for Public, Private, or both (depending on your use case) and change the Storage class to Cloud
9. Finally, run drush cron to actually push your local files to the AWS S3 bucket. This may take a while so I strongly recommend using drush instead of the Drupal web interface to run cron.
10. Verify the site functions as expected. The images should now be served from amazonaws.com or cloudfront.net
11. Celebrate faster page load times and more file system redundancy! Also, now that your files are in S3, you can even set up a backup strategy for Infrequent Access or Glacier.
Hi Ramesh, can you clarify what that video tutorial has to do with my post? Spam?
ReplyDeletethanks for the article.
ReplyDeleteWhat about mounting S3 with https://linux.die.net/man/1/s3fs as a local file and using it for public or private? sounds much simpler but i don't know the cons.
In the stage of migrating existing site, when trying to change the field's settings 'upload destination' it won't let me if the settings was 'private'
ReplyDeleteMounting S3 would work but you're still stuck with Drupal public or private and not being able to switch between them or a hybrid option. Regarding your other question, are you saying you see the Storage API radio button options but they're disabled somehow?
ReplyDeleteRegarding mounting S3: I will leave the fields on "private", just divert the private folder to the one mounted with S3 so there's no problem. Is using Storage API worth all the hussle?
ReplyDeleteRegarding changing an existing field - yes, take a look here: http://jmp.sh/O8qZutD
ReplyDeleteSo I started to look for solutions such as https://www.drupal.org/node/1806220#comment-11164717
but it's very complicated and not scalable for multiple fields.
By the way, choosing the migration process as you describe it causes this error which I don't know how to handle:
An AJAX HTTP error occurred. HTTP Result Code: 500 Debugging information follows. Path: /batch?id=633&op=do StatusText: Service unavailable (with message) ResponseText:
That happens after the progress bar of the migration reaches 100%.
Another thing that might be missing in your recipe is adding the "storage" container - I installed the modules and had no such storage related to "everything"
FYI, mounting from S3 has noticeable latency issues so images will initially load slower, especially computed image styles like dynamic responsive images. Storage API is probably overkill for one or two small sites but very useful for sites with tons of images and you're paying for storage or tons of sites where you want to manage the disk usage centrally.
ReplyDeleteRegarding your errors, this guide is a bit old so make sure to install the latest dev version of all the modules for bug fixes and check the Drupal module page forum for patches. ....ah, the joy of working with Drupal!
ReplyDeleteIs there a way to attach a file to a file field that is already located on S3?
ReplyDeletehow to invoke this module in custom module and utilize the same. give some sample code for get and put files in s3bucket.
ReplyDelete