Image processing using AWS Lamba actors

Run one after another, independently

Overview

Roadmap
Pre-signed S3 URLs
Add List in Dynamo
Custom image collection
Add Watermark
Image Processing: Nudity Check
Store image in AWS CloudFront
Update Dynamo
Dynamo Triggers of actors
- Alternative chaining via SNS
- Resize images to various sizes
Compression service
Extract geotags from photos
- Geo names from coordinates
Facial recognition
Archival to AWS Glacier
Resources

Here is a design to process files containing pictures using AWS Lambda functions calling various 3rd party APIs.

This roadmap is designed for an agile “minimul viable product” approach.

The advantage of doing this on AWS S3 rather than Gulp.js is that it is publicly accessible to many more people than only those who can setup a Gulp server.

https://medium.freecodecamp.org/how-to-post-process-user-images-programmatically-with-rails-amazon-s3-including-testing-c72645536b54

Roadmap

The first three steps are described in qwikLabs:

Setup Lambda function to invoke upon S3 bucket file upload
Manually drop a file into an S3 bucket (or Amazon Drive)
Trigger invokes Lambda function
Add to list in DynamoDB
Apply initial image recognition task (extent image has nudity, etc.)
Update DynamoDB with results and timings

If acceptable, continue:
Obtain images from alternate source
DynamoDB change triggers additional processing
Detect and add invisible watermark
Generate thumbnails and other size images
Compress image file (using TinyPng API)
Store image in AWS CloudFront
Generate HTML with img sizes in amp-img format for return

More image processing:
Facial recognition (does the image contain one or more faces?)
Extract geotags from photos

Automate input:
Generate pre-signed S3 URLs
Desktop website to accept drops of files to store in S3
Mobile app to accept upload of files to store in S3

Add capacity management features:
Automate saving of multiple files into S3 by another (test) program
Update setup DNS across two Availability Zones
Setup replication of DynamoDB across availability zones

Add auto-discovery features:
Scan through HTML to extract local images to convert
Replace image file URL in HTML
Commit change into GitHub

Add management features:
Filter log group/stream to generate metric
Display metrics from DynamoDB on Tableau
Setup alarms on CloudWatch of metrics
Email once a day with metrics summary (cron)

Add archival features:
DynamoDB change triggers additional processing
Archive file to AWS Glacier
Update DynamoDB about archival and file deletion
Remove file from S3

Process files in other clouds:
Process image files in Amazon Cloud Drive (https://www.amazon.com/clouddrive/)
Process image files in Dropbox
Process image files in Microsoft Azure
Process image files in Google Drive

Additional notification features:
SNS to IFTTT to invoke IoT https://github.com/danilop/SNS2IFTTT
Zapier notification

Let’s work on these together! Contact me.

Pre-signed S3 URLs

This Amazon doc notes that normally, S3 buckets are considered private. Permissions need to be assigned to those who want to upload.

This Amazon doc notes that S3 URLs can be pre-signed with permissions to upload files by providing:

security credentials,
a bucket name object key,
the HTTP PUT method (uploading objects), and
an expiration date and time

BTW: Visual Studio users can manually obtain a pre-signed object URL without writing any code by using the Visual Studio AWS Explorer.

NOTE: Amazon allows pre-signed S3 URLs to be valid for only 10 minutes.

Code for pre-signing are provided in .NET, Ruby, and Java. TODO: write a CreateFunction function to pre-sign. The AmazonS3.generatePresignedUrl method of the AmazonS3 class within the AWS SDK for Java. GeneratePresignedUrlRequest class.

TODO: Embed the pre-signed S3 URL in the HTML for presentation in a webpage where visitors can drag and drop files.

But the website should ask for (and validate) email addresses.

Add List in Dynamo

Custom image collection

From Lithium

Fetching the latest photos posted to the community
Fetching the latest photos posted to a node
Fetching photos with most kudos
Fetching top photo contributors
Fetching photos uploaded to a thread
Fetching photos posted by a user

Add Watermark

A digital watermark is embedded in the picture file some code used to prove copyright. The code is usually cryptographically signed.

Neutrinoapi offers an on-line API to add a visible watermark (using Alpha blending). It converts various image formats automatically (GIF, ICO, JPEG, PNG, TIFF).

A repeatable automated batch process (AP) is needed just to iterate when some programs don’t work. The watermarkng feature of OpenStego (a Java program) is still in beta and one persona> found that watermarked file sizes to be much smaller than the original and there is a noticeable quality loss.

Several organizations provide manual GUI to add invisible watermarks, one file at a time.

http://www.adptools.com/en/signmyimage-description.html
http://www.phibit.com/icemark/ $50
https://www.digimarc.com/products/guardian/images/photoshop-plug-in Digimarc Guardian in Adobe Photoshop
http://louisem.com/1912/free-watermark-software-watermark-online

Image Processing: Nudity Check

https://algorithmia.com/algorithms/sfw/NudityDetection Algorithmia.com

Swagger for service:

NOTE: Does not work in black and white though.

Training cases:

1) nude: True, confidence: 0.93
https://s3.amazonaws.com/www.isitnude.com/assets/images/sample/obama.jpg

2) nude: false, confidence: 0.95
http://www.isitnude.com.s3-website-us-east-1.amazonaws.com/assets/images/sample/young-man-by-the-sea.jpg

Store image in AWS CloudFront

An example of a resource in CloudFront is

http://d36cz9buwru1tt.cloudfront.net/AWS_Disaster_Recovery.pdf

Update Dynamo

DynamoDB is a NoSQL database containing key-value pairs.

Store in DynamoDB:

Picture URL
GUID to associate with other data
Date/Time of entry

For each picture:

Size of picture before compression
Size of picture after compression
Width of picture
Height of picture
Method of scaling
ContainsNudity: true/false
ContainsNudityConfidence: 0 to 100%.

Other information on all AWS activity logs:

instance-id,
region,
availability-zone
environment (staging, production, etc),

Having logs outside each server makes it unnecessary to SSH into individual servers and enables trends across servers and other attributes to be analyzed.

This blogger says:

“If you have to SSH into your servers, then your automation has failed. This is both the most frightening and yet most useful thing I’ve learned.”

Discussion on this on Hacker News.

Dynamo Triggers of actors

The workflow design concern here is with so many steps, we want to avoid a “master” program hanging around waiting for steps to complete just to hand-off to another function.

How about we have each Lambda function kill itself after invoking a new function to run independently, possibly defined in another programming language.

This would also enable fine-grained management of individual functions.

See: http://stackoverflow.com/questions/31714788/can-an-aws-lambda-function-call-another suggested chaining function calls via SNS topics https://mobile.awsblog.com/post/Tx1VE917Z8J4UDY/Invoking-AWS-Lambda-functions-via-Amazon-SNS

https://java.awsblog.com/post/Tx2J2LPKTTVU93H/Invoking-AWS-Lambda-Functions-from-Java

Documentation:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

Guided walkthrough:
https://aws.amazon.com/blogs/aws/dynamodb-update-triggers-streams-lambda-cross-region-replication-app/

CAUTION: There is a concurrent Lambda invocation limit of 100 at a time per account.

API Gateway has a maximum limit of 1000 RPS (requests per second), but can be adjusted by request.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

Alternative chaining via SNS

An alternative is via subscription to SNS topics.

Lambda publishes message to a SNS Topic</a>
Other Lambdas subscribe to this topic so as soon as messages arrive in the topic, second Lambda gets executed with the message as it’s input parameter.

Lambda publishes message to a SNS Topic
Other Lambdas subscribe to this topic so as soon as messages arrive in the topic, second Lambda gets executed with the message as it’s input parameter.

Resize images to various sizes

Pictures need to be re-sized for (pixels):

150x? fixed width, height is scaled as needed
50x50 scale image best into box
?x150 fixed height, width is scaled as needed
108x108 PNG (with transparency) or JPG for display in the Alexa App.
512x512 PNG (with transparency) or JPG for display in the Alexa App on larger screens.
2560x1440 for YouTube Channel Art

ImageMagick is used.

PROTIP: Do image compression AFTER resize?

Compression service

https://tinypng.com/developers

Extract geotags from photos

There are several tools to read the metadata devices embed in photo files.

From JPEG images Phil Harvey’s EXIF command extracts:

   North or South Latit|N                                                         
   Latitude            |42.00, 39.00, 24.67                                       
   East or West Longitu|E                                                         
   Longitude           |23.00, 21.00, 28.45                                       
   Altitude reference  |0x00                                                      
   Altitude            |625.50
    
   Manufacturer        |Nokia                                                     
   Model               |E71                                                       
   Orientation         |top - left                                                
   x-Resolution        |300.00                                                    
   y-Resolution        |300.00                                                    
   Resolution Unit     |Inch                                                      
   YCbCr Positioning   |centered                                                  
   Compression         |JPEG compression                                          
   x-Resolution        |72.00                                                     
   y-Resolution        |72.00                                                     
   Resolution Unit     |Inch                                                      
   FNumber             |f/3.2                                                     
   Exif Version        |Exif Version 2.2                                          
   Date and Time (origi|2011:01:24 12:15:01                                       
   Date and Time (digit|2011:01:24 12:15:01                                       
   ComponentsConfigurat|Y Cb Cr -                                                 
   Aperture            |3.20 EV (f/3.0)                                           
   Light Source        |0                                                         
   Flash               |Flash did not fire, auto mode.                            
   Focal Length        |4.9 mm                                                    
   FlashPixVersion     |FlashPix Version 1.0                                      
   Color Space         |sRGB                                                      
   PixelXDimension     |2048                                                      
   PixelYDimension     |1536                                                      
   Custom Rendered     |Normal process                                            
   Exposure Mode       |Auto exposure                                             
   White Balance       |Auto white balance                                        
   Digital Zoom Ratio  |1.00                                                      
   Scene Capture Type  |Standard                                                  
   GPS tag version     |0x02, 0x02, 0x00, 0x00

For videos there is exiftool at http://www.sno.phy.queensu.ca/~phil/exiftool/ Perl library.

There is also https://github.com/DIA-NZ/Metadata-Extraction-Tool the National Library of New Zealand wrote in Java and XML since 2003 at http://meta-extractor.sourceforge.net/

On-line from a browser are:

http://readexifdata.com/
http://regex.info/exif.cgi displays the day for a file URL specified. See http://regex.info/blog/other-writings/online-exif-image-data-viewer

BTW, http://www.dpreview.com/forums/1005 has photos.

Geo names from coordinates

Obtain from Nominatim OpenStreetMap Web Service

https://exposingtheinvisible.org/resources/obtaining-evidence/image-digging provides a Ruby script to process a list of image files.

Next, display the tag in a Google Map

Facial recognition

Does the image contain one or more faces?

https://github.com/danilop/docker-opencv

Archival to AWS Glacier

Amazon Glacier provides extremely low-cost storage for data archiving and backup. Objects (or archives, as they are known in Amazon Glacier) are optimized for infrequent access, for which retrieval times of several hours are adequate.

Resources

https://speakerdeck.com/michaelwittig/the-life-of-a-serverless-microservice-on-aws Michael Wittig (@hellomichibye, mwittig@tecracer.de):

https://cloudonaut.io/serverless-image-resizing-at-any-scale/
https://github.com/michaelwittig/devopscon16-auth-service
https://github.com/michaelwittig/devopscon16-profile-service
https://github.com/michaelwittig/devopscon16-location-service
CodePileline/CodeCommit available in us-east-1 only (as of June 2016).

(save 39% using code ctwdevopstw) manning.com/wittig/

Wilson Mar