Wilson Mar bio photo

Wilson Mar

Hello. Hire me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Youtube

Github Stackoverflow Pinterest

Run one after another, independently


Overview

Here is a design to process files containing pictures using AWS Lambda functions calling various 3rd party APIs.

This roadmap is designed for an agile “minimul viable product” approach.

The advantage of doing this on AWS S3 rather than Gulp.js is that it is publicly accessible to many more people than only those who can setup a Gulp server.

Roadmap

The first three steps are described in qwikLabs:

  1. Setup Lambda function to invoke upon S3 bucket file upload
  2. Manually drop a file into an S3 bucket (or Amazon Drive)
  3. Trigger invokes Lambda function
  4. Add to list in DynamoDB

  5. Apply initial image recognition task (extent image has nudity, etc.)
  6. Update DynamoDB with results and timings

    If acceptable, continue:

  7. Obtain images from alternate source

  8. DynamoDB change triggers additional processing
  9. Detect and add invisible watermark
  10. Generate thumbnails and other size images
  11. Compress image file (using TinyPng API)
  12. Store image in AWS CloudFront
  13. Generate HTML with img sizes in amp-img format for return

    More image processing:

  14. Facial recognition (does the image contain one or more faces?)
  15. Extract geotags from photos

    Automate input:

  16. Generate pre-signed S3 URLs
  17. Desktop website to accept drops of files to store in S3
  18. Mobile app to accept upload of files to store in S3

    Add capacity management features:

  19. Automate saving of multiple files into S3 by another (test) program
  20. Update setup DNS across two Availability Zones
  21. Setup replication of DynamoDB across availability zones

    Add auto-discovery features:

  22. Scan through HTML to extract local images to convert
  23. Replace image file URL in HTML
  24. Commit change into GitHub

    Add management features:

  25. Filter log group/stream to generate metric
  26. Display metrics from DynamoDB on Tableau
  27. Setup alarms on CloudWatch of metrics
  28. Email once a day with metrics summary (cron)

    Add archival features:

  29. DynamoDB change triggers additional processing
  30. Archive file to AWS Glacier
  31. Update DynamoDB about archival and file deletion
  32. Remove file from S3

    Process files in other clouds:

  33. Process image files in Amazon Cloud Drive (https://www.amazon.com/clouddrive/)
  34. Process image files in Dropbox
  35. Process image files in Microsoft Azure
  36. Process image files in Google Drive

    Additional notification features:

  37. SNS to IFTTT to invoke IoT https://github.com/danilop/SNS2IFTTT
  38. Zapier notification

Let’s work on these together! Contact me.


Pre-signed S3 URLs

This Amazon doc notes that normally, S3 buckets are considered private. Permissions need to be assigned to those who want to upload.

This Amazon doc notes that S3 URLs can be pre-signed with permissions to upload files by providing:

  • security credentials,
  • a bucket name object key,
  • the HTTP PUT method (uploading objects), and
  • an expiration date and time

BTW: Visual Studio users can manually obtain a pre-signed object URL without writing any code by using the Visual Studio AWS Explorer.

NOTE: Amazon allows pre-signed S3 URLs to be valid for only 10 minutes.

Code for pre-signing are provided in .NET, Ruby, and Java. TODO: write a CreateFunction function to pre-sign. The AmazonS3.generatePresignedUrl method of the AmazonS3 class within the AWS SDK for Java. GeneratePresignedUrlRequest class.

TODO: Embed the pre-signed S3 URL in the HTML for presentation in a webpage where visitors can drag and drop files.

But the website should ask for (and validate) email addresses.

Add List in Dynamo

Custom image collection

From Lithium

  • Fetching the latest photos posted to the community
  • Fetching the latest photos posted to a node
  • Fetching photos with most kudos
  • Fetching top photo contributors
  • Fetching photos uploaded to a thread
  • Fetching photos posted by a user

Add Watermark

A digital watermark is embedded in the picture file some code used to prove copyright. The code is usually cryptographically signed.

Neutrinoapi offers an on-line API to add a visible watermark (using Alpha blending). It converts various image formats automatically (GIF, ICO, JPEG, PNG, TIFF).

A repeatable automated batch process (AP) is needed just to iterate when some programs don’t work. The watermarkng feature of OpenStego (a Java program) is still in beta and one persona> found that watermarked file sizes to be much smaller than the original and there is a noticeable quality loss.

Several organizations provide manual GUI to add invisible watermarks, one file at a time.

  • http://www.adptools.com/en/signmyimage-description.html

  • http://www.phibit.com/icemark/ $50

  • https://www.digimarc.com/products/guardian/images/photoshop-plug-in Digimarc Guardian in Adobe Photoshop

  • http://louisem.com/1912/free-watermark-software-watermark-online

Image Processing: Nudity Check

https://algorithmia.com/algorithms/sfw/NudityDetection Algorithmia.com

Swagger for service:

NOTE: Does not work in black and white though.

Training cases:

1) nude: True, confidence: 0.93
https://s3.amazonaws.com/www.isitnude.com/assets/images/sample/obama.jpg

2) nude: false, confidence: 0.95
http://www.isitnude.com.s3-website-us-east-1.amazonaws.com/assets/images/sample/young-man-by-the-sea.jpg

Store image in AWS CloudFront

An example of a resource in CloudFront is

http://d36cz9buwru1tt.cloudfront.net/AWS_Disaster_Recovery.pdf

Update Dynamo

DynamoDB is a NoSQL database containing key-value pairs.

Store in DynamoDB:

  • Picture URL
  • GUID to associate with other data

  • Date/Time of entry

For each picture:

  • Size of picture before compression
  • Size of picture after compression
  • Width of picture
  • Height of picture
  • Method of scaling

  • ContainsNudity: true/false
  • ContainsNudityConfidence: 0 to 100%.

Other information on all AWS activity logs:

  • instance-id,
  • region,
  • availability-zone
  • environment (staging, production, etc),

Having logs outside each server makes it unnecessary to SSH into individual servers and enables trends across servers and other attributes to be analyzed.

This blogger says:

“If you have to SSH into your servers, then your automation has failed. This is both the most frightening and yet most useful thing I’ve learned.”

  • <a target=”_blank” href=https://news.ycombinator.com/item?id=7173361”> Discussion on this on Hacker News</a>.

Dynamo Triggers of actors

The workflow design concern here is with so many steps, we want to avoid a “master” program hanging around waiting for steps to complete just to hand-off to another function.

How about we have each Lambda function kill itself after invoking a new function to run independently, possibly defined in another programming language.

This would also enable fine-grained management of individual functions.

See: http://stackoverflow.com/questions/31714788/can-an-aws-lambda-function-call-another suggested chaining function calls via SNS topics https://mobile.awsblog.com/post/Tx1VE917Z8J4UDY/Invoking-AWS-Lambda-functions-via-Amazon-SNS

https://java.awsblog.com/post/Tx2J2LPKTTVU93H/Invoking-AWS-Lambda-Functions-from-Java

Documentation:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

Guided walkthrough:
https://aws.amazon.com/blogs/aws/dynamodb-update-triggers-streams-lambda-cross-region-replication-app/

CAUTION: There is a concurrent Lambda invocation limit of 100 at a time per account.

API Gateway has a maximum limit of 1000 RPS (requests per second), but can be adjusted by request.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

Alternative chaining via SNS

An alternative is via subscription to SNS topics.

  • Lambda publishes message to a SNS Topic</a>
  • Other Lambdas subscribe to this topic so as soon as messages arrive in the topic, second Lambda gets executed with the message as it’s input parameter.
  1. Lambda publishes message to a SNS Topic
  2. Other Lambdas subscribe to this topic so as soon as messages arrive in the topic, second Lambda gets executed with the message as it’s input parameter.

Resize images to various sizes

Pictures need to be re-sized for (pixels):

  • 150x? fixed width, height is scaled as needed
  • 50x50 scale image best into box
  • ?x150 fixed height, width is scaled as needed

  • 108x108 PNG (with transparency) or JPG for display in the Alexa App.
  • 512x512 PNG (with transparency) or JPG for display in the Alexa App on larger screens.

  • 2560x1440 for YouTube Channel Art

ImageMagick is used.

PROTIP: Do image compression AFTER resize?

Compression service

https://tinypng.com/developers

Extract geotags from photos

There are several tools to read the metadata devices embed in photo files.

From JPEG images Phil Harvey’s EXIF command extracts:

   North or South Latit|N                                                         
   Latitude            |42.00, 39.00, 24.67                                       
   East or West Longitu|E                                                         
   Longitude           |23.00, 21.00, 28.45                                       
   Altitude reference  |0x00                                                      
   Altitude            |625.50
    
   Manufacturer        |Nokia                                                     
   Model               |E71                                                       
   Orientation         |top - left                                                
   x-Resolution        |300.00                                                    
   y-Resolution        |300.00                                                    
   Resolution Unit     |Inch                                                      
   YCbCr Positioning   |centered                                                  
   Compression         |JPEG compression                                          
   x-Resolution        |72.00                                                     
   y-Resolution        |72.00                                                     
   Resolution Unit     |Inch                                                      
   FNumber             |f/3.2                                                     
   Exif Version        |Exif Version 2.2                                          
   Date and Time (origi|2011:01:24 12:15:01                                       
   Date and Time (digit|2011:01:24 12:15:01                                       
   ComponentsConfigurat|Y Cb Cr -                                                 
   Aperture            |3.20 EV (f/3.0)                                           
   Light Source        |0                                                         
   Flash               |Flash did not fire, auto mode.                            
   Focal Length        |4.9 mm                                                    
   FlashPixVersion     |FlashPix Version 1.0                                      
   Color Space         |sRGB                                                      
   PixelXDimension     |2048                                                      
   PixelYDimension     |1536                                                      
   Custom Rendered     |Normal process                                            
   Exposure Mode       |Auto exposure                                             
   White Balance       |Auto white balance                                        
   Digital Zoom Ratio  |1.00                                                      
   Scene Capture Type  |Standard                                                  
   GPS tag version     |0x02, 0x02, 0x00, 0x00                                    
   

For videos there is exiftool at http://www.sno.phy.queensu.ca/~phil/exiftool/ Perl library.

There is also https://github.com/DIA-NZ/Metadata-Extraction-Tool the National Library of New Zealand wrote in Java and XML since 2003 at http://meta-extractor.sourceforge.net/

On-line from a browser are:

  • http://readexifdata.com/

  • http://regex.info/exif.cgi displays the day for a file URL specified. See http://regex.info/blog/other-writings/online-exif-image-data-viewer

BTW, http://www.dpreview.com/forums/1005 has photos.

Geo names from coordinates

Obtain from Nominatim OpenStreetMap Web Service

https://exposingtheinvisible.org/resources/obtaining-evidence/image-digging provides a Ruby script to process a list of image files.

Next, display the tag in a Google Map

Facial recognition

Does the image contain one or more faces?

https://github.com/danilop/docker-opencv

Archival to AWS Glacier

Amazon Glacier provides extremely low-cost storage for data archiving and backup. Objects (or archives, as they are known in Amazon Glacier) are optimized for infrequent access, for which retrieval times of several hours are adequate.

Resources

https://speakerdeck.com/michaelwittig/the-life-of-a-serverless-microservice-on-aws Michael Wittig (@hellomichibye, mwittig@tecracer.de):

  • https://cloudonaut.io/serverless-image-resizing-at-any-scale/

  • https://github.com/michaelwittig/devopscon16-auth-service
  • https://github.com/michaelwittig/devopscon16-profile-service
  • https://github.com/michaelwittig/devopscon16-location-service

  • CodePileline/CodeCommit available in us-east-1 only (as of June 2016).

(save 39% using code ctwdevopstw) manning.com/wittig/