The goal of Salty Wet Man is to flag inappropriate online content to make the internet a safer and more inclusive space for everyone.
- Motivation
- Technical Solution
- Convolutional Neural Networks
- Object Recognition
- NSFW Object Recognition: Content-Based Retrival via Localization
- NSFW Object Recognition: Image Cropping
- Neural Network: Classifier Model
- Neural Network: Errors and Overfitting
- Technical Installations
- Technical Visualizations
- Technical User Privacy Considerations
- References
A chessboard features 16 playing pieces with 6 types where each piece has special moves and the end game is to capture the opponent's King resulting in "checkmate". What is the most powerful piece on the chess board? Many people will say the King or Queen because they are the highest rank. However I believe the most powerful are the nine Pawns (lowest rank). This is because through pawn promotion gameplay, the nine Pawns have the power to get promoted to become Queens, Rooks, Bishops, or Knights. Therefore we need to nurture and protect them throught gameplay as they are the seeds for the future.
Being online can astronomically magnify threats and risks that vulnerable children already face offline.
Children are increasingly exposed to digital media and online technology at an early age. They are going online to do schoolwork, play games, and socialize with over 4 billion people (1 in 3 children) connected to the internet. Around 60% of fourth to eighth graders have access to phones or tablets and almost half of them have access to a computer in their bedrooms.
Access to the internet can lead to risks of exposure to online predators posed by online sexual abuse and exploitation, cyberbullying, exposure to harmful inappropriate content, and use and sharing of personal data. The COVID19 global pandemic with it's lockdown measures has led to widespread school closures and physical distancing measures increasing our dependence on technology to connect. Law enforcement authorities and reporting agencies have seen a statistically signficant increase in the amount of child sexual abuse material being shared online, of which an ever increasing percentage involves self-generated content.
Innovation at UNICEF is about doing new things to solve problems and improve the lives of children around the world. Technological solutions like Online Protection Tools are key to efficiently respond the digital risks for children. Four categories of digital risks defined by UNICEF: Content, Contact, Conduct and Contract Risks: Focusing on Content Risks, which is defined as exposure to harmful or age-inappropriate content, such as pornography, child sexual abuse material, hate speech and extremism, discriminatory or hateful content, disinformation, online games, gambling, content that endorses risky or unhealthy behaviours and violent content which may be upsetting or show criminal activity.
Defining NSFW material is subjective and the task of identifying these images is non-trivial
Salty-Wet-Man identifies images solving a binary classification success/failure problem:
[SFW] positively trained for neutral images that are safe for work
[NSFW] negatively trained for inappropriate images that are not safe for work
Image Datasets
- Theoretically CNN is best since large learning capacity and complexity
- Stationarity of statistics
- Locality of pixel dependencies
NSFW Images
- Static images
- Uncontrolled backgrounds
- Multiple people and partial figures
- Different camera angles
GPU Implementation
- Heavy computation required - Size of CNN network limited by GPU memory avaliabe
- Highly optimized implementation of 2D convolutions
- Solution to spread network over multiple GPUs via parallel processing
Deep Learning's Impact on Computer Vision
Labeled Image-Training Datasets
- Small image datasets (order of tens of thousands of images) - MNIST digit-recognition with best error rate
- Large image datasets (order of hundreds of thousands of images) - ImageNet
ImageNet used for Large Scale Object Recognition
- Dataset over 15 million labeled images
- Variable-resolution images (256x256)
- Training, validation, and testing images
- Benchmark - ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)
Image Location with Large Areas of Skin-colored Regions
Skin region properties - image, color, and texture
Input RGB values (skin spatial pixels) with log-opponent representation
- L(x) = 105*logbaseten(x+1+n)
- I = L(G)
- Rg = L(R) - L(G)
- By = L(B) - (L(G) + L(R))/2
Intensity of image (texture) smooth-ed with median filter, then subtracted from original image
Query By Image Content (QBIC)
- Absraction of an image to search for colored textured regions
- Uses image decomposition, pattern matching, and clustering algorithms
- Find a set of images similar to a query image
Elongated Regions Grouping
- Group 2D and 3D constraints on body/skin regions
- Model human body == cylindrical parts within skeleton geometry
- Identify region outline
Classify Regions into Human Limbs
- Geometric grouping algorithms - matching view to collection of images of an object
- Make hypothesis object present, and an estimate of appearance via future vector from compressed image
- Minimum distance classifer to match feature vectors
Object Image Segmentation
- Group together skin pixels
- Normalized cut
Input image each pixel with a category label
- For every pixel - Check if the pixel [skin or not-skin]
If atleast 30% of the image area skin, the image will be identified as passing the skin filter
Training data for this super expensive - need to find images with every pixel labeled
How would salty-wet-man choose the image crops?
Brute force image cropping - sliding window approach (Bad)
Region proposals
- Looks for edges, and draw boxes around them
Region detection without proposals
- VGG16 is a CNN for large-scale image recognition
- Model achieves 92.7% top-5 test accuracy on ImageNet
- Implemented with Keras and Tensorflow backend in this project
- Fixed input of 224 x 224 RGB image
- Three fully-connected (FC) layers
- 4096, 4096, and 1000 chanels respectively
- Max pooling layers
- Hidden layers have ReLu Retification
- Final layer is soft-max layer
- Total 16 Layers
- Super slow - takes weeks to train
- Large disk/bandwidth network achitecture with +533MB
- Consider varient VGG19 classifer
Keras Implementation
keras.applications.vgg16.VGG16(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
Data Augmentation
Label peserving transfomations
- Transformed images do not need to be stored on GPU disk to save space
- Image translation and horizontal reflections
- Image captioning using PyTorch
RGB channel intensities
- Add transformation (covariance matrix) to each RGB image pixel
- Object idenity invariant to changes in intensity/colour of images
Dropout Rates
- ReLu neutrons
- Dropout is used for first two fully-connected (FC) layers (4096 and 4096)
Requires heavy computation
Install Python dependencies and packages (Keras, TensorFlow, and TensorFlow.js) - best to run from virtualenv
Download and convert the VGG16 model to TensorFlow.js format
Launch Node.js script to load converted model and compute maximally-activating input images for convnet's filters using gradient ascent in the input space. Save image files under
directory -
Launch Node.js script to calculate internal convolutional layers' activations and gradient-based Class Activation Map (CAM). Save image files under
directory -
Compile. Launch web view at
yarn visualize
Increase the number of filters to visualize per convolutional layer from default 8 to larger value (ex. 18):
yarn visualize --gpu --filters 18
Default image used for internal-activation and CAM visualization is "nsfw.jpg". Switch to another image by using the "--image waifu-pic.jpeg" 👀
yarn visualize --image waifu-pic.jpeg
HTML5 Local Storage Data
- Salty Wet Man cache stores data on user's local device
- Data.js information is removed when user clears cache
- Storage.setItem( 'game_state', JSON.stringify(gameState));
User.js File
- User.js file added for user privacy
- Template for configuring privacy and security
- Reduce tracking from web analytics, tracking, finger-printing, or shoulder surfers
- Harden browser settings against data disclosure or code execution vulnerabilities
- Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition .
- Alex Krizhevsky. 2012. ImageNet Classification with Deep Convolutional Networks
- Yahoo Engineering's Caffe DL library and CaffeOnSpark model.
- CS231n Computer Vision at Stanford University School of Engineering. Fei Fei Lee.
- Gabriel Goh. Image Synthesis from Yahoo's open_nsfw.
- Client-Side NSFW Classification.
- Ring-Filter image processing algorithm for Order Statistics.
- Mask R-CNN framework for object instance segmentation.
- Margaret M. FleckDavid A. Forsyth Chris Bregler. Finding Naked People. 1996.
- PyTorch tutorial review. and
- ImageNet training in PyTorch.
- PyTorch Image Models.
- Image Captioning PyTorch.
- PyTorch Visualizatoins. Implementation of convolutional neural network.
- Facebook AI Research. "Detectron2". Object detection and segmentation using PyTorch.
- Facebook AI Research. "Faster R-CNN and Mask R-CNN in PyTorch 1.0". Creating detection and segmentation models using PyTorch .