Securing Images in Python With the Imagga NSFW Categorization API

In web and mobile applications, as well as any other digital media, the use of images as part of their content is very common. With images being so ubiquitous, there comes a need to ensure that the images posted are appropriate to the medium they are on. This is especially true for any medium accepting user-generated content. Even with set rules for what can and cannot be posted, you can never trust users to adhere to the set conditions. Whenever you have a website or medium accepting user-generated content, you will find that there is a need to moderate the content.

Why Moderate Content?

There are various reasons why content moderation might be in your best interest as the owner/maintainer of a digital medium. Some common ones are:

  • Legal obligations - If your application accomodates underaged users, then you are obligated to protect them from adult content.
  • Brand protection - How your brand is perceived by users is important, so you might want to block some content that may negatively affect your image.
  • Protect your users - You might want to protect your users against harassmsnt from other users. The harassment can be in the form of users attacking others by posting offensive content. An example of this is Facebook’s recent techniques of combating revenge p0rn on their platform.
  • Financial - It might be in your best interest financially, to moderate the content shown on your applications. For instance if your content is somewhat problematic, other businesses might not want to associate with you in terms of advertising on your platform or accepting you as an affiliate for them. For some Ad networks, keeping your content clean is a rule that you have to comply with if you want to use them. Google Adsense is an example of this. They strictly forbid users of the service from placing their ads on pages with adult content.
  • Platform rules - You might be forced to implement some form of content moderation if the platform your application is on requires it. For instance,Apple requires applications to have a way of moderating and restricting user-generated content before they can be placed on the App Store and Google also restricts apps that contain sexually explicit content

As you can see, if your application accepts user-generated content, moderation might be a requirement that you can’t ignore. There are different ways moderation can be carried out:

  • Individual driven - an example of this is a website that has admins that moderate the content. The website might work by either restricting the display of any uploaded content until it has been approved by an admin or it might allow immediate display of uploaded content, but have admins who constantly check posted content. This method tends to be very accurate in identifying inappropriate content, as the admins will most likely be clear as to what is appropriate/inappropriate for the medium. The obvious problem with this is the human labour needed. Hiring moderators might get costly especially as the application’s usage grows. Relying on human moderators can also affect the app’s user experience. Human response will always be slower than an automated one. Even if you have people working on moderation at all times, there will still be a delay in identifying and removing problematic content. By the time it is removed, a lot of users could have seen it. On systems that restrict showing uploaded content until it has been approved by an admin, this delay can become annoying to users.
  • Community driven - with this type of moderation, the owner of the application puts in place features that enable the app’s users to report any inappropriate content e.g. flagging the content. After a user flags a post, an admin will then be notified. This also suffers from a delay in identifying inappropriate content from both the community (who might not act immediately the content is posted) and the administrators (who might be slow to respond to flagged content). Leaving moderation up to the community might also result in reported false positives as content that is safe is seen by some users as inappropriate. With a large community, you will always have differing opinions, and because many people will probably not have read the Terms and Conditions of the medium, they will not have clear-cut rules of what is and isn’t okay.
  • Automated - with this, a computer system usually using some machine learning algorithm is used to classify and identify problematic content. It can then act by removing the content or flagging it and notifying an admin. With this, there is a decreased need for human labour, but the downside is that it might be less accurate than a human moderator.
  • A mix of some or all the above methods - Each of the methods described above comes with a shortcoming. The best outcome might be achieved from combining some or all of them e.g. you might have in place an automated system that flags suspicious content while at the same time enabling the community to also flag content. An admin can then come in to determine what to do with the content.

A Look at the Imagga NSFW Categorization API

Imagga makes available the NSFW (not safe for work) Categorization API that you can use to build a system that can detect adult content. The API works by categorizing images into three categories:

  • nsfw - these are images considered not safe. Chances are high that they contain ponographic content and/or display nude bodies or inappropriate body parts.
  • underwear - this categorizes medium safe images. These might be images displaying lingerie, underwear, swimwear, e.t.c.
  • safe - these are completely safe images with no nudity.

The API works by giving a confidence level of a submitted image. The confidence is a percentage that indicates the probability of an image belonging to a certain category.

To see the NSFW API in action, we’ll create two simple programs that will process some images using the API. The first program will demonstrate how to categorize a single image while the second will batch process several images.

Setting up the Environment

Before writing any code, we’ll first set up a virtual environment. This isn’t necessary, but is recommended as it prevents package clutter and version conflicts in your system’s global Python interpreter.

First, create a directory where you’ll put your code files.

[cc lang="bash"]$ mkdir nsfw_test[/cc]

Then navigate to that directory with your Terminal application.

[cc lang="bash"]$ cd nsfw_test[/cc]

Create the virtual environment by running:

[cc lang="bash"]$ python3 -m venv venv[/cc]

We’ll use Python 3 in our code. In the above, we create a virtual environment with Python 3. With this, the default Python version inside the virtual environment will be version 3.

Activate the environment with (on MacOS and Linux):

[cc lang="bash"]$ source venv/bin/activate[/cc]

On Windows:

[cc lang="bash"]$ venv\Scripts\activate[/cc]

Categorizing Images

To classify an image with the NSFW API, you can either send a GET request with the image URL to the [cci]/categorizations/[/cci] endpoint or you can upload the image to [cci]/content[/cci], get back a [cci]content_id[/cci] value which you will then use in the call to the [cci]/categorizations/[/cci] endpoint. We’ll create two applications that demonstrate these two scenarios.

Processing a Single Image

The first app we’ll create is a simple web application that can be used to check if an image is safe or not. We’ll create the app with Flask.

To start off, install the following dependencies.

[cc lang="bash"]$ pip install flask flask-bootstrap requests[/cc]

Then create a folder named [cci]templates[/cci] and inside that folder, create a file named [cci]index.html[/cci] and add the following code to it.

[cc lang="python"]
{% extends "bootstrap/base.html" %}

{% block title %}Imagga NSFW API Test{% endblock %}

{% block navbar %}

{% endblock %}

{% block content %}

{% if image_url %}

{{ res }}

{% endif %}

{% endblock %}
[/cc]

In the above code, we create a html template containing a form that the user can use to submit an image URL to the Imagga API. When the response comes back from the server, it will be shown next to the processed image.

Next, create a file named [cci]app.py[/cci] in the root directory of your project and add the following code to it. Be sure to replace [cci]INSERT_API_KEY[/cci] and [cci]INSERT_API_SECRET[/cci] with your Imagga API Key and Secret. You can signup for a free account to get these credentials. After creating an account, you’ll find these values on your dashboard:

[cc lang="python"]
from flask import Flask, render_template, request
from flask_bootstrap import Bootstrap
import os
import requests
from requests.auth import HTTPBasicAuth

app = Flask(__name__)
Bootstrap(app)

# API Credentials. Set your API Key and Secret here
API_KEY = os.getenv('IMAGGA_API_KEY', 'INSERT_API_KEY')
API_SECRET = os.getenv('IMAGGA_API_SECRET', 'INSERT_API_SECRET')

API_ENDPOINT = 'https://api.imagga.com/v1'

auth = HTTPBasicAuth(API_KEY, API_SECRET)

@app.route('/', methods=['GET', 'POST'])
def index():
image_url = None
res = None
if request.method == 'POST' and 'image_url' in request.form:
image_url = request.form['image_url']

response = requests.get(
'%s/categorizations/nsfw_beta?url=%s' % (API_ENDPOINT, image_url),
auth=auth)

res = response.json()
return render_template('index.html', image_url=image_url, res=res)

if __name__ == '__main__':
app.run(debug=True)
[/cc]

Every call to the Imagga API must be authenticated. Currently the only supported method for authentication is Basic. With Basic Auth, credentials are transmitted as user ID/password pairs, encoded using base64. In the above code, we achieve this with a call to [cci]HTTPBasicAuth()[/cci].

We then create a function that will be triggered by GET and POST requests to the [cci]/[/cci] route. If the request is a POST, we get the data submitted by form and send it to the Imagga API for classification.

The NSFW Categorizer is one of a few categorizers made available by the Imagga API. A Categorizer is used to recognize various objects and concepts. There are a couple predefined ones available (Personal Photos and NSFW Beta) but if none of them fit your needs we can build a custom one for you.

As mentioned previously, to send an image for classification, you send a GET request to the [cci]/categorizations/[/cci] endpoint. The [cci]categorizer_id[/cci] for the NSFW API is [cci]nsfw_beta[/cci]. You can send the following parameters with the request:

  • url: URL of an image to submit for categorization. You can provide up to 10 urls for processing by sending multiple url parameters (e.g. [cci]?url=&url=…&url=[/cci])
  • content: You can also directly send image files for categorization by uploading the images to our [cci]/content[/cci] endpoint and then provide the received content identifiers via this parameter. As with the url parameter you can send more than one image - up to 10 content ids by sending multiple [cci]content[/cci] parameters.
  • language: If you’d like to get a translation of the tags in other languages, you should use the language parameter. Its value should be the code of the language you’d like to receive tags in. You can apply this parameter multiple times to request tags translated in several languages. See all available languages here.

After processing the request, the API sends back a JSON object holding the image’s categorization data in case of a successful processing, and an error message incase there was a problem processing the image.

Below you can see the response of a successful categorization:

[cc lang="javascript"]
{
'results': [{
'image': 'https://auto.ndtvimg.com/car-images/big/dc/avanti/dc-avanti.jpg',
'categories': [{
'name': 'safe',
'confidence': 99.22
}, {
'name': 'underwear',
'confidence': 0.71
}, {
'name': 'nsfw',
'confidence': 0.07
}]
}]
}
[/cc]

Note that you might not always get JSON with the three categories displayed. If the confidence of a category is [cci]0[/cci], this category will not be included in the JSON object.

Below you can see the response of a failed categorization.

[cc lang="javascript"]
{
'results': [],
'unsuccessful': [{
'reason': 'An error prevented image from being categorized. Please try again.',
'image': 'http://www.axmag.com/download/pdfurl-guide.pdf'
}]
}
[/cc]

Back to our app, you can save your code and run it with:

[cc lang="bash"]
$ python app.py
[/cc]

If you navigate to http://127.0.0.1:5000/ you should see a form with one input field. Paste in the URL of an image and submit it. The image will be processed and you will get back a page displaying the image and the JSON returned from the server. To keep it simple, we just display the raw JSON, but in a more sophisticated app, it would be parsed and used to make some decision.

Below, you can see the results of some images we tested the API with.

As you can see, the images have been categorized quite accurately. The first two have [cci]safe[/cci] confidence scores of [cci]99.22[/cci] and [cci]99.23[/cci] respectively while the last one has an [cci]underwear[/cci] score of [cci]96.21[/cci]. Of course, we can’t show an [cci]nsfw[/cci] image here on this blog, but you are free to test that on your own.

To know the exact confidence score to use for your app, you should first test the API with several images. When you look at the results of several images, you will be able to better judge which number to look out for in your code when filtering okay and not okay images. If you are still not sure about this, our suggestion is setting the confidence threshold at 15-20%. However, if you’d like to be more strict on the accuracy of the results, setting the confidence threshold at 30% might do the trick.

You should know that the technology is far from perfect and that the NSFW API is still in beta. From time to time, you might get an incorrect classification.

Note that the API has a limit of 5 seconds for downloading the image. If the limit is exceeded with the URL you send, the analysis will be unsuccessful. If you find that most of your requests are unsuccessful due to timeout error, we suggest uploading the images to our [cci]/content[/cci] endpoint first (which is free and not accounted towards your usage) and then use the content id returned to submit the images for processing via the [cci]content[/cci] parameter. We’ll see this in action in the next section.

Batch Processing Several Images

The last app we created allowed the user to process one image at a time. In this section, we are going to create a program that can batch process several images. This won’t be a web app, it will be a simple script that you can run from the command line.

Create a file named [cci]upload.py[/cci] and add the code below to it. If you are still using the virtual environment created earlier, then the needed dependencies have already been installed, otherwise install them with [cci]pip install requests[/cci].

[cc lang="python"]
import os
import requests
from requests.auth import HTTPBasicAuth

# API Credentials. Set your API Key and Secret here
API_KEY = os.getenv('IMAGGA_API_KEY', 'INSERT_API_KEY')
API_SECRET = os.getenv('IMAGGA_API_SECRET', 'INSERT_API_SECRET')

API_ENDPOINT = 'https://api.imagga.com/v1'
FILE_TYPES = ['png', 'jpg', 'jpeg', 'gif']

class ArgumentException(Exception):
pass

if API_KEY == 'YOUR_API_KEY' or \
API_SECRET == 'YOUR_API_SECRET':
raise ArgumentException('You haven\'t set your API credentials. '
'Edit the script and set them.')

auth = HTTPBasicAuth(API_KEY, API_SECRET)

def upload_image(image_path):
if not os.path.isfile(image_path):
raise ArgumentException('Invalid image path')

# Open the desired file
with open(image_path, 'rb') as image_file:
filename = image_file.name

# Upload the multipart-encoded image with a POST
# request to the /content endpoint
content_response = requests.post(
'%s/content' % API_ENDPOINT,
auth=auth,
files={filename: image_file})

# Example /content response:
# {'status': 'success',
# 'uploaded': [{'id': '8aa6e7f083c628407895eb55320ac5ad',
# 'filename': 'example_image.jpg'}]}
uploaded_files = content_response.json()['uploaded']

# Get the content id of the uploaded file
content_id = uploaded_files[0]['id']

return content_id

def check_image(content_id):
# Using the content id, make a GET request to the /categorizations/nsfw endpoint
# to check if the image is safe
params = {
'content' : content_id
}
response = requests.get(
'%s/categorizations/nsfw_beta' % API_ENDPOINT,
auth=auth,
params=params)

return response.json()

def parse_arguments():
import argparse
parser = argparse.ArgumentParser(
description='Tags images in a folder')

parser.add_argument(
'input',
metavar='',
type=str,
nargs=1,
help='The input - a folder containing images')

parser.add_argument(
'output',
metavar='',
type=str,
nargs=1,
help='The output - a folder to output the results')

args = parser.parse_args()
return args

def main():
import json
args = parse_arguments()

tag_input = args.input[0]
tag_output = args.output[0]

results = {}
if os.path.isdir(tag_input):
images = [filename for filename in os.listdir(tag_input)
if os.path.isfile(os.path.join(tag_input, filename)) and
filename.split('.')[-1].lower() in FILE_TYPES]

images_count = len(images)
for iterator, image_file in enumerate(images):
image_path = os.path.join(tag_input, image_file)
print('[%s / %s] %s uploading' %
(iterator + 1, images_count, image_path))
try:
content_id = upload_image(image_path)
except IndexError:
continue
except KeyError:
continue
except ArgumentException:
continue

nsfw_result = check_image(content_id)
results[image_file] = nsfw_result
print('[%s / %s] %s checked' %
(iterator + 1, images_count, image_path))
else:
raise ArgumentException(
'The input directory does not exist: %s' % tag_input)

if not os.path.exists(tag_output):
os.makedirs(tag_output)
elif not os.path.isdir(tag_output):
raise ArgumentException(
'The output folder must be a directory')

for image, result in results.items():
with open(
os.path.join(tag_output, 'result_%s.json' % image),
'wb') as results_file:
results_file.write(
json.dumps(
result, ensure_ascii=False, indent=4).encode('utf-8'))

print('Done. Check your selected output directory for the results')

if __name__ == '__main__':
main()
[/cc]

We use the [cci]argparse[/cci] module to parse arguments from the command line. The first argument passed in will be the path to a folder containing images to be processed while the second argument is a path to a folder where the results will be saved.

For each image in the input folder, the script uploads it with a POST request to the [cci]/content[/cci] endpoint. After getting a content id back, it makes another call to the [cci]/categorizations/[/cci] endpoint. It then writes the response of that request to a file in the output folder.

Note that all uploaded files sent to [cci]/content[/cci] remain available for 24 hours. After this period, they are automatically deleted. If you need the file, you have to upload it again. You can also manually delete an image by making a DELETE request to [cci]https://api.imagga.com/v1/content/[/cci].

Add some images to a folder and test the script with:

[cc lang="bash"]$ python upload.py path/to/input/folder path/to/output/folder[/cc]

If you look at the output folder you selected, you should see a JSON file for each processed image.

Feel free to test out the Imagga NSFW Categorization API. If you have any suggestions on ways to improve it or just general comments on the API, you can post them in the Comment Section below or get in touch with us directly. We are always happy to get feedback on our products.


Moderating adult content, our new NSFW categorizer

The internet provides amazing opportunity to connect with each other and share information. But like every great invention, it has a dark side. Explicit content is lurking around every corner and it’s not uncommon to stumble upon it while innocently browsing the web. We call that content Not Safe For Work (NSFW) - not safe for minors and unappropriate for work.

NSFW Shock

We have been working hard to offer an excellent solution for detecting adult content and providing extremely effective API for distinguish between photos that are safe for work (no nudity), semi-safe (underwear and swimwear) and totally safe (no nudity whatsoever).

The NSFW (not safe for work) categorizer can be of extreme help for almost any online business that deals with user generated photos, or aggregates such from third parties. Various countries have quite strict restrictions on what content can be publicly available to minors, so we can help them too comply with the requirements. Not to mention Apple’s App Store restrictive rules on nudity that have been observed severely and have caused problems for apps, especially in the dating vertical.

Up to now, user generated photo content has been moderated manually. This is a time consuming and expensive process. It also, has some privacy problems - your sensitive content might end up being checked by somebody that you personally know (quite embarrassing, the world is small and this happens more often than you think).

Our adult content moderation categorizer can automate this process, so you can get a lot more content done in no time. The technology is in beta stage, and there might be some flaws, but we are still able to cover you and your app!

Currently our NSFW (adult content image moderation) categoriser puts images in three categories:

  • nsfw - not safe at all - expect p0rn images, nudes, body parts to be put into this category.
  • underwear - medium safe images such as lingerie, underwear, pants
  • safe - completely safe images without nudity

Underwear NSFW Classifier

Here are a couple of uses cases that illustrate how our NSFW Categorizer can be used to speed up the process of content moderation:

Marketplaces - awesome for online shops/marketplaces where people upload products (along with some photos) and you provide the infrastructure. The NSFW filter can be moderately restrictive - underwear and swimwear photos can be included, but only nude photos stay out of the public pages.

Kids websites/communities - perfect candidate for aggressive filtering of adult content. Anything related to nudity can be filtered out. Imagga’s NSFW categorizer can be the first step of automated elimination of problematic adult content and then the rest can be manually moderated to make sure unapropriate photos are out of kids’ sight.

Dating websites - there are lots of issues here with adult content, especially when it comes to apps for iPhone. Dating sites employ lots of people to eliminate the problem. Sometimes you need to upload a new profile photo to impress somebody special and it’s very frustrating when moderation takes forever even though you uploaded just a facial photo.

There are probably tons of cases where the NSFW categorizer can be of very practical use. Why don’t you give it a try and share your impressions?