Identifying Vehicle Specs and Value with Custom Categorizers

Building Your Own Car Specs and Valuation App

While Imagga’s built-in image tagging system is already extremely powerful, sometimes your business requires something unique to its own use-cases. For that, Imagga provides custom categorizers. To help you understand when you might utilize custom categorizers and how you can implement them, let’s take a look at building a mobile application that lets users easily retrieve the specs and an estimated value of a vehicle just by snapping a few quick photos.

The basic flow of our app is pretty simple. The user will take two photos: front and back. Our custom categorizer will then identify the make and model of the car based on those photos and optionally prompt the user for the year. Once the user confirms the year (or inputs it manually), we’ll reach out to a third-party API to retrieve the specs and value for that specific vehicle.

Important Notes: Car valuation APIs tend to differ by country, and are nearly all locked away behind paid subscriptions, so rather than using a specific actual API here, we’ll rely on a mocked API that’s similar to most of the leading options. Additionally, since this example requires a custom categorizer, the Imagga tagging responses we show below will also be mocked examples.

What are Custom Categorizers

Before we dive into our car specs app itself, let’s take a moment to understand how custom categorizers work. In short, custom categorizers allow you to submit a list of images that are similar to the ones you would use within your own app, along with the categories for each. Imagga then takes those images and “trains” a categorizer based on them, learning from your supplied examples for each category, and creates a custom endpoint for your account that can be used to tag future images.

At this point, you may be asking yourself, "why would I pay for a custom categorizer when I can just use the generic one for free (or far more cheaply)?!" And, truthfully, for many everyday use cases, the tagging by Imagga's standard categorizer is more than capable. That said, there are also many scenarios where you need something far more laser-focused. In our example here, we don't want the image just categorized as "car" or "sedan," but instead narrowed down to "Toyota Corolla" or "BMW Z3."

With that in mind, to begin training our custom categorizer, we start by submitting a list with the models of the cars as our categories (Corolla, Accord, Z3, F-150, etc) along with images of each of those models. Once our custom categorizer has been created and trained, we can then send new images, not included within our original training set, to that endpoint e.g /categories/custom-car-categorizer and Imagga would automatically identify the model. If we wanted Imagga to try to identify the year as well, we would need to include that in our categories (e.g. “Corolla 2005”).

Training Notes: When training a custom categorizer, it's important to make sure that you provide a good collection of images that cover the range of images your user might input. For example, we can't just upload a bunch of images of Corollas and expect the categorizer to correctly identify an Audi. In addition, as you likely know, users rarely take perfectly aligned and cropped photos, so our training photos shouldn't be perfect either. In addition to including a wide range of models, you should also make sure that each model is such from a variety of angles, with different levels of lighting, and, for extra credit, with occasional objects occluding part of the car or part of the vehicle excluded from the frame of the image itself.

Practical Example of Building A Custom Categorizer App

So what does this actually look like? Well, imagine that you want to identify a Toyota Corolla. First you'll take a couple of photos, like the two below:

Then our custom app will upload them to our custom categorizer and get a response that looks something like the following:


{
"result": {
"categories": [
{
"confidence": 99.9991073608398,
"name": {
"en": "toyota corolla"
}
}
]
},
"status": {
"text": "",
"type": "success"
}
}

Depending on how clear the image is, and how well we trained our categorizer, there might be a few other categories returned as well, but, in general, the top option should be a match for the actual model we’re looking for. We also may need to merge the results from the two photos if multiple categories come back for one or both.

Once we have that data, all we need is the year, which we get from the user, and then we can pass that along to our car specs API. As mentioned above, the calls and responses below are just examples, but we’ve listed a few possible APIs in the notes at the end of the article if you want to build this yourself. So let’s take a look at how we might go about this. First, we send a POST request to our car API specs endpoint, https://carspecs.example.com/specs:


{
"make": "Toyota",
"model": "Corolla",
"year": 2010,
"upgrades": []
}

Which returns a response similar to:


{
"model_id": "45963",
"model_name": "Corolla",
"model_trim": "LE",
"model_year": "2010",
"model_body": "Sedan",
"model_engine_position": "Front",
"model_engine_cc": "1800",
"model_engine_cyl": "4",
"model_engine_fuel": "Gasoline",
"model_drive": "Front",
"model_transmission_type": "Automatic",
"model_seats": "5",
"model_doors": "4",
"model_weight_kg": "1245",
"model_length_mm": "4539",
"model_width_mm": "1760",
"model_height_mm": "1466",
"model_wheelbase_mm": "2601",
"model_lkm_hwy": "6.9",
"model_lkm_city": "9.0",
"model_fuel_cap_l": "50",
"model_mpg_hwy": "34",
"model_mpg_city": "26",
"model_mpg_mixed": null,
"make_display": "Toyota",
"make_country": "Japan"
}

Once we have that model_id, we can pass it along to our valuation API with the mileage and condition of this specific model with a request to https://carspecs.example.com/valuation:


{
"model_id": "45963",
"mileage": 58000,
"condition": "very good"
}

That request gives us the following information (all amounts in USD):


{
"min": 1795,
"median": 7995,
"max": 849995,
"mean": 8940,
}

Putting all of this together gives us a clean user experience for snapping a couple photos of a car and retrieving accurate pricing and specs within just a few seconds:

Final Thoughts

Since we utilized mocked APIs and categorizes this time, we don’t have explicit code for you to review or develop yourself, but hopefully this gives you a sense of the power of custom categorizers and how they can be utilized to make interacting with other data sources as simple as the snap of a few photos for your users.

Suggested Resources for Further Development

If you’d like to build upon the ideas we discussed here and actually build out a car spec app or something similar, here are a few good resources to use as you get started:

  • Custom Categorizers - More information about how custom categorizers work and how to request one for your business
  • CarQuery - Used as the basis for the specs API call
    • Important: their data does not include 2018 yet, and it appears they may have stopped updating their database
  • MarketCheck - Used as the basis for the market/value information
    • Important: They allow up to 300 calls/mo for testing, but further usage requires a paid subscription


Recognizing Objects in House Interior and Recommending Products with Image Recognition

There is a saying that If you want to become an expert of a field, you have to be a master of classification. Photographers can tell if their latest picture was beautiful or not. Musicians can classify what sounds great. Good developers can smell a good code snippet and a bad one.

Categorizations can take many hours of training for humans. Luckily, in the age of machine learning, the machines can help and save a ton of labor time for us. Today, we are going to got our feet wet by creating a photo categorizer app from scratch.

The mobile app we’re building is a demo. You can use it as a foundation for more complex photo organization application or as functionality within another specific application. It will take one image selected by the user, upload to Imagga APIs and categorize it. This is the final result:

 

 

Source Code

If you are looking into the code directly, feel free to skip reading and download from the side bar on right side. To get the code running, you will need to sign up with an Imagga Hacker Plan (free), get the authorization header and replace ‘YOUR_AUTHORIZATION_HEADER’ in the code. In ViewController.swift, change ‘authorizationHeader’ constant to the value of your own key. The key looks like: “Basic xxxx” as in the red square below:

The Imagga API

To start, let’s test our Imagga account by using this curl request from command line. This way, we can make sure that our setup is ready for the app. Here is the test image.

'curl --request GET \
--url 'https://api.imagga.com/v2/categories/personal_photos?image_url=https%3A%2F%2Fimagga.com%2Fblog%2Fwp-content%2Fuploads%2F2019%2F01%2Fentrywayheroscaled.jpg' \
--header 'accept: application/json' \
--header 'authorization:YOUR_AUTHORIZATION_HEADER''

After running the code above, we should got the result like this:

{"result":{"categories":[{"confidence":99.9999542236328,"name":{"en":"interior objects"}}]},"status":{"text":"","type":"success"}}

Now, we confirmed that the API key works for us. Notice in the above example, we used an URL of the image file. However, the app we are going to build will use an Image file instead of an image URL. So let’s see what the API doc has to say about that.

Navigate to the developer’s page, API Endpoints section and categories subsection: Imagga REST API Documentation. We can see it asks us to send a post request to this end point, and the parameter is “image” and with image file content. Be sure to keep this in mind because we gonna use it later. It’s time to code!

iOS Project Setup

Create a Xcode project, give it a name ‘Imagga’, select ‘Single View App’ and choose Swift as the language. In order to make the process faster, we are going to use two libraries: ‘Alamofire’ and ‘SwiftyJSON’. Make sure you have a podfile, set the platform to iOS12 and have the pods like this:

# Pods for Imagga
pod 'Alamofire', '~> 4.8'
pod 'SwiftyJSON', '~> 4.0'

We are going to use ‘Alamofire’ for uploading and ’SwiftyJSON’ for parsing the json responses. To manage the libraries, we use cocoapods. Make sure you have installed both pods before moving to the next section.

A Simple Upload in Swift

First thing first: Upload image to our API. Open ‘ViewController.swift’ and add the following below

import SwiftyJSON
import Alamofire

This will let you use the Alamofire module in your code. Next, create a function ‘sendImageToServer(image: UIImage)’ and add the following to it:

func sendImageToServer(image: UIImage) {
    guard let imageData = image.jpegData(compressionQuality: 0.5) else {
        print("no image data find")
        return
    }
    
    upload(multipartFormData: { (multipartFormData) in
        multipartFormData.append(imageData,
                                 withName: "image",
                                 fileName: "image.jpg",
                                 mimeType: "image/jpeg")
    },
           to: "https://api.imagga.com/v2/categories/personal_photos",
           headers: ["Authorization": authorizationHeader,
                     "accept": "application/json"],
           encodingCompletion: {encodingResult in
            switch encodingResult {
            case .success(let upload, _, _):
                upload.responseJSON { response in
                    debugPrint(response)
                }
            case .failure(let encodingError):
                print(encodingError)
            }})
}
  1. First, reduced the compress quality of the image to 0.5, so the file size will be smaller when we upload it to server.
  2. Upload an image named “image.jpg” and with a parameter name ‘image’ —— this is very import. If you recall from the Imagga API section, in order to upload a file, the parameter name we have to use is “image”.
  3. Send the file to end point: https://api.imagga.com/v2/categories/personal_photos, make sure the authorizationHeader is setup correctly in the headers.
  4. Print the responses.

In order to test this upload function, let’s grab an image file from the project. You can download it here and add it to your Xcode project. After it’s added, let’s do a test to make sure our upload function works. Add the following into ‘viewDidLoad()’:

override func viewDidLoad() {
    super.viewDidLoad()
    // Do any additional setup after loading the view, typically from a nib.
    
    guard let image = UIImage(named: "entrywayheroscaled.jpg") else {
        print("no image found")
        return
    }
    sendImageToServer(image: image)
}

and we should see this response from the server

[Result]: SUCCESS: {
    result =     {
        categories =         (
                        {
                confidence = "99.9999542236328";
                name =                 {
                    en = "interior objects";
                };
            }
        );
    };
    status =     {
        text = "";
        type = success;
    };
}

In Raywenderlich.com, there’s a more detailed tutorial of how to use Alamofire, SwiftyJSON with Imagga API here: https://www.raywenderlich.com/35-alamofire-tutorial-getting-started. Since we’re more focused on using the API, I’ll leave the Swift side—digging to you:)

Handle Return Value

The next step after we have printed the response is to handle it and save it to our data module. Add the following function to ViewController.swift

func handleResponse(response: DataResponse<Any>) {
    guard response.result.isSuccess,
        let value = response.result.value else {
            print("Error while uploading file: \(String(describing: response.result.error))")
            return
    }
    
    self.personalPhotos = (JSON(value)["result"]["categories"].array?.map { json in
        PersonalPhoto(name: json["name"]["en"].stringValue,
                      confidence: json["confidence"].doubleValue)
        })!
    
}

and in ‘sendImageToServer(image: UIImage)’ function, add

self.handleResponse(response: response)

after ‘debugPrint(response)’.

Inside the ViewController class, add

var personalPhotos: [PersonalPhoto] = []

before ‘viewDidLoad()’ function.

Here’s a step-by-step explanation of the code above:

  1. Check that the response was successful. If not, print the error.
  2. Use ‘SwiftyJSON’ to retrieve the ‘categories’ array from the response. It iterates through each of the categories and transforms it to a PersonalPhoto struct.
  3. Save the array into ‘self.personalPhotos’.
  4. Put it inside a call back block after sending the image.
  5. Add a variable in ViewController to save the categories.

We don’t have a PersonalPhoto struct yet, so let’s create one. Create a new file named: PersonalPhoto.swift. Add the following to it:

struct PersonalPhoto {
    var name: String
    var confidence: Double
}

Run the project, and set a breakpoint right after we assigned value to self.personalPhotos. We should see one PersonalPhoto object in the array.

That means we are good for upload image and parse the responses. Now it is time to show the results in UI.

Connect UI in Xcode

Let’s see our app UI in the Main.storyboard file.

There are three main items: 1. Image view, 2 Table view 3. Select button.

Here are my constrains:

  • Image view: Height 200, leading & trailing space 10, top space 20, horizontal center. Default image: ‘entrywayheroscaled.jpg’, set contentMode to ‘Aspect Fit’
  • Select button: Height 50, Width 100, horizontal center, top to Image View 20
  • Table view: Height 200, leading & trailing space 10, top to Select button 20

Drag a prototype UITableViewCell into the tableview, and give it a container view, a confidence label on the left and a name label on the right. Tweak the label size, color and position as you would like. Make sure you connect the IBOutlets into ViewController.

@IBOutlet weak var imageView: UIImageView!
@IBOutlet weak var tableView: UITableView!

Now our UI is good to go.

Select Photo from User Library

Since we are going to select the photo from user’s photo library, we connect ‘Select Button’ IBAction event into a buttonPressed function:

@IBAction func buttonPressed(_ sender: Any) {
    openPhotoLibraray()
}

and call ‘openPhotoLibraray()’ function inside it.

In order to open the system photo library, add ‘UIImagePickerControllerDelegate’ and ‘UINavigationControllerDelegate’ to the ViewController class.

class ViewController: UIViewController, UIImagePickerControllerDelegate, UINavigationControllerDelegate, UITableViewDataSource {

The calling system photo library part is straight forward:

func openPhotoLibraray() {
    if UIImagePickerController.isSourceTypeAvailable(.photoLibrary) {
        let picker = UIImagePickerController()
        picker.delegate = self as UIImagePickerControllerDelegate & UINavigationControllerDelegate
        picker.sourceType = .photoLibrary
        self.present(picker, animated: true) {
            
        }
    }
}

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
    picker.dismiss(animated: true)
    
    guard let image = info[.originalImage] as? UIImage else {
        print("No image found")
        return
    }
    
    imageView.image = image
}

Now if you build and run the project, “Select Photo” button will give you the build-in photo library and you can choose a photo. The Image View will display the photo you selected.

Connect Two Parts Together

We now have two parts:

  1. Upload an image to server and handle the responses.
  2. Select an image from the user photos library.

Let’s remove all the testing code inside ‘viewDidLoad()’, and in the last line of

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any])

add ‘sendImageToServer(image: image)’ right after ‘imageView.image = image’. Build and run the project, select a photo from your library, and you should see the responses in the logs.

UI Tweaks

It is time to display the responses in UI. We are going to show the result like below:

It is a tableview with multiple cells. In each cell, we display the categorizer’s name, confidence and the color indicator in the background.

To get the tableViewCell to work, create a subclass of UITableViewCell and name it ‘PersonalPhotoCell’ and connect all three IBOutlets

class PersonalPhotoCell: UITableViewCell {
    @IBOutlet weak var nameLabel: UILabel!
    @IBOutlet weak var confidenceLabel: UILabel!
    @IBOutlet weak var containerView: ConfidenceView!
    

and ConfidenceView.swift as follow:

import UIKit
@IBDesignable
class ConfidenceView: UIView {
    @IBInspectable var confidence: Double = 0 {
        didSet {
            setNeedsDisplay()
        }
    }
    
    override func draw(_ rect: CGRect) {
        // Drawing code
        let color = UIColor.init(red: 123.0/255.0, green: 184.0/255.0, blue: 183.0/255.0, alpha: 1.0)
        
        let rectToDraw = CGRect(x: 0, y: 0, width: CGFloat(rect.size.width) * CGFloat(confidence), height: rect.size.height)
        let path : UIBezierPath = UIBezierPath(rect: rectToDraw)
        color.set()
        path.fill()
    }
}

The ConfidenceView.swift is a subclass of UIView, and we use it as a container. The view only draw the percentage of confidence with color. This way it gives a good visual impression of the confidence level. In the tableView delegate:

func tableView(_ tableView: UITableView, numberOfRowsInSection section: Int) -> Int {
    return personalPhotos.count
}

func tableView(_ tableView: UITableView, cellForRowAt indexPath: IndexPath) -> UITableViewCell {
    let cell = tableView.dequeueReusableCell(withIdentifier: "Cell", for: indexPath) as! PersonalPhotoCell
    let personalPhoto = personalPhotos[indexPath.row] as PersonalPhoto
    cell.nameLabel.text = personalPhoto.name
    cell.confidenceLabel.text = String(format: "%.2f%%", personalPhoto.confidence)
    cell.containerView.confidence = personalPhoto.confidence / 100.0
    return cell
}

We use our variable personalPhotos as the data source and set it accordingly to display in the cell. Notice those two lines:

cell.confidenceLabel.text = String(format: "%.2f%%", personalPhoto.confidence)
cell.containerView.confidence = personalPhoto.confidence / 100.0

We want to format it with two decimal points and also control the confidence value between 0 ~ 1 for the confidenceView.

Now if you run the project and select a photo, nothing is going to happen. But why?

This is because we haven’t updated our tableview after receiving the responses, so let’s do that now. In the upload image function, add

self.tableView.reloadData()

right after ‘self.handleResponse(response: response)’, this way we’re going to have an updated tableview once handled all the responses.

Run the project, select a photo you like and the server gonna return a categorizer it fits. And now, we’re good!

Best API Practice

Before we wrap up our tutorial, I want to emphasize on the best practices of the API: Imagga REST API DocumentationFor the image size:

The API doesn't need more that 300px on the shortest side to provide you with the same great results

And for a good confidence level:

Our suggestion is setting the confidence threshold at 15-20%.

So with those two points in mind, if you want to make the upload faster while keep the same quality, crop the image’s shortest side to 300px. Also, by filtering out all the categories with a confidence less than 80%, you will get a more accurate result.

Conclusion

Today we’ve gone through how to use the categorization API and create the iOS app from scratch by using Swift. Hope you enjoy the reading and had fun . Feel free to leave any questions and keep me updated on what you going to build next. Cheers!

References

Imagga REST API Documentation

Categorization API for Image Classification | Imagga Solutions

Alamofire Tutorial: Getting Started | raywenderlich.com

Create Autotagging Uploads with NodeJS using Image Recognition API – Imagga Blog

How to Make an Image Classifier – Intro to Deep Learning #6 – YouTube


Free iOS App with Image Recognition


Marketing Permissions For Imagga


We are always concious about sending a lot of emails, so our commitment is that we will send only when there is something interesting or free knowledge that we would like to share with you. If you feel like we are not living up to that expectation you can unsubscribe at any time by clicking the link in the footer of our emails.



Start Learning AI Agents Used and Games. Teach AI to Play Pong

Knowledge published on the internet is free but in the ocean of opinions and points of view, there is no clear chronological pathway on how to learn about the history of AI in Games.

Here we will introduce some basic technical instructions required to begin training Artificial Intelligent agents with games before discussing how this could be used in potential applications. Games and gameplay are an integral part of the history and development of Artificial Intelligence (AI) and can provide mathematically programmable environments that can be used to develop AI programs. As part of this article, we will provide a tutorial on how you can begin experimenting with training your own Artificially intelligent system on your computer with some open source arcade games.

Open AI GYM

The long history of gameplay between humans and artificial intelligent computers demonstrate that games provide a reliable framework for computer scientists to test computational intelligence against human players. But games also provide an additional environment for machine learning to develop intelligence through repeatedly playing and interacting with a game by itself. In 2015 an organization was formed by Elon Musk and Sam Altman to develop Machine Learning algorithms through the use of games. Their organization, Open AI has launched a platform to develop machine learning algorithms with arcade games. This is done to simulate environments for AI models in order to improve and develop ways in which AI systems can interact ‘in the real world’. These systems make uses of image recognition technology in order for the AI agent to interpret the pixels on the screen and interact with the game. Rather than identifying objects like Imagga’s image recognition API, the open AI Gym modules use computer vision libraries (such as open CV) in order to interpret key elements such as the user's score or incoming pixels.  In 2015 a series of videos were published showing how neural networks could be used to train and AI agent to successfully play a number of classic arcade games. AI agent that can learn to complete Mario in under 24hours which have led to hundreds of similar attempts to train AI on Super Mario Bros, many of which are continuously live streamed and you can watch here.

The above picture displays a breakdown and explainer from creator Seth Bling on how Mario is learning to play the arcade game through a combination of computer vision, neural networks, and machine learning. The box on the left indicates the computer vision interpretation of the pixels on the arcade game, while the red and green lines indicate the weighting of neural networks assigned to different inputs from the controller (A, B, X, Y, Z etc). The bar at the top indicates the status of Mario and the AI agent playing, fitness refers to how far the agent has advanced through the game and Gen, species and genome all indicate the progress of the neural networks calibrating to make the best possible Mario player. People are building upon the original code written by Seth Bling and competing to see who can engineer and AI agent to complete Mario in the quickest time.

What is the benefit of an AI expert that can complete super Mario in less than 24 hours

The use of games to train AI agents is most obvious in developing software for self-driving cars, a small team even set out to turn GTA to machine learning system after removing all the violence from the game. Games offer a framework for an AI agent to initiate and direct behavior, through gameplay AI agents can develop strategies for interacting with different environments, which can then be applied to real-world situations. But before we get too far ahead, let's look at building one simple AI agent to learn the classic arcade game, Pong. In this tutorial, we are going to take a look at how to get started running your own machine learning algorithm on the classic arcade game Pong!

For this tutorial, we are going to be working with Open AI module 'Gym', open source tool, that provides a library to run your own reinforcement machine learning algorithm. They have good online support documentation and the biggest repository of classic arcade games, so if Pong is not your game and you are more of a Space Invaders or Pacman fan, getting that set up once you have GYM installed is just as easy.

Pong Tutorial

For this tutorial, you will need to be able to use the command line interface (terminal) to install package and be able to execute small python scripts.

To get started, you’ll need to have Python 3.5+ installed if you have a package manager such as Brew.sh you can install Python 3.5 with the command:

brew install python3

Simply install gym using pip:
pip install gym
Once gym is installed we will install all the environments for us to load into our python scripts later on.
Cd into the gym directory:
Cd gym
pip install -e '.[all]'

To run a simple test you can run a bare-bones instance copied from the GYM website. Save the code below and name it something with the extension .py

import gym
env = gym.make('CartPole-v0').
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action

For structure place the file in your GYM directory and from the terminal enter:
Python yourfile.py

 ATARI

Now we have a demo up and running it’s time to move on to some classic arcade games. The actual script is quite similar, just make sure to change the ROM that we are loading when we call gym.make(‘arcade_game’)

In a new text document place the following:

import gym
env = gym.make(''Pong-v0'')
env = gym.make(''Pong-v0'')
env.reset()
env.render()

This will generate an instance of Pong in a viewing window. But wait… nothing is happening? That is because we have only rendered an instance of the game without providing any instructions to the computer script on how we want the computer to interact with the game. For example, we could tell the computer to perform a random action 1000 times:
import gym
env = gym.make('Pong-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action

The script above will tell the application to take 1000 random actions. So you might see the most chaotic fastest version of pong you have ever witnessed. That is because our script is performing a random action, without processing or storing the result from each action, consequently playing like a stupid machine rather than a learning system. To create an algorithm that learns we need to be able to respond to the environment in different ways and program aims and objectives for our program. This is where things get a little more advanced For now try playing around with different games by checking out the list of environments and begin exploring the sample code to get the computer to play the game in different ways. This is the basic framework we are going to be using to build more advanced interactions with arcade games in next tutorial. 

If you want to see the next tutorial, we will be looking at implementing AI agents to play the games and develop our reinforcement algorithms so that they begin to become machine learning applications. Share with us what you want to see in the comment section below.

Machine Learning Toolkits 

https://github.com/openai/gym

https://blog.openai.com/universe/

https://experiments.withgoogle.com/ai

https://teachablemachine.withgoogle.com/

https://ml4a.github.io/

https://github.com/tabacof/adversarial


GTC Conference 2018 Munich - Georgi Kadrev Part of AI Panel

This year’s European edition of NVIDIA’s GTC was held October 9-12th in the beautiful town of Munich, Germany. GTC has established itself as one of the leading events covering the latest trends not just regarding GPU hardware but also the many areas where the hardware computation power enables various practical A.I. use cases. This naturally includes the Imagga’s specialty - image recognition and its applications in various industries.

The hardware advancement, of course, is the practical enabler of modern A.I. so it was exciting to witness the announcement of the DGX-2 with the impressive 2 PFLOPS throughput and significant memory optimizations

As one of the early adopters of DGX in Europe, we were invited to join a panel of industry and academic experts taking advantage of the DGX family of machines. Our CEO Georgi Kadrev has the chance to share how image-oriented and/or image-dependent business can solve problems by taking advantage of the automation that ML-based image recognition brings. He specifically addressed how the DGX Station has helped us to train one of the biggest CNN classifiers in the world - PlantSnap which recognizes 300K+ plants - in just a matter of days.

Georgi was also invited to join the private DGX user group luncheon where the object of discussion was the present and the future of the DGX hardware family and the software and support ecosystems starting to shape around it.

Being an Inception alumnus ourselves, it was inspiring to also see the new generation of early-stage startups taking advantage of GPU and A.I. at the Inception Awards ceremony. Well crafted pitches and interesting vertical applications were presented during this startup oriented track of GTC. Already turning into a tradition, NVIDIA’s rock-star CEO Jensen Huang spent almost an hour afterward talking with entrepreneurs about the ups and downs and the visionary drive laying behind every successful technology venture.

With a lot of positive impressions and new friendships, we are looking forward to returning to Munich for the next GTC EU the latest. In the meantime - if you haven’t had the chance to meet us there, please don’t be a stranger. We are looking forward to learning more about your business and brainstorm how image recognition can help make it even more successful.


How-to-efficiently-detect-sex-chatbots-in-Kik-Messenger-and-build-it-for-other-platforms

How to Efficiently Detect Sex Chatbots in Kik Messenger and Build it for Other Platforms

Ever since 2017,  Kik messenger has been plagued by irrelevant messages distributed by automated bots. In particular, it has seen its public groups assailed by a large number of these bots which infuse it with spam-related content. Jaap, computer science and engineering master’s student, annoyed by these relentless and unwelcomed interruptions, decided to tackle the issue by engineering a counter bot called Rage that would identify and remove these bots.

Using proprietary algorithms to identify known spambot name patterns, Rage also uses Imagga’s adult content filtering API (NSFW API) to scan profile pictures as well. The result worked so well that friends soon wanted it. With word of mouth only, his bot was installed on over 2,000 groups in just three days. Now his bot is used in +3000 chats rooms, containing over 100.000 people, and some 20.000 images are being processed every month.

Major project issues - Neural networks are expensive: the cheapest is AWS g2.2 series which costs $0.65/hour. For a student, it is a hefty sum to invest in GPU instances. Therefore Jaap looked into using a third-party company that would provide him with a more affordable out of the box solution. While Google came up as first in a search, he selected Imagga because of the already tested accuracy compared to other solutions on the market.

Putting it all together - Since spambots use racy profile pictures, Rage bot’s detection algorithm works a lot better using Imagga NSFW API than it would be just applying name matching. When someone joins a chat, his or her name is scanned for known spambot name patterns while Imagga’s NSFW content moderation analyzes his or her profile picture. If the safety confidence level is less than 40%, the user is considered a spambot and is removed from the chat.
Since Kik profile pictures are public, Rage bot only needs to send image links directly to Imagga Content Moderation API from which it returns its result.

"confidence": 100,
"name": "safe"

Up until August 2018, Jaap and his bot have stopped over 20K spambots from plaguing Kik Messenger. Detected bots who have under a certain level of “confidence” are removed automatically, but the Rage bot is not stopping here. One of the most recent features is the 48 mode, which detects the number of people in a chat and removes inactive users.

Building and deploying - Imagga's NSFW classification were set up and running in a day. Using the 2,000 free API calls a month demo, Jaap was able to quickly implement, test and judge if this was the right tool for him. As a content moderation solution, it can be installed and run the same day with no downtime and deliver accurate content moderation. Then, if you need more API calls, Imagga has very affordable pricing.


Why Using Image Recognition Software Can Save Your Cloud Platform a Ton of Resources

In recent years, we have seen significant growth in artificial intelligence technology and its use in different industries such as automotive, healthcare, e-commerce, gaming, e.t.c. Image recognition, one of the flagship applications of AI, has had wide adoption across industries. It is estimated that the worldwide market for image recognition will grow to $29.98 billion by 2020.

A major factor in the growing demand for image recognition technology has been the increased use of the internet and the move of small and medium enterprises (SMEs) to the cloud. With this move, the businesses have benefited from some of the advantages a cloud platform offers such as widespread reach, scalability, flexible billing options, rapid deployment and constant availability. With the move to the cloud, businesses have found it necessary to adopt technology that helps them better navigate the smarter and more connected platform; and image recognition is one of those technologies.

Image recognition (sometimes called computer vision) is the ability of software to analyze an image or video, identifying its content e.g. people, objects, places and text. It is widely used in different industries e.g. in self-driving cars, facial and optical character recognition software, disease diagnosis, e.t.c. For businesses that operate in the cloud, image recognition can offer numerous benefits as outlined below.

Automating Tasks with Image Recognition Software Saves Time

Unlike other resources that you can create or acquire more of, time is a finite resource that most likely, to stay competitive, you can't afford to waste.

Without a doubt, computers are faster than humans at some particular tasks, and so for those tasks, it makes sense to automate the job using software, leaving your employees free to work on other urgent tasks. Image recognition software can be used to automate such tasks as categorizing and tagging media content, content moderation and editing images (e.g. cropping or background removal).

Use of Image Recognition Software can Help Keep your Team Lean and Thus Save Costs

Use of image recognition software can reduce or eliminate required labour. Without image recognition, you would have to put people on the job to do such tasks as tagging and categorizing your digital assets, moderating user-generated content, individually editing images, e.t.c. In some cases, such a feat might be annoying and frustrating at best, but in other cases, it might be outright impossible to do. Take, for instance, a firm that might be offering Digital Asset Management services. The firm might have several clients, each having millions of digital content that needs to be processed. It would be very difficult, if not impossible to run such a service on manual labour alone. To keep its client's happy, the business will have to keep its asset processing time to a minimal, which means it would have to keep a lot of people on board to do the work. With time, as its client list increases or as the content each client maintains increases, the business's labour costs will also be skyrocketing. Running such a business on manual labour alone isn't sustainable. By automating some tasks with image recognition software, you can maintain a lean and cost-effective team.

Image Recognition can Reduce Human Error

To err is human, to forgive divine so the saying goes; but when you are running a business that depends on the accuracy of its operations, you might not be so lax about errors that might occur.

Human labour is susceptible to making errors. When tasked with entering a large amount of data, it is probable that some data will be recorded incorrectly. Human labour is also prone to tiring. When one has to process thousands of images or videos, they might not be as keen on processing a few thousands. With exhaustion and waning focus, errors might creep in here and there.

For some use cases, image recognition has been shown to give better results than humans. In the medical field, for instance, there is a visual recognition software that has a higher success rate in diagnosing a particular type of cancer. In the still infant field of self-driving cars, it has been said that driverless cars are safer than human drivers.

Image recognition can help eliminate or at least reduce the inaccuracies of human intervention. This will, in turn, save the business resources that would have been lost due to the errors, whether in the form of revenue, labour or time.

Image Recognition can Help you Innovate According to Market Trends

One advantage of running an online business is that a lot of your customers are also online. In this connected ecosystem, it is easier to monitor the market by observing what people share online. By analyzing visual content that is shared online, you might be able to recognize a trend that you can piggyback on when it comes to product release. With image recognition, you can also gain some insights into your competitors by detecting their online visual presence. You can observe how the market engages with the competitor's visual content and determine if their reaction to it is positive or not. This can, in turn, inform your product design decisions.

Instead of using tedious questionnaires and discovery processes to find out what users want, you can use data to determine this. You can determine what users gravitate towards online by observing what they share and how they react to different content. An example of this in use is Netflix which uses data to determine what shows to create. This can save you the effort and cost of creating something that won't be profitable once it hits the market.

Image Recognition can Improve your Marketing Efforts

Other than using image recognition to predict products that will be popular amongst your target market, you can also use it to determine how best to market the products to consumers. Using image recognition, you can mine your target market's visual content and monitor market trends in real time. In this way, you can gain insights on how visual posts spread online, what type of visuals get the most attention, the type of people engaging most with your content, the individual influencers driving most of the traffic and the best platform to post your content on. This can, in turn, help you launch marketing campaigns that are most likely to succeed. Your marketers don't have to waste their budget guessing at what will work, they can use data to decide on the way forward.

How something is presented can have a huge impact on the level of engagement people will have with it. Netflix discovered from conducting consumer research, that the artwork on their website was not only the biggest influencer to a member's decision to watch content, but it also constituted over 82% of their focus while browsing. This is why they go through so much effort to determine the best artwork to display on their website, a feat that would be impossible without image recognition and machine learning. If you are running an online business, you should pay attention to how you present your product or service. In a world where consumers are spoilt for choice when searching for a product or service, you should ensure that your website communicates the value of what you are trying to sell in the best way possible.

Image Recognition can Help Online MarketPlaces Fight Counterfeit Goods

According to the Organization for Economic Co-operation and Development (OECD), counterfeit products may cost the global economy up to $250 billion a year. Businesses running online platforms that allow sellers to sell goods always run the risk of having some sellers selling counterfeit products. This can damage the marketplace's reputation when consumers get products that are subpar to their genuine counterparts.

To counter this, marketplace websites have started turning to image recognition technology to help identify legit and counterfeit products. Using software, the platforms put uploaded product images through some checks to ensure their authenticity.

In General, Image Recognition Makes for Better Apps

Overall, incorporating image recognition improves the user experience of cloud applications and makes their operation effective and efficient. Using better apps is good for any business's bottom line as they reduce the overall overhead costs.

In the presence of numerous competition, most companies compete primarily on the basis of customer experience. Poor user experience can lead to customer churn, and in an interconnected world, it is very easy for disgruntled customers to spread the word about the terrible service they had at your hands; so it is always in your best interest to employ any technology you can to produce the best possible product for your target market.

Do you use image recognition in your product? If yes, let us know how you use it and how it has improved your business. If you would like to find out more about the Imagga Image Recognition API, please contact us and we'll get back to you promptly.


Are Machines Already Smarter Than Us

Intelligence has always been an amazing topic for conversations: whether it’s about discussing what it is precisely or other people’s lack of it, it never fails to provide food for thought. Now with the rise of artificial intelligence, we have one more topic to debate, make predictions about and feel excited (or threatened) by. So far we have taught machines to draw, drive cars, write poems, beat humans playing Go, and even chat with us. AI is obviously getting smarter, but is it already smarter than us?

In 2002, Mitchell Kapor, co-founder of the Electronic Frontier Foundation and the first chair at Mozilla, and Ray Kurzweil, author, computer scientist, inventor and futurist  who works for Google, established a $20,000 wager. The bet was over whether a computer would pass the Turing Test by 2029. They called it “A Long Bet.” Kapor bet against a computer passing the Turing Test by 2029, while Kurzweil believed it would happen. Has the bet been resolved in 2018? Let’s take a deeper look.

AI: The Origins

Let’s go all the way back to ancient history. Just think about all the myths and stories about artificial beings who get their consciousness by a divine power. The seeds of AI were planted by philosophers who tried to describe the process of human thinking as the mechanical manipulation of symbols. In 1308, the Catalan poet and theologian Ramon Llull published Ars generalis ultima (The Ultimate General Art), which perfected his method of using paper-based mechanical means to create new knowledge from combination of concepts. Following that in 1666, mathematician and philosopher Gottfried Leibniz published On the Combinatorial Art , which proposed an alphabet of human thought and argued that all ideas are nothing but combinations of a relatively small number of simple concepts. All of this culminated with the invention of the programmable digital computer in the 1940s. So scientists had the base to start discussing the possibility of building an electronic brain.

The term “artificial intelligence” was coined in a proposal for a “2 month, 10 man study of artificial intelligence” in August 1955 in Dartmouth College. The workshop involved John McCarthy (Dartmouth College), Marvin Minsky (Harvard University), Claude Shannon (Bell Telephone Laboratories) and  Nathaniel Rochester (IBM).  The workshop took place in 1956 and is considered the official birth of the new fied. In 1959 Arthur Samuel coined the term “machine learning” when he was trying to program a computer to learn to play a better game of checkers better than the person who wrote the program.

AI: The Test

In the far 1950, Alan Turing developed an actual test, which would help determine a machine’s ability to exhibit intelligent behavior compared to that of a human. The test involved a human evaluator who would judge natural language conversations between a human and a machine designed to generate human-like responses. The judge will be aware that a machine is involved. The conversation would be limited to a text-only channels such as a computer keyboard and screen. If the evaluator cannot reliably tell the machine from the human, the computer passes the test. In this test there are no right and wrong answers- just answers close to human speech.

The test has been introduced in Turing’s paper “Computing Machinery and Intelligence.” The first sentence states: “I propose to consider the question, 'Can machines think?’” But thinking is too difficult to define so Turing replaces this question with another: “"Are there imaginable digital computers which would do well in the imitation game?" Turing believed that the new question can be answered.

AI: The Bet

In 2014 a computer successfully convinced a panel of judges that it was human. Thus it passed the Turing Test. The test was held by the University of Reading and the organization announced that for the first time a computer passed. The computer’s name was Eugene Goostman and it tricked the judges 33% of the time. But did it really helped Kapoor win the bet so that Kurzweil owes him $20,000?

Yes, Eugene Goostman passed the Turing Test and fooled the judges more than 30% of the time in their five-minute conversations. No, Kurzweil doesn’t owe Kapoor $20,000. Yet. The bet had explicit rules and the experiment at the University of Reading didn’t meet all of the listed criteria. For example, to help Kapoor win, a computer needs to have a conversation of at least eight hours, which means the computer will need to convince two out of three judges.

But why should Kapoor be worried?

Machines are getting better at everything we are teaching them to be. What makes machines smarter? Seth Shostak, the former director of the Search for Extraterrestrial Intelligence Institute (SETI), believes that we can build computers that can beat humans at specific tasks (like winning the game Go). The machines can’t do everything better, but he thinks that eventually we will design AI that is as complex and intelligent as a human brain.

"But the assumption is that that will happen in this century. And if it does happen, the first thing you ask that computer is: Design something smarter than you are," says Shostak. "Very quickly, you have a machine that's smarter than a human. And within 20 years, thanks to this improvement in technology, you have one computer that's smarter than all humans put together."

AI is learning quickly. Just one recent example is The AI Hacker: in 2016 the Darpa Cyber Grand Challenge hosted the first hacking contest between a pit bot against bot. Designed by seven teams of security researchers from across academia and industry, the bots were asked to play offense and defense, fixing security holes in their own machines while exploiting holes in the machines of others.  

Not to mention the infamous story which the more dramatic amongst us (or the Black Mirror fans) saw as the beginning of the reign of AI over humans: that time when Facebook had to shut down two chatbots, just because no one understood what they were talking about.  The researchers didn’t seem to worried about it. "There was no reward to sticking to English language," Dhruv Batra, Facebook researcher, told FastCo. "Agents will drift off understandable language and invent codewords for themselves.”

In the meantime Google is feeding its AI with unpublished books and, in return, the AI is composing mournful poems. And if you’ve played with the AI-powered tool that Google released in 2016, you’ve actually helped it learn how to draw. The program is called Sketch-RNN and it draws pretty well...for a machine. The drawings are basic, but they are not what is important. The method used to create them can be quite useful. It is paving the way for AI programs which can be used as creative ads for designers, architects and artists.

We, on the other hand, have focused on the image recognition abilities of AI. A while ago we asked you to play in the Clash of Tags. Players were presented with two sets of images for a given text tag and had to vote which set was describing the image better. It turned out that machines were almost as good as humans. So for now, the result is even. But the battle is not over.

Human: Intelligence?

So what is intelligence? According to Einstein, “The true sign of intelligence is not knowledge but imagination” Socrates said, “I know that I am intelligent, because I know that I know nothing.” Philosophers got created in the ancient search finding the true measure of intelligence and meaning. Today neuroscientists try to answer questions about intelligence from a scientific perspective. It is widely accepted that there are different types of intelligence—analytic, linguistic, emotional, to name a few—but psychologists and neuroscientists disagree over whether these intelligences are linked or whether they exist independently from one another.

In the meantime, computers will be getting smarter. Yes, they can process certain kinds of information much faster than any of us can. Computers learn more quickly and narrow complex choices to the most optimal ones. They have better memories and can analyze huge amount of information. Computers can calculate and perform tasks without stopping.  On the other hand, humans are better at making decisions and solving problems. Humans are capable of experiencing life.  We have creativity, imagination and inspiration. Computers replicate tasks, but they can’t create. Yet.


image_recognition_brail

Image Recognition Revolutionizes the Online Experience for the Visually Impaired

People take seeing and technology for granted. For a specific group of internet users, the online experience is not so straightforward. The visually impaired need special assistance to experience the digital world. There are a few diverse low-vision aids but generally, they can be divided into two categories: translating visual information into alternative sensory information (sound or touch) and adapting visual transformation to make it more visible. However, the bigger problem remains how to help people who are blind. The emerging technology for assistance in this category uses image processing techniques to optimize the visual experience. Today we will be looking at how image recognition is revolutionizing the online experience for the visually impaired.

Blind Users Interacting with Visual Content

Let’s stop for a second to consider the whole online experience for the visually impaired. What happens when a regular person sees a webpage? He scans it, clicks links or fills in page information. For the visually impaired, the experience is different. They use a screen reader: a software that interprets a photo or image on the screen and reads it to the user. However, to narrate each page element in a fixed order including skipping is not easy. Sometimes there is a vast difference between the visual page elements (buttons, banners, etc.) and the alt-text read by the screen reader. SNS pages (social networking service) with unstructured visual elements and an abundance of links, with horizontally and vertically organized content make listening to the screen reader more confusing.

Interacting with Social Visual Content

SNSs make it easy to communicate through various types of visual content. To fully engage with images, visually impaired people need to overcome accessibility challenges associated with the visual content through workarounds or with outside help.

Advancements in artificial intelligence are allowing blind people to identify and understand the visual content. Some of them include image recognition, tactile graphics, and crowd-powered systems.

Facebook has already algorithmically generated useful and accurate descriptions of photos on a larger scale without latency in the user experience. They provide visuals a description as image alt-text, an HTML attribute designed for content managers to provide the text alternative for images.

Web Accessibility  Today

We might think that web accessibility is a universal thing, but web designers do not always have the resources to devote to accessibility or do not see the value in making sites accessible. A 2-dimensional web page translated into a 1-dimensional speech stream is not easy to decipher. One of the most annoying things is that the majority of websites have insufficient text labeling of graphic content, concurrent events, dynamic elements, or infinitely scrolling pages (i.e. a stream of feeds). Thus, many websites continue to be inaccessible through screen readers. Even the ones that are intended for universal access: library websites, university websites, and SNSs.

The World Wide Web Consortium (W3C), an international community where Member organizations and the public work together to develop Web standards, created accessibility standards.  Led by Web inventor Tim Berners-Lee and CEO Jeffrey Jaffe, W3C's mission is to lead the Web to its full potential.

Solutions Helping Visually Impaired Users

Aipoly
There is a new iPhone app which uses machine learning to identify objects for visually impaired people without an Internet connection. The free image-recognition app is called Aipoly and is making it easier for people to recognize their surroundings. How does it work? You simply point the phone’s rear camera at whatever you want to identify and it speaks what it sees. The app can identify one object after another as the user moves the phone around and it doesn’t require picture taking.The app can be helpful not only to people with impaired vision but also to the ones trying to learn a new language.

Aipoly cofounder Simon Edwardsson says it recognizes images by using deep learning, which is a machine-learning technique inspired by studies of the brain. This is the same technology used by Facebook for recognizing faces and Google for searching images. The app breaks down the image into different characteristics like lines, patterns, curves, etc. and uses them to determine the likelihood of that image to be a specific object. The app works fine for objects around the office. So far it can recognize around 1,000 objects, which is more than enough.

Banknote-reader (b-reader)
The banknote reader is a device that helps the visually impaired to recognize money. The banknote goes into the b-note holder for scanning and recognition (orientation doesn’t really matter), it gets photographed and sent securely to the cloud. There an Imagga-trained custom classifier recognizes the nominal value and returns the information to the b-note device. Then it plays a pre-recorded .mp3 file with the value if it is recognized. The project is part of TOM (Tikkun Olam Makers), a global movement of communities connecting makers, designers, engineers and developers with people with disabilities to develop technological solutions for everyday challenges. On the web platform, you can find full specs of the b-note prototype, including building instructions and camera code used for calling Images API, so that you can make a device like it for around 100 Euro or 115 USD.

LookTel
This is a combination of a Smartphone and advanced “artificial vision” software to create a helpful electronic assistant for anyone who is visually impaired or blind. It can be used to automatically scan and identify objects like money, packaged goods, DVDs, CDs, medication bottles, and even landmarks. All it takes is to point the device video camera at the object and the device pronounces the name quickly and clearly. It can be taught to identify all the objects and landmarks around you. With a little extra help, the LookTel can be a helpful assistant. It also incorporates a text reader which allows users to get access to print media.

Seeing AI
This is a smartphone app that uses computer vision to describe the world and is created by Microsoft. Once the app is downloaded, the user can point the camera at a person and it will announce who the person is and how they are feeling. The app also works with products. It is done by artificial intelligence running locally on the phone. So far the app is available for free in the US for iOS. It is unclear when the rest of the world and Android users will be able to download it.

The app works well for recognizing familiar people and household products (scanning barcodes). It can also read and scan documents and recognize US currency. This is not a small feat because the dollar bills are basically the same size and color, regardless of their value, so spotting the difference is sometimes difficult for the visually impaired. The app is using neural networks to identify objects, which is the same technology used for self-driving cars, drones, and others. The most basic functions take place on the phone itself, however most features require a connection.

Next  Challenges for Full Adoption

Facebook users upload more than 350 million photos a day. Websites are relying mostly on images and less on the text. Sharing visuals has become a major part of the online experience. So using screen readers and screen magnifiers on mobile and desktop platforms help the visually impaired. However, more efforts need to be put to make the web more accessible through design guidelines, designer awareness, and evaluation techniques.

The most difficult challenge ahead is the evaluation of the effectiveness of image processing. It needs to be held ultimately to the same standards as other clinical research in low vision. Image processing algorithms need to be tailored specifically to disease entities and be available on a variety of displays, including tablets. This field of research has the potential to deliver great benefits to a large number of people in short period of time.


Securing Images in Python With the Imagga NSFW Categorization API

In web and mobile applications, as well as any other digital media, the use of images as part of their content is very common. With images being so ubiquitous, there comes a need to ensure that the images posted are appropriate to the medium they are on. This is especially true for any medium accepting user-generated content. Even with set rules for what can and cannot be posted, you can never trust users to adhere to the set conditions. Whenever you have a website or medium accepting user-generated content, you will find that there is a need to moderate the content.

Why Moderate Content?

There are various reasons why content moderation might be in your best interest as the owner/maintainer of a digital medium. Some common ones are:

  • Legal obligations - If your application accommodates underaged users, then you are obligated to protect them from adult content.
  • Brand protection - How your brand is perceived by users is important, so you might want to block some content that may negatively affect your image.
  • Protect your users - You might want to protect your users against harassment from other users. The harassment can be in the form of users attacking others by posting offensive content. An example of this is Facebook’s recent techniques of combating revenge p0rn on their platform.
  • Financial - It might be in your best interest financially, to moderate the content shown on your applications. For instance, if your content is somewhat problematic, other businesses might not want to associate with you in terms of advertising on your platform or accepting you as an affiliate for them. For some Ad networks, keeping your content clean is a rule that you have to comply with if you want to use them. Google Adsense is an example of this. They strictly forbid users of the service from placing their ads on pages with adult content.
  • Platform rules - You might be forced to implement some form of content moderation if the platform your application is on requires it. For instance,Apple requires applications to have a way of moderating and restricting user-generated content before they can be placed on the App Store and Google also restricts apps that contain sexually explicit content

As you can see, if your application accepts user-generated content, moderation might be a requirement that you can’t ignore. There are different ways moderation can be carried out:

  • Individual driven - an example of this is a website that has admins that moderate the content. The website might work by either restricting the display of any uploaded content until it has been approved by an admin or it might allow immediate display of uploaded content, but have admins who constantly check posted content. This method tends to be very accurate in identifying inappropriate content, as the admins will most likely be clear as to what is appropriate/inappropriate for the medium. The obvious problem with this is the human labor needed. Hiring moderators might get costly especially as the application’s usage grows. Relying on human moderators can also affect the app’s user experience. The human response will always be slower than an automated one. Even if you have people working on moderation at all times, there will still be a delay in identifying and removing problematic content. By the time it is removed, a lot of users could have seen it. On systems that restrict showing uploaded content until it has been approved by an admin, this delay can become annoying to users.
  • Community driven - with this type of moderation, the owner of the application puts in place features that enable the app’s users to report any inappropriate content e.g. flagging the content. After a user flags a post, an admin will then be notified. This also suffers from a delay in identifying inappropriate content from both the community (who might not act immediately the content is posted) and the administrators (who might be slow to respond to flagged content). Leaving moderation up to the community might also result in reported false positives as content that is safe is seen by some users as inappropriate. With a large community, you will always have differing opinions, and because many people will probably not have read the Terms and Conditions of the medium, they will not have clear-cut rules of what is and isn’t okay.
  • Automated - with this, a computer system usually using some machine learning algorithm is used to classify and identify problematic content. It can then act by removing the content or flagging it and notifying an admin. With this, there is a decreased need for human labor, but the downside is that it might be less accurate than a human moderator.
  • A mix of some or all the above methods - Each of the methods described above comes with a shortcoming. The best outcome might be achieved by combining some or all of them e.g. you might have in place an automated system that flags suspicious content while at the same time enabling the community to also flag content. An admin can then come in to determine what to do with the content.

A Look at the Imagga NSFW Categorization API

Imagga makes available the NSFW (not safe for work) Categorization API that you can use to build a system that can detect adult content. The API works by categorizing images into three categories:

  • nsfw - these are images considered not safe. Chances are high that they contain ponographic content and/or display nude bodies or inappropriate body parts.
  • underwear - this categorizes medium safe images. These might be images displaying lingerie, underwear, swimwear, e.t.c.
  • safe - these are completely safe images with no nudity.

The API works by giving a confidence level of a submitted image. The confidence is a percentage that indicates the probability of an image belonging to a certain category.

To see the NSFW API in action, we’ll create two simple programs that will process some images using the API. The first program will demonstrate how to categorize a single image while the second will batch process several images.

Setting up the Environment

Before writing any code, we’ll first set up a virtual environment. This isn’t necessary but is recommended as it prevents package clutter and version conflicts in your system’s global Python interpreter.

First, create a directory where you’ll put your code files.

[cc lang="bash"]$ mkdir nsfw_test[/cc]

Then navigate to that directory with your Terminal application.

[cc lang="bash"]$ cd nsfw_test[/cc]

Create the virtual environment by running:

[cc lang="bash"]$ python3 -m venv venv[/cc]

We’ll use Python 3 in our code. In the above, we create a virtual environment with Python 3. With this, the default Python version inside the virtual environment will be version 3.

Activate the environment with (on MacOS and Linux):

[cc lang="bash"]$ source venv/bin/activate[/cc]

On Windows:

[cc lang="bash"]$ venv\Scripts\activate[/cc]

Categorizing Images

To classify an image with the NSFW API, you can either send a GET request with the image URL to the [cci]/categorizations/[/cci] endpoint or you can upload the image to [cci]/content[/cci], get back a [cci]content_id[/cci] value which you will then use in the call to the [cci]/categorizations/[/cci] endpoint. We’ll create two applications that demonstrate these two scenarios.

Processing a Single Image

The first app we’ll create is a simple web application that can be used to check if an image is safe or not. We’ll create the app with Flask.

To start off, install the following dependencies.

[cc lang="bash"]$ pip install flask flask-bootstrap requests[/cc]

Then create a folder named [cci]templates[/cci] and inside that folder, create a file named [cci]index.html[/cci] and add the following code to it.

[cc lang="python"]
{% extends "bootstrap/base.html" %}

{% block title %}Imagga NSFW API Test{% endblock %}

{% block navbar %}

{% endblock %}

{% block content %}

{% if image_url %}

{{ res }}

{% endif %}

{% endblock %}
[/cc]

In the above code, we create an HTML template containing a form that the user can use to submit an image URL to the Imagga API. When the response comes back from the server, it will be shown next to the processed image.

Next, create a file named [cci]app.py[/cci] in the root directory of your project and add the following code to it. Be sure to replace [cci]INSERT_API_KEY[/cci] and [cci]INSERT_API_SECRET[/cci] with your Imagga API Key and Secret. You can signup for a free account to get these credentials. After creating an account, you’ll find these values on your dashboard:

[cc lang="python"]
from flask import Flask, render_template, request
from flask_bootstrap import Bootstrap
import os
import requests
from requests.auth import HTTPBasicAuth

app = Flask(__name__)
Bootstrap(app)

# API Credentials. Set your API Key and Secret here
API_KEY = os.getenv('IMAGGA_API_KEY', 'INSERT_API_KEY')
API_SECRET = os.getenv('IMAGGA_API_SECRET', 'INSERT_API_SECRET')

API_ENDPOINT = 'https://api.imagga.com/v1'

auth = HTTPBasicAuth(API_KEY, API_SECRET)

@app.route('/', methods=['GET', 'POST'])
def index():
image_url = None
res = None
if request.method == 'POST' and 'image_url' in request.form:
image_url = request.form['image_url']

response = requests.get(
'%s/categorizations/nsfw_beta?url=%s' % (API_ENDPOINT, image_url),
auth=auth)

res = response.json()
return render_template('index.html', image_url=image_url, res=res)

if __name__ == '__main__':
app.run(debug=True)
[/cc]

Every call to the Imagga API must be authenticated. Currently, the only supported method for authentication is Basic. With Basic Auth, credentials are transmitted as user ID/password pairs, encoded using base64. In the above code, we achieve this with a call to [cci]HTTPBasicAuth()[/cci].

We then create a function that will be triggered by GET and POST requests to the [cci]/[/cci] route. If the request is a POST, we get the data submitted by form and send it to the Imagga API for classification.

The NSFW Categorizer is one of a few categorizers made available by the Imagga API. A Categorizer is used to recognize various objects and concepts. There are a couple predefined ones available (Personal Photos and NSFW Beta) but if none of them fit your needs we can build a custom one for you.

As mentioned previously, to send an image for classification, you send a GET request to the [cci]/categorizations/[/cci] endpoint. The [cci]categorizer_id[/cci] for the NSFW API is [cci]nsfw_beta[/cci]. You can send the following parameters with the request:

  • url: URL of an image to submit for categorization. You can provide up to 10 URLs for processing by sending multiple url parameters (e.g. [cci]?url=&url=…&url=[/cci])
  • content: You can also directly send image files for categorization by uploading the images to our [cci]/content[/cci] endpoint and then provide the received content identifiers via this parameter. As with the URL parameter, you can send more than one image - up to 10 content by sending multiple [cci]content[/cci] parameters.
  • language: If you’d like to get a translation of the tags in other languages, you should use the language parameter. Its value should be the code of the language you’d like to receive tags in. You can apply this parameter multiple times to request tags translated in several languages. See all available languages here.

After processing the request, the API sends back a JSON object holding the image’s categorization data in case of a successful processing, and an error message incase there was a problem processing the image.

Below you can see the response of a successful categorization:

[cc lang="javascript"]
{
'results': [{
'image': 'https://auto.ndtvimg.com/car-images/big/dc/avanti/dc-avanti.jpg',
'categories': [{
'name': 'safe',
'confidence': 99.22
}, {
'name': 'underwear',
'confidence': 0.71
}, {
'name': 'nsfw',
'confidence': 0.07
}]
}]
}
[/cc]

Note that you might not always get JSON with the three categories displayed. If the confidence of a category is [cci]0[/cci], this category will not be included in the JSON object.

Below you can see the response of a failed categorization.

[cc lang="javascript"]
{
'results': [],
'unsuccessful': [{
'reason': 'An error prevented image from being categorized. Please try again.',
'image': 'http://www.axmag.com/download/pdfurl-guide.pdf'
}]
}
[/cc]

Back to our app, you can save your code and run it with:

[cc lang="bash"]
$ python app.py
[/cc]

If you navigate to http://127.0.0.1:5000/ you should see a form with one input field. Paste in the URL of an image and submit it. The image will be processed and you will get back a page displaying the image and the JSON returned from the server. To keep it simple, we just display the raw JSON, but in a more sophisticated app, it would be parsed and used to make some decision.

Below, you can see the results of some images we tested the API with.

As you can see, the images have been categorized quite accurately. The first two have [cci]safe[/cci] confidence scores of [cci]99.22[/cci] and [cci]99.23[/cci] respectively while the last one has an [cci]underwear[/cci] score of [cci]96.21[/cci]. Of course, we can’t show an [cci]nsfw[/cci] image here on this blog, but you are free to test that on your own.

To know the exact confidence score to use for your app, you should first test the API with several images. When you look at the results of several images, you will be able to better judge which number to look out for in your code when filtering okay and not okay images. If you are still not sure about this, our suggestion is setting the confidence threshold at 15-20%. However, if you’d like to be more strict on the accuracy of the results, setting the confidence threshold at 30% might do the trick.

You should know that the technology is far from perfect and that the NSFW API is still in beta. From time to time, you might get an incorrect classification.

Note that the API has a limit of 5 seconds for downloading the image. If the limit is exceeded with the URL you send, the analysis will be unsuccessful. If you find that most of your requests are unsuccessful due to timeout error, we suggest uploading the images to our [cci]/content[/cci] endpoint first (which is free and not accounted towards your usage) and then use the content id returned to submit the images for processing via the [cci]content[/cci] parameter. We’ll see this in action in the next section.

Batch Processing Several Images

The last app we created allowed the user to process one image at a time. In this section, we are going to create a program that can batch process several images. This won’t be a web app, it will be a simple script that you can run from the command line.

Create a file named [cci]upload.py[/cci] and add the code below to it. If you are still using the virtual environment created earlier, then the needed dependencies have already been installed, otherwise, install them with [cci]pip install requests[/cci].

[cc lang="python"]
import os
import requests
from requests.auth import HTTPBasicAuth

# API Credentials. Set your API Key and Secret here
API_KEY = os.getenv('IMAGGA_API_KEY', 'INSERT_API_KEY')
API_SECRET = os.getenv('IMAGGA_API_SECRET', 'INSERT_API_SECRET')

API_ENDPOINT = 'https://api.imagga.com/v1'
FILE_TYPES = ['png', 'jpg', 'jpeg', 'gif']

class ArgumentException(Exception):
pass

if API_KEY == 'YOUR_API_KEY' or \
API_SECRET == 'YOUR_API_SECRET':
raise ArgumentException('You haven\'t set your API credentials. '
'Edit the script and set them.')

auth = HTTPBasicAuth(API_KEY, API_SECRET)

def upload_image(image_path):
if not os.path.isfile(image_path):
raise ArgumentException('Invalid image path')

# Open the desired file
with open(image_path, 'rb') as image_file:
filename = image_file.name

# Upload the multipart-encoded image with a POST
# request to the /content endpoint
content_response = requests.post(
'%s/content' % API_ENDPOINT,
auth=auth,
files={filename: image_file})

# Example /content response:
# {'status': 'success',
# 'uploaded': [{'id': '8aa6e7f083c628407895eb55320ac5ad',
# 'filename': 'example_image.jpg'}]}
uploaded_files = content_response.json()['uploaded']

# Get the content id of the uploaded file
content_id = uploaded_files[0]['id']

return content_id

def check_image(content_id):
# Using the content id, make a GET request to the /categorizations/nsfw endpoint
# to check if the image is safe
params = {
'content' : content_id
}
response = requests.get(
'%s/categorizations/nsfw_beta' % API_ENDPOINT,
auth=auth,
params=params)

return response.json()

def parse_arguments():
import argparse
parser = argparse.ArgumentParser(
description='Tags images in a folder')

parser.add_argument(
'input',
metavar='',
type=str,
nargs=1,
help='The input - a folder containing images')

parser.add_argument(
'output',
metavar='',
type=str,
nargs=1,
help='The output - a folder to output the results')

args = parser.parse_args()
return args

def main():
import json
args = parse_arguments()

tag_input = args.input[0]
tag_output = args.output[0]

results = {}
if os.path.isdir(tag_input):
images = [filename for filename in os.listdir(tag_input)
if os.path.isfile(os.path.join(tag_input, filename)) and
filename.split('.')[-1].lower() in FILE_TYPES]

images_count = len(images)
for iterator, image_file in enumerate(images):
image_path = os.path.join(tag_input, image_file)
print('[%s / %s] %s uploading' %
(iterator + 1, images_count, image_path))
try:
content_id = upload_image(image_path)
except IndexError:
continue
except KeyError:
continue
except ArgumentException:
continue

nsfw_result = check_image(content_id)
results[image_file] = nsfw_result
print('[%s / %s] %s checked' %
(iterator + 1, images_count, image_path))
else:
raise ArgumentException(
'The input directory does not exist: %s' % tag_input)

if not os.path.exists(tag_output):
os.makedirs(tag_output)
elif not os.path.isdir(tag_output):
raise ArgumentException(
'The output folder must be a directory')

for image, result in results.items():
with open(
os.path.join(tag_output, 'result_%s.json' % image),
'wb') as results_file:
results_file.write(
json.dumps(
result, ensure_ascii=False, indent=4).encode('utf-8'))

print('Done. Check your selected output directory for the results')

if __name__ == '__main__':
main()
[/cc]

We use the [cci]argparse[/cci] module to parse arguments from the command line. The first argument passed in will be the path to a folder containing images to be processed while the second argument is a path to a folder where the results will be saved.

For each image in the input folder, the script uploads it with a POST request to the [cci]/content[/cci] endpoint. After getting a content id back, it makes another call to the [cci]/categorizations/[/cci] endpoint. It then writes the response of that request to a file in the output folder.

Note that all uploaded files sent to [cci]/content[/cci] remain available for 24 hours. After this period, they are automatically deleted. If you need the file, you have to upload it again. You can also manually delete an image by making a DELETE request to [cci]https://api.imagga.com/v1/content/[/cci].

Add some images to a folder and test the script with:

[cc lang="bash"]$ python upload.py path/to/input/folder path/to/output/folder[/cc]

If you look at the output folder you selected, you should see a JSON file for each processed image.

Feel free to test out the Imagga NSFW Categorization API. If you have any suggestions on ways to improve it or just general comments on the API, you can post them in the Comment Section below or get in touch with us directly. We are always happy to get feedback on our products.


Artificial Intelligence Becoming Human. Is That Good or Bad?

The term “artificial intelligence” has been driving people’s imaginations wild even before 1955 when the term was coined to describe an emerging computer science discipline. Today the term includes a variety of technologies to improve the human life and the list is ever growing. Starting with Alexa and self-driving cars finishing with love robots, your newsfeed is constantly full of AI updates. Your newsfeed is also a product of (somewhat) well-implemented algorithm. The good news? Just like the rest of the AI technologies, your newsfeed is self-learning and constantly changing, trying to improve your experience. The bad news? A lot of people know why but nobody can really explain why the most advanced algorithms work. And that’s where things can go wrong. And that’s where things can go wrong.

The Good AI

The AI market is blooming. The profitable mix of media attention, hype, startups and adoption by enterprises is making sure that AI is a household topic. A Narrative Science survey found that 38% of enterprises are already using AI and Forrester Research predicted that in 2017 the investments in AI will grow by 300% compared with 2016.

But what good can artificial intelligence do today?

Natural language generation

This capability of AI is used to generate reports, summarize business intelligence insights and automate customer service, AI can use this ability to produce text from data.

Speech recognition

Interactive voice response systems and mobile applications rely on AI ability to recognize speech. It transcribes and transforms human speech into form usable by a computer application.

Image recognition

This has been already successfully used to detect problematic persons at airports, for retail, etc.

Virtual agents/chatbots

These virtual agents are used in customer service and support, smart home managers. These chatbot systems and advanced AI can interact with humans. There are machine learning platforms which can design, train and deploy models into applications, processes and other machines, by providing algorithms, APIs, development and training data.

Decision management for enterprise

Engines that use rules and logic into AI systems and are used for initial setup/training and ongoing maintenance and tuning? Check. This technology has been used for a while now for decision management by enterprise applications and assisting automated decision-making. There is also AI-optimized hardware with the power to process graphics and designed to run AI computational jobs.

AI for biometrics

On a more personal level, the use of AI in biometrics enables more natural interactions between humans and machines, relying on image and touch recognition, speech, and body language. By using scripts and other ways to automate human action to support efficient business processes, robots are capable of executing tasks or processes instead of humans.

Fraud detection and security

Natural language processing (NLP) uses and supports text analytics by understanding sentence structure and meaning, sentiment and intent through statistical and machine learning methods. It is currently used in fraud detection and security.

The “Black Box” of AI

At the beginning AI breached out in two directions: machines should reason according to rules and logic (everything is visible in the code); machines should use biology and learn from observing and experiencing (a program generates an algorithm based on example data). Today machines ultimately program themselves based on the latter approach. Since there is no hand-coded system which can be observed and examined, deep learning is particularly a “black box.”

It is crucial to make sure we know when failures in the AI occur because they will. In order to do that, we need to know how techniques like deep learning work. Recognizing abstract things. In simple systems, recognition is based on physical attributes like outlines and colour; on the next level- more complex things like basic shapes, textures, etc. The top level can recognize all the levels and the whole not just as a sum of its parts.

There is the expectation that these techniques will be used to diagnose diseases, make trading decisions and transform whole industries. But it shouldn’t happen before we manage to make deep learning more understandable especially to their creators and accountable for their uses. Otherwise there is no way to predict failures.

Today mathematical models are already being used to find out who is approved for a loan and who gets a job. But deep learning represents a different way to program computers.  “It is a problem that is already relevant, and it’s going to be much more relevant in the future,” says Tommi Jaakkola, a professor at MIT who works on applications of machine learning. “Whether it’s an investment decision, a medical decision, or maybe a military decision, you don’t want to just rely on a ‘black box’ method.”

Starting in the summer of 2018, the European Union will probably require that companies be able to explain decisions made by automated systems. Easy right? Not really: this task might be impossible if the apps and the websites use deep learning. Even if it comes to something simple like recommending products or playing songs. Those services are run by computers which have programmed themselves. Even the engineers who have build them will not be able to fully clarify the way the computers reach the results.

“It might be part of the nature of intelligence that only part of it is exposed to rational explanation. Some of it is just instinctual.”

With the advance of technology, logic and reason might need to step down and leave some room for faith. Just like human reasoning and logic, we can’t always explain why we’ve taken a decision. However, this is the first time we are dealing with machines, which are not understandable by even the people who engineered them. How will this influence our relationship with technology? A hand-coded system is pretty straightforward, but any machine-learning technology is way more convoluted. Yes, not all AI tech will be this difficult to understand, but deep learning is a black box by design.

AI works a bit like the neural network and its center- the brain: you can’t look inside it to find out how it works because a network’s reasoning is embedded in the behaviour of thousands of simulated neurons. These neurons are arranged into dozens or even hundreds of intricately interconnected layers. The first layer receives input and then performs calculations before giving an a new signal as output. The results are fed to neurons in the next layer and so on.

Because there are many layers in a deep network, they are able to recognize things at different levels of abstraction. If you want to build an app, let’s say “Not a HotDog” (“Silicon Valley,” anyone?), you need to know what  a hot dog looks like. A system might be designed to recognize hot dogs based on outlines or color. Higher layers will recognize more complex things like texture and details like condiments.

But just as many aspects of human behavior can’t be explained in detail, it might be the case that we won’t be able to explain everything AI does.  “Even if somebody can give you a reasonable-sounding explanation [for his or her actions], it probably is incomplete, and the same could very well be true for AI,” says Clune, of the University of Wyoming. “It might just be part of the nature of intelligence that only part of it is exposed to rational explanation. Some of it is just instinctual, or subconscious, or inscrutable.”

Just like civilizations have been built on a contract of expected behaviour, we might need to design AI system to respect and fit into our social norms. Whatever robot or a system we created, it is important that their decision-making is consistent with our ethical judgements.

The AI Future

Participants in a recent survey were asked about the most worrying notion about AI. The results were as expected: participants were most worried by the notion of a robot that would cause them physical harm. Naturally, machines with close physical contact like self-driving cars and home managers were viewed as risky. However, when it cоmes to statistics, languages, personal assistants: people are more than willing to use AI in everyday tasks. The many potential social and economic benefits from the technology depend on the environment in which they evolve, says the Royal Society.

A robot animated by AI is known as “embodiment.” Thus applications that involved embodiment were viewed as risky. As data scientist Cathy O’Neil has written, algorithms are dangerous if they posses scale, their working are a secret and their effects are destructive. Alison Powell, an assistant professor at the London School of Economics believes that this mismatch between perceived and potential risk is common with new technologies. “This is part of the overall problem of the communication of technological promise: new technologies are so often positioned as “personal” that perception of systematic risk is impeded.”

Philosophers, computer scientists and techies make the distinction between “soft” and “hard” AI. The main difference? Hard AI’s main goal is to mimic the human mind. As the Wall Street Journal and MIT lecturer Irving Wladawsky-Berger explained, soft AI’s main purpose is to be statistically oriented and use its computational intelligence methods to address complex problems based on the analysis of vast amounts of information using sophisticated algorithms. For most of us soft AI is already an everyday part of our daily routine: from the GPS to ordering food online. According to Wladawsky-Berger, hard AI is “a kind of artificial general intelligence that can successfully match or exceed human intelligence in cognitive tasks such as reasoning, planning, learning, vision and natural language conversations on any subject.”

AI is already used to build devices that cheat and deceive or to outsmart human hackers. It is quickly learning from our behavior and people are building robots who are so humanlike they might be our lovers. AI is also learning right from wrong. Mark Riedl and Brent Harrison from the School of Interactive Computing at the Georgia Institute of Technology are leading a team who is trying to instill human ethics to AIs by using stories. Just like in real life we teach human values to children by reading them stories, AI learns to distinguish wrong from right, bad from good. Just like civilizations have been built on a contract of expected behaviour, we might need to design AI system to respect and fit into our social norms. Whatever robot or a system we created, it is important that their decision-making is consistent with our ethical judgements.

free image recognition with imagga