Microsoft Azure is a cloud computing service that has its own machine learning service, known as Cognitive Services. It splits into five categories: Vision, Speech, Language, Knowledge, and Search, with each category containing several tools like Computer Vision, Content Moderator, Custom Vision Service, Emotion API, Face API, and Video Indexer. Face API has two main functions:
Detect human faces in an image, return face rectangles, and optionally with face Ids, landmarks, and attributes.
- Optional parameters including face Ids, landmarks, and attributes. Attributes include age, gender, head pose, smile, Facial Hair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure, and noise.
- Face Id will be used in Face – Identify, Face – Verify, and Face – Find Similar. It will expire 24 hours after the detection call.
- Higher face image quality means better detection and recognition precision. Please consider high-quality faces: frontal, clear, and face size is 200×200 pixels (100 pixels between eyes) or bigger.
- JPEG, PNG, GIF (the first frame), and BMP format are supported. The allowed image file size is from 1KB to 6MB.
- Faces are detectable when its size is 36×36 to 4096×4096 pixels. If need to detect very small but clear faces, please try to enlarge the input image.
- Up to 64 faces can be returned for an image. Faces are ranked by face rectangle size from large to small.
- Face detector prefer frontal and near-frontal faces. There are cases that faces may not be detected, e.g. exceptionally large face angles (head-pose) or being occluded, or wrong image orientation.
- Attributes (age, gender, headPose, smile, facial Hair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise) may not be perfectly accurate. HeadPose’s pitch value is a reserved field and will always return 0.
Takes faces and performs comparisons to determine how well they match. Has four categories –
- Face Verification – takes two detected faces and attempts to verify that they match
- Finding Similar Face – takes candidate faces in a set and orders their similarity to a detected face from most similar to least similar
Face Grouping – takes a set of unknown faces and divides them into subset groups based on similarity. For a subset of the original set of unknown faces, each face within that subset is considered to be the same person object (based on a threshold value).
Face Identification – Face API can be used to identify people based on a detected face and a people database (defined as a Large Person Group/PersonGroup). Create this database in advance, which can be edited over time.
Face Storage – Face Storage allows a Standard subscription to store additional persisted faces when using LargePersonGroup/PersonGroup Person objects (PersonGroup Person – Add Face/LargePersonGroup Person – Add Face) or Large FaceLifts/FaceLists (FaceList – Add Face/LargeFaceList – Add Face) for identification or similarity matching with the Face API.
With Face Identification, you must first create a PersonGroup object. That Person Group object contains one or more person objects. Each person object contains one or more images that represent the respective person object. As the number of face images a person object contains increases, so does the identification accuracy.
For example, let’s say that you create a PersonGroup object called “co-workers.” In co-workers, you create person objects, for example, you might create two – “Alice” and “Bob.” Face images are assigned to their respective person objects. You have now created a database with which to compare a detected face image. An attempt will be made to find out if the detected image is Alice or Bob (or neither), based on a numerical threshold.
This threshold is on a scale that is most permissive at 0 and most restrictive at 1. At 1, they must be perfect matches – by perfect, I mean that two identical images at different compression rates will not be recognized as a match. In contrast, at 0 a match will be returned for the person object with the highest confidence score regardless. In my experiments, somewhere between 0.3 -0.35 tended to strike a good balance. To reiterate an earlier point, more images per person object increases identification accuracy, thus decreasing both false positives and false negatives.
The Emotion API beta takes an image as an input and returns the confidence across a set of emotions for each face in the image, as well as a bounding box for the face from the Face API. The emotions detected are happiness, sadness, surprise, anger, fear, contempt, disgust, or neutral. These emotions are communicated cross-culturally and universally via the same basic facial expressions, where are identified by Emotion API.
In interpreting results from the Emotion API, the emotion detected should be interpreted as the emotion with the highest score, as scores are normalized to sum to one. Users may choose to set a higher confidence threshold within their application, depending on their needs.
For more information about emotion detection, see the API Reference:
- Basic: If a user has already called the Face API, they can submit the face rectangle as an input and use the basic tier. API Reference
- Standard: If a user does not submit a face rectangle, they should use standard mode. API Reference