Abstract Gesture recognition is an ever-popular topic in computer science. This is because it not only enables humans to communicate with machines (human-machine interaction or HMI) but also constitutes the first step for machines to begin understanding the human body language. This project deals with solving the problem for tracking of hand and recognition of single-stroke hand gestures using Microsoft Kinect sensor and machine learning.Introduction Hand gestures are expressive human body motion which is irregular and not perfect. To do a proper recognition of what an appropriate representation must be chosen. A vast amount of work in gesture recognition has been performed in the area of computer vision. After reviewing all those reports we decided to work on Single-stroke hand gesture recognition. A lot of effort has been made for single-stroke hand gesture recognition but no perfect algorithm has been designed yet. The problem with basic hand gesture recognition was that the user was able to input a single letter or a single gesture at a time, which was time taking a very frustrating for the user. After reviewing all the reports of the efforts made till now, these works can be divided into two categories:* Trajectory-based analysis* Dynamic model based approachThe trajectory-based approach matches curves in configuration space to recognize gestures. The dynamics model-based approach learns a parametric model of gestures. Gesture recognition systems, in general, are composed of three main components: * Image pre-processing * Tracking * Gesture recognition In individual systems, some of these components may be merged or missing, but their basic functionality will normally be present.Image pre-processing is the task of preparing the video frames for further analysis by suppressing noise, extracting important clues about the position of the object of interest (for example hands) and bringing these on the symbolic form. This step is often referred to as feature extraction. This process is done by the use of depth camera of the Microsoft Kinect v2. Tracking on the basis of the pre-processing, the position and possibly other attributes of the object (hands) must be tracked from frame to frame. This is done to distinguish a moving object of interest from the background and other moving objects, and to extract motion information for recognition of dynamic gestures. Infrared (passive and active sensors)Optics (video and camera systems)Radio Frequency Energy (radar, microwave, and tomographic motion detection)Sound (microphones and acoustic sensors)Vibration (triboelectric, seismic, and inertia-switch sensors)Magnetism (magnetic sensors and magnetometers)Accessing Microsoft Kinect from within OpenCV is not much different from accessing a computer’s webcam or camera device. The easiest way to integrate a Kinect sensor with OpenCV is by using an OpenKinect module called freenect. In order to run our app, we will need to execute the main function routine that accesses the Kinect, generates the GUI, and executes the main loop of the application. The layout chosen for the current project (Kinect Layout) is as plain as it gets. It should simply display the live stream of the Kinect depth sensor at a comfortable frame rate. The bulk of the work is done by the Hand Gesture Recognition class, especially by its recognize method. This class starts off with a few parameter initializations The recognized method is where the real magic takes place. This method handles the entire process flow, from the raw grayscale image all the way to a recognized hand gesture. We will discuss the whole process in details in further sections. Now that we (roughly) know where the hand is located, we aim to learn something about its shape What remains to be done is classifying the hand gesture based on the trajectory of the hand. For example, if the trajectory of the hand forms a sign of the number 1 then the recognition function will return the value ‘1’. This was an example of single stroke gesture recognition for single stroke which is our main objective. For example, if we make a gesture for Hello the recognition function will input it’s trajectory then through template matching and machine learning it will create an output ‘Hello’.ProcessingProcessing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production. The project was initiated in 2001 by Casey Reas and Benjamin Fry, both formerly of the Aesthetics and Computation Group at the MIT Media Lab. In 2012, they started the Processing Foundation along with Daniel Shiffman, who joined as a third project lead. Johanna Hedwa joined the Foundation in 2014 as Director of Advocacy. One of the aims of Processing is to allow non-programmers to start computer programming aided by visual feedback. The Processing language builds on the Java Language but uses a simplified syntax and a graphics user interface.In order to access Kinect from Processing we used some of the open source libraries: * OpenNI or Open Natural Interaction is an industry-led non-profit organization and open source software project focused on certifying and improving interoperability of natural user interfaces and organic user interfaces for Natural Interaction (NI) devices, applications that use those devices and middleware that facilitates access and use of such devices.1The OpenNI framework provides a set of open source APIs. These APIs are intended to become a standard for applications to access natural interaction devices. The API framework itself is also sometimes referred to by the name OpenNI SDK.The APIs provide support for Voice and voice command recognitionHand gesturesBody Motion Tracking * Kinect for Windows SDK 2.0 includes Window 7 and above compatible PC drivers for Kinect device. It provides Kinect capabilities to developers to build applications with C++, C#, or Java by using Processing and includes following features:Raw sensor streams: Access to low-level streams from the depth sensor, color camera sensor, and four-element microphone array.Skeletal tracking: The capability to track the skeleton image of one or two people moving within Kinect’s field of view for gesture-driven applications.Advanced audio capabilities: Audio processing capabilities include sophisticated acoustic noise suppression and echo cancellation, beam formation to identify the current sound source, and integration with Windows speech recognition API.Sample code and Documentation. * Libusbk is a complete driver/library solution for vendor class USB device interfaces. libusbK encompasses a 100% WinUSB compatible API/function set. All WinUSB power/pipe policies are fully supported by the libusbK driver. In Addition, libusbK has full support for isochronous endpoints and an extensive set of additional modules to simplify development. Problem FormulationOur purpose is recognition of hand gestures by tracking the movement of the hand and finding its trajectory. To do so we need to discriminate hand from the other part of the body.So the algorithm that is to be designed must be able to resolve all the above issue. It must be capable of discriminating between hand and another body part, must be able to record the trajectory of the hand movement, must be able to recognize the hand gesture and give the desired result.We worked on designing an algorithm which can recognize single-stroke hand gestures perfectly and instantly. So in order to obtain a perfect and quick result, we have to apply machine learning. Now we had to design an algorithm for machine learning so that it can help the gesture recognition algorithm to learn and generate a quick and perfect result. Problem SolutionThe method we propose, presented in figure below, uses an average point hand tracking algorithm to track the hand of the user and saves a trajectory obtained from the centers of the tracked region. The saved trajectory is then segmented into strokes. Considering all the gestures used are composed of a reduced number of strokes, information like the number of strokes, average angles with the horizontal axis, angles with neighboring segments and segments proportionality can be easily derived from the segmented trajectory. Finally, based on the extracted features the gesture is uniquely identified. TrackingThe region containing the hand to track must be firstly selected. Then, a mask is applied to the HSV image in order to eliminate the pixels which have too small 6th WSEAS Int. To track the user’s hand by the use of Kinect depth camera. We created a min and maximum threshold value for the depth camera under which the user will make hand gestures. After that, the algorithm will find the pixels which are closer to the camera which will the pixels representing the user’s palm. Now by using the average point calculation algorithm, we can find the centroid of the palm. Once that s done the program then tracks the motion of that centroid until it becomes static.While tracking we record static trajectory in other directory and the dynamic trajectory in another directory in order to distinguish between the starting point, end point and the gesture made.Trajectory SegmentationThe consecutive centers of the tracked region define a relatively rough (noisy) trajectory, increasing the difficulty of strokes detection. Therefore the trajectory is smoothed so that each new recorded trajectory point, t i, is obtained as a weighted average of the new measured point, m I , and the previous trajectory point, t i?1: ti= ?mi+(1-?)ti-1The recording of a new gesture trajectory is triggered by a movement of the user’s hand occurring after a short interval (1-2 seconds) of a static position. Minimum thresholds are imposed on the amplitude and speed of the movement in order to avoid false triggering due to tracking noise or hand trembling. The gesture trajectory recording ends when the movement speed falls below the imposed threshold for at least 2 seconds. A set of angles with the horizontal axis is computed over the recorded trajectory. Computing the angle for each small segment determined by two consecutive points of the trajectory may result in a very noisy angle set, with many false angle discontinuities. This noise is caused by angles between trajectory points that are relatively close to each other because the image is sampled on a rectangular grid. Selecting a reduced number of trajectory points using a fixed step (e.g. choosing each second or third point of the trajectory) results in a relatively smoothed angle set. Improved results can be obtained by adaptively selecting trajectory points based on a threshold distance. Even with the fixed step selection, a distance threshold must be imposed in order to avoid computing the angle if the two points have the same position.Trajectory SegmentationIn order to split the trajectory into strokes, the ends of these strokes must be detected. The starting point of the trajectory is also the starting point of the first segment, and the end point of the trajectory is the end point of the last segment. All other strokes’ endpoints are detected as angle discontinuity points, the starting point of each segment is the end of the previous segment. A stroke discontinuity is detected as a point between two small segments which have significantly different angles with the horizontal axis. For this purpose a derivative over the angles set is computed: d?/dii= (?i- ?i-1)-(i-(i-1))= ?i- ?i-1All the maximums of the absolute value of this derivative which exceed an imposed threshold indicate an important angle discontinuity and correspond to stroke ends. Usually, a threshold of 30° is enough to reject the small segments angle noise. When computing the derivatives, special care must be taken due to the circular definition of angle, so the angle difference must also be computed on a circular domain : Another threshold must be imposed on the minimum length of a stroke, in order to avoid detection of false strokes. A reasonable value for this threshold is 1/10 of the image height.