If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
An open-source software to facilitate seamless translation from sign language to spoken word.
The software is based on a dual part system between Unity and Python.
The software uses Artificial Neural Network machine learning to create a highly accurate model for each individual user.
The software uses the Leap Motion Sensor to allow for fast, accurate, and simple-to-use capture of hand data.
The software is intended as a platform for future development on the subject of sign language translation.
There is a need for a system that is easy to use, accurate, and portable to translate from American Sign Language (ASL) to English. The proposed innovative system combines Unity’s usability with Python’s machine learning capabilities to create a platform for real time translation of ASL. The Leap Motion Controller is used to capture hand movements and information to create supervised machine learning models. This system will be able to work in an adaptive way to learn from the signer how they sign individual words to allow for a more robust accuracy than a general model for sign language recognition.
Sign language is a critical method of communication for many who are hearing or speech impaired, allowing for swift and efficient communication between those who know it. However, there are not many easy ways for those who are deaf or hearing impaired to communicate with those who do not know sign language. There have been many attempts in the past to create a system that maintains high accuracy of recognition, while at the same time maintaining portability, and most methods fail at one or both of these issues [
]. LeapASL is intended as an easy-to-use platform to allow for simple and instant creation of a highly accurate personalized machine learning model to translate from signed American Sign Language (ASL) into spoken English using the Leap Motion Sensor [
The Leap Motion Sensor is chosen for this project due to its high level of portability, low overhead, and ease of setting up. The software currently expects it to be sitting on some fixed surface in front of the user, but other research has shown success with the sensor draped around the neck, using the signer’s chest as the fixed position from which hand tracking can occur [
Split into two parts, the program will have model creation and model use. In model creation the user will teach an Artificial Neural Network how they perform each individual sign. Then, in model use the user will be able to sign any word within the dictionary of the program, allowing for instant translation that is then displayed on screen and spoken aloud. The user interacts with the program using the Leap Motion Sensor allowing for the speed of hand recognition necessary for ASL interpretation, a wide array of data points captured, and the portability necessary for everyday easy use [
LeapASL is a project primarily based in Unity, as that is the standard method of communication with the Leap Motion Sensor. The goal of the software is to capture hand data using Unity, and then use that data to create Supervised Machine Learning models in Python, taking advantage of the robust and malleable machine learning models available in Python libraries. The current iteration of the project uses a standard Artificial Neural Network model to create a model that is as accurate as possible. LeapASL is intended in addition to be a platform for future work. Different machine learning models could be used, different points of data could be collected from the Leap Motion Sensor, and in general the project could be used as a jumping off point. Due to the two-half nature of the software it is relatively simple to begin altering one half of the software and observe the changes made within the other.
2. Design and implementation
2.1 User interface and Leap Motion Controller interaction
Within the Unity aspect of the project the user is immediately greeted with three options to select from: ‘Guessing’, wherein the user can sign words that will be guessed and spoken; ‘Training’ wherein the user can select one of the available words and allow the system to collect data on how they sign that word; and finally, ‘Create Model’, wherein pressing the button will launch a python script to create a new model from the current set of training data. Each of these submenus is based on simple Unity functionality, allowing for an intuitive user interface (see Fig. 1).
When the button to train the model is chosen a new scene will load. Within this portion of the program the user will be shown thirteen possible words to train on, although this number is arbitrary and could be easily increased to test the accuracy of the model when put up against more words. When the user selects a word, another scene is loaded, and within this scene are the objects to allow for interaction between the Leap Motion Sensor and Unity. The objects are created whenever the sensor recognized hands, so the scripts use this to determine that they should begin collecting data from the hands. As a snapshot of the current hand position is taken, two hand objects will have each of their attributes assigned to a particular variable. All data points will then be concatenated in the correct order and written as a new line in the HandData csv file. However, if the hand data that is about to be written is identical to the hand data that was previously written, this means that the sensor is not reading that hand currently, in this case it is assumed the user is intending for the sign to only use one hand, as many signs in ASL do, and that hand’s data will be recorded as empty values to show that the hand is out of scope. In addition, during data collection there are two red bars at the bottom of the screen which serve to denote if data is currently being collected for each hand, as while data is being collected the bars will turn green (see Fig. 2, Fig. 3).
Once the user has collected data for all signs currently within the vocabulary, they should select the option to ‘Create Model’. This option is rather simple, as it will simply make a call to a function to run the python script called ‘save.py’ in which the model is created and then saved. At this point the user can begin to use the program as intended by selecting the ‘Guessing’ option. This will load the same scene as selecting a word within ‘Training’, but it will be in guessing mode, and a call to start up to the ‘load.py’ script will occur. Once this python script loads the Unity Script and Python Script will begin to communicate using a local server plugin and active communication will begin (see Fig. 4). Unity will take a snapshot of a hand position, concatenate all the data collected into a string, and then that string will be sent to Python, wherein it will be decoded, normalized, and the machine learning model prior created will be run over it. The guess that is generated by the Python script is then sent as the reply to Unity who will then add it to the next spot in an array that is being looped over. Whenever a new word is the majority within the array that word is shown on screen and spoken aloud using text to speech. This allows for continuous sign language recognition as the system will automatically recognize when a new word has been signed. Different variables such as the size of the array can be altered to change performance of this mode and could possibly be implemented as a user setting as a way to choose how fast signing occurs. i.e., a smaller array be more appropriate for fast signing, while a larger array would be more appropriate for slower signing.
2.2 Machine learning implementation using Python libraries
In its current form the Python aspect of the program serves primarily to do two things, create a machine learning model from a set of data, and identify new points of data based on that model. The current algorithm in use is an Artificial Neural Network, which boasts high accuracy at the cost of slow model creation, and in this particular use case the model only needs to be created one time, after the user has recorded all of their input data, making it a good fit for this particular job. In this case of course we want to do supervised machine learning since we have a specific set of outputs, we are looking to identify the data with (see Table 1).
Both scripts use a similar set of python libraries. The model creation script is using Pandas and NumPy to allow for table creation and alteration, Scikit-Learn to allow for training test splitting, label encoding and standard scaling, and Keras to create the artificial neural network and transform encoded labels into categorical data [
Within the python script dedicated to creating the model there are many steps being taken to increase the accuracy of the final model. Artificial Neural Networks require the data to be normalized and lacking empty spaces to function properly, and for this we use several functions. To remove any empty spots we simply use ‘data.fillna(0)’ which is a built in function within pandas [
]. Then, we need to convert the words “True” and “False” into ones and zeros to allow for numerical normalization, and for this we create a dictionary that can be easily applied to each column that requires this conversion. We then use standard scaler to normalize all the data and create a normalization model. After this we simply create the artificial neural network with the 37 input layers and 13 output layers, and currently no hidden layers, although the accuracy of this addition could be examined in future work. Once the model is finished being created, we save the standard scaler model, the label encoder so we know what numerical output corresponds to what word, and the artificial neural network itself using the pickle library for the first two and the built in keras save function for the third.
Within the python script dedicated to conversing with Unity and identifying the new data most of the same scripts are being used. Pandas, numpy, and keras are all being used to allow for conversion of the string into a usable table of data [
]. Once the connection has been established the Artificial Neural Network is loaded and prepared to be used over each piece of incoming data. Each line of incoming data has the same true/false dictionary and standard scaler model applied. Once the guess has been generated it is then simply sent back as a string to Unity to be processed there.
LeapASL has the potential to be greatly impactful in two very important ways. The first way it can have an impact is as its intended use as a personalized software for ASL recognition and translation. The Leap Motion Sensor gives excellent performance is many environments and situations, requires minimal setup, and gives highly accurate hand data. Experiential results using data collected from the Leap Motion Sensor have shown a high level of accuracy at identifying different words across multiple machine learning techniques [
]. The use of powerful machine learning allows the speech or hearing-impaired user to create their own personal model for how they sign each word, allowing for high levels of personal accuracy and the ability for the program to adapt to different regions unique methods of signing the same words.
The second major way LeapASL has the potential to make an impact is as a platform for future work. Due to the split of the program into Python and Unity someone familiar with only one of these languages could make drastic changes and improvements without having to do many, if any, changes to the other aspect of the program. As a jumping off point into testing different machine learning methods or different points of data to collect the software has the simplicity to be used for this purpose. It is an excellent starting point for both future improvements and future research.
4. Conclusion and future work
In terms of future work planned for the software or future improvements there are many aspects that could be improved or directions the software could be taken. For example, a possibility for future work is to attempt to have sentence-level recognition, where instead of guessing word by word you also attempt to predictively guide a sentence or predict sequences of words. Another feature that could be added is the possibility of different user profiles, not only to allow for multiple people to use the same setup without their models interfering, but also for easier model testing to see the real efficacy of the current model. Another possibility for a future feature would be the ability to test and record the accuracy of a given word, wherein the user would choose a specific word and would repeatedly sign it to test how accurate the model is. In this case the accuracy could be displayed to the user with a specific cutoff point where it tells the user it needs to record more data for that sign. Another feature that could be implemented is the ability to change the speed of word prediction, whereas the program is now it predicts in a constant speed with a certain number of guesses recorded to display a certain word, but in the future, it could be implemented such that those who sign faster could increase the speed of the program, and of course it could be slowed down as well. There are many possible features and directions one could take with this software, and its potential as a possible platform for future work is rather high.
In conclusion, LeapASL is a Unity program that allows for seamless communication with both the Leap Motion Sensor for hand tracking and Python for powerful machine learning techniques. An artificial neural network in combination with the user to creating their own personal model produces a system with very high personal accuracy. More information about methodology, sensor choice, results, models, and experimental results can be found in prior publication of this research [