Our take on human machine interfaces: Bio-potentials, Deep Learning and implementation on the Apple Watch
Human machine interfaces evolve at a slower pace than other technologies we are used to. The mouse and keyboard, introduced in the Mother of All Demos was one such landmark. Another landmark moment was introduced by Steve Jobs at the iPhone reveal. At Wearable Devices we are taking a shot at another interface evolutionary step. I know this sounds a bit grandiose but I tell it as it is 😀
The touchscreen is not a good interface for a smartwatch. We would like to control the smartwatch single handed and effortlessly using only our fingers.
To this end, we have developed a complete system centered around bio potential sensing. Whenever we move our fingers, μVolts are emitted from our nerves, pass through the layers of our skins and are converted into electrical current via a procedure called ionic exchange. Our wearable sensing system is based around this phenomenon:
Mudra Band
In this blog, I would like to highlight our SW development approach and procedures. Our goal — to optimize a deep meta-learner which can generalize well on streams of real-time sensor data. If not — it can be tuned to do so. So first, we acquire the data. This procedure is done in-house, we bring different users, with different “profiles” (age, gender, wrist circumference etc.) to our office. We ask these users to perform a wide range of gestures, in different positions, postures and varying intensities.
Second, we annotate the data itself. This step is similar to manual annotation of bounding boxes, such as in the YOLO¹ detector . For bio-potentials, we need a “bounding line” since we only have a single x axis (time). The manual annotation looks like so:
Thumb Gesture — Raw Data (with orange annotation)
At the end of the day (this takes a few weeks actually 😄), we have a train-validation set with manually annotated values. We also measure the amount of pressure between each finger relative to an external sensor. We then proceed to pre-process the data. Why? Since we don’t want to rely only on deep learning (on first order gradient descent to be precise). Another way of looking at it is we want to present a classifier to a sub-group of the data, instead of having to learn basis functions from scratch. This philosophy is similar to Stephen Mallat’s Scattering Net² approach.
Thumb Gesture — Wavelet pre-processing
This approach enables us to introduce some prior knowledge in the form of basis functions, since we want to increase the SNR. This means that some artifacts, such as motion\friction artifacts should be cancelled. Such artifacts usually have a mixture of low and high frequencies and are notoriously difficult to cancel with standard (Fourier based) filters.
Twist Gesture — Note the (red) motion artifact at the end
Note the difference in the wavelets between gestures and the motion artifact following the “twist” gesture (red sensor \ red wavelet).
The next step is where the fun begins… We design our neural net. I cannot go in depth on the exact architecture, yet note that we have multiple inputs (from multiple sensors) and multiple outputs (classification for left and right hand, pressure regression etc.)
Multi Task Learning Architecture
This neural net is then trained and tested. Tests are performed on a set from a “different distribution”. In this case we adopt a worst-case approach and test our neural net on subjects who experience Mudra for the first time. Such users are one-time subjects, they perform the gestures following an on-boarding procedure.
Classifier feature space evolution
The above figure illustrates the challenge in optimizing a neural net on the data presented above (each color represents a single gesture observation, gray denotes noise). The net undergoes abrupt changes at the beginning of the training. This effect can be mitigated with different optimizers, such as AdamW³ and different architectures. However, we note the difficulty of training neural networks, as described in the Edge of Chaos theory⁴, is such that these solutions do not necessarily yield a superior model.
The final phase of building a model, arguably the most important one, is testing. We test our final model on multiple metrics, such as performance and robustness to noise. We test performance on entire streams:
Index Gesture stream
We also visualize the internals of the net (still proprietary) and visualize the sensitivity of the net:
Sensitivity visualization
Note the sensitivity around the gesture. The model is sensitive to both low and and high frequencies of the wavelet. Near the beginning and termination of the gesture, the model is sensitive to high frequencies only. Also take heed of the lack of sensitivity to a friction artifact near the end of the window.
Once the model meets all the above criteria, we move forward to production. We convert our model to tflite using Tensorflow 2.x, which saves us the trouble of coding all the mathematical operations ourselves. Tflite excels at this. Inference mechanisms for Tensorflow are coded using the model subclassing API. The idea is summarized in the following pseudocode:
import tensorflow as tf
from tensorflow.keras import Model, layers
class GestureMetaLearner(Model):
def __init__(self, batch_size, spectral_lvls, num_sensors, window_len_snc, window_len_imu, is_training=True):
super(GestureMetaLearner, self).__init__()
# NN high-level params
self.params1 = 1
self.params2 = 2
self.params3 = 1
# Layer definitions
self.conv1 = layers.Conv2D(self.params1, activation=tf.nn.elu, kernel_size=(self.params2, self.params3), name='conv1')
# Inference
@tf.function(input_signature=[tf.TensorSpec(shape=[None, N1], dtype=tf.float32, name='snc'),
tf.TensorSpec(shape=[None, N2], dtype=tf.float32, name='imu'),
tf.TensorSpec(shape=[None, N3], dtype=tf.float32, name='euler')],
autograph=False)
def call(self, snc, imu, euler):
with tf.name_scope('input'):
snc = self.conv1(snc)
# ...
return task1, task2, task3...
The above model can be saved to a tflite file like so:
import tensorflow as tf
# Save model to pb file (SavedModel format)
tf.saved_model.save(meta_learnen, model_dir, signatures=meta_learner.call)
print('pb file saved')
# Optimize SavedModel with tflite
converter = tf.lite.TFLiteConverter.from_saved_model(model_dir)
converter.target_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
open(os.path.join(model_dir, 'gesture_meta_learner.tflite'), "wb").write(tflite_model)
print('tflite file saved')
Once our model is saved, the best method (currently) to have it running on the Apple Watch is by converting the model to Core ML, see here on how we do this. I will update the blog when the TF Core ML delegate is production ready. On a side note, all the rest of the pipeline (pre-processing, calibration etc) is written in modern c++, you can check out another blog here for more details.
Mudra Band for Apple Watch - tech Demo
[1]: Redmond et al. You Only Look Once: Unified, Real-Time Object Detection
[2]: Joan Bruna and Stéphane Mallat. Invariant Scattering Convolution Networks
[3]: Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization
[4]: Greg Yang and Samuel S. Schoenholz. Mean Field Residual Networks: On the Edge of Chaos
Original blog post was published on Medium
댓글