Convolutional Neural Net used for an orientation estimation

5 min readApr 9, 2020

There is an infinite number of possible ways in which machine learning can be used. For a data scientist, it might be interesting to use the ML model for an object’s orientation estimation based on data gathered from IMU (Inertial Measurement Unit). A basic tool that is used by most of the researchers and practitioners is a Kalman Filter e.g. the Extended Kalman Filter¹. Although there is no need to improve Kalman filter itself as it operates optimally by minimizing the estimate’s variance (ok, there is, but we are not going do this in this short post), some complicated problems can be solved by the DeepLearning techniques.

Let’s assume that we try to estimate an orientation using biased sensors. In a classical approach, to do that correctly, we would first need to calibrate these sensors, then use an estimator. In the case of ML methods, we can use the power of the trained model, that handles sensor data imperfections like gain, offsets, etc. To estimate orientation in 3D we need to use at least two sensors i.e. an accelerometer and a magnetometer. But to get smoother results and less prone to linear motion and magnetic field distortions, we also need a gyroscope. The first two sensors can be used statically as their readings are enough to estimate the orientation (this optimization process refers to Wahba’s problem²). The gyroscope provides angular velocity expressed in the local coordinate system, which is used along with timestamps. In a classic solution, we would integrate the velocity to get the residual of a previous and a current angle or quaternion.

Ok, that’s it when it comes to the basics of estimators, now it’s time for the merits. To show how to use ML in state estimation (state here means quaternion) I did the following:

I put IMU on 3-Axis Robotic Arm,
Gathered data from robotic arm axes — velocity and angle
I used forward kinematics to get the orientation of the robotic arm end effector (expressed in a coordinate reference system used by the IMU sensor)
Based on the orientation, I generated “virtual sensor” data i.e. acceleration, velocity, and magnetic field. This data can be directly compared to the real data from the IMU like in the picture below, where real data depicted here was filtered with a low pass filter.

Finally, I generated training data in the following format: input: [acceleration, velocity, magnetic field], output: [quaternion]. In order to learn the model (tensorflow+keras), the input uses only the generated data.

I use a CNN architecture with a few layers and a custom lose function. You may wonder why not to use a predefined lose function, well, the answer is simple: because the output in regression is a quaternion (q) and one of its features is that q=-q and ||q||=1 (note that q=[q_w q_x q_y q_z]). Here is the model:

def build_model():
    inputs1 = keras.Input(shape=(timesteps, n_features), name='sens')
    x1 = keras.layers.Conv1D(filters=128, kernel_size=3, activation='relu')(inputs1)
    x1 = keras.layers.Dropout(0.05)(x1)
    x1 = keras.layers.Conv1D(filters=128, kernel_size=3, activation='relu')(x1)
    x1 = keras.layers.Dropout(0.05)(x1)
    x1 = keras.layers.MaxPooling1D(pool_size=2)(x1)
    x1 = keras.layers.Conv1D(filters=128, kernel_size=3, activation='relu')(x1)
    x1 = keras.layers.MaxPooling1D(pool_size=2)(x1)
    x1 = keras.layers.Conv1D(filters=128, kernel_size=3, activation='relu')(x1)
    x1 = keras.layers.Flatten()(x1)
    x1 = keras.layers.Dense(20)(x1)#activation='relu'
    x1 = keras.layers.Dense(m_features)(x1)
    
    outputs = tf.math.l2_normalize(x1, axis=-1)

    model_out = keras.Model(inputs=inputs1, outputs=outputs, name='quat_model')

    return model_out

And the custom loss function:

def loss(y_true, y_pred):
       
        eps = 1e-8
        y_pred_norm = tf.math.l2_normalize(y_pred, axis=-1, epsilon=eps)  # quat.normalize(y_pred)
        y_true_norm = tf.math.l2_normalize(y_true, axis=-1, epsilon=eps)  # quat.normalize(y_true)

        w, xyz = tf.split(y_true_norm, (1, 3), axis=-1)
        y_true_norm_conj = tf.concat((w, -xyz), axis=-1)

        w1, x1, y1, z1 = tf.unstack(y_true_norm_conj, num=4, axis=-1)
        w2, x2, y2, z2 = tf.unstack(y_pred_norm, num=4, axis=-1)
        x = x1 * w2 + y1 * z2 - z1 * y2 + w1 * x2
        y = -x1 * z2 + y1 * w2 + z1 * x2 + w1 * y2
        z = x1 * y2 - y1 * x2 + z1 * w2 + w1 * z2
        w = -x1 * x2 - y1 * y2 - z1 * z2 + w1 * w2

        res = tf.stack((w, x, y, z), axis=-1)
        res_norm = tf.math.l2_normalize(res, axis=-1, epsilon=eps)

        wr, xr, yr, zr = tf.unstack(res_norm, num=4, axis=-1)

        dw = tf.abs(1-tf.abs(wr))
        dx = tf.abs(xr)
        dy = tf.abs(yr)
        dz = tf.abs(zr)


        res = tf.stack((dw, dx, dy, dz), axis=-1)
        loss_inner = tf.reduce_mean(res, -1)

        return loss_inner

The loss function realizes the following operations:

Note that q_t stands for a generated quaternion and q_p stands for a predicted quaternion. In the above equations, two operations are used i.e. quaternion conjugate (*) and quaternion multiplication.

Before learning the model, we need to temporalize our sensory data. In simple words, we need to add the third dimension to our data which is “timesteps”. The picture below depicts how to accomplish this operation.

A structure of the final tensor used during learning. Note that it has three dimensions i.e. samples, timestamps, and features

By looking at the picture we can clearly see that from the original file which consisted of timesteps and features (sensory data) we created a tensor where the first dimension has samples from 1 to m. Each sample has n timesteps (second dimension) and each timestep has 9 features (third dimension). Sample data overlap in n-1 timesteps (but this is not a rule and you can use 50% overlap instead or not overlap them at all)

I learned the model with 60k timesteps of generated sensory data. For the settings:

epochs = 20
batch = 128
timesteps = 64

I got the following result:

test loss: [0.005214414036389666]

The result is pretty good, but to show that this makes sense I replaced the accelerometer and gyroscope data with actual data from the real sensor with a bias of 0.8m/s² added to the accelerometer’s all axes. The results are as follows:

test loss: [0.012366948514218958]

The estimation results of the ML model on real sensory data. Note that the first part is presented correctly as the q=-q *is a natural quaternion’s feature.*

Conclusion

In the post, I’ve shown how the ML model can be used for an orientation estimation problem. As I mentioned above, the solution is not better than a classical one with a Kalman Filter in terms of computational complexity. But it is a different approach, that might be interesting for those who try to address multidimensional problems with sensory data or process data that is strongly biased. I hope you found this post interesting. Comments are welcome!

References:

[1]: J. L. Marins, Xiaoping Yun, E. R. Bachmann, R. B. McGhee, and M. J. Zyda, “An extended Kalman filter for quaternion-based orientation estimation using MARG sensors,” Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the Next Millennium (Cat. №01CH37180), Maui, HI, USA, 2001, pp. 2003–2011 vol.4.

[2]: Quaternion Attitude Estimation Using Vector Observations, https://www.researchgate.net/publication/253223833_Quaternion_Attitude_Estimation_Using_Vector_Observations

Convolutional Neural Net used for an orientation estimation

Written by Jaroslaw Goslinski