AI is better than humans in many ways such as disease
diagnosis, board game fraud detection and etc. However, when it comes to
question that required a lot of thinking like this:
What size is the cylinder that is left of the brown metal
thing that is left of the big sphere?
A child could answer this question easily. However, the
traditional deep learning models just wont show you a reliable result.
Why deep learning isn’t enough
Deep learning models always showing fantastic result at
understanding relationships between inputs and outputs. This is pretty enough
for problems like classification and perception. However, we want AI to be able
to make decision using human reasoning, what we call “common sense”.
Deep reasoning
Deep reasoning is the field where machines were unable to
understand complex relationship with different idea. For example: “all animals eat”. “Dogs are
animal”.
Here, human can quickly find out this implicit relationship
that all Dogs eat. However, it is not so easy for machine to understand how
different things relate to one another. So, how to teach AI with the ability to
reason?
Implementing Deep Reasoning
To handle this type of question, DeepMind researchers have
come out with a solution in the 3 steps:
- Processing the question with long-Short-Term-Memory (LSTM) network
- Process the image using convolutional Neural Network (CNN)
- Understand how the different objects relate to each other using Relational Network (RN).
Language Processing
LSTMs network are pretty good at
understanding the sequences due to their ability to remember the previous part
of the sequence. It works well when dealing with question and language. This is
because the beginning of the question or language always have great impact on
the meaning or influence the at the end.
Besides that, the LSTM also creates an embedding block that’s easier for
the RN to work with.
Image Processing
Since CNNs are good at identifying the features in the
image, it was used to used to extract the objects from the image in the form of
feature-map vectors. Just like the embeddings LSTMs produce, feature-map
vectors are just a more efficient representation of the objects than pixels,
making it easier for the RN to work with.
Relational Understanding
Next, it can start to understand the relationship between
the object in the image once the model has processed the question and the
image. Now RN will learn how to use the relationships to answer the question.
These outputs are then feed into a multilayer perceptron (MLP), a kind of
feedforward neural network, and then those outputs are summed and feed through
the final MLP which produce the output.
Image Processing
Since CNNs are good at identifying the features in the
image, it was used to used to extract the objects from the image in the form of
feature-map vectors. Just like the embeddings LSTMs produce, feature-map
vectors are just a more efficient representation of the objects than pixels,
making it easier for the RN to work with.
Relational Understanding
Next, it can start to understand the relationship between
the object in the image once the model has processed the question and the
image. Now RN will learn how to use the relationships to answer the question.
These outputs are then feed into a multilayer perceptron (MLP), a kind of
feedforward neural network, and then those outputs are summed and feed through
the final MLP which produce the output.
Deep reasoning allows AI to understand the complication
relationship between different “things”. A “relation network” module can easily
be combined into a deep learning model to empower the reasoning capabilities.
Deep reasoning is the next step for AI. It closes the gap between AI and human
and brain.

No comments:
Post a Comment