Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why (c, h) tuple? #2

Open
soloice opened this issue Mar 28, 2018 · 1 comment
Open

Why (c, h) tuple? #2

soloice opened this issue Mar 28, 2018 · 1 comment

Comments

@soloice
Copy link

soloice commented Mar 28, 2018

对于 LSTM 来说,状态应该是 (c, h) tuple,因为下一步计算需要用到前一步的 c 和 h。
但是对于 SRU,下一步只需用到前一步的 c ,call 函数返回 new_h, new_c 即可,无需返回 new_h, (new_c, new_h)。

@soloice
Copy link
Author

soloice commented Mar 28, 2018

就是说,

def call_without_highway(self, x, state, scope=None):
        with tf.variable_scope(scope or type(self).__name__):                        
            c, _ = state
            x_size = x.get_shape().as_list()[1]
            
            W_u = tf.get_variable('W_u', [x_size, 3 * self.output_size])
            b_f = tf.get_variable('b_f', [self._num_units])
            b_r = tf.get_variable('b_r', [self._num_units])

            xh = tf.matmul(x, W_u)
            z, f, r = tf.split(xh, 3, 1)            

            f = tf.sigmoid(f + b_f)
            r = tf.sigmoid(r + b_r)            

            new_c = f * c + (1 - f) * z
            new_h = r * tf.tanh(new_c)

            return new_h, (new_c, new_h)

可以改成

def call_without_highway(self, x, state, scope=None):
        with tf.variable_scope(scope or type(self).__name__):                        
            c = state  # first modification
            x_size = x.get_shape().as_list()[1]
            
            W_u = tf.get_variable('W_u', [x_size, 3 * self.output_size])
            b_f = tf.get_variable('b_f', [self._num_units])
            b_r = tf.get_variable('b_r', [self._num_units])

            xh = tf.matmul(x, W_u)
            z, f, r = tf.split(xh, 3, 1)            

            f = tf.sigmoid(f + b_f)
            r = tf.sigmoid(r + b_r)            

            new_c = f * c + (1 - f) * z
            new_h = r * tf.tanh(new_c)

            return new_h, new_c  # second modification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant