Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Layer] add tanh-based approximate gelu activation function #2658

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

baek2sm
Copy link
Contributor

@baek2sm baek2sm commented Jul 1, 2024

  • add tanh-based approximate gelu(tanh gelu) for vision transformer.
  • rename quick gelu to sigmoid gelu(it's a sigmoid-based approximate gelu)

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

@taos-ci
Copy link
Collaborator

taos-ci commented Jul 1, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2658. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Member

@skykongkong8 skykongkong8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue that is directly related in this PR, but seems activation functions in this implementation is not using SIMD implementation for half-precision at all. (For example, swish, softmax)
As you might know already, mathematical implementations are not really desired to be leaved here... Think we need to discuss about the implementation structure of the activation function computation in the near future.

Copy link
Member

@DonghakPark DonghakPark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is currently from the branch in the upstream, not a forked repo!

Copy link
Member

@SeoHyungjun SeoHyungjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

ACT_MISH, /**< Mish */
ACT_NONE, /**< no op */
ACT_UNKNOWN /**< unknown */
ACT_TANH, /**< tanh */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a conflict. I think it will be fine if you just fix this.

I added pr 2665 which adds the missing activation type after this pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I rebased it.

Copy link
Contributor

@lhs8928 lhs8928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there are many GELU approximations but this one is representative because it mentioned in GELU paper.
So please consider that rename this activation as approximate gelu.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cpp-linter Review

Click here for the full clang-format patch
diff --git a/nntrainer/layers/acti_func.h b/nntrainer/layers/acti_func.h
index 8988a75..9e43219 100644
--- a/nntrainer/layers/acti_func.h
+++ b/nntrainer/layers/acti_func.h
@@ -476,2 +476,7 @@ public:
-      [&](T x) { return static_cast<T>(
-        0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out);
+      [&](T x) {
+        return static_cast<T>(
+          0.5 * x *
+          (1 + tanhFloat<T>(
+                 static_cast<T>(sqrt(2 / M_PI) * (x + 0.044715 * pow(x, 3))))));
+      },
+      t_out);
@@ -490,2 +495,2 @@ public:
-                           Tensor &outgoing_derivative,
-                           Tensor const &incoming_derivative = Tensor()) {
+                               Tensor &outgoing_derivative,
+                               Tensor const &incoming_derivative = Tensor()) {
@@ -493 +498,2 @@ public:
-    ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented");
+    ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is "
+            "not yet implemented");
@@ -505 +511,4 @@ public:
-      [&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out);
+      [&](T x) {
+        return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x))));
+      },
+      t_out);
@@ -517,3 +526,4 @@ public:
-  static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
-                           Tensor &outgoing_derivative,
-                           Tensor const &incoming_derivative = Tensor()) {
+  static Tensor &
+  sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
+                   Tensor &outgoing_derivative,
+                   Tensor const &incoming_derivative = Tensor()) {
@@ -521 +531,2 @@ public:
-    ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented");
+    ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu "
+            "function is not yet implemented");
diff --git a/nntrainer/layers/common_properties.h b/nntrainer/layers/common_properties.h
index 9a9bb0c..4a94f84 100644
--- a/nntrainer/layers/common_properties.h
+++ b/nntrainer/layers/common_properties.h
@@ -917,4 +917,3 @@ struct ActivationTypeInfo {
-  static constexpr const char *EnumStr[] = {"tanh",    "sigmoid",    "relu",
-                                            "softmax", "leaky_relu", "swish",
-                                            "gelu",    "tanh_gelu",  "sigmoid_gelu",
-                                            "none",    "unknown"};
+  static constexpr const char *EnumStr[] = {
+    "tanh", "sigmoid",   "relu",         "softmax", "leaky_relu", "swish",
+    "gelu", "tanh_gelu", "sigmoid_gelu", "none",    "unknown"};

Have any feedback or feature suggestions? Share it here.

Comment on lines 476 to 477
[&](T x) { return static_cast<T>(
0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format suggestions

Suggested change
[&](T x) { return static_cast<T>(
0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out);
[&](T x) {
return static_cast<T>(
0.5 * x *
(1 + tanhFloat<T>(
static_cast<T>(sqrt(2 / M_PI) * (x + 0.044715 * pow(x, 3))))));
},
t_out);

Comment on lines 490 to 491
Tensor &outgoing_derivative,
Tensor const &incoming_derivative = Tensor()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format suggestions

Suggested change
Tensor &outgoing_derivative,
Tensor const &incoming_derivative = Tensor()) {
Tensor &outgoing_derivative,
Tensor const &incoming_derivative = Tensor()) {

Tensor &outgoing_derivative,
Tensor const &incoming_derivative = Tensor()) {
// NYI
ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format suggestions

Suggested change
ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented");
ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is "
"not yet implemented");

* @param[in] t_in input tensor
* @param[in] t_out output tensor
*/
template <typename T = float>
static Tensor &quickGelu(Tensor const &t_in, Tensor &t_out) {
static Tensor &sigmoidGelu(Tensor const &t_in, Tensor &t_out) {
t_in.apply<T>(
[&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format suggestions

Suggested change
[&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out);
[&](T x) {
return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x))));
},
t_out);

Comment on lines 517 to 519
static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
Tensor &outgoing_derivative,
Tensor const &incoming_derivative = Tensor()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format suggestions

Suggested change
static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
Tensor &outgoing_derivative,
Tensor const &incoming_derivative = Tensor()) {
static Tensor &
sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
Tensor &outgoing_derivative,
Tensor const &incoming_derivative = Tensor()) {

Tensor &outgoing_derivative,
Tensor const &incoming_derivative = Tensor()) {
// NYI
ml_logw("quickGeluPrime which is calculate derivate of quickGelu function is not yet implemented");
ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format suggestions

Suggested change
ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented");
ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu "
"function is not yet implemented");

Comment on lines 917 to 920
static constexpr const char *EnumStr[] = {"tanh", "sigmoid", "relu",
"softmax", "leaky_relu", "swish",
"gelu", "tanh_gelu", "sigmoid_gelu",
"none", "unknown"};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format suggestions

Suggested change
static constexpr const char *EnumStr[] = {"tanh", "sigmoid", "relu",
"softmax", "leaky_relu", "swish",
"gelu", "tanh_gelu", "sigmoid_gelu",
"none", "unknown"};
static constexpr const char *EnumStr[] = {
"tanh", "sigmoid", "relu", "softmax", "leaky_relu", "swish",
"gelu", "tanh_gelu", "sigmoid_gelu", "none", "unknown"};

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.

@baek2sm baek2sm changed the title [Layer] add tanh-based approximate gelu activation function [WIP][Layer] add tanh-based approximate gelu activation function Jul 3, 2024
- add tanh-based approximate gelu(tanh gelu) for vision transformer.
- rename quick gelu to sigmoid gelu(it's a sigmoid-based approximate gelu)

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cpp-linter Review

No concerns from clang-format.

Great job! 🎉

Have any feedback or feature suggestions? Share it here.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.

@baek2sm baek2sm changed the title [WIP][Layer] add tanh-based approximate gelu activation function [Layer] add tanh-based approximate gelu activation function Jul 4, 2024
Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants