Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AVX] Added VCVT (between floating-point and integer) AVX support for x86 architecture #2480

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

djeong20
Copy link
Contributor

This pull request adds VCVT (between floating-point and integer) AVX support for the x86 architecture.
This feature allows efficient conversion between single-precision floating-point and half-precision floating-point data types using the AVX instruction set.

Changes proposed in this PR:

  • Added new VCVT instructions to the blas_interface
  • Fix incorrect meson.build logic in checking gcc version and AVX support
  • Add enable-avx option to enable AVX hardware acceleration

Self-evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

@taos-ci
Copy link
Collaborator

taos-ci commented Feb 20, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2480. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Member

@skykongkong8 skykongkong8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

} else {
for (unsigned int i = 0; i < N; ++i)
Y[i * incy] = static_cast<_FP16>(X[i * incx]);
}
#else
for (unsigned int i = 0; i < N; ++i)
Y[i * incy] = static_cast<_FP16>(X[i * incx]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for this PR or this month, but for later versions (you are now having technical debts here):

maintain the same code in general cpp files across NEON/AVX/None.
centralize architectural dependencies to a corresponding header and its implementation only.

You may have a single header for all three cases and choose a corresponding implementation (.c/.cpp file) at build time, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try the better approach from the next implementation (and fix this when you are ready)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean something like:

// new file : blas_raw.cpp
...
void foo( ... ) {
...
} else {
    for (unsigned int i = 0; i < N; ++i)
      Y[i * incy] = static_cast<_FP16>(X[i * incx]);
  }
...
}
...

and use it for all blas_interface.cpp, blas_neon.cpp, blas_avx.cpp ?
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this up! I will keep it in mind for future improvement.

Copy link
Member

@myungjoo myungjoo Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or.. you may have a base class as a set of operators/functions/methods for general CPU operations compatible for ALL systems with virtual functions and a derived class as a set of operators/functions/methods for architecture-dependent (SIMD) operations overriding the virtual functions.

Then, if you don't have an implementation in the derived class (SIMD), the fallback method in the base class will be automatically chosen. If you have an implementation in the derived class and the class is chosen at run-time/build-time, it will be automatically chosen WITHOUT having #if/#endif or even general if statements. You can control that in the base class initializer or "get_instance" method of the singleton design.

Because you still have a tight deadline for the release, don't start refactoring, yet. Try to refactor when you have completed the immenent release or when you start writing a new class or function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such refactoring needs planning and design before implementation. Don't just start without enough discussion and I don't want this refacotring effort hinder the immenent release.

Anyway, for your reference: #2482

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept suggestion : https://github.com/myungjoo/nntrainer/tree/suggestion/refactoring/archdep
(this is not for actual PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both approaches seem great. I'll also research how other frameworks manage this. let's keep the discussion going

meson.build Outdated
avx512_code = '''#include <immintrin.h>
int main (int argc, char *argv[]) { const __m256i vec = {0, 1, 2, 3}; __m512 v = _mm512_cvtph_ps(vec); return 0; }
'''
has_avx512 = cc.compiles(avx512_code, args : '-march=native', name: 'avx512 support')
Copy link
Member

@myungjoo myungjoo Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may proceed and fix it later, but...

Are you sure that the build machine is AVX512 capabile while the target machine is also AVX512 capable? As we build ARM binaries in X64 machines, we may build X64/AVX512 binaries in X64/non-AVX512 machines. We may need to force-enable them (and you will be required to apply proper arch name, in this case, -march=native won't work.) Note that a few instances in our OBS or QB might be not ready for AVX512.

The current meson script is fine for developers and most users.

And, the current script is fine for this release because the automated build system is not applied for x64 targets. (we do it only for arm targets)

But, you will need to consider this afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I thought that if AVX512 is not supported on the build machine, it should not be enabled in the first place. Since there are no AVX512 instructions in our code, I think we can modify it now without any problems.

… x86 architecture

This pull request adds VCVT (between floating-point and integer) AVX support for the x86 architecture.
This feature allows efficient conversion between single-precision floating-point and half-precision floating-point data types using the AVX instruction set.

**Changes proposed in this PR:**
- Added new VCVT instructions to the blas_interface
- Fix incorrect meson.build logic in checking gcc version and AVX support
- Add enable-avx option to enable AVX hardware acceleration

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

@myungjoo myungjoo merged commit 3d68b89 into nnstreamer:main Feb 27, 2024
30 checks passed
@djeong20 djeong20 deleted the blas/avx/vcvt branch March 20, 2024 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants