Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue617 refactor service with history 2 #652

Open
wants to merge 237 commits into
base: master
Choose a base branch
from

Conversation

dhblum
Copy link
Collaborator

@dhblum dhblum commented Jul 1, 2024

This is for #617 and #671 and in place of #622. It does everything from #622, but merges the Service architecture in a way to include the nrel/boptest-service git history.

kbenne and others added 30 commits June 26, 2020 23:36
This commit introduces the concept of separate worker and web server processes. The worker is a distinct docker container that is respnsible for carrying out a simulation. The web server is within a separate container is only responsible for serving http requests. A redis database is used to communicate between the worker and web server. The web server and worker may scale indpendently and it is expected that a deployment would involve many workers to manage multiple simultaneous simulations/tests.

This commit is organized such that it adds to the existing boptest project layout, but it does not disturb any of the existing project layout. The worker depends on and makes extensive use of the existing boptest libraries, especially those APIs in testcase.py as well as the forecast and kpi source code directories.

Currently the web api presented by the web server is conceptually equilvant, but not api compatiable with the existing boptest/master API. A future commit will resolve this difference.

To test this commit:

cd <project-root>
docker-compose build
docker-compose up web worker

After starting the services defined by docker-compose, a web application should be viewable in the browser at http://localhost

* There is a graphql https://graphql.org api viewable at http://localhost/graphql
* This architecture depends on bulk file storage such as Amazon S3. In development mode a mino https://min.io container stands in for S3, and is available at http://localhost:9000. From this interface it is possible to view raw file (test case) uploads and simulation files. The default authentication can be found in the .env file that is included in this commit
* A forecast API has been added, which was not previously available in Alfalfa
Working (although incomplete) twoday example
Update API and add example controller
There is global attribution at the project root
Use redis messages to generate results as they are requested, instead of
the very time consuming approach of computing them on every step
Remove some unwanted Alfalfa isms
@dhblum
Copy link
Collaborator Author

dhblum commented Sep 3, 2024

@EttoreZ @javiarrobas @terrancelu92 This PR is ready for beta testing. Can you checkout the branch issue617_refactorService_with_history2 (https://github.com/ibpsa/project1-boptest/tree/issue617_refactorService_with_history2) and test deployment and use of BOPTEST using the migrated service architecture? You should be able to start with the README and go from there :) Let me know if you're able to do some testing, and report back any issues you find, preferably within the next 1.5 weeks.

FYI @kbenne

dhblum and others added 10 commits September 4, 2024 08:28
* Within worker, add logs when a message is received and immediatly
  after a response has been sent.

* Include HOSTNAME in the test metadata that is stored in Redis. For a
  K8s deployment, HOSTNAME will correspond to the name of the pod
  running the test. This will make it possible to retreive worker logs
  for a misbehaving test.

* When a test is complete, the worker and the associated logs, may no
  longer exist, however the logs will still be available in the log file
  contained within the test payload that is pushed to long term storage.

* These changes pertain to worker, however there is an existing log
  message within the web implementation that logs when a message is sent to
  the worker, but no response is received. Additionally, each message
  between web and worker is given a unique ID, therefore with all of
  this togethor there will be breadcrumbs if a message is dropped.
* The default message timeout is now 20 minutes, and the value is configurable using the
BOPTEST_MESSAGE_TIMEOUT environment variable.
The message subscription timeout is now configured as
BOPTEST_MESSAGE_TIMEOUT + 60 seconds
Copy link
Contributor

@javiarrobas javiarrobas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, it's looking good to me. I just went through it and tried to understand as much as possible. While reading I added a couple of minor comments/suggestions.

docker-compose.yml Outdated Show resolved Hide resolved
service/.gitignore Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest merging this one with the README.md at the root directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No argument from me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but what do you think about in the API request table, somehow having some separation in the requests specific to user/namespace-specific test cases from those to select from the public IBPSA test cases and other functional API requests (e.g. /advance, /results, etc.)? Because I think of those as more advanced user type things that I don't want people to get confused about when just starting to try BOPTEST with our public test cases.

I could propose something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to track the service version anymore after it is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I can make a commit to remove this. I think we might need to coordinate some changes in the web server where the /version endpoint is implemented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbenne Ok, please make a commit to take care of this.

testing/makefile Outdated Show resolved Hide resolved
@javiarrobas
Copy link
Contributor

I've been trying to deploy it myself locally. It seems that upon calling docker compose up web worker provision it provisions only for bestest_hydronic, multizone_office_simple_hydronic, and test_multizone_residential_hydronic testcases. Any idea why?

image

Would it be possible to specify just one testcase instead of provisioning multiple? When working locally, we typically want to interact with just one test case. If we select just one test case to deploy we could save some time and resources.

@javiarrobas
Copy link
Contributor

javiarrobas commented Oct 4, 2024

Another suggestion is to pin all Dockerfile images to linux/x86_64 as we did here to enable cross-platform builds. This is particularly relevant for those running on Apple silicon.

EDIT: Now that I think of it, this may not be needed for images other than worker (which needs pyfmi but is already pinned) since the dependencies already distribute the binaries for Apple Silicon platforms.

@kbenne
Copy link
Contributor

kbenne commented Oct 4, 2024

I've been trying to deploy it myself locally. It seems that upon calling docker compose up web worker provision it provisions only for bestest_hydronic, multizone_office_simple_hydronic, and test_multizone_residential_hydronic testcases. Any idea why?

image

Would it be possible to specify just one testcase instead of provisioning multiple? When working locally, we typically want to interact with just one test case. If we select just one test case to deploy we could save some time and resources.

I'm not sure why some of the testcases are not provisioned. From your log it does appear to be true, but I would double check by calling the GET /testcases endpoint, which will tell you for sure which test cases were provisioned.

The default provision command which is invoked by docker compose up provision is CMD [ "python3", "-m", "boptest_submit", "--shared", "--path", "./testcases/"]. Note the --path argument. The script will glob for FMUs at or below the given path. It is possible to provision a single test case, although it isn't convenient and maybe we should do something to make it better. To provision a single test case you would need to bring up the service with docker compose up web worker and then from a separate terminal run docker compose run --no-deps provision python3 -m boptest_submit --shared --path ./testcases/multizone_office_simple_hydronic/. Note that you can reprovision a test case, so for instance if you are in a development loop working on a test case, you could make your changes, recompile the FMU, and call the provision command again, which will replace the existing test case without needing to bring the service down and back up.

@javiarrobas
Copy link
Contributor

I'm not sure why some of the testcases are not provisioned. From your log it does appear to be true, but I would double check by calling the GET /testcases endpoint, which will tell you for sure which test cases were provisioned.

I'm double checking with the GET /testcases endpoint and I can confirm that other test cases are not provisioned:
image

Notice that it gets stuck with test_multizone_residential_hydronic (not with multizon_residential_hydronic). Not sure if that has to do with anything, but I'm mentioning just in case:

image

My guess is that it does get stuck in the for loop when provisioning the test_multizone_residential_hydronic test case, probably because of me running in a Mac M1 laptop with the worker configured with linux/x86_64 and thus running under emulation. I've encountered issues with that in the past. If that's the case, I'm not sure there is much we can do about it. It seems there are now pyfmi binaries for linux-aarch64 see here, so unpinning the worker image might work, but honestly I don't think it will because the FMUs are still shipping linux/x86_64 binaries.

@dhblum
Copy link
Collaborator Author

dhblum commented Oct 9, 2024

I'm not sure why some of the testcases are not provisioned. From your log it does appear to be true, but I would double check by calling the GET /testcases endpoint, which will tell you for sure which test cases were provisioned.

I'm double checking with the GET /testcases endpoint and I can confirm that other test cases are not provisioned: image

Notice that it gets stuck with test_multizone_residential_hydronic (not with multizon_residential_hydronic). Not sure if that has to do with anything, but I'm mentioning just in case:

image

My guess is that it does get stuck in the for loop when provisioning the test_multizone_residential_hydronic test case, probably because of me running in a Mac M1 laptop with the worker configured with linux/x86_64 and thus running under emulation. I've encountered issues with that in the past. If that's the case, I'm not sure there is much we can do about it. It seems there are now pyfmi binaries for linux-aarch64 see here, so unpinning the worker image might work, but honestly I don't think it will because the FMUs are still shipping linux/x86_64 binaries.

@javiarrobas Hm, will need to look into this. Provisioning actually doesn't start running (or initializing) any of the test cases. All it's doing is copying the FMU files into the Service's storage for when a request to start a test case is received, the FMU is loaded into the worker, upon which then the initialization happens per previous method of local deployment (e.g. with testcase.py).

EDIT, I feel like I ran into this problem before and I need to look back in my notes.

EDIT2, actually, I read the issue more closely - that it's for some reason appending "test_" to the beginning for multizone_residential_hydronic. This I'm not sure I've seen. Need to look still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants