Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KJ Install Notes on HPE Betakit #69

Closed
kjw3 opened this issue Aug 13, 2020 · 2 comments
Closed

KJ Install Notes on HPE Betakit #69

kjw3 opened this issue Aug 13, 2020 · 2 comments

Comments

@kjw3
Copy link

kjw3 commented Aug 13, 2020

NOTES: The beta kit has 4 blades but an external Mellanox switch and the 1G internal backplane switch.

I chose 2 networks:
External: 192.168.8.0/24 (vlan on my home network)
Internal: 192.168.1.0/24 (isolated vlan 1 on mellanox switch) #NOTE This is the vlan that was already in use when delivered.

I had to reconfigure my nodes to use UEFI to match the preference by HPE admins. I switched all blades to UEFI mode, set the first NVME drive as the boot drive and the first 10G nic for pxe.

Overall Feedback:

  • Getting the bios and network configured the expected way is time consuming and leaves room for human error
  • Having to obtain mac addresses for switch, chassis, ILO interfaces and blade NICS also leaves much room for human error
  • Getting used to the mechanisms in the TUI to add, edit, remove the configurations that were dictionaries took a little exploration and trial/error
  • Need to document or provide a way to reset the generated ip address assignments. Could have created some nice ip conflicts for myself
  • Leaving the ILOs in dhcp mode during first parts of installation leaves them in a state where they are not accessible because the internal network does not have a dhcp server until the bastion is configured with dhcpd. This makes rebuilding the bastion a painful exercise of plugging and unplugging cables, switching networks, etc.
  • Because of the bug with not being able to run farosctl cmds in different sessions and the issue of DNS not listening on the internal network, I had to wait for bootstrap timeout to access the bootstrap node via farosctl to troubleshoot (I'm sure there was probably another way but in the little looking I did, I couldn't find the right key pair to gain access directly)
  • We should probably try to document or recommend a good way to get all the drives wiped and clean. This was a time consuming process to boot an iso on each blade to do this. Not sure why it was so slow. But it was. :)
  • Providing a way to clean up the bastion would save a lot of time when you screw up
  • The existing documentation is a good start, but I think it will have to get much more finely detailed throughout the process
  • Detailed documentation of the preconfigs for the EL8K would be very helpful
  • Wipefs and OCS deployment command should not offer root drives in selection
  • OCS deployment went very smoothly though
  • Nvidia GPU operator deployment failed
@rmkraus rmkraus added this to Upcoming in Roadmap Aug 13, 2020
@rmkraus rmkraus moved this from Upcoming to In Work in Roadmap Aug 13, 2020
@rmkraus
Copy link
Member

rmkraus commented Aug 13, 2020

  • Getting the bios and network configured the expected way is time consuming and leaves room for human error

Addressing BIOS config issues is tracked in #51
network config issues are being tracked in #74

  • Having to obtain mac addresses for switch, chassis, ILO interfaces and blade NICS also leaves much room for human error

Acknowledged and agreed. No remedy at this time. I don't have a good solution, but this is now tracked in #75.

  • Getting used to the mechanisms in the TUI to add, edit, remove the configurations that were dictionaries took a little exploration and trial/error

Acknowledged. I also don't have a good remedy for this at the moment. Tracked in #76

  • Need to document or provide a way to reset the generated ip address assignments. Could have created some nice ip conflicts for myself

Tracked in #72

  • Leaving the ILOs in dhcp mode during first parts of installation leaves them in a state where they are not accessible because the internal network does not have a dhcp server until the bastion is configured with dhcpd. This makes rebuilding the bastion a painful exercise of plugging and unplugging cables, switching networks, etc.

Tracked in #47

  • Because of the bug with not being able to run farosctl cmds in different sessions and the issue of DNS not listening on the internal network, I had to wait for bootstrap timeout to access the bootstrap node via farosctl to troubleshoot (I'm sure there was probably another way but in the little looking I did, I couldn't find the right key pair to gain access directly)

Tracked in #65

  • We should probably try to document or recommend a good way to get all the drives wiped and clean. This was a time consuming process to boot an iso on each blade to do this. Not sure why it was so slow. But it was. :)

Honestly, I'm not sure of a good way to do that. I have issue #50 to track known pain points. I'd like to bring this up to engineering and have the issue addressed at a RHCOS level.

  • Providing a way to clean up the bastion would save a lot of time when you screw up
  • The existing documentation is a good start, but I think it will have to get much more finely detailed throughout the process

Tracked in #72

  • Detailed documentation of the preconfigs for the EL8K would be very helpful

Tracked in #74

  • Wipefs and OCS deployment command should not offer root drives in selection

Tracked in #77

  • OCS deployment went very smoothly though

  • Nvidia GPU operator deployment failed

Tracked in #68

@rmkraus
Copy link
Member

rmkraus commented Aug 13, 2020

Thank you for the detailed feedback. I've spun off issues for everything. Some have already been addressed. Some will be addressed soon. Some we may just have to live with. I'll close this ticket as all of the issues are being tracked elsewhere.

@rmkraus rmkraus closed this as completed Aug 13, 2020
Roadmap automation moved this from In Work to Released Aug 13, 2020
@rmkraus rmkraus removed this from Released in Roadmap Aug 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants