Wednesday, September 6, 2023

ChatGPT actually gave some useful suggestions!


ChatGPT actually gave some useful suggestions!


Thought I'd share them here.

Physical impacts can cause a range of issues, from visible damage to subtle problems that only manifest under specific conditions. Here's a systematic approach to surveying the motherboard for damage:


1. **Visual Inspection**:

   - **Magnifying Glass or Loupe**: Use a magnifying glass or a loupe to inspect the motherboard closely. Look for any visible cracks, broken traces, or damaged components.

   - **Check Connectors**: Ensure that all connectors, including the PCIe slots, are intact and not bent or broken.

   - **Inspect Solder Joints**: Look for any solder joints that might have cracked or become loose due to the impact.

   - **Check for Debris**: Ensure there's no debris or foreign objects lodged between components or traces.


2. **Physical Inspection**:

   - **Press Gently**: Gently press down on various parts of the motherboard to see if there's any unusual movement, which could indicate a broken stand-off or mounting point.

   - **Check the Back**: Turn the motherboard over and inspect the back for any signs of damage or unusual marks.


3. **Functional Testing**:

   - **Power On**: If the visual and physical inspections don't reveal any issues, consider powering on the server. Listen for any unusual sounds and observe any irregular LED patterns.

   - **BIOS/UEFI Check**: Access the BIOS/UEFI and check for any hardware errors or warnings. Some systems run a self-test during startup and might report issues.

   - **PCIe Slot Test**: Since the impact dislodged a card from the PCIe slot, test that specific slot with a known working card to ensure it's still functional. Also, test the previously dislodged card in another slot or system to ensure the card itself isn't damaged.

   - **Stress Testing**: Use software tools to stress-test the system. Tools like Prime95, Memtest86, and FurMark can help identify issues with the CPU, memory, and GPU, respectively.


4. **Check Peripheral Components**:

   - **RAM**: Ensure that all RAM modules are seated correctly. Consider running a memory test to ensure they're functioning correctly.

   - **Storage**: Check that all storage devices (HDDs, SSDs) are securely connected and are recognized by the system.

   - **Cooling**: Ensure that all fans are operational and that heatsinks are securely attached.


5. **External Inspection**:

   - **Case and Chassis**: Check the server case and chassis for any signs of deformation or damage. Ensure that the motherboard is securely and correctly mounted within the case.


6. **Seek Expert Assistance**:

   - If you're unsure about any findings or if the server behaves erratically, consider seeking assistance from a professional or a specialized repair service. They might have specialized equipment, like multimeters or oscilloscopes, to diagnose issues more accurately.


7. **Document Everything**:

   - Take photos of any visible damage or irregularities. This can be useful for insurance claims, future reference, or if you decide to seek professional assistance.


Remember, even if the server appears to function normally after the impact, there could be latent issues that might manifest later. It's essential to be thorough in your inspection and testing to ensure the server's reliability and longevity.


Tuesday, September 5, 2023

ML350 G9, the continuing saga.

 ML350 G9, the continuing saga.


Part I: received the server, box was pretty 'bashed up'. 

The case was pretty 'bashed up', it had had a hard impact into the power-supplies (probably used to rest the case on the ground, by the delivery guys).


Also the PCIe storage card (for the tapedrive and the cd-rom drive) had 'jumped' out of the PCIe slot. Not good signs. I repaired the power board (because the power supplies would not be recognised, in the meantime I had a new power-board on the way ($20).

I've since replaced the power-board too, no luck so far. The same error keeps popping up. It's about an EFUSE (20h), but I have no idea where that is, I suspect it might be protecting the PCIe slots (maybe some of the pins have shorted?) but I have no idea where to look.
A new motherboard is now on order (~$100, these older parts are getting quite cheap).

According to this post, it could be the PSUs, but they give a 'green light' when plugged in: https://community.hpe.com/t5/proliant-servers-ml-dl-sl/error-power-on-fault-system-board-aux-main-efuse-regulator-1-20h/td-p/7181745

So: Motherboard first, then some 'flex' power supplies. Let's see where this goes.

In the meantime, I also have a storj.io node now. I've already 'made' $0.07

In other news, also expanded my NAS by 8Tbyte, as I am now running overseerr and people can request stuff.

Just to get it all linked back to one place, here is the link for the HPE forums with the same problem (no resulution): https://community.hpe.com/t5/proliant-servers-ml-dl-sl/ml350-gen9-not-booting-with-critical-error-aux-main-efuse/m-p/7180208/thread-id/180199 
And my own post on Reddit describing my 'pains' with the server board: https://www.reddit.com/r/homelab/comments/168o7ib/help_me_resurrect_my_ml350_g9/




Monday, August 21, 2023

ML350G9, the 'final' server.

 Finally, have found my 'dream' server.

HPE ML350G9, capable of carrying 6 modules containing 4x3.5" or 8x3.5" drives. So 24xLFF or 48xSFF drives. I'm using zfs with SLOG/ZIL for data security. I have some 5x 8Tbyte QVO Samsung disks.


It can take 2x E5 v4 processors. In my case this would be 2x 2630L v4 10 core processors at 55W TDP. For now I intend to add:

  1. 128Gbytes of LRDIMM ECC memory (power saving and error correcting)
  2. 8/16 Gbytes of NVDIMM for the zfs SLOG/ZIL

  1. 2 port 10Gbit SFP+ ethernet card PCIe x8
  2. 2 port 56Gbit QSFP+ infiniband card PCIe x8
  3. Fujitsu IB mode SAS 12G card PCIe x8
  4. PCIe switch card with 4x 1Tbyte NVMe storage (cache) PCIe x8
  5. PCIe2NVMe card for the boot drive (2tbyte NVMe), hoping I can boot from it. PCIe x4
  6. SAS Expander (12G)  (no PCIe lanes, just power)
  7. Nvidia Quadro P2000 (for transcoding) multiple streams possible. No external supply needed, limit 75W. PCIe x16
It will be running Debian 11 (due to Infiniband drivers not being available for Debian 12). Along with docker/k3s (not decided yet).





Tuesday, June 20, 2023

The challenge of charging Tesla packs.

The problem:

When connecting LiPo cells in parralel, massive equalisation currents can flow, especially if the voltage differs significantly. https://www.rcgroups.com/forums/showthread.php?2297114-Equalizing-two-LiPo-s
It's been a bit of a 'pain' to charge 16 packs to the same voltage. Once I got to the 16th pack, the 1st one was out of whack again. 

The solution:

Instead of using resistors, I've come up with a different idea, using diodes.

The diodes will stop the cells discharging into each other, but will allow charge to in from the charger. There will a slight voltage drop from the diodes, but I can turn the power supply up to match the drop in voltage. To compensate from the drop in voltage, I have ordered Shottky diodes, which should have a lower drop across them.

I'll let you know how it goes. There have been several suggestions that diodes are not very reliable, and might fail-shorted. If they do, they will most likely burn out, like a fuse.

Friday, April 14, 2023

The storage 'endgame'.

 I've been 'playing' with my NAS options. So far:

  • The original Xeon E5-2630L v4 NAS. 200W, multiple 10G interfaces (built in 2017/2018)
  • Zimaboard - 2x 1Gbit Network ports, 2 SATA ports, PCIe x4
  • TMM Lenovo Thinkcentre M900 1xSATA, 2xNVMe PCIe x16
  • Topton NAS board, 6xSATA, 2xNVMe, 4x 2.5Gbit ethernet
  • Framework laptop motherboard with a+e 2230 to m.2 2280 converter and 2280 m2. 8x SATA ports: https://nl.aliexpress.com/item/1005004417694518.html 
All of these have some kind of 'severe' limitations in one way or another, be it form-factor, the lack of a real 'case' or just being plain ineffficient. Mostly, on everything except the E5, I'm 'missing' bandwidth (PCIe lanes).

Back the original NAS, but with a 'twist'. 
Storage is going to be SATA SSDs, and for performance some T4510 Intel SSDs attached directly to a PCIe switch. That should hopefully keep power low, as I do not have a real SAS controller for those disks. It should also keep performance really high.

I'll be using a backplane from Supermicro to house all the SATA drives, hopefully. But using a SAS controller with only 4/8 ports should keep power-usage down.


The SAS expander in more detail:




I'm going to be making something similar to this, but then for 2.5" drives and specifically for this setup, I hope: https://www.thingiverse.com/thing:5803558

Maybe I can even 'reduce' the backplane to a 10" format, making it suitable for a 10" rack. Or maybe it can be used 'sideways'

Wednesday, February 16, 2022

The M900 experience,

To begin, a couple of links:
https://forums.servethehome.com/index.php?threads/tiny-mini-micro-pc-experiences.30230/post-326460
https://forums.servethehome.com/index.php?threads/tiny-mini-micro-pc-experiences.30230/post-327084
TL:DR
  • I have installed an A&E key PCIe ethernet card which fits beautifully in the VGA/Serial option slot. It's an RTL1811 card, so not 'great'.



  • I already ordered a A&E-key to miniPCIe adapter from DeLock, and I've ordered a dual I350 chipset based miniPCIe card. It's dual, so that should be 'exiting' in terms of performance/manageability. I'm not sure it will fit in the case though!
Still waiting on the Tiny P340 from Lenovo. To be delivered in April. Still a lot of waiting to do. It's basically a completely standard one, but with a serial port (maybe for a UPS, we'll see).

So far the update for my TMM project, except for that I'm now using the P320 with the Nvidia P600 as a graphics card. Works great for my limited use so far.


Monday, January 31, 2022

Upgrading the Homelab.

Upgrading the Homelab

Why:

I've always had a homelab of sorts, until recently that consisted of a beefy NAS server and nothing else. The NAS server uses around 150W of power, much-too-much in my opinion.

How:

'Upgrading' to an M900 Lenovo mini-server as here: https://www.servethehome.com/lenovo-thinkcentre-m900-tiny-project-tinyminimicro-guide/

Hardware:

Processor:

i5-6500T processor with 4 cores. I'm coming from a Intel® Xeon® Processor E5-2630L v4 with 10 cores and 20 threads. Needless to say I don't use this server processor to it's fullest capabilities. So going from a TDP of ~55W to 35W should make a huge difference. 

Storage controller:

Coming from an Adaptec RAID controller with 16 ports to an internal M.2 2Tbyte disk and a SATA connected 8Tbyte disk with 5x 4Tbyte external rotating disks (USB connected) with software-RAID is going to be quite a step-down, but should be very manageable. But it should bring down the power-usage.

Disks:

Coming from a large amount of rotating disks (8x 4tbyte) and going to 1x 8Tbyte SSD should save a lot of power. The external rotating disks will be mounted when needed, and will thus not add much to the power-consumption.

Memory:

Going from 48Gbytes of server-based memory to 64Gbytes of laptop memory (both DDR-4) should make quite a dent in the power consumption, hopefully.

Power:

The current power usage of the server is ~200W at idle and around 300W under full load, I hope to reduce that to around 20W during idle or about 50W during full load. I Think that will be a challenge with full retention 

The Six USB ports will be used by 5x 4Tbyte HDs, the internal 2.5" will be populated by a 8Tbyte SSD and the mini PCIe port will be used by a mini PCIe to Ethernet interface (It will also do duty as a firewall/router). As you can see in the pictures, there is not much space for the mini pcie -> Ethernet card.

Some pictures: