Graph Compute, Day 12 -- Deeper Into The Machine Learning Sandboxex
100 Days of learning about graph compute – Day 12
It is BEST to re-use what one has and MAYBE extend skills on a newly-aquired CHEAP salvage re-tread … DO NOT PAY for the privilege of buying new.
In our particular case, we will be adding a DELL Alienware Aurora R12 with a liquid-cooled i7-11700KF processor, 32GB DDR memory and a Geoforce RTX 3080 gpu card … to our large inventory of cast-off, still functional inventory of old junk to learn new lessons from.
Reuse, Reuse, REUSE, Repair, Relearn, but mostly LEARN, LEARN, LEARN … old IT gear will still get you from point A to point B in the virtual realm of learning in topics like content creation, data science, artificial intelligence or computational chemistry
“Why DELL?” one should ask … the first answer is initial cost of a USED mass-market system which someone else has given up on. It’s all about the low cost of a salvage retread … a RECENTLY popular system will be old enough to have spawned newish internet content … because of the disappointment factor, but we can still reclaim 99% of the valid reasons for the initial hype which drove the big anticipation a year or two ago.
ANYBODY who wants to dream about the latest technology should just pick their own parts and configure something like an air- and liquid-cooled, open case Extended ATX test bench workstation … the point is not to build this bill of materials – the point is to think about the parts picker to consider what could be built – hopefully, people consider and re-consider their parts lists every day for at least a month before ordering ANYTHING.
The second reason for going with Dell is cost of adding future parts machines. Salvage parts mean that extended future maintenance and the potential upgrades will low BECAUSE of the volume of cannabalizable shit that is available.
The beauty of a mass market pre-built is in the VOLUME of the mass market … and the legitimate tech lust which drove that volume. There’s no end to the selection available on eBay or Craigslist as long as one sticks with mass market pre-builts … it’s all about the VOLUME ofof salvage, cast-off, still functional but no longer loved Dell systems that can be cannibalized for parts. That argument will not sell with a someone addicted to gear acquistion. A tech, gamer or builder should just chill and let consumers throw money at brand new Dell pre-builts … to focus on building dual-EPYC workstations for paying customers like GKH who might have extreme demands, but do have builds that can people can learn from … especially, as the weaknesses of the new tech lusts become apparent and the corresponding workarounds emerge.
We should never forget, that the Dell pre-builts are FULL of common, open, standard-issue lessons that EVERYONE can learn from … when it comes to LEARNING, it’s quantity of sandboxes or the sample size of one’s lessons that really matters … and it’s BEST when we can learn from lots of other people’s lessons, rather than acquiring extra stuff ourselves.
Crowds of commentors can teach us more than single individuals demonstrating a concept. It’s not just that the mass market of pre-builts are a target-rich environment for [ALMOST] tech savvy people. Of course, people want to raise their hand and point out the obvious by criticizing the common problems seen in lowest common denominator design approach found in mass market pre-builts. This sort of techie piling-on is lot like a nutritionist or heirloom vegetable gardener pointing out that fries are unhealthy or that ketchup is not a vegetable. The beauty of the flattening out of Moore’s law is that before you know it, two or three years will pass and the tech geniuses will be able to find now-latent problems with the 192 core Titanium pre-builts being pre-built now for mass consumption in data center markets. All of this stuff will be available soon enough on the salvage markets – because it’s like the car business, the OEMs will need to keep finding new innovations [in not in raw CPU processing power] to sell new models to keep their companies afloat.
Those of us who have worked in some sort of IT role in companies learned to use DELL stuff for its consistency or replaceability … not because it was all that spectacular technically. Supportability arises out of EXTREME uniformity … rather than superiority of the technical specs.
Massive volumes of units have been / will alway be sold of pre-builts … which is AWESOME … because it is the VOLUME which makes for great internet bottomfeeder content, ie lots of people are familiar with and discuss their mods/optimizations on IDENTICAL apples-to-apples training labs]. To reiterate, the primary reason to use DELL is not that we buy the debatable DELL marketing hype – rather, it is the undebatable VOLUME of DELL stuff driving the internet fora and YouTube with decent quality, painfully honest, personal content about on DELL systems from legit human beings, eg https://youtube.com/playlist?list=PL2Ksi8qJcnFIFZ2hm3HbxQFtCdi57nzjY … the not-entirely-bad DELL discards from people are always available on eBay for cheep … if you can’t find spare parts you need now, someone else will sell one for parts in the next week or so. VOLUME matters. Go with DELL to get the benefits of VOLUME.
Why Alienware Aurora? Any explanation will necessarily sound like marketing speak from DELL … except that there is actually an element of truth to DELL marketing speak because companies like DELL have to depend upon their reputation … but mostly, the reason for the gaming rig flagship comes down to the kind of engineering and maintainability that comes from a HIGH VOLUME entity going hard after an extremely lucrative market niche of affluent customers, ie TEAMS of DELL engineers developed this system and then they took advantage of employee discounts to actually take the stuff home or give as gifts … not just TEAMS independent entrepreneurs, but these systems come from capable, mature systems in the full CMMI sense that develop predictavle product … it’s the CMMI, not just the LARGE teams of thoroughly-vetted, specialized, professional engineers vying for a career-defining resume position with a favorable DELL success. So … YES … it is also true that the components will be exhaustively value-added, value-engineered to ensure that they are as cheap as possible – but this kind of stuff is going to be pretty much obsolete in five or certainly ten years anyway, right?
Why R12? As a general rule, it’s not just that all of the components in the system are new and improved [and maybe cheapened up, hopefully where it doesn’t matter], the R12 is the Alienware Aurora team’s first rodeo with this particular design approach … there are improvements in the R11 over the R10, improvements in R12 over R11. The R12 is not the R13 or R14 but it can be modified to accomplish the more expensive system.
For example, the R13 allowed customers to specify a 1000W power supply, but the R12 [under a service contract] also had access to after-sale DELL-factory-service-approved improvements like a 1000W power supply … even though, you might not be able to put 1000W of heat into that case and now expect some problems – more power does present options. For example, it might be more necessary to remove the mechanical hard-drive and add with a BIGGER fan …yes, that fan will be noiser, but nothing is as quiet as compute that has shut down. At least, a 1000W ensures that you actually can put a BIG FAN in that case.
Why liquid-cooled? As good as the R12 might be, air-flow is still going to be an inescapable design flaw with this design … it will be necessary to use every trick in the book to keep this rascal cool … dissipating heat is a REALLY BIG problem … but when you know exactly what the problem is, you can work around IT, ie this project is FULL of little life lessons.
Why Intel i7-11700KF … first of all, it’s a chance a chance to get reacquainted with and really EXPLORE the Intel ecosystem and especially the Rocket Lake architecture https://en.wikipedia.org/wiki/Rocket_Lake – which will be important to understand, particularly when we get into the tracing and performance monitoring stuff … the VOLUME lesson, which applies to Intel [moreso than NVIDEA] matters here as well – Intel is not perfect, but its VOLUMES mean that problems are well known … and when you know exactly what the problem is, you can work around it, ie this life lesson gets re-iterated so it must be an important one.
The i7-11700KF in our cheapo system is certainly not an entirely bad cpu … yes, it’s afforable so it has limitations, the 16 MB L3 cache, for example, can become a bottleneck … BUT you are in control of the software, so gotta find ways to get AROUND those bottleneck OR it will be necessary to get a new cpu … but i7-11700KF has 8 cores, runs 16 threads … and has neato technologoes like the Intel® Gaussian & Neural Accelerator (GNA) … an ultra-low power accelerator block designed to run audio and speed-centric AI workloads. The Intel® GNA is actually designed to run audio based neural networks at ultra-low power, while simultaneously relieving the CPU of this workload … this kind of thing is nothing short of amazing, if only because this magic easter egg of a feature illustrates how HUGE of a roll software has now inside a hardware product. https://www.techreviewer.com/tech-specs/intel-11700kf-for-gaming/
https://ark.intel.com/content/www/us/en/ark/products/212048/intel-core-i711700kf-processor-16m-cache-up-to-5-00-ghz.html
Why 32 (2X16) GB DDR4 memory in two of the four slots? Why not just 16 or 4X16 for 64? Why not the full 128?
32 GB is probaby going to be enough for most tasks; if it isn’t … that might be the time to add 2 X 32 for 96 GB; and if that’s still not enough replace the 2 X 16 with 32s.
Why the GeForce RTX 3080 gpu card?
We cannot afford the DGX H100 SuperPod … yet … https://www.nvidia.com/en-us/data-center/dgx-superpod/