This is not the selling point they think it is.
The problem I see with the AM5 socket is simply the fact that DD5 RAM to support it is just too expensive. So this will not really make the big impact they were hopin for.
DDR4 is basically just as expensive. At least DDR5 gives on-chip error correction (not as good as full ECC).
For a geneal computer, there's not that much difference between AM4 and AM5 unless you really want the extra speed of DDR5, PCI 5, and the newest processors. You can build a very capable AM4 machine for slightly less money, but that savings is found on the CPU and motherboard, not on the RAM.
I think your point still stands overall for AMD's business though, I assume a vast majority of CPUs are purchased in new desktops?
I posted it here the same day I found and started using it, to almost no reaction.
[0] https://github.com/FastFlowLM https://fastflowlm.com/ https://huggingface.co/FastFlowLM
HN is overloaded with AI stuff, its hard to break through all the noise. I say this as someone very interested in AI. Even I skip some links because its just too much.
some older pre-395 AMD articles suggested it'd be possible to use the NPU for prefill and the GPU for decoding and this would be faster than using either alone, but we have yet to see that (even on Windows) for any usefully sized models, just toys like LLaMA-8B.
Trying to take the plunge on that now sounds like a nightmare.
Your machine is sweet and probably runs just as fast on most tasks. I wouldn't be in n a hurry to upgrade.
I upgraded from an old Intel i7.
Mass adoption won't happen until we get those cheap, because there are no mass prosumers making software for them that is massively popular.
I am wondering if this so true. What resources and time are needed to increase supply by N times to catch demand.
The same way nobody wanted to invest in mask supply in the US or Western Europe during covid, because producers knew the demand spike wouldn't last and they'll be left with useless equipment to pay after the crisis passed.
If they're wrong and this is actually a permanent spike in demand, then it'll take the industry a while to realize it but eventually they'll collectively figure it out and increase supply. The ones who figure it out soonest and increase supply fastest will profit the most. The ones who figure it out slowest will lose market share.
I am one who think there is a chance it will sustain. AI is useful tool, Opus unlocks N times productivity gain for devs since Opus 4.5, which is available just for 3 months.
This means adaptation is just started, it could expand on all kind of usecases, niches, solving problems, products, etc, which could be N times more demand for compute from what we have now.
If it doesn't decline, than anyone who took that risk and increased their production capacity will benefit greatly, and those who didn't will lose market share.
The incentives here are naturally very well aligned with solving the shortage. If doing nothing is likely to solve the shortage, then they'll do nothing. If increasing supply is likely to solve the shortage, then they'll increase supply. If there's a 50/50 chance of both, then some will increase supply and some will do nothing, and the market will reward whichever group was right and punish the other.
Let me describe this in the most simple terms possible: You have speculators speculating about AI products. The speculators are not very smart when it comes to technology, and think RAM is RAM. There is at least three kinds of RAM that are important to this: DDR for system RAM, GDDR for GPUs, and HBM for high density enterprise products, and they are not interchangeable, there is no one-die-fits-all solution.
So, these speculators are like "oh no, more GPUs requires more RAM!", and then just start speculating on all RAM. Which of these RAMs are the ones that they need to worry about? Exclusively HBM, which is a minority in production, DDR and GDDR dominate production.
If you're into inference, and have older machines, you're buying Hxxx or Bxxx cards that use HBM, fit into dual slot x16 configurations, and you're jamming (optimally) 8 of them in. If you're into hardware that is newer, somewhere in the middle of the inference boom, you're using MXM cards. In either situation, the host machine has DDR, but if you're OpenAI, Anthropic, Microsoft, or Google, you're not building (more) inference machines like this.
The first two are buying Nvidia's all in one SBC solution: unified HBM, onboard ARM CPU to babysit the dual GPUs, has its own dual QSFP network controller that can RDMA, etc. No DDR or GDDR involved. Any machines built before this platform are being phased out entirely.
Microsoft is doing the same, but with AMD's products, the MI series that co-locates Epyc-grade Zen 4/5 CCDs with CDNA compute chiplets, running the entire thing off HBM, thus also unified and no DDR/GDDR needed. They, too, are phasing out machines older than this.
Google has a mix: they offer Nvidia all in one SBCs as part of GCP for legacy inference tasks (so your stack that can't run on AMD yet still can run), but also offer the same MI products that Microsoft offers via Azure's inference product, but also has their own TPUs that some of Gemini runs on; the TPUs run on HBM afiact. No DDR or GDDR here.
So, what does AMD or Intel do here? Lets say they waste fab time to make their own dies on the wrong process (TSMC and Intel-Foundry do not have for-RAM optimized processes)... they would be producing DDR and GDDR for a market that almost has its entire demand met. Intel lacks the die stacking technology required to build HBM, and TSMC I think can't do it for that many layers (HBM has 8 to 16 layers in current gen stuff iirc).
Micron, for example, already is bringing two large factories online here in the US to meet the projected growth in demand for the next 20+ years. When these factories finally start producing, it will not change the minds of speculators: they still seem to think AI datacenters need RAM, of any kind, and refuse to understand even the most basics of nuance. Also, when they come online, HBM will be a minority product; the AI inference boom is still just a bump in the road for them.
Nvidia kinda screwed their consumer partners, btw: they no longer bundle the GDDR required for the card with the purchase of the die. There is a slight short term bump in GDDR spot prices as partners are building up warchests to push series 60 GPUs into production, and once that is done, spot prices return to normal (outside of the wild speculation manipulation).
One last thing: what about LPDDR, used by AMD Strix Halo and Apple stuff? Speculation seems to have not actually effected it. I consider it as a sub-category of DDR (and some dies seem to work as either DDR or LPDDR as of DDR5, due to the merger of the specs by JEDEC), but since it isn't something you find in datacenters, it seems to have avoided speculation.
The Ryzen Max CPUs mentioned in the linked article? Uses LPDDR. Doubling down on the Ryzen Max product line might be a brilliant move.
The commenter is also not very smart and does not realize companies making the RAM can trade capacity of one for another and any re-tooling at current price is still profitable.
The commenter also does not realize that is also true for lines currently making SSDs
Flash chips haven't been speculated on nearly as hard, and are suffering from the same sort of weird lack-of-nuance. Samsung, for example, isn't reassigning capacity to meet some sort of phantom datacenter demand that isn't already there, generically, across all datacenters, AI or not.
A lot of SSD price skyrocketing is largely "SSDs have RAM on them for cache", not "SSDs have flash chips, and they're both made at the same fabs"... which oddly effects low end SSDs that don't have external cache.
To make it worse, for the speculators who do understand this, because it isn't some universal homogeneous group, the flash chips that go into enterprise SSDs aren't the same that go into consumer SSDs.
The Big Three still aren't doing some major re-tasking of capacity, as the actual global demand isn't outstripping supply any more than normal. There is no short term problem to fix, speculators are just gonna have to stop hoarding toilet paper like its the start of Covid.
Edit: Oh, and if you want to ask how AMD/TSMC or Intel solve this? They can't, same reason why making their own in-house HBM isn't happening.
Micron killed Crucial to focus on AI.
Micron killed Crucial because Crucial was a weird offering that competed with their own partners. This was always a weird problem, and it just didn't make financial sense to continue with it. One of the analyses I read was Crucial was less than 12% of sales.
Like, don't get me wrong, I've liked many Crucial products over the years, and even recommended some of them, but it was always weird they were trying to out-compete companies like Adata and other major ODMs.
The counterexample of this is Nvidia absolutely trying to kill their partners, and going to first party assembly and sales of products. Nvidia isn't even going to PNY anymore for ODM needs, but going directly to Foxconn.
Micron execs claiming its because of AI is a bit weird and revisionist, because they've been working on exiting the Crucial brand since long before they publicly announced it. The public didn't learn of any such plans until right before the Ballistix brand sunsetting was announced in 2021, but started years before that. Like, I know they're just playing to their shareholders, but its still a bit weird.
Also, with the FEs, their partners are disallowed from making their own FEs, even if they make their own PCB from scratch and not based on any existing Nvidia design. Doesn't matter who makes the FE, it immediately puts partners at a great disadvantage if they can't make one too.
Micron, Samsung, and Hynix just basically sell you chips that comply with the JEDEC spec, and the DIMM manufacturers further bin them according to purpose. The highest end chips (that are stable at high clocks and acceptable voltages) end up in enthusiast performance products, the ones that don't work well at all but still meet JEDEC spec are sold to Dell/HP/Lenovo/etc for Grandma's Facebook machine, and the ones that are exceptionally stable at thermal design limits are plunked onto ECC DIMMs and sold to servers.
Also, as others have mentioned, its just a fab, and it can make any of the dies they're able to make. Whatever needs to be made to meet demand, they make, they just can't turn on a dime and react to quarterly concerns, and are locked into cycles that may range from 6 months to 18 months.
Side note that is also worth mentioning, sometimes you can order special bins of parts with features that wouldn't normally be available if you're willing to order enough. Recent example being Nvidia buying overclocked GDDR6 chips from Micron with additional features enabled; Micron was more than happy to become Nvidia's exclusive supplier for the custom GDDR chip if Nvidia was willing to buy out the entire run. Stuff like this happens every so often, but isn't the norm.
https://investors.micron.com/news-releases/news-release-deta...
There's nothing in that press release that implies that the memory was somehow different (or "consumer-grade"). The _only_ thing they're saying is that they're ending their B2C business and focusing on B2B.
You just need an additional chip to move from "consumer grade" (ie no parity) to "server grade" (ie have parity). ECC support is actually in the memory controller which is in the CPU for the last 15 years. No magik.
Good luck actually finding them on stock with 128GB+ RAM. I got strix laptop while ago, now price in EU is technically the same, but no stock. Maybe month or three
There is also claw hype. And large gwen3.5 models can run very well on DDR5 CPUs or mac minis...
This only effects a very narrow slice of highly budget conscious consumers trying to build high end PCs at razor thin margins.
Did you adjust for inflation ?
https://web.archive.org/web/20240805053759/https://jcmit.net...
https://thememoryguy.com/dram-prices-hit-historic-low/
Inflation applied manually; https://www.bls.gov/cpi/
https://www.neowin.net/forum/topic/983036-latest-steam-hardw...
Steam hardware survey GPU history: https://www.youtube.com/watch?v=wHTdnIviZTE
But that was not my point _whatsoever_. What I said is - every time you bring the explicit numbers (like in GP "$500 for 32GB is about $15/GB which is a high we haven't seen since the mid-2000s") you _absolutely_ have to adjust for inflation to have a meaningful conversation. This is it.
Apples to oranges. Why are you comparing RAM prices to CPU prices? It's different hardware.
$500 for 32 GB is insane. Just 18 months ago, I bought 128 GB of DDR5 for only $480.
Edit: I don’t get your math. If we’re using a very generous definition of “top end,” even neglecting Nvidia and going AMD - which some would argue makes it not top end - you’re talking conservatively: $600 for a GPU, $500 for 32gb of ram, and $500 for a CPU. $1600 before PSU, case, SSD, fan(s), mobo…there’s no world in which you’re coming in under $2k. The SSD and board will put you over immediately.
You’re talking 3/2025 prices, not 3/2026. A compromise, mid-range computer is $1500 to build now.
All 4 of my "top end PCs" have 128GB RAM. Me server (I self host everything is 512GB). Lucky for me all were bought before that insanity.
I don't know much about it but my mental model is that for transformers you need random access to billions of parameters.
Is this fast GPU like instructions that anyone can use in any operating system to run any open sourced LLMs on CPU with all your RAM rather than on a discrete GPU with its own limited amount of VRAM?
Or is this a proprietary thing that only works in Windows for some specific use cases and irrelevant for Linux users?
lol no it can't - there's a small (40MB) SRAM that can DMA to DRAM and then each of the tiles is another DMA away from that SRAM.
Your true believer convictions don't matter here. Those AI accelerators are merely just marketing stunts. They won't help your local inference because they are not general purpose enough for that, they are too weak to be impactful, most people won't ever run local inference because it sucks and is a resource hog most can't afford, and it goes against the interests of those scammy unprofitable corporations who are selling us LLMs as AI as the silver bullet to every problem and got us there in the first place (they are already successful in that, by making computing unaffordable). There's little to no economical and functional meaning to those NPUs.
You have fallen headfirst into the "Not now, so never" fallacy. As if consumer hardware won't get more powerful, or models more economical.
Perhaps. Though we have empirical evidence of how much we can quantize and distillate models to the point of practical uselessness. That sets a bar for how large a local model needs to be for general-use as to compete with the could ones. We are talking in the area of 60GB for GPT-OSS/Qwen3.5, which is what enthusiasts are running on 32GB DDR5 + 24GB VRAM RTX 3090.
> As if consumer hardware won't get more powerful
Now I will let you, with that last fact in hand, plot a chart of how much it's been costing to provision that over the past 2 years and use it to prove me wrong about the affordability of local models.
a) Local inference for chats sucks. Using LLMs for chatting is stupid though.
b) Local inference is cheap if you're not selling a general-purpose chatbot.
There's lots of fun stuff you can get with a local LLM that previously wasn't economically possible.
Two big ones are gaming (for example, text adventure games or complex board games like Magic the Gathering) and office automation (word processors, excel tables).
If you can use the NPU to process embeddings quickly, you get some incredible functionality — from photo search by subject to near match email search.
For consumer applications that’s what I’m most excited for. It takes something that used to require large teams, data, and bespoke models into commodity that any app can use.
Ask your friends or a small business owner if they are going to spend $1k on a new laptop because "there's lots of fun stuff".
For office automation, you'll get a lot more mileage with Claude and similar.
Do people not buy gaming PCs and game consoles? Isn't that buying something because "there's lots of fun stuff?"
And while sure a business owner wouldn't be buying it for "fun stuff", if it was about being able to run the AI tools they want without the business risk of sending your most important data and intellectual property to an AI provider wouldn't some think about it?
/r/SillyTavernAI would disagree with you.
We are talking about NPUs here.
> If you intend to do LLM inference on your local machine, we recommend a 3000-series NVIDIA graphics card with at least 6GB of VRAM, but actual requirements may vary depending on the model and backend you choose to use.
Also, please be respectful when discussing technical matters.
P.S. I didn't say "local chat sucks".
...which is not by any means a powerful GPU, and besides the AMD Ryzen AI CPUs in question have a plenty enough capacity to run local LLMs esp. MoE ones; with 3b active MoE parameters miniPC equipped with these CPUs dramatically outperform any "3000-series NVIDIA graphics card with at least 6GB of VRAM".
> please be respectful when discussing technical matters.
That is more applicable to your inappropriately righteous attitude than to mine.
Do you have a real argument, especially a technical one, that you can contribute?
In what way precisely? That local LLMs "suck"? Is that a technical argument? Or this statement "there's little to no economical and functional meaning to those NPUs." - is that actual factual statement or a emotionally charged verbal flatulence? and what "they won't help your local inference because they are not general purpose enough for that" even means? People succesfully run largeish MoE llms on AMD Ryzen AI miniPCs.
> Do you have a real argument, especially a technical one, that you can contribute?
What kind of argument do you want me to "contribute" wrt the ideological rant the "parent comment" had managed to produce?
- I shouldn't be paying more for my next CPU because it has a NPU that I won't ever use. Give me the freedom of choice.
- Given that freedom of choice, it would seem that a majority would opt-out (as seen recently by Dell), so the morals of all that are dubious.
- NPUs may not be completely stupid as a concept, in theory, but at this point in time they are proprietary black-boxes purpose-built for marketing and micro-benchmarks. Give me something more general-purpose and open, and I will change my mind
- …but the problem is, you can only build so much general-purpose computing in bespoke processor. That's kind of its defining trait. So I won't hold my breath.
- Re: local-inference for the masses, putting aside the NPU shortcomings from above: how large do you think a LLM needs to be so it's deemed useful by your average laptop user? How would the inference story be like, in your honest opinion (in terms of downloading the model, loading it in memory, roundtrip times)? And how often would the user realistically want to suffer through all that, versus, just hopping to ${favorlite_llm.ai} from their browser?
Anyhow, if that makes me "antiai", please, sign me up!
There is a plenty to choose from.
> - NPUs may not be completely stupid as a concept, in theory, but at this point in time they are proprietary black-boxes purpose-built for marketing and micro-benchmarks. Give me something more general-purpose and open, and I will change my mind
In fact the linked article is not talking about NPUs in particular, but about Ryzen AI cpus. These have unified memory and more compute compared to normal ones which make them very useful for inference.
> how large do you think a LLM needs to be so it's deemed useful by your average laptop user?
Depend what they need it for. Useful autcomplete in IDE starts at around 4b weights.
> loading it in memory
Happens only once, usually takes around 10sec.
> roundtrip times
Negligible? it is loca after all.
> And how often would the user realistically want to suffer
No suffering involved.
No one is running LLMs on current gen NPUs so if we will in the future its a long time coming. Unless they can demonstrate some real (and not marketing) wins I remain skeptical that a large NPU for LLMs is the future.
I can totally see NPU accelerating simple tasks, but to be worth the silicon they have a ways to go imo.
99% of people don't need or want a dev workstation. My travel laptop is 7+ years old and I couldn't tell you the difference between it and a current flagship in terms of browsing and everyday tasks.
I will not lie, I find LLMs useful but the desktop experience is pretty polished already. NPUs seem to be an attempt to ride the AI bandwagon with very little to show for it so far.
It's similar to performance/effiency cores. I don't need power efficiency and I'd actually buy CPU that doesn't make that distinction.
You don't do anything involving realtime image, video, or sound processing? You don't want ML-powered denoising and other enhancements for your webcam, live captions/transcription for video, OCR allowing you to select and copy text out of any image, object and face recognition for your photo library enabling semantic search? I can agree that local LLMs aren't for everybody—especially the kind of models you can fit on a consumer machine that isn't very high-end—but NPUs aren't really meant for LLMs, anyways, and there are still other kinds of ML tasks.
> It's similar to performance/effiency cores. I don't need power efficiency and I'd actually buy CPU that doesn't make that distinction.
Do you insist that your CPU cores must be completely homogeneous? AMD, Intel, Qualcomm and Apple are all making at least some processors where the smaller CPU cores aren't optimized for power efficiency so much as maximizing total multi-core throughput with the available die area. It's a pretty straightforward consequence of Amdahl's Law that only a few of your CPU cores need the absolute highest single-thread performance, and if you have the option of replacing the rest with a significantly larger number of smaller cores that individually have most of the performance of the larger cores, you'll come out ahead.
* You don't do anything involving realtime image, video, or sound processing?
I don't
* You don't want ML-powered denoising and other enhancements for your webcam,
Maybe, but I don't care. The webcam is good or bad as it stands.
* live captions/transcription for video,
YouTube has them. I don't need it for live calls.
* OCR allowing you to select and copy text out of any image,
Maybe
* object and face recognition for your photo library enabling semantic search?
Maybe but I think that most people have their photo library on a cloud service that does AI in the cloud. My photos are on a SSD attached to a little single board ARM machine at home, so no AI.
What I would like to be able to do is running the latest Sonnet locally.
In general I think that every provider will do their best to centralize AI in their servers, much like Adobe did for their suite and Microsoft did for Office so local AI will be marginal, maybe OCR, maybe not even blurring the room behind my back (the server could do it.)
There are alternatives to Adobe and Office, because I don't care about more than the very basic functionality: running Gimp and Libreoffice on my laptop costs zero. How much would it cost to run Sonnet with the same performances of Claude's free tier? Start with a new machine and add every piece of hardware. I bet that's not a trivial amount of money.
Nothing that's not already hardware accelerated by the GPU or trivial to do on CPU.
> You don't want ML-powered denoising and other enhancements for your webcam
Not really.
> live captions/transcription for video
Not really, since they're always bad. Maybe if it's really good, but I haven't seen that yet.
> OCR allowing you to select and copy text out of any image
Yet to see this implemented well, but it would be a nice QOL feature, but not one I'd care all that much about being absent.
> object and face recognition for your photo library enabling semantic search?
Maybe for my old vacation photos, but that's a solid 'eh'. Nice to have, wouldn't care if it wasn't there.
Besides, most of what you mentioned doesn't run on NPU anyway. They are usually standard GPU workload.
And on the platforms that have a NPU with a usable programming model and good vendor support, the NPU absolutely does get used for those tasks. More fragmented platforms like Windows PCs are least likely to make good use of their NPUs, but it's still common to see laptop OEMs shipping the right software components to get some of those tasks running on the NPU. (And Microsoft does still seem to want to promote that; their AI PC branding efforts aren't pure marketing BS.)
That is fine. Most ordinary users can benefit from these very basic use cases which can be accelerated.
Guess people also said this for video encoding acceleration, and now they use it on a daily basis for video conferencing, for example.
This will only come if Windows 12 requires a TPU and most of the old hardware is decommissioned.
I think the overall trend is now moving somewhat away from having the CPU and GPU on one die. Intel's been splitting things up into several chiplets for most of their recent generations of processors, AMD's desktop processors have been putting the iGPU on a different die than the CPU cores for both of the generations that have an iGPU, their high-end mobile part does the same, even NVIDIA has done it that way.
Where we still see monolithic SoCs as a single die is mostly smaller, low-power parts used in devices that wouldn't have the power budget for a discrete GPU. But as this article shows, sometimes those mobile parts get packaged for a desktop socket to fill a hole in the product line without designing an entirely new piece of silicon.
Toilets also changed everything we do and are helpful in unobtrusive ways, but that won't make the "Ryzen Crapper" a customer favorite.
But does this extrapolate to the current way of doing AI being in normal life in a good way that ends up being popular? The way Microsoft etc is trying to put AI in everything is kinda saying no it isn’t actually what users want.
I’d like voice control in my PC or phone. That’s a use for these NPUs. But I imagine it is like AR- what we all want until it arrives and it’s meh.
Pretty much every hardware vendor has an NPU
Additionally, GPUs are going back to the early days, by becoming general purpose parallel compute devices, where you can use the old software rendering techniques, now hardware accelerated.
I can see separate cards for datacenter use but for consumers they will probably come on same SOC as CPU
Even the latest NVIDIA Blackwell GPUs are general purpose, albeit with negligible "graphics" capabilites. They can run fairly arbitrary C/C++ code with only some limitations, and the area of the chip dedicated to matrix products (the "tensor units") is relatively small: less than 20% of the area!
Conversely, the Google TPUs dedicate a large area of each chip to pure tensor ops, hence the name.
This is partly why Google's Gemini is 4x cheaper than OpenAI's GPT5 models to serve.
Jensen Huang has said in recent interviews that he stands by the decision to keep the NVIDIA GPUs more general purpose, because this makes them flexible and able to be adapted to future AI designs, not just the current architectures.
That may or may not pan out.
I strongly suspect that the winning chip architecture will have about 80% of its area dedicated to tensor units, very little onboard cache, and model weights streamed in from High Bandwidth Flash (HBF). This would be dramatically lower power and cost compared to the current hardware that's typically used.
Something to consider is that as the size of matrices scales up in a model, the compute needed to perform matrix multiplications goes up as the cube of their size, but the other miscellaneous operations such as softmax, relu, etc.. scale up linearly with the size of the vectors being multiplied.
Hence, as models scale into the trillions of parameters, the matrix multiplications ("tensor" ops) dominate everything else.
> the compute needed to perform matrix multiplications goes up as the cube of their size,
are they really not using even Strassen multiplication?
>First wave of Ryzen AI desktop CPUs targets business PCs rather than DIYers.
I've put it in quotes as the effort required from these chips for streaming transcoding is so low these days that brute force makes it sound like more effort than it really is.
What's your source for this? Transcoding without acceleration is incredibly expensive, especially for 4K content, and especially for 4K HDR content.
Even a single 4K HDR -> 1080p transcode takes a huge amount of resources.
The Asustor Lockerstor4 Gen3 has a Quad-Core Ryzen Embedded V3C14 and cannot transcode 4K content.
Meanwhile, an old Kaby Lake Intel chip does so just fine but only because its QSV can handle h265.
Quick Sync is invaluable for low powered processors, my old Intel embedded Wyze can do several streams.
Even on modern chips, transcoding is quite expensive.
People who are running Plex generally are running on servers also serving files, web apps, and who knows what else. These devices are often running 24/7, so both overall cost and power efficiency are big concerns. I wouldn't want to rely on my server being at high CPU usage most of the time - for power, heat, and overall reliability concerns.
My homelab setup runs out of memory much faster than it does CPU cores.
So no matter how much compute you stuff in there it's going to be shit for AI ?
PC architecture is not adapting to AI workloads at all, and no signs of that changing in years to come. I would not be surprised if your phone was more capable of running AI models than an average desktop - especially given gpu pricing.
I wanted a better strix halo (which has 128GB unified RAM and 40cu on the 8080s (or something) iGPU).
This looks like normal Ryzen mobile chips + but with fewer cus.
yeah... Ironic I guess. It's as if they've realised that it's only a matter of time until we get a "good enough" FOSS model that runs on consumer hardware. The fact that such a thing would demolish their entire business of getting VC hyped while giving out their service for a loss surely got lost to them. Surely they and Nvidia have not realised that the only thing that could stop this is to make good hardware unreachable for anything smaller than a massive corp
Mark my words: in less than one year, we'll probably get something akin to Opus 4.6 FOSS. China is putting as much money into that as they can because they know this would crash the US economy, which is in the green only thanks to big tech pumping up AI. China wants Trump either gone or neutered as soon as possible, which they know they can do by making Republicans as unelectable as possible - something that will probably do if the economy crashes and a recession happens
Microsoft: "Friendship ended with Intel, now AMD is my best friend"