The blog post cites the concern that malloc could block, however when Rust's standard library is compiled with support for atomics enabled the Rust allocator's locking implementation busy loops instead of waiting on the main thread.
See the comment in the Rust source here: https://github.com/rust-lang/rust/blob/77a4fb62f70c6ea05e182...
This means that if care is taken to avoid any other code that makes the main thread wait it should be possible to use a single shared binary instead of the more convoluted approach presented in the blog post.
And yes you're right again it's only in the allocator.
https://github.com/WebAssembly/component-model/blob/main/des...
Threading is actively being worked on right now (to be released in 0.3.x, soon), and some changes just made their way into LLVM as well:
Stackless Coroutines are currently supported for p3, stackful coroutines AKA “virtual threads” are coming (which I assume is what you mean by green threads), and “actual threads” as in OS threads are not currently a goal for the ABI AFAIK —- would you mind explaining some uses you were thinking of?
It simulates x86 (win32 and win16) and implements Windows APIs in javascript and renders window frames with DOM and contents with canvas (e.g. GDI translates to browser canvas operations). A lot of programs run already but a lot of APIs are not yet implemented.
I successfullt spent a few days extending it to run a Click & Create based game from my childhood.
[0]: https://neugierig.org/software/blog/2026/04/theseus.html
https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCa...
The doc you linked has two forms of use, sync and async.
For sync: it seems the idea is for the worker to render into an OffscreenCanvas, then postMessage an ImageBitmap created with transferToImageBitmap from worker to main thread for drawing. It seems like it would need to allocate a new bitmap for each frame. Currently Theseus puts the pixel data in shared memory and the main thread copies it out (required to create an ImageData), which at least in principle could reuse the copy buffer (though it currently doesn't), which seems better? https://github.com/evmar/theseus/blob/a5a849dbcf8046a2d1837a...
For async: in this the idea is have the worker render into an OffscreenCanvas linked to the on-screen one. But it seems to get an OffscreenCanvas in a worker, the main thread canvas must .transferControlToOffscreen() it to the worker. Under the current synchronization model[1] the only time the worker can receive a message is during startup, because the rest of the time it's deep in its own wasm call stacks. This means that if the worker needs to resize its canvas and then paint to it, it's stuck.
[1] I wrote "current" because after writing this post I learned about JSPI which might help with this.
IIRC the transferToImageBitmap path is efficient and doesn't necessarily copy anything. The APIs are designed to allow the ImageBitmap you get to be a reference to the GPU texture that is the backbuffer of the canvas, not a copy. When you transfer it to the main thread you are supposed to draw it using an ImageBitmapRenderingContext, which doesn't need to do an extra blit. It's just directly composited with the rest of the page, all staying on the GPU. In theory if it's a full screen canvas with no DOM on top the browser could even skip compositing entirely though I don't know if that's implemented anywhere.
Of course there are probably a lot of ways to fall off the fast path. And I guess you are doing software rendering for GDI so you're not starting out with pixels on the GPU in the first place. I'm not sure what the best path is for you but I think you can probably benefit from OffscreenCanvas in some way.
I've gone in circles a few times with how to think about image buffer management because I also support DirectDraw, which is designed to be backed by accelerated graphics, with operations like scaling bitblit. (Currently the Theseus implementation uses a shared "Surface" type as the backing store for both GDI Windows and DirectX Surfaces.)
It's a bit complicated by a few things. (1) DirectX surfaces can be "locked" to access as pixel buffers, so any accelerated surface indirection I guess would need to be able to copy pixels back down into emulator memory. Which I guess I could just implement. (2) There's a bunch of different modes for operations like bitblit like setting a color key for transparency that I can't implement with the canvas API, so I think I'd need to use GL shaders if I want acceleration, not just canvas.
It too uses WASM, but for running non-Rust programs in sandboxes. Everything else is Rust. Hmm.. Last updated in 2024 though.
These branded projects become difficult to remember when everything has a random non-mnemonic name.
It's just a matter of time before we'll have a universal executable/binary translator that can run any program from any OS. These are artificial constructs when you think about it.
People have been working on these layers for a long time. It's understood that programs are almost always fundamentally portable and only unable to run due to peculiarities. Smoothing over those differences in format, whether they be at the machine code layer, API layer, whatever, is doable but difficult. Props to qemu, Wine, DXVK, etc.
The pain points about threading in the browser and debugging wasm are the two problems I ran into on another project. I hope we can get some improvements in both areas because wasm would be a lot easier to work with if developers didn't have to fight both of those topics.
Looks like just enough was supported to run minesweeper. Impressive though.
https://github.com/danoon2/Boxedwine looks interesting in this space, but unfortunately it can't really run anything remotely modern in practice (though if you're looking at 20th century Windows software it will likely be capable of running it).