marcan changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
<chrisf> DarkShadow44, where's the packed format field?
<chrisf> the `u` fields may just be `unknown field X`
<DarkShadow44> Probably
<DarkShadow44> chrisf: Packed format field? You mean to specify what format the data has? AFAIK that's all Fx:F
<chrisf> ah
<chrisf> i had mapped out the bottom 2 bits of F as `size` but it makes sense if the packed formats use the rest
<chrisf> aha
<chrisf> yeah, i agree that's what it is then :)
<DarkShadow44> I assume it's split into Fx:D because of the different instruction length, since all bytes that are cut off are assumed 0
<chrisf> yep
<DarkShadow44> although, it raises a question: Since registers are specified as Rx:R, maybe bit 31 is for another "cut off"?
<DarkShadow44> just a theory
<chrisf> it's an interesting theory
<chrisf> ive never seen the metal compiler do that though
<DarkShadow44> that doesn't mean we can't manually generate bytecode like that >:D
<chrisf> what is this `L` bit?
<DarkShadow44> shoul cut off the last two bytes
bpye has quit [Ping timeout: 240 seconds]
<chrisf> actually i dont think ive seen a non-8-byte load or store
<DarkShadow44> neither have I, but maybe they planned it into the hardware
<DarkShadow44> most instructions support such a cut off
<chrisf> ah, `L` is always that, i see the note up the top now.
bpye has joined #asahi-gpu
<chrisf> i would be careful about things that you never see the metal compiler do -- they may well not actually work in silicon
<chrisf> there's always stuff that doesnt work
<DarkShadow44> Sure thing, but that's where tests come in, no?
<chrisf> if you can show it does work then great
<DarkShadow44> I mean, I wouldn't necessarily use it, but it'd be good to know, IMHO
<DarkShadow44> no unknown bits is best
<DarkShadow44> although I gotta admit, I don't understand in having the benefit of having variable length instructions like that
<DarkShadow44> I thought an advantage of, for example, ARM was easier decoding due to fix lengths
<chrisf> tradeoffs
<Yuzu> variable length ops = higher density (typically, if done right)
<chrisf> apple does appear willing to spend area making things not suck
<DarkShadow44> yeah, easier on the cache
odmir_ has joined #asahi-gpu
<chrisf> oh, i guess the reason i might not have seen short load/store is i dont have examples of the packed cases
<chrisf> and so `mask` is always in play in the examples i have
<DarkShadow44> mh, it's a bit odd that the mask is one of the parts that are cut off
<DarkShadow44> need to make tests for that
odmir has quit [Ping timeout: 240 seconds]
<bloom> that especially matters since G13 is a pure scalar arch, yet it's designed for graphics (vector) workloads
<bloom> If you
<bloom> 're not careful about code density, you'll blow through your i-cache budget
<bloom> (Since everything is 4x worse over in graphics land, since you repeat the same instruction a bunch of times.)
<bloom> Different vendors cope with this in different ways.
<bloom> Older GPU arches were genuinely vector, some newer ones (GCN, Bifrost) have a fp16vec2 thing going on, special oddball handling goes to Adreno which has a special "repeat N times" modifier :-p
<DarkShadow44> huh, interesting
<DarkShadow44> what exactly is that "fp16vec2" thing?
<bloom> It's.. cute
<bloom> Some AMD and Mali GPUs are scalar for 32-bit instructions, but allow 2 channel 16-bit vectors (what AMD markets as rapid packed math or sth)
<bloom> For Mali, even have 4 channel 8-bit vectors targeting ML workloads
<bloom> It follows naturally from 32-bit arithmetic. I.e. packed 2x16-bit add is the same as 32-bit add up to handling of carry bits.
<bloom> ^ integer
<bloom> and fp16 is so much cheaper in hw than fp32 that it again works out
<bloom> So in theoretical benchmarks, it means a 2x reduction in cycle count for 16-bit vs 32-bit workloads
<bloom> ...in theory. In practice it's a pain for compilers, sometimes more so than plain old vec4 hardware.
<bloom> Code like `a.xy + b.xz` can't be vectorized (exercise: why not?)
<DarkShadow44> mh, I see
<bloom> Anyway, G13 does *not* work this way.
<DarkShadow44> heh, should make things easier
<bloom> Indeed.
<bloom> I really hate how inactive I've been
<bloom> Dealing with personal stuff but still :<
<DarkShadow44> I know that feeling
<bloom> What are your relevant interests ?
<DarkShadow44> What do you mean?
odmir_ has quit [Remote host closed the connection]
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 260 seconds]
JusticeEX has joined #asahi-gpu
Emantor has quit [Quit: ZNC - http://znc.in]
Emantor has joined #asahi-gpu
phiologe has quit [Ping timeout: 250 seconds]
phiologe has joined #asahi-gpu
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 265 seconds]
rwhitby has quit [Ping timeout: 258 seconds]
TheJollyRoger has quit [Remote host closed the connection]
TheJollyRoger has joined #asahi-gpu
TheJollyRoger has quit [Remote host closed the connection]
TheJollyRoger has joined #asahi-gpu
rwhitby has joined #asahi-gpu
zkrx has quit [Ping timeout: 265 seconds]
zkrx has joined #asahi-gpu
rwhitby has quit [Ping timeout: 258 seconds]
JusticeEX has quit [Ping timeout: 240 seconds]
Necrosporus has quit [Read error: Connection reset by peer]
Necrosporus has joined #asahi-gpu
Baughn has quit [Read error: Connection reset by peer]
Necrosporus is now known as Guest59740
Guest59740 has quit [Killed (barjavel.freenode.net (Nickname regained by services))]
Necrosporus has joined #asahi-gpu
Baughn has joined #asahi-gpu
odmir has joined #asahi-gpu
JusticeEX has joined #asahi-gpu
odmir has quit [Remote host closed the connection]
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 268 seconds]
phiologe has quit [Ping timeout: 250 seconds]
phiologe has joined #asahi-gpu
odmir has joined #asahi-gpu
Baughn has quit [Ping timeout: 260 seconds]
odmir has quit [Ping timeout: 246 seconds]
Baughn has joined #asahi-gpu
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 260 seconds]
JusticeEX has quit [Ping timeout: 268 seconds]
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 240 seconds]
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 268 seconds]
odmir has joined #asahi-gpu
solarkraft has quit [Quit: Bye!]
JusticeEX has joined #asahi-gpu
odmir has quit [Ping timeout: 246 seconds]
odmir has joined #asahi-gpu