#asahi-gpu on 2021-04-15 — irc logs at freenode.irclog.whitequark.org

2021-01-11 09:46 marcan changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

00:02 <bloom> fadd16/fmul16/fmadd16 seem fine?

00:02 vlixa has quit [Remote host closed the connection]

00:02 <bloom> if you want to know what my reference point for regularity is i'll link you the bifrost encoding :p

00:06 <dougall> yeah, they could definitely be worse - i'm just thinking of moving the Am/Bm/Cm field relative to their 32-bit equivalents (why did they do that?), and the fact that 32-bit ops can have 16-bit sources and destinations too

00:10 <bloom> "and the fact that 32-bit ops can have 16-bit sources and destinations too"

00:10 <bloom> This part makes a ton of sense.

00:11 <bloom> The 32-bit ops are heavier weight. Yes, you _can_ run a fadd.32 with all operands 16-bit, but that will (depending on uarch details that are not ISA visible) be slower or higher power.

00:11 <bloom> It's fundamentally a different operation. Convert, fp32 multiply, convert, versus fp16 multiply. The latter is much cheaper. (The converts are cheap regardless.)

00:13 <dougall> ah, good point, yeah... that makes sense

00:13 <bloom> on some arches, fp16 multiply is even vectorized (where fp32 is scalar, conversions be damned)

00:13 <bloom> it's that much cheaper :>

00:19 <bloom> Honestly the most annoying part of the encoding is the presence of >64-bit instructions

00:19 <bloom> Makes the bit arithmetic awful.

00:27 <dougall> yeah, C is particularly painful for that... i'd probably use __int128, and i'd probably end up regretting it :p

00:27 <bloom> lol

00:50 <bloom> ok, added some generic ALU packing code

00:50 <bloom> 2 lines shorter than I was before :-p

01:38 odmir has joined #asahi-gpu

01:42 odmir has quit [Ping timeout: 240 seconds]

01:56 vijfhoek has quit [Ping timeout: 246 seconds]

01:56 vijfhoek has joined #asahi-gpu

01:57 anuejn has quit [Ping timeout: 246 seconds]

01:58 anuejn has joined #asahi-gpu

02:06 <bloom> ...and with the generic stuff, it was a cinch to add support for all the funops

02:08 <bloom> dougall: " if sx and source.thread_bit_size >= 16:"

02:09 <bloom> I suspect s/>= 16/< 64/ was intended.

02:11 <bloom> I do wonder, if there's native 64-bit adds, why I see the blob lowering to a pair of adds

02:13 <bloom> Oh, maybe because there's no 64-bit access to uniform registers.

02:14 <dougall> hmm, yeah, i think you're right about < 64...

02:15 <bloom> not a r/e thing, just a "what is sign-extension?" thing ;)

02:16 <bloom> also, the encoding for iadd seems really odd. this is probably the weirdest of the ISA.

02:16 <bloom> It's like it's supposed to be a 48-bit instruction and they added an extra 2 bytes of padding for no reason? what?

02:17 <dougall> fwiw i saw apple's compiler emit 64-bit subtracts and 64-bit add+shift, but (as far as i can recall) not 64-bit adds

02:19 <bloom> ...Interesting.

02:21 <dougall> yeah, not sure what's up with that encoding... i do think there's _something_ in the high couple of bits in most/all instructions that i haven't figured out, which might make it make a tiny bit more sense

02:24 TheJollyRoger has quit [Quit: TheJollyRoger]

02:24 TheJollyRoger has joined #asahi-gpu

02:26 <bloom> I'm not worried about 2 unknown bits in the extended encoding

02:26 <bloom> it's iadd specifically (imadd is fine) that's all weird..

02:29 Necrosporus has quit [Killed (beckett.freenode.net (Nickname regained by services))]

02:29 Necrosporus has joined #asahi-gpu

02:35 Necrosporus has quit [Killed (verne.freenode.net (Nickname regained by services))]

02:35 Necrosporus has joined #asahi-gpu

02:52 <dougall> (or maybe i was trying to say that immediates don't get sign extended? not really the best way to represent that... hmm)

02:52 phiologe has quit [Ping timeout: 250 seconds]

02:55 phiologe has joined #asahi-gpu

05:35 pthariensflame has joined #asahi-gpu

05:42 pthariensflame has quit []

05:46 bpye has quit [Quit: The Lounge - https://thelounge.chat]

05:47 bpye has joined #asahi-gpu

07:04 vlixa has joined #asahi-gpu

08:41 bpye has quit [Quit: Ping timeout (120 seconds)]

08:42 bpye has joined #asahi-gpu

09:00 Bastian[m] has quit [Quit: Idle for 30+ days]

09:08 neobrain has quit [Remote host closed the connection]

09:12 Bastian[m] has joined #asahi-gpu

11:09 gabboman has joined #asahi-gpu

11:21 gabboman has quit [Quit: Ping timeout (120 seconds)]

12:31 gabboman has joined #asahi-gpu

12:39 gabboman has quit [Quit: Connection closed]

14:45 tomtastic has quit [Ping timeout: 240 seconds]

14:49 tomtastic has joined #asahi-gpu

15:20 vup has quit [Ping timeout: 245 seconds]

15:20 vup has joined #asahi-gpu

16:50 chrisf has quit [Quit: ZNC - https://znc.in]

16:54 chrisf has joined #asahi-gpu

17:17 odmir has joined #asahi-gpu

18:03 <bloom> dougall: ok, my curiousity got the best of me, poked at sin_pt_1/2

18:03 <bloom> The heavylifting is done by sin_pt_2. However, the function it computes is *not* sin(x), rather it's sin(x)/x

18:04 <bloom> (This is standard, there are numeric advantages here.)

18:04 <bloom> But it only computes in a single quadrant. So given 0 <= x < 1, it'll spit back sin(x * (pi/2)) / x

18:05 <bloom> Notice that's an even function. So sin_pt_2 is in fact defined over [-1, 1], but it ignores the sign bit of its input.

18:06 <bloom> This is a useful property: it lets sin_pt_1 pass the sign of the output over the sin_pt_2 call, to be recombined with a later multiplication.

18:07 <bloom> So what is sin_pt_1? It's just a quadrant fixup.

18:08 <bloom> For x in the first quadrant, it's simply the identity. sin_pt_2 is defined as such, so when we compute sin_pt_2(sin_pt_1(x)) * sin_pt_1(x) we're just computing sine.

18:09 <bloom> For x in the third quadrant, recall sin(x + pi) = -sin(x). So sin_pt_2 will just flip the sign, so we can compute in the first quadrant (pt_2), and then the sign gets restored with the multiply.

18:11 <bloom> For xin the second quadrant, recall sin(x + pi/2) = cos(x) = sin(pi/2 - x). So rather than flip the sign, we take the arithmetic complement.

18:11 <bloom> Likewise for the fourth quadrant, where we both complement and flip the sign.

18:12 <bloom> The last detail I glossed is the units. sin_pt_2 wants its angle as [-1, 1] but sin_pt_1 takes in a rotation [0, 4]. This doesn't affect any of the math, but it means the constants work out to nice integers.

18:12 <bloom> Putting it together, we get definitions:

18:13 <bloom> sin_pt_1 : [0, 4] -> [-1, 1], sin_pt_2 : [-1, 1] -> R

18:13 <bloom> sin_pt_1(x) =

18:13 <bloom> { x if 0 <= x < 1

18:14 <bloom> { 2 - x if 1 <= x < 2

18:14 <bloom> { 2 - x if 2 <= x < 3

18:14 <bloom> { x - 4 if 3 <= x < 4

18:14 <bloom> Or more clearly:

18:14 <bloom> sin_pt_1(x) =

18:14 <bloom> { fract(x) if 0 <= x < 1

18:15 <bloom> { 1 - fract(x) if 1 <= x < 2

18:15 <bloom> { - fract(x) if 2 <= x < 3

18:15 <bloom> { fract(x) - 1 if 3 <= x < 4

18:15 <bloom> As well as:

18:15 <bloom> sin_pt_2(x) = sin((pi/2) * |x|) / |x|

18:16 <bloom> (Is sin_pt_2(0) undefined? Maybe, maybe not. It doesn't matter, as long as it's value satisfies sin_pt_2(0) * 0 = 0 since sin(0) = 0.)

18:17 <bloom> (This requires the value to be finite, since Inf * 0 and NaN * 0 are both NaN under standard IEEE 754 rules.)

18:18 <bloom> And that's how AGX computes sine and cosine!

18:21 <bloom> (Up to constants, sin_pt_2 is the "sinc" function.)

20:04 odmir has quit [Remote host closed the connection]

20:05 odmir has joined #asahi-gpu

20:07 nickiminjaj has joined #asahi-gpu

20:10 odmir has quit [Ping timeout: 260 seconds]

20:11 robinp has quit [Read error: Connection reset by peer]

20:14 robinp has joined #asahi-gpu

20:21 odmir has joined #asahi-gpu

20:26 odmir has quit [Ping timeout: 268 seconds]

20:37 odmir has joined #asahi-gpu

20:57 odmir has quit [Ping timeout: 240 seconds]

21:11 odmir has joined #asahi-gpu

21:42 <glibc> # 48 00 C2 00 -unk48 h0, 2l, u0l.neg

21:42 <glibc> that appears even in empty fragment shaders

21:43 <glibc> with early-z, no alpha-to-coverage or other weird features

21:49 <bloom> yep..

21:49 <bloom> I *suspect* that controls some detail of the tilebuffer

21:50 <bloom> and if you look at how MRT writeout works you can see a pair of bits moving along in those unk48 ("writeout" in dougall's)

21:50 <bloom> similarly interesting patterns with depth/stencil writeouts, et

21:56 pthariensflame has joined #asahi-gpu

21:57 pthariensflame has quit [Client Quit]

22:21 chrisf has quit [Quit: ZNC - https://znc.in]

22:25 chrisf has joined #asahi-gpu

23:06 odmir has quit [Remote host closed the connection]

23:07 odmir has joined #asahi-gpu

23:12 odmir has quit [Ping timeout: 268 seconds]

23:38 odmir has joined #asahi-gpu