02:27
klickverbot has quit [Ping timeout: 260 seconds]
02:50
klickverbot has joined #m-labs
02:55
klickverbot has quit [Ping timeout: 268 seconds]
03:52
<
sb0 >
for running all the tests, elapsedTime=83.350688 for master, elapsedTime=64.582824 for release-1. is the compiler getting slower, or is this something else or just noise?
05:09
sandeepkr has quit [Ping timeout: 268 seconds]
05:12
klickverbot has joined #m-labs
05:17
klickverbot has quit [Ping timeout: 244 seconds]
05:39
klickverbot has joined #m-labs
05:44
klickverbot has quit [Ping timeout: 248 seconds]
05:47
cr1901_modern has quit [Ping timeout: 268 seconds]
06:07
klickverbot has joined #m-labs
06:12
klickverbot has quit [Ping timeout: 276 seconds]
06:27
evilspirit has joined #m-labs
06:34
klickverbot has joined #m-labs
06:38
klickverbot has quit [Ping timeout: 240 seconds]
06:46
sandeepkr has joined #m-labs
07:01
klickverbot has joined #m-labs
07:05
klickverbot has quit [Ping timeout: 268 seconds]
07:28
klickverbot has joined #m-labs
07:33
klickverbot has quit [Ping timeout: 260 seconds]
07:36
cr1901_modern has joined #m-labs
07:55
klickverbot has joined #m-labs
08:00
klickverbot has quit [Ping timeout: 276 seconds]
08:22
klickverbot has joined #m-labs
08:26
evilspirit has quit [Ping timeout: 264 seconds]
08:27
klickverbot has quit [Ping timeout: 246 seconds]
08:44
<
mithro >
sb0: What do you think about misoc generating a device tree fragment needed to for booting linux on a misoc platform?
08:45
<
whitequark >
sb0: dunno
08:50
klickverbot has joined #m-labs
08:54
klickverbot has quit [Ping timeout: 246 seconds]
09:17
klickverbot has joined #m-labs
09:21
klickverbot has quit [Ping timeout: 246 seconds]
09:44
klickverbot has joined #m-labs
09:50
klickverbot has quit [Ping timeout: 246 seconds]
10:25
sandeepkr has quit [Ping timeout: 244 seconds]
11:06
_rht has joined #m-labs
11:21
sandeepkr has joined #m-labs
11:36
<
whitequark >
rjo: pulse_rate_dds down to 100us
11:40
<
whitequark >
rjo: I'm not sure what else can be done about it
11:40
<
whitequark >
there's a load of phase_mode and a load/store of now in the inner loop
11:40
<
whitequark >
everything else has been eliminated
11:41
<
whitequark >
the load of phase_mode
*might* be eliminable with aggressive TBAA
11:42
key2 has joined #m-labs
11:45
<
whitequark >
rjo: oh wait lol
11:46
<
whitequark >
the
*IR* doesn't have any soft-FP in the loop
11:46
<
whitequark >
however, the
*assembly* does
11:46
<
whitequark >
it looks like instruction selector decided to fuse the FP comparison with the branch. except it's dumb and doesn't understand that it shouldn't do that with soft-FP
11:46
<
whitequark >
I can address this, but post-1.0.
12:00
klickverbot has joined #m-labs
12:05
klickverbot has quit [Ping timeout: 244 seconds]
12:05
<
GitHub67 >
conda-recipes/master b9c7c95 whitequark: llvm-or1k: move to artiq branch.
12:05
<
GitHub63 >
conda-recipes/master 7823364 whitequark: llvmlite-artiq: bump.
12:06
<
whitequark >
bb-m-labs: force build --props=package=llvm-or1k conda-all
12:06
<
bb-m-labs >
build forced [ETA 5m59s]
12:06
<
bb-m-labs >
I'll give a shout when the build finishes
12:12
key2 has quit [Ping timeout: 276 seconds]
12:27
klickverbot has joined #m-labs
12:32
klickverbot has quit [Ping timeout: 268 seconds]
12:36
klickverbot has joined #m-labs
13:08
klickverbot has quit [Ping timeout: 276 seconds]
13:16
<
rjo >
whitequark: i wonder wether we could generally bias the artiq-python code much more away from FP towards integers.
13:17
<
rjo >
something that isn't there in the first place does not need to be optimized away.
13:33
_rht has quit [Quit: Connection closed for inactivity]
13:36
klickverbot has joined #m-labs
13:41
<
whitequark >
bb-m-labs: force build --props=package=llvm-or1k conda-lin64
13:41
<
bb-m-labs >
build forced [ETA 1m00s]
13:41
<
bb-m-labs >
I'll give a shout when the build finishes
14:58
klickverbot has quit [Ping timeout: 248 seconds]
15:17
<
whitequark >
oh wait
15:17
<
whitequark >
that wasn't 100us
15:17
<
whitequark >
that's just the starting value
15:19
<
whitequark >
okay so before D18744, pulse_rate_dds is 36us
15:21
<
whitequark >
after D18744, pulse_rate_dds is 26us
15:32
<
rjo >
whitequark: yay.
15:32
<
rjo >
whitequark: no fp anymore?
15:33
<
rjo >
see my 20 us estimate was right on the money.
15:33
<
whitequark >
yeah. no fp.
15:33
<
whitequark >
the rest is... spills, reloads, doubtful register allocator decisions, manipulation of `now`
15:34
<
whitequark >
some bizarre jumps on the hot path
15:35
<
rjo >
does llvm-or1k 3.8.1-10 have all those?
15:36
<
GitHub87 >
conda-recipes/master 9073893 whitequark: llvm-or1k: bump.
15:36
<
whitequark >
bb-m-labs: force build --props=package=llvm-or1k conda-all
15:36
<
bb-m-labs >
build forced [ETA 5m59s]
15:36
<
bb-m-labs >
I'll give a shout when the build finishes
15:47
sandeepkr_ has joined #m-labs
15:50
sandeepkr has quit [Ping timeout: 240 seconds]
15:50
kuldeep has quit [Ping timeout: 244 seconds]
15:51
klickverbot has joined #m-labs
15:52
<
whitequark >
sb0: you said the desired number is 10us/ch. we are at 13us/ch.
15:55
sandeepkr__ has joined #m-labs
15:55
kuldeep has joined #m-labs
15:58
sandeepkr_ has quit [Ping timeout: 246 seconds]
16:00
<
sb0 >
whitequark, nice. should be fine for now.
16:00
<
sb0 >
there is the option of putting a delay between runs so that the fifo has time to refill
16:02
<
whitequark >
sb0: what's the ns/instruction ratio for mor1kx?
16:02
<
whitequark >
well, clock/instruction and ns/clock
16:03
<
sb0 >
CPI is 1 for non-branch instructions
16:03
<
whitequark >
what about loads?
16:04
<
whitequark >
loads from cache specifically, I think everything should be in cache
16:04
<
sb0 >
I think 1 if they hit the cache
16:04
<
rjo >
that would be something interesting to verify
16:04
<
whitequark >
that's odd
16:04
<
whitequark >
why do we take ~812 instructions to set one DDS channel?
16:05
<
whitequark >
well, 812 cycles
16:05
<
sb0 >
there are a lot of things in there. look at the C source
16:09
<
whitequark >
okay so
16:10
<
whitequark >
the inner loop has 69 instructions. I can shave off, optimistically, ten
16:10
<
whitequark >
so, 80ns. not worth bothering with.
16:13
sandeepkr_ has joined #m-labs
16:15
<
rjo >
64bit manipulation is not that cheap
16:17
kuldeep has quit [Ping timeout: 248 seconds]
16:17
sandeepkr__ has quit [Ping timeout: 276 seconds]
16:21
<
whitequark >
actually, it is not that expensive, when addc is actually used
16:22
<
whitequark >
but someone stubbed that out in or1k...
16:22
<
whitequark >
yeah. in the LLVM backend.
16:23
<
sb0 >
I remember seeing very short code produced by llvm for a 64-bit add
16:23
<
whitequark >
yes. but it is longer than two instructions.
16:23
<
sb0 >
can that be fixed?
16:23
<
whitequark >
okay, I can fix that
16:25
<
whitequark >
sb0: can I express a "subc" with l.addc?
16:27
<
whitequark >
I dunno
16:27
<
whitequark >
but they seem related
16:28
<
rjo >
sorry: -EARCH
16:28
<
whitequark >
hastebin doesn't have or1k.
16:28
<
whitequark >
... and anyway the highlighting is close enough
16:29
<
sb0 >
whitequark, can you take the opposite of a 64-bit number in a fast manner?
16:30
<
sb0 >
I think it's invert all the bits (xor each word with 0xffffffff) and then add 1, isn't it?
16:31
<
whitequark >
LLVM generates a pretty obnoxious sequence for negation...
16:31
<
sb0 >
so yeah, you can do subc with two xors and then two 64-bit additions
16:32
<
sb0 >
s/subc/64-bit sub
16:33
<
whitequark >
that seems way too slow.
16:34
kuldeep has joined #m-labs
16:40
<
sb0 >
there doesn't seem to be any support for 64-bit subtraction in or1k
16:40
<
sb0 >
I mean, multiprecision with carry flags
16:41
<
sb0 >
artiq code should rarely use 64-bit subtraction anyway
16:41
<
sb0 >
there are just a few
16:43
<
rjo >
yes. time is mostly increasing.
16:47
<
rjo >
but is that asm for the 64 bit add() good? i remember seeing that style as well months ago.
16:48
<
whitequark >
no, it's not
16:48
<
rjo >
it is also longer than the sub()
17:23
<
whitequark >
sb0: rjo: we can subtract quickly.
17:23
<
whitequark >
invert all the bits, *set carry*, and then add.
17:28
<
whitequark >
there's no easy way to set carry.
17:30
<
whitequark >
oh, I can do an l.addi rX, r0, -1
17:33
<
whitequark >
oh, I got a suggestion for an even more optimal way
17:38
<
rjo >
shouldn't llvm be able to come up with those?
17:40
<
whitequark >
LLVM's codegen is a pattern matcher. it reduces a graph.
17:40
<
whitequark >
it doesn't understand almost anything past basic semantics, e.g it knows commutative
17:44
<
whitequark >
LLVM does have hardcoded canonical expansions, but they can be quite inefficient
18:03
<
stekern >
sb0: yes, loads from cache are one cycle
18:11
klickverbot has quit [Ping timeout: 246 seconds]
18:13
klickverbot has joined #m-labs
19:04
sandeepkr__ has joined #m-labs
19:04
klickverbot has quit [Ping timeout: 244 seconds]
19:07
kuldeep has quit [Ping timeout: 248 seconds]
19:08
sandeepkr_ has quit [Ping timeout: 252 seconds]
19:15
klickverbot has joined #m-labs
19:23
kuldeep has joined #m-labs
19:27
sandeepkr_ has joined #m-labs
19:27
kuldeep has quit [Max SendQ exceeded]
19:28
sandeepkr__ has quit [Read error: No route to host]
19:28
kuldeep has joined #m-labs
19:29
sandeepkr has joined #m-labs
19:32
sandeepkr_ has quit [Ping timeout: 244 seconds]
20:05
sandeepkr has quit [Quit: Leaving]
20:27
mumptai has joined #m-labs
20:41
kuldeep has quit [Ping timeout: 246 seconds]
20:48
kuldeep has joined #m-labs
20:48
kuldeep has quit [Changing host]
20:48
kuldeep has joined #m-labs
21:53
mumptai has quit [Quit: Verlassend]