<sb0>
whitequark, the l.lwa and l.swa instructions do appear in the final runtime binary
<sb0>
why do we need atomic RMWs? we don't have interrupts
<whitequark>
I've explained that in the comment...
<whitequark>
we don't. Rust libcore still provides them.
<whitequark>
sure, it's possible to patch libcore *and* patch crates we depend on, which will make updating Rust version even more complicated than it already is, so I choose not to.
<sb0>
and the corresponding functions in libcore cannot be easily removed?
<whitequark>
that would break the interface of libcore
<whitequark>
the fewer ARTIQ-specific changes the toolchain needs, the better
<whitequark>
I guess if you don't want to disable them in the core but not the compiler, the proper solution is a flag for LLVM that enables lowering l.lwa/l.swa to l.lwz/l.sw if interrupts are disabled
<GitHub-m-labs>
migen/master f4180e9 Sebastien Bourdeauducq: vivado: print short timing info after phys_opt_design
<sb0>
whitequark, ^ the buildbot should use the output of this: 1. timing is no longer fixed after the routing step 2. xilinx docs say explicitly not to use the routing message for timing sign-off (I guess routing doesn't take certain things into account, maybe e.g. tPWS on ultrascale)
<whitequark>
that doesn't mean it won't be inferred
<whitequark>
so you need to change target-feature as well
<sb0>
but it means we can ditch it without much of a user-visible impact
<whitequark>
it might be used in the kernels though
<whitequark>
we have it enabled in artiq/compiler/targets.py too
<sb0>
okay, I'll double-check if the slow xilinx stuff is able to meet timing with it still enabled ...
<sb0>
_florent_, any progress on serwb?
ncl has quit [Read error: Connection reset by peer]
ncl has joined #m-labs
<sb0>
whitequark, the message "All user specified timing constraints are met." is in the log now
<sb0>
for good measure, phys_opt_design still use 21 seconds of CPU time when it does nothing
<_florent_>
sb0; i'm working on serwb, i'll try to get it done today
qinfengling has quit [Read error: Connection reset by peer]
<sb0>
INFO: [Physopt 32-716] Net sys_clk has constraints that cannot be copied, and hence, it cannot be cloned. The constraint blocking the replication is create_clock @ /home/sb/artiq_drtio/artiq_kasli/sysu/gateware/top.xdc:649
<sb0>
whitequark, vivado is having none of it; timing breaks when FFL1 is enabled
<sb0>
whitequark, where do you think this instructions would be used in kernels?
<sb0>
whitequark, maybe we can keep it enabled for the kernel CPU only, though
<whitequark>
hmm should I update the LLVM and Rust versions while I'm at it...
<whitequark>
smoltcp will require a newer Rust soon
<sb0>
again? what artiq problem does that fix?
<whitequark>
"not being stuck with an ancient LLVM version that requires weeks of work when an upgrade is absolutely unavoidable at some point in the future"
<whitequark>
in practical terms, I think the closest thing we'll need is the new metadata that makes constant propagation work better
<whitequark>
in kernels
<whitequark>
#655 that is
<sb0>
okay, but there are higher priority things, e.g. sayma bugs, camera driver
<whitequark>
if I'm going to spend several hours rebuilding packages, why not fold a version update into it?
<whitequark>
the last three times I've done that it didn't break anything iirc
<sb0>
ok
<whitequark>
oh, looks like I already updated LLVM to 5.0
<whitequark>
that makes things even easier
<sb0>
this timing problem is quite nasty. almost sayma class.
FabM has joined #m-labs
FabM is now known as FabM_cave
<whitequark>
clock can't be lowered because kasli has to work with sayma right?
<whitequark>
and kasli can't get a faster fpga because?
<sb0>
it could be lowered if needed, but it's not straighforward either and doesn't help with the performance
<sb0>
upgrading the fpga speed grade doesn't really help for some reason
<whitequark>
neither does upgrading to 75T or 100T?
<sb0>
it's already 100t
<whitequark>
oh
<sb0>
the CPUs are asynchronous to (D)RTIO, so that's not the problem with the lower clock freq
<sb0>
the problem is: generating the correct frequencies (e.g. IDELAYCTRL still needs 200MHz) though that's relatively straightforware, DDR3 breakage (DDR3 is synchronous to the CPU for latency reasons), and whether Vivado will pass timing at the lower frequency, since the constraints have changed
<sb0>
and the lower performance of the system
<whitequark>
hm I wonder if lm32 would have passed timing easier
<whitequark>
it's probably too late to do an lm32 port now though
<sb0>
CERN are going to move their mock-turtle control system to it
<whitequark>
have you tried to synthesize it?
<sb0>
no
<whitequark>
that would instantly move us to upstream rust and llvm which is great even by itself
<sb0>
feel free to have a look at it and try adding it to misoc
<whitequark>
hm it doesn't have caches
<sb0>
ah, yeah
<sb0>
afaik they're putting everything in BRAM
<whitequark>
yes, that's what it says
<whitequark>
let me see their latest code
<sb0>
does risc-v llvm work correctly now?
<whitequark>
it didn't before?
<sb0>
well you said so
<sb0>
but years ago
<sb0>
also it would seem that fpga implementations are still unusable ...
* whitequark
searches IRC logs
<whitequark>
2016-04-04 22:32 <whitequark> sb0: from discussion on #llvm: "like, moving from OR1K to RISC-V would be like moving from a trash can fire to a larger, dumpster-sized fire"
<whitequark>
I don't remember the context of that and it was two years ago though
<sb0>
yes
<whitequark>
sb0: I looked at uRV
<whitequark>
I am definitely not going to use it.
<whitequark>
there's no wishbone interface, the divider is broken, and overall the code looks like a typical student project
<whitequark>
zero confidence in that it actually works
<whitequark>
there's only a tiny testsuite too
<whitequark>
now I think Clifford's core would be much better
rohitksingh_work has quit [Read error: Connection reset by peer]
<whitequark>
does it make any sense to run picorv32 with a far higher clock to compensate for its average of 4 CPI?
<rqou>
whitequark: i dare you to use my risc-v core lol :P
<rqou>
it has no wishbone interface and the code was indeed a typical student project
<whitequark>
rqou: thanks but no
<rqou>
also technically non-free because you're not allowed to use it to publish complete project solutions for the course
<rqou>
:P
<whitequark>
sb0: also you'll need to pass -mno-ffl1 to clang
<whitequark>
since clang has ffl1 enabled by default I believe
rohitksingh_wor1 has joined #m-labs
rohitksingh_work has quit [Ping timeout: 240 seconds]
<whitequark>
nevermind the above, it isn't
<sb0>
whitequark, what is worse for kernels, no ffl1 or no store buffer?
<sb0>
whitequark, 4 CPI puts this CPU into the unusable category. nothing new under the sun.
<whitequark>
sb0: even at 500 MHz?
<sb0>
Without using the look-ahead memory interface (usually required for max clock speed), this results drop to 0.305 DMIPS/MHz and 5.232 CPI.
<sb0>
and does it achieve such speeds consistently?
<whitequark>
that's for our kintex-7
<whitequark>
regarding what's worse
<whitequark>
I'm not sure
<whitequark>
sb0: any chance of hardfloat?
<whitequark>
if we enabled hardfloat we wouldn't need ffl1
<sb0>
because of course, enabling both on the kernel CPU breaks timing
<sb0>
well I think hardfloat may land at some point. so let me try with store buffer and no ffl1
<sb0>
though, we'll have to be careful not to break timing again when introducing hardfloat
<sb0>
I don't know why everyone insists on making rv softcores that suck, instead of copying lm32 or mor1kx ...
<whitequark>
well PicoRV32 is explicitly size-optimized
<sb0>
700 to 2000 LUTs?
<sb0>
that's LM32 ballpark
<sb0>
the high clock speed is interesting, but i don't know how it performs in a real-world situation
<sb0>
fpga routing is very slow, you can't go far on the chip at that clock rate
<sb0>
you can contain the cpu in a small area and use a clock multiplier and CDCs, but that's complicated
<sb0>
LM32 is easier to use
<sb0>
this gives me an idea though: a NoC using C-slowed multithreaded CPUs running at very high frequencies like that
<sb0>
C-slowing is much better than wasting the 3-4 other cycles in the instruction, and doesn't make control more complicated or slower
<whitequark>
hm
<sb0>
(I don't know if that has any practical application for artiq or anything)
<whitequark>
[789/2165] Building CXX object
<whitequark>
like I mentioned
<whitequark>
the slowest part is rebuilding LLVM. I've already updated everything to 6.0....
<sb0>
ffs, vivado is still crapping out with ffl1 disabled and the store buffer enabled on the kernel CPU
<sb0>
so it's going to be ffl1 and no store buffer, if that meets timing at all
<sb0>
at least that keeps binary compatibility between slow and less-slow FPGAs
<whitequark>
hm yes, if you made binaries incompatible that would kill my plans of unifying ksupport with runtime
<sb0>
if only zynq didn't suck
<sb0>
but those FPGA cos almost never can get anything right
<sb0>
well, the PPC cores in Virtex seemed acceptable, though I never actually used them
<sb0>
they could simply put several CPU cores, with built-in caches and clock multiplier, under complete FPGA fabric control instead of this "FPGA-SoC" bullshit
<sb0>
if they do the CDC right, you only get a few cycles of latency with the fabric, instead of many dozens with the zynq garbage, and then you can put things like SDRAM controllers etc. in the fabric and still get good performance
<sb0>
instead of fixing vivado and their IP, they think cramming all sorts of buggy hardwired cores that require obscure and trashy "wizards" to work around the poor design and silicon bugs is going to make things right
<sb0>
it's like, they don't want people to program FPGAs
<whitequark>
captive audience
<whitequark>
they don't want people to program FPGAs, they want people to give them money
<sb0>
stekern, are there any plans for a write-back cache for mor1kx? in uniprocessor systems, that generally has higher performance and better timing than a store buffer
<sb0>
though that doesn't work so well for mmio
<whitequark>
sb0: can't you bypass cache for mmio?
<whitequark>
you'd typically have everything over 0x80000000 as noncacheable or something
<stekern>
I don't have any plans on implementing one (due to lack of time :()
<sb0>
yes, but that needs a cache bypass circuit that can result in timing problems
<whitequark>
sb0: maybe I could take a stab at it
<whitequark>
a write-back cache isn't very complicated
<sb0>
whitequark, ok
rohitksingh_wor1 has quit [Read error: Connection reset by peer]
<sb0>
okay, with FFL1=REGISTERED on the kernel CPU it meets timing
<sb0>
so, we have to ditch the store buffer, or replace with with the wb cache
<sb0>
let me try FFL1=REGISTERED on the comms CPU as well
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #636: > The other thing we could do is merge that address into the channel number. Then the (compound) channel number would be {u8 core, u16 channel, u8 address}... https://github.com/m-labs/artiq/issues/636#issuecomment-377233408
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #636: @whitequark What do you think of ``now`` pinning to the CSR with atomicity implemented in gateware by committing the value when the 32 least significant bits are written? https://github.com/m-labs/artiq/issues/636#issuecomment-377233864
<sb0>
okay I think we can keep FFL1
X-Scale has quit [Read error: Connection reset by peer]
<sb0>
whitequark, I'm ready to commit the fixes to the kasli timing issue after you remove the dependency on the atomic instructions
rohitksingh has joined #m-labs
<sb0>
stekern, is there a performance penalty between FFL1=ENABLED and FFL1=REGISTERED?
<sb0>
well sure, but is that absorbed by the CPU pipeline or does it insert a bubble?
<whitequark>
I think it inserts a bubble
rohitksingh has quit [Quit: Leaving.]
<GitHub-m-labs>
[artiq] jbqubit commented on issue #919: @sbourdeauducq Did you try to trick that @gkasprow suggested on Feb 9? What is scope of IO error for Slots 3 and 4? Single IO line or all the IO lines? https://github.com/m-labs/artiq/issues/919#issuecomment-377275901
<GitHub-m-labs>
[artiq] whitequark commented on issue #636: IIRC, bus errors work properly but only on writes, on reads error cycles are silently ignored.
<GitHub-m-labs>
[artiq] whitequark commented on issue #919: @jbqubit I'm working on this. I will check that tomorrow or the next day; so far I've not been able to get any output from DACs whatsoever because of serwb hangs. https://github.com/m-labs/artiq/issues/919#issuecomment-377289756
<GitHub-m-labs>
[artiq] whitequark commented on issue #636: On second thought, if we pin `now` in the linker and simply rely on whatever word write order LLVM already uses for `i64` (it's consistent), then we won't lose anything. More importantly, this won't affect code motion across `now`, which is important for things like lowering FP delay operations to integer delay_mu ones. https://github.com/m-labs/artiq/issues/636#i
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #636: Also we do not need the now CSR to be always up to date. It must be up-to-date only when a RTIO write or read is done; the rest of the time, containing some valid value (i.e. one that has been set by the user) that is atomically updated is sufficient. https://github.com/m-labs/artiq/issues/636#issuecomment-377291124
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #636: Also we do not need the now CSR to be always up to date. It must be up-to-date only when a RTIO write or read is done; the rest of the time, containing some valid value (i.e. one that has been set at some point by the user) that is atomically updated is sufficient. https://github.com/m-labs/artiq/issues/636#issuecomment-377291124
<GitHub-m-labs>
[artiq] whitequark commented on issue #636: If we never treat that location as anything other than `i64*` then LLVM will never write only one half of it. Of course, if you manually cast either half to an `i32*` (which is only possible in Rust) without marking it as volatile then you will get a "miscompilation". The solution is to, well, not do that. https://github.com/m-labs/artiq/issues/636#issuecomment-37
<GitHub-m-labs>
[artiq] sbourdeauducq commented on issue #919: I'm not sure if it's really a source of problems, just something to check - sometimes, problems disappeared when reconnecting the rtm, but it could have been just chance. https://github.com/m-labs/artiq/issues/919#issuecomment-377304062
rohitksingh has quit [Read error: Connection reset by peer]
rohitksingh has joined #m-labs
<whitequark>
sb0: ok, LLVM 6.0 update is complete
<whitequark>
now it's #891 and #667
dhs has joined #m-labs
cr1901 has joined #m-labs
dhs has quit [Quit: Page closed]
<GitHub-m-labs>
[artiq] dhslichter commented on issue #636: @jordens just to confirm, the 8-bit address limitation will mean we get max 256 channels on any given DDS or SPI bus? Does address correspond to branches for SAWG? I just want to confirm the meanings of core (indicating DRTIO device - e.g. a single Kasli or Sayma), channel (a single RTIO channel on that core, e.g. a TTL or a DDS bus or an SPI or a SAWG channel), and
FabM_cave has quit [Quit: ChatZilla 0.9.93 [Firefox 52.7.2/20180316222652]]
<cr1901>
rjo: For the proxy bitstreams, how did you modify openocd when introducing support for the new proxy bitstreams (the ones on the master branch of your bscan_spi_bitstreams repo). Is openocd at the tip of master today still capable of programming the flash using the old proxy bitstreams?
<cr1901>
I got some weird breakage in a repo that uses an old proxy bitstream when paired w/ an openocd I just compiled from the tip of master today. >>
<cr1901>
But I seem to be the only person having these issues (and the conda openocd provided also uses the tip of master... must be a patch issue)
cr19011 has joined #m-labs
cr1901 has quit [Read error: Connection reset by peer]
cr19011 has quit [Read error: Connection reset by peer]