<sb0>
well you cannot, gcc needs different compiler builds for different architecture
<sb0>
so I'm not sure what this adds
<sb0>
you'll need to compile another toolchain anyway
<mithro>
sb0: yes, I've already done that bit - I have conda recipes for lm32 and or1k gcc which seem to work okay. I needed the gcc compiler for or1k to compile linux / rtems as that is what the openrisc guys are developing with anyway
<whitequark>
so, a 64-bit sub is a sub+xor+add-with-carry
<whitequark>
you know, this is really stupid, because the l.addc opcode has a reserved bit and the ALU already has all the necessary combinatory logic for subtraction
<whitequark>
they could have added l.subc but did not :/
<whitequark>
could've also used the 0x38,0x1 ALU subrange
<whitequark>
sb0: do you see any use for the MAC module?
<sb0>
no
<whitequark>
64-bit multiplier?
<sb0>
how does this work? is l.sub touching the carry flag?
<whitequark>
why... why does or1k have separate instructions for extending byte and half-word to register size?!
<whitequark>
well, zero-extending at least, that's just a waste of opcode space, since they're all representible via l.andi
<whitequark>
this is a bizarre architecture
<sb0>
yes
<sb0>
lm32 doesn't have such problems afaik...
<whitequark>
so, about that
<rjo>
whitequark: nice. but how do you teach this to llvm if you say it can't learn to do this?
<whitequark>
with what I leanred while fixing OR1K in the last few days, I'm confident I can quickly implement a decent LM32 backend as well as upstream OR1K
<whitequark>
I understand pretty much all the moving parts necessary for implementing a backend of this complexity now
<whitequark>
rjo: with C++ code.
<whitequark>
it has a SUBE instruction (sub-using-carry) and it has built-in legalization code that translates the 64-bit SUB into SUBE+SUBC
<whitequark>
I lower SUBC to l.sub which does the right thing, and then manually lower SUBE to l.xor+l.addc
<rjo>
by the way. soon there will be many 64 bit subtractions because of latency compensation.
<whitequark>
you will be pleased with their speed, then.
<whitequark>
(and I will be pleased that I didn't waste this time)
<sb0>
whitequark, but then there will be libunwind and all
<whitequark>
(well, not like it would have gone to waste anyway, with all the things I learned...)
<whitequark>
sb0: what about libunwind?
<sb0>
I don't trust it will be bug-free for lm32, if available at all
<whitequark>
you do remember that libunwind wasn't available at all for OR1K?
<whitequark>
OR1K had no exceptions, no DWARF, no debug information whatsoever
<whitequark>
libunwind basically needs setcontext+getcontext and a little bit of boilerplate. and it was bug-free from the start, because that code is just too dumb to have bugs
<whitequark>
there *were* a few bugs in the OR1K frame lowering code, but they would have manifested even without exceptions or DWARF, that just made them manifest earlier, and in easier to debug ways, for that matter
<sb0>
didn't you use something from BSD?
<whitequark>
nope
<sb0>
I remember seeing some OR1K DWARF/unwind support from there
<whitequark>
I have never even heard about that
evilspirit has quit [Ping timeout: 260 seconds]
<cr1901_modern>
They probably reimplemented something due to licensing concerns and/or the GNU equivalent being crap
<whitequark>
well they sure as hell used binutils, there's no alternative for or1k
<cr1901_modern>
Fair. (Though tbh, I'm a little surprised a binutils alt never came to fruition.)
<whitequark>
sure it did
<whitequark>
LLVM has its own assembler since ages (because it's stupid to fork, serialize and deserialize just to emit machine code)
<whitequark>
now LLVM has its own linker too, and it slowly gains all the loose parts ie ar dwarfdump objdump et cetera
* sb0
notices that QFileDialog with QFileDialog::DontUseNativeDialog also has table column layout issues
<cr1901_modern>
I guess it's just slow to adopt then. I actually didn't know LLVM had an assembler. Presumably you can write one for any backend you want if motivated?
<sb0>
whitequark, so you're motivated to port everything to lm32?
<whitequark>
sb0: sure, why not? you're saying it provides concrete advantages, and I see that it's not a lot of work
<sb0>
well the architecture is cleaner. but there are no user-visible advantages ...
<whitequark>
we also need to decide something about upstreaming the backends. or1k, lm32, both
<whitequark>
I'm tempted to try it with or1k because it's already there and in a good state, and see how painful it is
<sb0>
on the other hand, a file selector that would not suck clearly would be a user-visible advantage
<sb0>
the kde one is okay, but probably hell to integrate
<sb0>
on windows and all
<whitequark>
might not be that bad actually, but what's wrong with the system one?
<sb0>
I want to customize it in two ways: 1) it should not be a dialog but a permanent part of the application window 2) large icons used as previews (rendered by my application)
<sb0>
the system one supports neither
<whitequark>
I don't think you should base it off the file selector at all, then
<whitequark>
mor1kx doesn't even bother to implement the extension instructions
evilspirit has joined #m-labs
<whitequark>
well, four out of six
sb0 has joined #m-labs
evilspirit has quit [Ping timeout: 244 seconds]
<whitequark>
rjo: sb0: wow.
<whitequark>
the 64-bit addc changes have had a *massive* effect, far more than I have anticipated
<whitequark>
specifically, PulseRateDDS is down to 20us
<whitequark>
so... 10us per channel? that's actually better than what the Oxford group wants, isn't it?
<whitequark>
uh
<whitequark>
what
<whitequark>
*enabling* addc while building the runtime makes the test faster, but *disabling* addc while building the kernel *also* makes the test faster?
<whitequark>
a little bit, but it does
<whitequark>
yeah, there's a pretty large amount of l.addic's in dds.o, and a few in rtio.o, i think most of them are dead though
<whitequark>
i wonder what's up with addc slowing down the kernel though
key2 has joined #m-labs
<rjo>
whitequark: it is what i suggested they can get with drtio. 10us is a useful number for a pulse (dds set and ttl pulse combined). but there will be overhead when actually doing them.
<whitequark>
yes, if you change phase you will immediately have FP in the loop
<whitequark>
(do you change phase?)
<rjo>
all the time
<whitequark>
or if you set phase mode to not continuous, there will be a bunch of 64-bit multiplications in dds_set
<rjo>
not only that but als the overhead of retrieving the pulse data etc. this is not just repeating the same pulse over and over again.
<rjo>
but we really need to leave that for later imho.
<rjo>
now we should prioritize and say that 10us for repeating the sme pulse without phase tracking is good.
<whitequark>
there's the 64-bit multiplier in the ISA but not in mor1kx...
<rjo>
out of curiosity. how did that help for pulse rate ttl?
<rjo>
a 64 bit multiplier would need to be either multi-cycle or bring down the clock speed a lot.
<whitequark>
it didn't. the ttl pulse rate is 1484ns
<whitequark>
pretty much what it was before I started messing with FP, LICM, etc
<whitequark>
this, on the other hand, I actually expected
<rjo>
hmm. there should be heavy 64 bit stuff in there as well.
<whitequark>
hrm, RCA is inconclusive for that addc slowdown, but probably register pressure
<whitequark>
in any case it's 40ns
kuldeep has quit [Ping timeout: 248 seconds]
key2 has quit [Ping timeout: 244 seconds]
kuldeep has joined #m-labs
kuldeep has quit [Client Quit]
kuldeep has joined #m-labs
_rht has quit [Quit: Connection closed for inactivity]
klickverbot has joined #m-labs
klickverbot has quit [Ping timeout: 250 seconds]
<whitequark>
sb0: from discussion on #llvm: "like, moving from OR1K to RISC-V would be like moving from a trash can fire to a larger, dumpster-sized fire"
klickverbot has joined #m-labs
<cr1901_modern>
Surprised. I was under the impression that RISC-V was the most popular out of LM32,OR1K,and RISC-V. But then again, most popular != best.
<cr1901_modern>
(I've been told that "one reason LM32 is ignored is that it's 32-bit only")
<cr1901_modern>
although I seem to recall that data width is adjustable? *checks*
<whitequark>
datapath width is not really the same as register width
klickverbot has quit [Quit: No Ping reply in 180 seconds.]
klickverbot has joined #m-labs
<cr1901_modern>
Yea, I'm not sure where I was going with that in retrospect.
<cr1901_modern>
sb0: Ping.
<whitequark>
rjo: sb0: we can't use overflows in OR1K.
<whitequark>
none of the OR1K shifts set overflow (or carry, for that matter) bits
<whitequark>
LLVM will transform *2 into <<1 in instcombine (and do other similar things)
<whitequark>
which is, of course, not only legal but desirable.
<whitequark>
not only this will make code *much* slower but also I don't think that optimization can even *be* turned off, it's considered target-independent