clifford changed the topic of #yosys to: Yosys Open SYnthesis Suite: http://www.clifford.at/yosys/ -- Channel Logs: https://irclog.whitequark.org/yosys
seldridge has quit [Ping timeout: 240 seconds]
mbock has quit [Quit: Leaving.]
GuzTech has quit [Ping timeout: 240 seconds]
<Exaeta> if I made my CPU only able to read and write memory by load and store instructions, would that be silly?
<Exaeta> that is to say, the vast majority of instructions could only operate on registers and not memory, except for some atomics perhaps
<sorear> that's how the vast majority of ISAs work, x86 is an outlier
<sorear> exactly how much research have you done?
<Exaeta> sorear: on non-x86 architectures?
<Exaeta> not very much
<sorear> ISAs, microarchitectures, electronic aspects
<Exaeta> I've made a virtual machine for a made up architecture before, but never done it using a FPGA.
<Exaeta> FPGA-wise I'm nooby. But I understand the fundamentals of logic gates pretty well.
<Exaeta> other than maybe timing issues
<sorear> i'm talking about more stuff like, pipelining
<sorear> MMUs and caches
<Exaeta> I know what they are, but haven't thought much about how to implement them in a FPGA
<Exaeta> I think I need a better handle on what they can do before I decide that.
<awygle> reading is always good
<Exaeta> I was thinking of a hybrid of sorts
<Exaeta> mostly, operations use registers, but a few will treat the input registers as pointers.
<sorear> don't underestimate the amount of work involved in porting software
dys has quit [Ping timeout: 264 seconds]
<Exaeta> sorear: yeah. I'd have to write an LLVM backend probably.
<sorear> what about gcc
<sorear> there's a lot of stuff that doesn't like gcc and a lot of suff that doesn't like clang, you'll wind up needing both
<Exaeta> I don't really like GCC.
<sorear> do you feel like porting random projects to clang?
<Exaeta> First, writing a GCC backend would be harder than writing a clang backend.
<Exaeta> Clang is pretty decently documented.
<Exaeta> Also, I prefer clang's license.
<Exaeta> I used to be a fan of copyleft GPL, but then I realized... it doesn't matter.
<Exaeta> Projects like Mozilla firefox can be copyleft and still terrible.
<Exaeta> And stuff like Team Fortress 2 can respect your freedom and privacy even when they are closed source.
<awygle> wow so many things in the last four messages i don't agree with! i don't intend to argue the points but be aware your opinions are by no means universal lol
<Exaeta> Sure I can't edit the source code, but it's not bombarding me with ads and sending every web link I visit to Google.
<Exaeta> That's why I think Mozilla Firefox has become evil.
<sorear> as someone who has spent most of the past two years dealing with all of the shit that turns out to need small or large changes for __riscv, you do not want to repeat this work, especially as a one-person project
<Exaeta> First, I'm going to develop a new VM with a new architecture, and at the same time I will maybe experiment with FPGAs and see what they do well and don't do well.
<Exaeta> The VM will also let me test if my FPGA is giving the same results
<Exaeta> (this will help me figure out if bugs are porting bugs or verilog bugs)
<sorear> see, if you adopted riscv you'd eliminate most of the work :p
<Exaeta> sorear: Developing an instruction set is easy! Implementing it is the hard part.
<Exaeta> It would ruin the point if I don't have custom instructions.
<Exaeta> From time to time, I've encountered situations where data-structure X would be so much faster and better than data structure Y if only the CPU did this one thing quickly, but it doesn't.
<Exaeta> And also where's the fun in that? If I wanted a RISC-V CPU... I could buy one.
<sorear> that's the point, start with riscv and then add whatever suits your fancy
<sorear> as long as you don't remove anything, stuff that's been ported to riscv will still work
<sorear> and since there's so little, there's hardly any reason to remove anything
AlexDaniel has joined #yosys
<Exaeta> sorear: I'm also implementing a kind of odd ISA that's designed to be able to run quickly in a VM via JIT
<Exaeta> There is strong determinism, all instructions have fully defined behavior in all cases.
<Exaeta> Also, outputs don't conditionally write. They always write a deterministic output, with a few exceptions.
<sorear> let me know when you've read and can make sense of everything linked from http://inst.eecs.berkeley.edu/~cs152/fa16/
ZipCPU|Laptop has joined #yosys
<Exaeta> sorear: I am not that far into the design yet. But I know what most of that is already, with some exceptions.
<sorear> your question 53 minutes ago makes me not believe you at all
<Exaeta> sorear: there is no 53 minute old question
<sorear> 43
<Exaeta> yeah well I know a lot about intel chips in particular
<Exaeta> so I understand the general architecture of chips
<Exaeta> but not much about other chips
<Exaeta> hence why I didn't know if load/store architecture would be weird.
<Exaeta> I spent some time doing some very tiny optimizations.
<Exaeta> a certain amount of knowledge about how intel cpus worked came with that
<Exaeta> not necessarily implementation details, but what pipelining, cache, is, and how it behaves, etc.
<Exaeta> I don't think using RISC-V will make that part any easier though
SpaceCoaster has joined #yosys
<Exaeta> A course taught by a professor where topics are taught in a specific order isn't the only way to learn about CPUs
<sorear> of course, there are a lot of ways to learn, but until you admit that you have a lot left to learn you won't
<Exaeta> Of course I have a lot to learn, but there's not only one order to do things in.
<sorear> and I didn't say you have to use those slides, just that you have to be able to understand them. this is a test
<Exaeta> sorear: from what I can tell so far I understand them. I didn't know what a "snoopy cache" was in specific, but it's kind of obvious after reading the slides.
<ZipCPU|Laptop> sorear: As someone who has spent time making sure the compiler works for a new architecture, please allow me to underscore your opinions from earlier.
<ZipCPU|Laptop> Building a CPU is a lot of work. There are a *lot* of pieces to it. The compiler/toolchain is only one of them.
<ZipCPU|Laptop> There is something to be said for working on a project where others have already plowed the field for you. (i.e. RISC-V)
<Exaeta> I'm aware. I want experience with LLVM anyway though.
<sorear> by all means do it if you want to, but don't act like it's going to be the easy part
<ZipCPU|Laptop> I personally did not realize the amount of work involved when I started. After four months, I was excited to have a "working" CPU. Since I didn't have a compiler or proper assembler at that time ... you might argue about how "working" it ever was at the time.
<ZipCPU|Laptop> I'll also add on to sorear's comments by saying that, having read many articles and taken courses on CPU design, there were many parts I never really understood until I tried to build them.
<ZipCPU|Laptop> You sort of have to learn to be aware of the clock, to almost have an intuition for how much can be done in one clock tick, or how many logic elements are on a board (the 1k doesn't have much), and what you can do with them ...
<ZipCPU|Laptop> These don't come from a course.
ZipCPU|Laptop has quit [Ping timeout: 248 seconds]
Exaeta-mobile2 has joined #yosys
<Exaeta-mobile2> ZipCPU: figures. I can write an assmebler in C++ in a couple hours at most though :P
<Exaeta-mobile2> LLVM backend might be a bit harder, but we'll see.
Exaeta-mobile2 is now known as Exaeta-mobile
m_w has quit [Quit: Leaving]
m_w has joined #yosys
promach has joined #yosys
<promach> ZipCPU: For https://github.com/promach/UART/blob/development/rtl/test_UART.v#L150 , why am I having the following warnings ?
<promach> Warning: Identifier `\i_data_index[7]' is implicitly declared at ../rtl/test_UART.v:150.
<promach> Warning: Identifier `\i_data_index[6]' is implicitly declared at ../rtl/test_UART.v:150.
<promach> Warning: Identifier `\i_data_index[5]' is implicitly declared at ../rtl/test_UART.v:150.
<cr1901_modern> >It would ruin the point if I don't have custom instructions.
<cr1901_modern> Back- okay, there really is little point in designing your own ISA in 2018*. All it would be is "yet another load store arch with bit patterns swapped around and possibly different widths for pc-rel jumps".
<cr1901_modern> Just use the unallocated opcode part of riscv if you want to test custom insns. You still need to teach gcc/binutils about your new insns if you want high level code to generate them, but >>
<cr1901_modern> You already have that problem anyway if you make your own ISA.
<cr1901_modern> *ZipCPU, none of this applies to you, considering you actually did all the work required :P
<cr1901_modern> (That being said, I'd actually be interested in benchmarking a RISCV impl w/ custom instructions for things such as, e.g. msp430's or ARM's constant generation to see if this has a meaningful effect on performance.)
<sorear> it comes down to are you doing boring things with your ISA (difffernt addressing modes, bit patterns swapped around) or ~exciting things~ (it runs in ternary and doesn't have registers)
<cr1901_modern> ternary doesn't excite me. Nor do stack machines
<sorear> exciting in the sense of " 'can I write a compiler for this' is actually an open question"
<cr1901_modern> "Readable assembly language that doesn't take 10 insns to add 2 numbers from memory and store the result back in a third memory location" actually excites me
<cr1901_modern> Could be done in 1 line in 68k
<sorear> not in the sense of "this is actually going to advance the state of the art"
<sorear> er, vax can do that in 1, but doesn't m68k need at least 2
<cr1901_modern> m68k has a memory indirect mode
<sorear> m68k instructions only have two addresses
<cr1901_modern> Wait... yea I'm thinking something else, nevermind. Yea it requires 2 insns
<cr1901_modern> which is 5 times better than 10
<cr1901_modern> Hell even in 6502 it could be done in 3 insns, and that only has 1 addr per insn?
<cr1901_modern> lda $00
<cr1901_modern> sta $02
<cr1901_modern> adc $01
Exaeta-mobile has quit [Ping timeout: 240 seconds]
<awygle> since we're talking arches, what's the deal with POWER?
<awygle> esp. POWER9, i feel like people were psyched about it not that long ago
* cr1901_modern has no idea about power
<sorear> there's nothing inspiring about ppc64 per se, it's kind of interesting that IBM has a design team and fabrication resources for server CPUs without DRM bits
Exaeta-mobile has joined #yosys
mwk has quit [Ping timeout: 240 seconds]
mwk has joined #yosys
<Exaeta> So far this is what I have... https://www.docdroid.net/vY1FOEA/va64-1.pdf#page=3
Exaeta-mobile2 has joined #yosys
Exaeta-mobile has quit [Ping timeout: 276 seconds]
<Exaeta> just basic stuff... nothing on supervisor mode or such like that yet
<Exaeta> I guess I'll just wait until the 8K is delivered in the mail
<sorear> you've got a lot of undefined terms in here
<sorear> might be nice to have something somewhere that says what a "compression xor" is
ar3itrary has quit [Quit: No Ping reply in 180 seconds.]
ar3itrary has joined #yosys
<Exaeta> yeah I need to define that
<Exaeta> it's for finding square roots quickly
<Exaeta> and stuff like that
<Exaeta> basically
<Exaeta> should be compression or and not xor too >_>
<Exaeta> but
<Exaeta> 111000 compression-or -> 110
<Exaeta> 111000 compression-and -> 100
<Exaeta> sorear: are there any other terms I didn't define?
<sorear> lea rax, [rbx*2]; or rax, rbx; pext rax, qword ptr mask; ...; mask: dq 0x5555555555555555
<Exaeta> sorear: what
<sorear> it would be nice to know the actual size in bits of your registers and instructions
<Exaeta> Well. The instruction opcodes are unsizes
<Exaeta> *unsized
<sorear> Exaeta: how to do a 'compression or' in 3 x86_64+BMI2 instructions
<Exaeta> Basically, the input/outputs are sized
<Exaeta> But there are 64 64-bit registers, I think? Not sure if I'll be able to fit that many on the FPGA though
<Exaeta> Address space might be smaller if it wont fit
<Exaeta> and/or registers
<Exaeta> 64 registers, probably 64 bits each.
<Exaeta> Hum. I think bitfield extract and write seem useful though.
<Exaeta> I should add something along those lines
<Exaeta> sorear: the instruction opcodes are 8-bit for now
<Exaeta> sorear: then the arguments (I/O) are 1 byte each, 2 bits for size, and 6 bits to choose between the 64 bit registers, except for instructions that take something other than %i and %o arguments
<Exaeta> 00 - 8 bit, 01 - 16 bit, 10 - 32 bit, and 11 - 64 bit
<sorear> how do sizes work
<Exaeta> then the next 6 bits choose the register
<sorear> what is an "ordinary input channel"
<Exaeta> sorear: the argument opcode. not sure how to phrase that
<Exaeta> it only takes 1 argument but that's the mask of which registers to push
<Exaeta> so push %r1, %r2... wouldn't work
<sorear> what's a "page"
<Exaeta> sorear: haven't decided on page size yet.
<Exaeta> probably 4096 bytes
<sorear> are you familiar with the word "immediate operand"?
Exaeta-mobile2 has quit [Read error: Connection reset by peer]
<sorear> in at&t syntax % always denotes a register, immediates are $
<sorear> if push takes an immediate, you shouldn't describe it with a %
Exaeta-mobile has joined #yosys
<Exaeta> sorear: it takes a register
<sorear> it uses a register to select registers to push? eeew
<Exaeta> currently, the only instruction that can take an immediate is "load constant"
<Exaeta> sorear: the push instruction also writes the popmask to the stack, so you don't have to specify which registers to pop later.
<Exaeta> or wait no
<Exaeta> that's call
<Exaeta> Right now I am using 2 bits for sizes: 8, 16, 32, 64, but if I wanted to take immediates, I'd need 3...
<Exaeta> unless maybe there was a special register that was the immediate register
<Exaeta> that might work actually
<Exaeta> I can make r0 special and that will load an immediate.
<sorear> Exaeta: which registers are call-saved and call-used
<Exaeta> sorear: you have to pass a bitmask to the call instruction which saves whatever registers are in the bitmask
<Exaeta> so, everything is caller saved
<Exaeta> though, if someone wanted to use a different ABI, they could
<sorear> can ordinary instructions write %rip
<Exaeta> sorear: yes
<Exaeta> but it would probably cause a segfault
<Exaeta> or maybe they shouldn't be able to. hum
<Exaeta> wait why do I even have jump instructions
<Exaeta> Yeah, they definitely shouldn't be able to write to it.
<Exaeta> CPU will rely on jump instructions for optimization hints
<Exaeta> You know what. I shouldn't use a stack pointer
<Exaeta> Instruction pointer and stack pointer should not be general purpose registers.
<sorear> is this thing supposed to run C
_whitelogger has joined #yosys
<Exaeta> sorear: well yes?
<Exaeta> is there a problem with it
<sorear> what does "stack pointer should not be general purpose" mean
<Exaeta> I decided to remove the stack pointer from the register set
<Exaeta> there's now a stack descriptor
<Exaeta> and you have an instruction specifically for reading/writing to the stack
<sorear> extern void B(char *Y); void A() { char X[200]; B(X); }
<sorear> how is that compiled. where do you allocate X
<Exaeta> I added ssa - stack size adjust
<Exaeta> that would be
<Exaeta> a: tgt; ssa $200; call $B, $0; ret;
<Exaeta> how you pass Y is up to you
<Exaeta> but it could be like
<Exaeta> hum
<Exaeta> a: tgt; ssa $200, %r0; sub %r0, $200, %r0; call $B, $0; ret;
<Exaeta> added an argument after ssa
<Exaeta> to output the new size
<Exaeta> this assumes the callee reads %r0 for your argument
<sorear> but %r0 doesn't exist anymore, you just redefined it as an immediate escape :p
<Exaeta> well, sorta. I decided to make %r60-%r63 the immediates
<sorear> superscalar decode is going to be a challenge with the widely varying instruction lengths and supercalar issue a challenge with your indirect register accesses
<Exaeta> actually you know what
<Exaeta> that'd actually require a memory allocation or semi-complex code, since the stack will be segmented
<Exaeta> sorear: and now you know why there is a different opcode for a same-page jump and a cross page jump
<Exaeta> and also why there is a jump target instruction
<Exaeta> instructions can't cross page boundaries
dys has joined #yosys
<Exaeta> the ultimate goal is that when instructions hit L1 instruction cache it can optimize an instruction page in microcode to do a large number of instructions per cycle
<sorear> ...
<Exaeta> it sounds crazy?
<Exaeta> the idea is to chain several multiplexers together and if a dependency graph can be found then it would be able to run them together in the same cycle
<Exaeta> the idea there being that many simple bitwise operations could be chained together to implement stuff like floats in hardware
<Exaeta> software, I mean
<sorear> it sounds like a very weirdly scoped project
<sorear> what is your timeline for this
<sorear> also, uh, multipliers eat power
<sorear> s/multipliers/multiplexers/
<Exaeta> hum
<Exaeta> how much power?
<Exaeta> wait
<Exaeta> the yosys compiler doesn't implement latched logic
<Exaeta> I'm assuming that has something to do with it
<Exaeta> sorear: I don't have a timeline atm
<Exaeta> sorear: improved version
<Exaeta> there's one issue in that I want the stack to be segmented
<Exaeta> Hum
Exaeta-mobile2 has joined #yosys
Exaeta-mobile has quit [Ping timeout: 252 seconds]
GuzTech has joined #yosys
Exaeta-mobile has joined #yosys
Exaeta-mobile2 has quit [Ping timeout: 276 seconds]
GuzTech has quit [Ping timeout: 260 seconds]
cemerick_ has joined #yosys
pie__ has joined #yosys
ravenexp has quit [Quit: WeeChat 2.0.1]
ravenexp has joined #yosys
ravenexp has quit [Quit: WeeChat 2.0.1]
Exaeta-mobile has quit [Read error: Connection reset by peer]
Exaeta-mobile has joined #yosys
ravenexp has joined #yosys
ralu has quit [Ping timeout: 265 seconds]
mbock has joined #yosys
dys has quit [Ping timeout: 264 seconds]
dys has joined #yosys
ralu has joined #yosys
cemerick has joined #yosys
cemerick_ has quit [Ping timeout: 240 seconds]
cemerick_ has joined #yosys
cemerick has quit [Ping timeout: 264 seconds]
kc8apf__ has joined #yosys
TFKyle_ has joined #yosys
lansiir has joined #yosys
TFKyle has quit [Ping timeout: 264 seconds]
mbock has quit [Ping timeout: 264 seconds]
oldtopman has quit [Ping timeout: 264 seconds]
kc8apf has quit [Ping timeout: 264 seconds]
lansiir has joined #yosys
lansiir has quit [Changing host]
mbock has joined #yosys
Exaeta-mobile has quit [Read error: Connection reset by peer]
Exaeta-mobile has joined #yosys
cemerick_ has quit [Ping timeout: 256 seconds]
GuzTech has joined #yosys
pie__ has quit [Ping timeout: 260 seconds]
AlexDaniel has quit [Changing host]
AlexDaniel has joined #yosys
promach has quit [Ping timeout: 248 seconds]
Marex_ is now known as Marex
cfelton has joined #yosys
mbock has quit [Quit: Leaving.]
pie__ has joined #yosys
leviathan has quit [Remote host closed the connection]
leviathan has joined #yosys
pie__ is now known as pie_
<Exaeta> maybe I'll give up on segmented stacks
<Exaeta> will make it easier to write code for
AlexDaniel has quit [Ping timeout: 252 seconds]
cemerick_ has joined #yosys
m_t has joined #yosys
pie_ has quit [Read error: Connection reset by peer]
pie__ has joined #yosys
sklv has quit [Remote host closed the connection]
sklv has joined #yosys
sklv has quit [Remote host closed the connection]
sklv has joined #yosys
cemerick_ has quit [Ping timeout: 260 seconds]
leviathan has quit [Remote host closed the connection]
Marex has quit [Remote host closed the connection]
Marex has joined #yosys
Exaeta-mobile2 has joined #yosys
Exaeta-mobile has quit [Ping timeout: 265 seconds]
eduardo__ has joined #yosys
eduardo_ has quit [Ping timeout: 252 seconds]
maartenBE has quit [Ping timeout: 276 seconds]
maartenBE has joined #yosys
cemerick_ has joined #yosys
GuzTech has quit [Ping timeout: 240 seconds]
cemerick has joined #yosys
cemerick_ has quit [Ping timeout: 256 seconds]
cemerick has quit [Ping timeout: 240 seconds]
sklv has quit [Ping timeout: 255 seconds]
sklv has joined #yosys