#yosys on 2017-10-20 — irc logs at freenode.irclog.whitequark.org

2017-10-15 10:00 clifford changed the topic of #yosys to: Yosys Open SYnthesis Suite: http://www.clifford.at/yosys/ -- Channel Logs: https://irclog.whitequark.org/yosys

00:03 beefok has quit [Client Quit]

00:17 beefok has joined #yosys

00:18 <beefok> is there a specific ice40 irc channel?

00:19 <ZipCPU|Laptop> beefok: Not that I know of. Is there something specific you are looking for?

00:19 <cr1901_modern> ZipCPU|Laptop: : At least in the context of _temporal_ induction I think it's a better explanation. Reconciling temporal induction w/ "the induction I (was supposed to) learned in HS" >>

00:20 <cr1901_modern> is a question I've been meaning to ask on MathOverflow, tbh

00:20 <beefok> I'm using the iceCube2 tools, so I'm dealing with the Synplify Pro tools and I'm getting these odd errors

00:20 <beefok> (Thanks for the quick response)

00:20 <ZipCPU|Laptop> Can you post the errors in a gist and post the link to the gist?

00:21 <beefok> Yeah, sec! It's one error multiple times over, so that simplifies it lol

00:21 <beefok> For instance: :FX689 : cpu_next.vhd(64) | Unbuffered I/O u3.ma[12] which could cause problems in P&R

00:22 <ZipCPU|Laptop> Hmm ... ok ... can you post your code at all?

00:22 <beefok> there's 332 of these errors, for literally every in and out of each entity that aren't top level in my module

00:23 <beefok> Yeah that is pretty vague isn't it, sorry, sec

00:23 <ZipCPU|Laptop> Is your code relatively simple, or extremely complex?

00:23 <beefok> It's a full CPU design, but it's really not that complex

00:23 <ZipCPU|Laptop> Really? I'm interested --- which CPU?

00:23 <beefok> my own design :)

00:23 <ZipCPU|Laptop> Even better!

00:24 <beefok> I'm designing a video game console and every portion of it is FPGA based

00:24 <ZipCPU|Laptop> I love it!

00:24 <ZipCPU|Laptop> Let me know if there's anything I can do to encourage you.

00:24 <cr1901_modern> thoughtpolice: Ahhh, so _that's_ who you are... https://twitter.com/stdlib

00:24 <ZipCPU|Laptop> <shameless plug> Have you met my blog, zipcpu.com? </shameless plug>

00:25 <beefok> awesome, I think I have heard of zipcpu :D

00:25 <beefok> I could push everything up to my bitbucket in a sec

00:25 <beefok> I've been doing everything from scratch as a learning process

00:25 <thoughtpolice> cr1901_modern: I've been found!!!

00:26 <beefok> even did my own boards :D

00:26 <beefok> <shameless plug two> https://bitbucket.org/beefok/gameseed/wiki/

00:26 <beefok> the cpu design is old -- I've changed it since then

00:27 <beefok> it's all in deep WIP mode

00:27 <cr1901_modern> Saw your ABC tweet on my feed, thought "jeez, frequency illusion is _really_ hitting hard this week wrt "ppl doing formal verification""... then I realize "wait, doesn't someone in #yosys use Clash?"

00:28 <ZipCPU|Laptop> A 12-bit register machine?

00:28 <thoughtpolice> Well, I've been doing semi-formal methods for a while as a Haskeller. Just in a fairly different domain. (I used ABC for model checking functional cryptographic code previously, for example, so I'm somewhat familiar with it.)

00:28 <ZipCPU|Laptop> Your brave, beefok.

00:29 <beefok> yeah, haha

00:29 <beefok> It's a good middle between 8-bit and 16-bit

00:29 <beefok> it's odd, I know

00:29 <cr1901_modern> I don't even know what ABC really does other than "AIG means And-Inverter Graph"

00:30 <beefok> zip - https://bytebucket.org/beefok/gameseed/raw/7f5107ab78af552a3d336296d212d83faaed1ba4/gameseed_rev3_pcb.png

00:30 <beefok> assembled :D

00:30 <beefok> I wrote a little ntsc video generator for color - https://bytebucket.org/beefok/gameseed/raw/7f5107ab78af552a3d336296d212d83faaed1ba4/video_ntsc_color2.jpg

00:31 <beefok> anyway this is just playing around until I get the cpu done

00:31 <cr1901_modern> "Doing NTSC from discrete analog components" is a bucket list idea of mine

00:31 <beefok> it uses a cp2130 usb->spi interface for programming, I didn't realize how easy it would be

00:31 <ZipCPU|Laptop> Your not as odd as you might think. The ZipCPU originally was a 32-bit byte machine.

00:31 <ZipCPU|Laptop> I had no end of fighting with GCC over that issue.

00:31 <beefok> I think that makes more sense anyway

00:32 <ZipCPU|Laptop> Nice pictures, though.

00:32 <beefok> cr1901 -- it's oddly not too bad, though mine is all digital except a r2r dac + ntsc burst rate

00:32 <beefok> thanks!

00:33 <cr1901_modern> By burst rate, you just mean "the clock that generates colorburst isn't on the FPGA"?

00:33 <beefok> I have been scared of the idea of creating a gcc backend or llvm.. or

00:33 <thoughtpolice> cr1901_modern: That's mostly what you need to know about it! :D AIGs are simply an efficient representation for representing circuits in some problems, especially for boolean functions ("Any function that just outputs a 1 or a 0 for some input"). And that's good for formal verification, for example, to check if two circuits are equivalent: just create a boolean function f(x) = g(x) == h(x), which checks if two functions 'h'

00:33 <thoughtpolice> and 'g' give the same output for every input.

00:34 <beefok> yeah cr1901, or at least my fpga is clocked by the NTSC burst rate x 8, and then a pll doubles that so I can get 16 colors lol

00:34 <cr1901_modern> So basically like the NES

00:34 <beefok> exactly!

00:34 <beefok> the color generation is a johnson counter exactly like the NES

00:34 <thoughtpolice> You can represent that as a SAT problem, i.e. "is there any assignment to `x` where h(x) != g(x)". Connecting the output of two circuits and seeing if they're equivalent is a "miter" circuit (so miter circuits represent SAT problems, in a sense.)

00:34 <cr1901_modern> Each of the 16 colors is a different clock phase in the 3.58MHz signal

00:35 <beefok> yep

00:35 <cr1901_modern> And presumably you mux them to choose which one to show

00:35 <beefok> exactly

00:35 <beefok> works pretty cleanly

00:35 <thoughtpolice> (That's how I used ABC for cryptography: I compile two different specifications, one in C and another that is very high level, to circuits, and then connect them with a miter and throw it in a solver)

00:35 <cr1901_modern> miter?

00:37 <beefok> zipcpu - I'm looking at SmallerC by alexfru as a way to get a system language on my system

00:37 <thoughtpolice> You can think of two circuits F and G, and a miter is just a circuit that does 'G(x) == F(x)'. So it's a circuit that just takes the output of two other circuits, and outputs 1 if they're the same and 0 if they differ.

00:38 <cr1901_modern> Ahhh... there's prob a way to combine that with temporal induction to do equivalence checking

00:43 <thoughtpolice> (FWIW, In the cryptography case there's not really a notion of sequential logic I had to deal with, really, you can formulate *that* problem purely as one of combinational logic, so in that case equivalence checking does not require lots of tricky stuff. So if you have combinational, circuits you can just take two, make a miter of them, and throw that directly to a SAT solver with very little effort, mostly)

00:43 <cr1901_modern> That's why I was thinking of temporal induction :P

00:44 * cr1901_modern wishes he had 1642 followers, but that would imply being a FP wizard, which is _not_ going to happen

00:44 <thoughtpolice> Pretty useful trick in practice, though. I used it for equivalence checking but also doing things like finding hash collisions...

00:44 <thoughtpolice> (For reduced-round hash functions)

00:44 <qu1j0t3> cr1901_modern: there are other ways to get there. Do you already post cat/dog pics?

00:45 <qu1j0t3> cr1901_modern: i'd give you a pity follow but i think i already do follow you.

00:45 <cr1901_modern> qu1j0t3: https://www.youtube.com/watch?v=-PMOJwsH2pU

00:46 pie_ has joined #yosys

00:46 <cr1901_modern> And yes you already do

00:46 * pie_ lurks

00:46 <cr1901_modern> (btw, I don't follow more ppl b/c 286 is approx my Dunbar's Number on Twitter)

00:46 <qu1j0t3> awww cr1901_modern

00:47 <qu1j0t3> the audio on that sounds like a moth caught in a lightshade

00:47 <cr1901_modern> She's the Cutest Cat In The World :3. She just "showed up" one day, Dad fed her, and she never left :D

00:47 <qu1j0t3> she's lovely

00:47 <cr1901_modern> She's purring ._.

00:47 <qu1j0t3> <3.<3

00:47 <qu1j0t3> pie_: welcome to the cat video channel today we are reviewing cr1901_modern | qu1j0t3: https://www.youtube.com/watch?v=-PMOJwsH2pU

00:48 * pie_ ogles

00:48 <cr1901_modern> 10/10 for Cuteness

00:48 <pie_> https://www.youtube.com/watch?v=SaA_cs4WZHM

00:48 <cr1901_modern> 8/10 for Demeanor (she swats)

00:49 <qu1j0t3> nobody's perfect!

00:49 <qu1j0t3> i'm like a 3/10 for demeanor myself.

00:49 <cr1901_modern> But do you swat and bite?

00:50 * qu1j0t3 snerks

00:51 uelen has joined #yosys

00:57 beefok has quit [Quit: Page closed]

01:08 pie___ has joined #yosys

01:08 pie_ has quit [Remote host closed the connection]

01:10 <awygle> I am currently waiting to pick up my two new cats in the next like, five minutes

01:22 <promach> for yosys-smtbmc, what is the difference between smtc file and tpl file ?

01:22 <promach> both are used to describe assertions from what I can observe

01:40 ZipCPU|Laptop has quit [Ping timeout: 248 seconds]

03:09 eduardo__ has quit [Remote host closed the connection]

03:39 captain_morgan has quit [Remote host closed the connection]

03:39 captain_morgan has joined #yosys

03:39 mbuf has joined #yosys

04:18 uelen is now known as uelenbot

04:51 m_w has quit [Quit: leaving]

05:29 proteusguy has quit [Remote host closed the connection]

05:57 pie___ has quit [Read error: Connection reset by peer]

05:57 pie_ has joined #yosys

06:07 aw- has joined #yosys

06:19 m_t has joined #yosys

06:46 pie_ has quit [Ping timeout: 260 seconds]

07:18 leviathanch has joined #yosys

07:34 FabM has quit [Ping timeout: 240 seconds]

07:39 <promach> ZipCPU: what do you understand about the assertion input mechanism for cycle3 example ?

08:26 dys has joined #yosys

08:57 kmehall has quit [K-Lined]

08:57 _whitelogger has quit [K-Lined]

09:04 _whitelogger has joined #yosys

10:02 nrossi has joined #yosys

10:23 <promach> I have added "-wires" to the makefile command, yet I could not see the internal wires in the vcd file. What is wrong ?

10:46 aw- has quit [Quit: Leaving.]

10:49 proteusguy has joined #yosys

10:58 FabM has joined #yosys

11:21 sunxi_fan has quit [Read error: Connection reset by peer]

11:41 mbock has joined #yosys

12:36 mbuf has quit [Quit: Leaving]

12:55 pie_ has joined #yosys

13:05 <ZipCPU> promach: While I've taken the tools out for a drive, I haven't lifted the hood.

13:17 mbuf has joined #yosys

13:21 <promach> ZipCPU: huh ?

14:23 azonenberg_work has quit [Ping timeout: 240 seconds]

14:34 m_t has quit [Quit: Leaving]

15:30 mbock has quit [Quit: Leaving.]

15:42 nrossi has quit [Quit: Connection closed for inactivity]

15:52 mbuf has quit [Quit: Leaving]

16:05 <cr1901_modern> It might still be worth writing my blog post, but taking a more concrete approach...

16:06 <cr1901_modern> The most important "wall" for me getting started was I wasn't sure if I could trust the results until I understood what was going on under the hood

16:07 <ZipCPU> cr1901_modern: Does this mean that you can answer promach's question? I have no idea where to start.

16:08 <cr1901_modern> I'd have to see the example code to answer

16:28 <ZipCPU> I'm moving on to "proving" that the ZipCPU prefetches work. (I've got three ...) These will have to prove that the WB bus acts in a "reasonable" manner.

16:46 <cr1901_modern> thoughtpolice: Very strongly disagree w/ this: https://twitter.com/stdlib/status/921405564638375936 The ppl who think Haskell is simple are the ones who already know it

16:46 <ZipCPU> lol

16:47 <cr1901_modern> I think the perception should change from "Haskell is easy" to "Haskell _is_ in fact difficult, I'm willing to help you work through it"

16:47 <thoughtpolice> cr1901_modern: this is not the place to discuss it. But I do elaborate further in any case; I view this mostly as a failure on our part for numerous reasons

16:47 <thoughtpolice> You’re more than free to @ me of course :)

16:48 <cr1901_modern> Not in the mood to get 10 billion replies

16:49 <thoughtpolice> (Also to be fair, my twitter is like 95% shitposting/dumb jokes like that so you should read everything there with a large grain of salt. Or a tall drink.)

16:49 <cr1901_modern> I'm currently trying to _prune_ my following list, tbh :P

16:50 <thoughtpolice> cr1901_modern: But anyway, the joke was more that "If smart people fail at it but dumb people like me succeed at it, there's clearly some structural barriers inhibiting them, which is mostly our fault". I'd actually agree being more up-front about some difficulty is a good, honest thing to do! We sugar coat it a bit.

16:50 <thoughtpolice> I've been in the community for nearly a decade so I'm fairly frank and in tune with a lot of those vibes...

16:51 <cr1901_modern> I don't know if, for instance, formal verification is hard or I just had to see it presented a different way. I do know that it took me a while before I found the correct resources to start getting it.

16:51 <thoughtpolice> But it's a multi-faceted thing. Some pedagogy, some structural issues, perception, some technical issues, etc etc. Small cuts add up.

16:53 <cr1901_modern> So, my main oversimplified reason for not being a huge FP fan is I don't care for it's reliance on a huge RT (read: GC)

16:53 <thoughtpolice> Formal verification is largely a field that in some ways is awakening from a long age of "near-complete irrelevance". I mean, it wasn't entirely irrelevant. But it wasn't nearly as accessible as it is now. Which is still pretty bad.

16:53 <thoughtpolice> Really bad, in fact.

16:53 <cr1901_modern> (This is not a reflection on clifford's and other's previous presentations or blog posts, btw. In retrospect, I understand his presentations fine.)

16:53 azonenberg_work has joined #yosys

16:54 <cr1901_modern> But I definitely _did_ need to see it a different way before it clicked

16:54 <cr1901_modern> And that "different way" was essentially "take clifford's examples from his 2016 talk on yosys-smtbmc, examine the smt2 output, and change some asserts/assumes, and see what happens"

16:55 <cr1901_modern> Essentially, this: https://twitter.com/thepracticaldev/status/720257210161311744?lang=en

16:56 <cr1901_modern> (Everybody should have a copy of the Kitten Book)

17:11 <awygle> cr1901_modern: this is one reason I'd like to see your blog post as well as ZipCPU's. You clearly have a different perspective (see the total lack of smt2 in Dan's post), and I'd like to see both (or ideally more than two)

17:13 <awygle> I too have a hard time learning anything without first convincing myself that the lower stack layers are sound

17:14 <cr1901_modern> awygle: https://gist.github.com/cr1901/a445ef31281e67a0cf286e149deaac41 This should help you get started then. It's most of my notes for the blog post

17:16 <qu1j0t3> cr1901_modern: it need not rely on a 'huge RT'. see Feeley's work on Scheme, for instance

17:17 <qu1j0t3> cr1901_modern: or TIL/ML

17:20 dys has quit [Ping timeout: 246 seconds]

17:21 <cr1901_modern> qu1j0t3: Which flavor of ML?

17:21 <thoughtpolice> qu1j0t3: Anything by Marc Feeley is worth looking at, to be fair.

17:21 <cr1901_modern> I mean, OCaml's pretty daunting

17:22 <cr1901_modern> (and what is TIL?)

17:22 <qu1j0t3> thoughtpolice: yeah

17:23 <qu1j0t3> cr1901_modern: TIL/ML was a research project on safe low level functional programming. It's a little hard to find, but worth reading. It's not a mature off the shelf thing, unfortunately. but i don't think there's anything inherent in FP that prevents it pushing to low levels. (that's the message of most research, it seems to me)

17:24 <qu1j0t3> and that research is ongoing, of course. Linear types for example

17:25 <cr1901_modern> Basically I want a "C alternative" that's portable to my vintage machines. I'm willing to relax the "no GC" requirement, but >>

17:25 <thoughtpolice> If you restrict your domain enough you can even do without things like garbage collection entirely. If you design it right.

17:25 <cr1901_modern> AFAIK, Haskell inherently requires a GC for its more powerful features

17:25 <cr1901_modern> though a reigon-based memory manager is in the works

17:26 <thoughtpolice> (e.g. Ur/Web, which can generate a GC-less web server from high level ML programs that use less RAM than bash with no GC, due it its restricted domain)

17:26 <qu1j0t3> i think over time we will see better solutions in this area, because we're always going to be able to do better static analyses in future, which benefits all targets. (plus the continual invention)

17:27 <cr1901_modern> I'm willing to relax the "no GC" requirement, but it needs to be something I can implement myself if no impls exist on my desired target

17:27 <cr1901_modern> (No Forth)

17:27 <thoughtpolice> cr1901_modern: There is no region based memory manager in the works. Such a change would be fairly invasive, but also, "region based" or "stack based" allocation makes a lot less sense in that context anyway (I used to work on the major Haskell compiler, so I'm quite familiar with it.)

17:28 <thoughtpolice> Mostly because the notions of "stack" and where it exists are different. (The stack actually exists... on the heap!)

17:28 <cr1901_modern> thoughtpolice: Well I mean, not all archs have a "stack". C would have to emulate them using linked lists on those archs

17:29 <cr1901_modern> thoughtpolice: Also: https://twitter.com/mrkgnaow/status/904000942315511809

17:29 <thoughtpolice> Linear types are something very different than region based memory management.

17:30 <cr1901_modern> I managed to combine them into one concept through a large amount of confusion ._.

17:30 <cr1901_modern> I thought "oh cool, linear types, that must imply region based"

17:31 <cr1901_modern> thoughtpolice: ""region based" or "stack based" allocation makes a lot less sense in that context anyway" Sorry, in what context?

17:32 <thoughtpolice> The context of something like GHC's implementation, I mean.

17:32 <thoughtpolice> In some other Haskell compiler it would make more sense, maybe. In GHC's design it would be... weird, I think.

17:33 <cr1901_modern> I see... well in any case, I'm willing to relax the GC requirement (though I would need a "don't GC here I know better than you!" command).

17:33 <thoughtpolice> (In fact there was a Haskell compiler that relied entirely on a region based memory management system at first, but they eventually reneged and added a GC for the general cases it couldn't handle. The compilation model was very very different, so this kind of decision was possible)

17:34 <cr1901_modern> hrm

17:34 <thoughtpolice> (This Haskell compiler also, coincidentally, produced kick-ass pure ISO C99 programs, that in some cases outperformed handwritten C benchmarks :)

17:34 <thoughtpolice> (When it worked. Which it did not always do.)

17:34 <cr1901_modern> Yea, I know... you can also write OCaml that outperforms C b/c of the assumptions OCaml is allowed to make that C can't

17:35 <cr1901_modern> I'm most interested in portability and "ease of rolling your own impl or porting a compiler if a compiler doesn't already exist"

17:35 <thoughtpolice> That's the theory but doing it in practice is immensely difficult! It's quite a different thing to see it actually work. :)

17:36 <cr1901_modern> And that includes my "vintage machines where LLVM is prob a poor fit"/

17:36 <thoughtpolice> You can take some other tricks though, like metaprogramming to generate code, which is what some FP people do alternatively.

17:36 <thoughtpolice> ("Why write a fast FFT when I can write a program to generate a fast FFT for my specific case?")

17:36 <cr1901_modern> B/c I'm lazy and I don't like coding

17:36 <thoughtpolice> That's a popular approach that's been used several times in OCaml for example, like FFTW

17:38 m_t has joined #yosys

17:40 m_w has joined #yosys

17:42 <cr1901_modern> thoughtpolice: So my issue with GC is that 1. It becomes nearly impossible to reason temporally about how long a hot code snippet will take, since the working set

17:42 <cr1901_modern> of memory is always in flux. At least w/ cache/manual allocation after a few loops the working set will become reasonably stable.

17:43 <cr1901_modern> And 2. It's just not easy to write a good one

17:50 <awygle> cr1901_modern: curious, what makes you think llvm is a bad fit for vintage platforms?

17:51 <cr1901_modern> awygle: LLVM wants code to be swapped around in registers; on, say, 6502, you have... exactly three of those

17:51 <cr1901_modern> awygle: Actually give me a minute, had a long discussion about this some months back

17:54 <cr1901_modern> awygle: https://twitter.com/cr1901/status/858770699946795008

18:07 <awygle> cr1901_modern: interesting.

18:09 <qu1j0t3> yeah register poverty is a challenge for some compilers. i definitely had this issue on lcc.

18:10 <qu1j0t3> cr1901_modern: done much with sdcc (or vbcc)?

18:10 <awygle> sdcc maybe?

18:10 <awygle> Lol

18:10 <qu1j0t3> :)

18:11 <awygle> I guess you'd need sdllvm

18:15 <ravenexp> register poverty is not an issue for forth compilers

18:15 <ravenexp> they only need 2 or 3 of those

18:15 <ravenexp> let's rewrite the world in forth

18:34 <thoughtpolice> I think I used 4 or 5 in mine. Horribly bloated.

18:34 <cr1901_modern> qu1j0t3: vbcc doesn't support a feature I frequently use (multiline strings to be concatenated)

18:34 <cr1901_modern> It is thus a non-compliant ANSI C compiler and ANSI C is the bare minimum I support in my code :D

18:35 <ZipCPU> Did anyone ever read my count of how many registers other processors had? Most officially have 32, of which they can use about 24. RISC-V has another 66+ special purpose registers. OpenRISC has 65+ special purpose registers. LM32 has 10 special purpose registers, microblaze 25, NiOS 6.

18:35 <ZipCPU> Wow.

18:36 <cr1901_modern> sdcc doesn't support "structs as input args", though AFAICT, that's not at the parser level

18:39 <cr1901_modern> "RISC-V has another 66+ special purpose registers" Oh ffs

18:39 <cr1901_modern> ZipCPU: In my case I specifically meant just registers used as part of calculations

18:40 <ZipCPU> For that, most of the CPU's I examined declared that they had 32, and then artificially restricted the number they actually used to something closer to 24.

18:41 <thoughtpolice> ZipCPU: I noticed ZipCPU is suspiciously missing in that list :P

18:41 <cr1901_modern> ZipCPU would have no registers if he could feasibly implement it :)

18:41 <ZipCPU> No, it was presented. The ZipCPU has 1 special purpose register per mode, or two total.

18:41 <cr1901_modern> And 0 instructions

18:41 <thoughtpolice> Oh, I meant just in IRC just then. :)

18:41 <ZipCPU> Meh ... not quite. I've seen smaller forth/stack based CPU's.

18:42 <ZipCPU> As for general purpose registers, the ZipCPU has 14 per mode, for a total of 28. In reality, you can only use about 14 at a time.

18:43 <cr1901_modern> I'm not a huge fan of the "shadow register" approach to interrupts; A. what happens during nested exceptions? B. In practice, I find code using them (z80) confusing

18:45 <ZipCPU> Understood completely. The ZipCPU solves A by not allowing nested interrupts, and B, because of the shadow registers you don't really need assembly code to create an interrupt handler.

18:45 <ZipCPU> You can do it all in C.

18:46 <cr1901_modern> Doesn't matter if a *nix port isn't in your plans, but I can't imagine general purpose OSes would like that restriction

18:46 <ZipCPU> Well, reading the Linux device drivers manual *was* one of the reasons for only allowing a single interrupt level.

18:47 <cr1901_modern> lol

18:48 <ZipCPU> I mean ... seriously ... if no one's going to use it, then why support it?

18:49 <cr1901_modern> Probably breaks a device driver or two? And yes, I don't use nested ints either except to indicate something went horribly wrong

18:49 <cr1901_modern> still think it's a good idea to indicate that condition

18:52 <awygle> Huh, what are these 66+ special purpose registers in risc v? I haven't made it all the way through my book yet but so far I've only seen one (the pc)

18:53 <ZipCPU> I wasn't counting the PC.

18:53 <ZipCPU> Keep digging.

18:53 <ZipCPU> Oh, and as I recall, the manual allows for 500+ special registers, they just don't all need to be implemented.

18:53 <awygle> I am reading a chapter a night so I'll get back to you in about two weeks :-P

18:53 <ZipCPU> That's why I just say 66+.

18:56 <cr1901_modern> ZipCPU would be interesting to add to MiSoC/LiteX (I plan to do my own RISCV impl first)

18:57 <ZipCPU> Yeah, I think it would.

18:57 <ZipCPU> Be aware, though, the ZipCPU only supports the wishbone pipeline mode used in the B4 spec, not the more common B3 spec protocols.

18:58 <cr1901_modern> Oh, well I believe MiSoC peripherals have to support standard mode and that's it

18:59 * cr1901_modern would have to see a block diagram of ZipCPU's caches and WB bus to figure out whether it's feasible

18:59 <ZipCPU> Oh I'm sure it's feasible ...

18:59 <ZipCPU> The cache's don't change the interface any either, so they don't show up on any of my diagrams.

19:00 <cr1901_modern> Ahhh, is it like lm32 where "caches are internal to the core", and "the WB bus connects to the caches"?

19:00 <awygle> Needs a WB 3-to-4 bridge

19:01 <cr1901_modern> (lm32 reserves the higher 2GB for "uncached access to I/O")

19:01 <ZipCPU> Not quite.

19:01 <ZipCPU> Caches are internal to the core, yes.

19:01 <ZipCPU> But the CPU has only one external port to the external wishbone bus.

19:02 <ZipCPU> This needs to run through a bridge, if you wish to connect it to WB-3 components.

19:02 <cr1901_modern> I see...

19:02 <cr1901_modern> ZipCPU: A LiteX is more likely to occur, b/c the maintainer of MiSoC has not been open to changes recently

19:02 <ZipCPU> On the other hand, WB B3 runs at least 3x slower than B4, *and* it slows your overall clock down as well.

19:02 <cr1901_modern> well open to changes much*

19:02 <ZipCPU> So ... I wouldn't connect to B3 components if I didn't have to.

19:03 <cr1901_modern> I don't know what spec MiSoC implements. I don't even think lm32 implements pipelined mode b/c ideally it's "talking to the cache" most of the time

19:03 <thoughtpolice> awygle: There are just a lot of RISC-V CSRs, basically. The base set has only a few but the extensions add a bunch

19:04 <cr1901_modern> Wishbone is... not my favorite. I find the spec very difficult to grok and ambiguous in some places (particularly wrt to widths/granularity)

19:04 <ZipCPU> Yeah, see ... it makes no sense to connect a "cache" to the outside separate from the CPU. The cache should be integrated into the CPU, leaving only a standard bus to connect.

19:04 <thoughtpolice> The variable width Vector Extensions ("V" extension) add a ton of CSRs just on their own, because the vector unit has to be configured properly.

19:04 <ZipCPU> cr1901_modern: When comparing WB to other busses, AXI, Avalon, etc., I've always found the WB easier to understand. However, the WB lm32 implements is ... a different WB beast from what I'm doing.

19:05 <cr1901_modern> I don't understand...

19:05 <ZipCPU> Go on.

19:05 <rqou> "the maintainer of MiSoC has not been open to changes recently"

19:05 <rqou> you mean sb0?

19:06 <cr1901_modern> Yes, but not everyone knows who he is

19:06 <rqou> hmm, i didn't expect that he wouldn't want changes

19:07 <cr1901_modern> What I'm getting at is "sb0 doesn't really want to extend MiSoC or Migen unless it's a new platform or something that is essentially zero maintaince burden or zero invasive changes"

19:07 <rqou> ah that makes a lot more sense

19:07 <cr1901_modern> ZipCPU: I mean, lm32 and ZipCPU both implement the wb spec

19:07 <cr1901_modern> why would they be totally different beasts? Does lm32 completely violate the spec?

19:07 <ZipCPU> Yes, but lm32 depends on a large number of "optional" registers.

19:08 <ZipCPU> ZipCPU stripped the bus back down to the bare minimum.

19:08 <cr1901_modern> by registers you mean "optional bus signals?"

19:08 <ZipCPU> The spec allows for such things as (IIRC) a cycle type indicator (CTI), a burst length indicator, etc.

19:08 <ZipCPU> That's one difference.

19:09 <cr1901_modern> Idk if those signals are used in practice in MiSoC

19:09 <ZipCPU> The other difference I've wrestled with has to do with the bus reset line.

19:09 <ZipCPU> ZipCPU allows the reset to be asserted at any time during a transaction. lm32 insists one ACK or one RESET response to every request.

19:10 <ZipCPU> My problem with the RESET line is that in my implementations, RESET is asserted by the bus interconnect--not the peripheral.

19:10 <ZipCPU> The interconnect often doesn't know how many requests are outstanding.

19:10 <ZipCPU> You get the picture.

19:11 <ZipCPU> I either need to upgrade my interconnects, or ... continue to do things subtly different.

19:11 <cr1901_modern> But... in MiSoC, the peripherals _don't_ assert the reset line

19:11 <ZipCPU> On the other hand ... I doubt the CPU cares if the bus interconnect is done "better", such as the lm32 would use.

19:11 <cr1901_modern> there is a clock-reset-generator IP specifically for this

19:11 <ZipCPU> cr1901_modern: That's a good start, but consider the following scenarios:

19:12 <ZipCPU> 1) You access the last address of the flash. The controller accepts the request, and starts working.

19:12 <ZipCPU> 2) on the next clock, you access non-existent memory. The interconnect creates and returns an error to the CPU before the flash returns the error.

19:12 <ZipCPU> 3) Sometime later the flash generates a response ...

19:13 <ZipCPU> Currently, I deal with that by terminating the entire bus transaction--hence the "flash read" request would fail.

19:13 <ZipCPU> I leave it to the programmer not to cross devices--something which is a subtle bug in the WB spec.

19:14 <cr1901_modern> Don't you have to _wait_ for the flash to generate a response before accessing the next memory location?

19:14 <cr1901_modern> it should be holding ACK low until the flash is ready

19:15 <cr1901_modern> (or is that the point of pipelined mode that "bus transactions can happen out of order"?)

19:15 <ZipCPU> See ... that's what I like about B4 ... you can keep making requests even if the peripheral is still working on the first one.

19:15 <ZipCPU> Bus transactions are not supposed to be able to happen "out of order".

19:16 <ZipCPU> It just works in a pipeline fashion ... requests go into a pipeline at one speed, acks come out later after (potentially) being delayed by many clocks.

19:16 <cr1901_modern> Then why bother making transactions if the first one isn't finished? Sure, you can store them in a queue, but unless that queue gets flushed, it will eventually get full

19:16 * cr1901_modern is afk for now, sorry

19:17 <ZipCPU> In the middle of a conversation?

19:17 <ZipCPU> Yes, they get stored in a queue--spread throughout your device as timing demands.

19:17 <rqou> does your CPU take a precise exception when a bus transaction errors?

19:17 <ZipCPU> If the queue ever fills, you then need to stall the bus master so it doesn't make any more requests.

19:18 <ZipCPU> rqou: Excellent questions! The answer: Not currently.

19:18 <ZipCPU> On the other hand, I don't support virtual memory (yet).

19:18 <rqou> I figured :P

19:18 <ZipCPU> With virtual memory, I'll have to go back and implement precise exceptions.

19:19 <rqou> so precise page faults, imprecise bus aborts?

19:19 <ZipCPU> I'll probably make them both precise.

19:19 <ZipCPU> That means I'm going to need to upgrade my interconnect too ... doable, just haven't done it.

19:21 <rqou> hmm, is there a wishbone b3->b4 changelog?

19:21 <ZipCPU> The B4 spec discusses the differences between what it calls "classic" mode and "pipelined" mode. They are easy enough to convert between.

19:22 <ZipCPU> The real performance hit takes place when accessing, say, DDR3-SDRAM.

19:22 <ZipCPU> The MIG gives you a latency of about 20 clocks or so.

19:22 <ZipCPU> Now, if you use WB B3, your cost will be 20 clocks *for* *every* *word* *read*.

19:23 <ZipCPU> Using WB B4, your cost can be 20 clocks plus the number of items read, N+20 vs 20N

19:23 <ZipCPU> That's a *BIG* performance difference.

19:23 <rqou> hmm, i thought wb b3 also had a burst mode?

19:24 * ZipCPU opens up his spec ...

19:25 <rqou> there's a block transfer mode

19:25 <ZipCPU> It has a block read/write mode ... is that what you are talking about? That mode has the problem I just described--you can't move on to the next request until the last one is returned.

19:26 <rqou> right, but the next request can return immediately if it was indeed for the next address

19:26 <ZipCPU> No.

19:26 <rqou> why not?

19:26 <ZipCPU> Check out the diagrams on pages 26 and 27 of https://github.com/ZipCPU/zipcpu/blob/master/doc/orconf.pdf

19:27 <ZipCPU> P26 matches the diagrams in B3 spec, P27 shows the pipeline difference.

19:28 <ZipCPU> Ok, P26 doesn't *quite* math ... Under B3, the address and data lines need to be held until the ack is also true.

19:29 <rqou> i don't see how P27 violates the B3 spec?

19:29 <ZipCPU> See page 51 of the B3 spec. The ACK comes back while the strobe is still high. The strobe then needs to be dropped for a cycle, before being raised again. Once raised, it will take a minimum of one clock to ack the next cycle.

19:30 <rqou> but page 54 shows two back-to-back reads

19:31 * ZipCPU turns to page 54

19:32 <ZipCPU> Yeah, I see what you are talking about ...

19:32 * ZipCPU strokes his beard ...

19:33 <ZipCPU> Here's the thing ... what if the ACK doesn't come back for many cycles? The bus is stalled.

19:33 <rqou> yes

19:33 <ZipCPU> How would this work for a DDR3 SDRAM, which can't accomplish the read for 20+ cycles?

19:34 <ZipCPU> BTW ... I see what you are talking about, and sit here at least partially corrected ... ;)

19:34 <rqou> so the first ack is delayed 20+ cycles, and subsequent acks appear instantly

19:34 <rqou> (assuming the address matches)

19:34 <ZipCPU> While that might work for a write, it won't work for a read.

19:35 <rqou> why not?

19:35 <ZipCPU> The master can't change the address until the ACK returns valid.

19:35 <ZipCPU> The ACK can't return valid, until the returned data is valid.

19:35 <ZipCPU> Hence, your stuck doing each read individually.

19:36 <ZipCPU> There's another problem as well: fanout.

19:36 <ZipCPU> A bus tends to have very high fanout, and so it can easily slow down the speed of your circuitry.

19:36 <rqou> so the first address is held for 20+ cycles, and then the master changes the address. the memory controller has already prefetched the next address, and it checks if the new address from the master matches what it prefetched

19:36 <ZipCPU> If you can place delay stages within that, you can keep your speed up.

19:37 <rqou> i don't see why this doesn't work

19:37 <ZipCPU> Because some slaves have consequences when an item is read. You don't want to pre-read from those slaves before you know that's what you'll need.

19:37 <ZipCPU> Ahm ... side-effects is a better term than consequences.

19:38 <rqou> right, but the memory controller can

19:38 sklv1 has joined #yosys

19:38 <ZipCPU> Then you'd need two separate controllers.

19:38 <rqou> why?

19:39 <ZipCPU> Let's slow down for a moment ... I can see how memory or flash could pre-read. Cool.

19:39 <ZipCPU> But what about reading from a peripheral? Say an A/D buffer, where everything's at the same address?

19:39 <ZipCPU> You can't pre-read if you don't know you are going to remain at that address ...

19:39 <rqou> you can't pre-read in that case

19:39 <ZipCPU> Besides, that also places "address update" circuitry into multiple places within your design.

19:40 <ZipCPU> But ... what about the fanout issue?

19:40 <rqou> these two issues still exist :)

19:40 <ZipCPU> (Oh, and to deal with pre-reading, the lm32 IIRC implements several TAG lines ... so it knows ahead of time what to read.)

19:40 <ZipCPU> You don't need to do that with B4/pipeline.

19:41 sklv has quit [Ping timeout: 248 seconds]

19:42 <rqou> hmm, the only difference i see is whether address needs to remain valid?

19:43 <ZipCPU> If the address needs to remain valid, that implies to me a round-trip between the master and slave.

19:43 <ZipCPU> See, this is what the cycle-type indicator was meant to handle within B3.

19:44 <ZipCPU> It gives you the ability to support several different cycle types, but it also complicates the processing within the slave.

19:45 <rqou> hmm, the spec seems very unclear, but it seems to me that pipelined back-to-back reads are still possible on b3

19:48 <ZipCPU> If I read the spec properly, that's discussed in chapter 4: WB register feedback.

19:48 <rqou> hmm, wishbone seems to conflate "burst" and "atomic"

19:48 <ZipCPU> Registered feedback requires a cycle-type indicator.

19:49 <ZipCPU> This means you have to know, before accessing whatever, that you'll be doing a multi-read cycle.

19:49 <ZipCPU> While this may make sense for a cache, it's an arbitrary requirement otherwise.

19:51 <rqou> huh, i missed that

19:51 <rqou> yeah, that seems somewhat unnecessary

19:52 <ZipCPU> The cool thing is ... the ZipCPU, when I last evaluated Dhrystone, had a *really* awesome performance--for a data-cacheless CPU.

19:52 <ZipCPU> Why? Because it exploited the pipeline bus access capabilities of B4.

19:54 <rqou> hmm, that just sounds like "i used prefetching"

19:54 <ZipCPU> Heheh ... no, it's more than that ;)

19:54 <ZipCPU> Consider write delays. How fast can you write using WB/B3?

19:54 <ZipCPU> How many cycles will each write take?

19:55 <rqou> ah, you have write buffering too

19:55 <ZipCPU> Let's suppose you have a write-through cache ...

19:55 <ZipCPU> So every write to memory goes to the bus.

19:55 <ZipCPU> How many cycles will it take to write N consecutive items to the bus?

19:57 <rqou> so you basically have a one-line cache :P

19:57 <rqou> (of course the implementation looks nothing like a cache)

19:59 <rqou> ZipCPU: are there hazards when writing to an address and then immediately reading from it?

19:59 <ZipCPU> Probably. I don't usually do that, though.

20:00 <rqou> when i was doing a cpu, hazards and branching made up 90+% of the bugs

20:00 <ZipCPU> "usually"? Actually, I don't do that at all.

20:00 <ZipCPU> ^^: +1

20:00 <ZipCPU> Same here.

20:01 <rqou> the first opcode i implemented was actually "jump and link" for this reason :P

20:01 <ZipCPU> Really? Gosh, I think I started with the ALU, and only learned the lesson the hard way.

20:03 <rqou> ok, i had "add" as well because that's pretty trivial

20:03 <rqou> one step away from turing completeness i believe (needs compare)

20:03 <rqou> :P

20:04 <ZipCPU> Add + compare + LR = completeness? No requirement for memory reading/writing?

20:07 <rqou> supposedly "subtract and branch if less" is Turing complete by itself

20:08 <ZipCPU> Sigh ... I'm losing my respect for turing completeness the longer we chat on this topic. ;)

20:08 <rqou> actually you're right, this has to modify memory

20:08 <rqou> but it doesn't require registers :P

20:09 <ZipCPU> "hazards and branching made up 90% of the bugs" ... I like that. I'm going to try to remember it, so that I might quote you later on it. It's just so true.

21:23 <qu1j0t3> cr1901_modern: it seems you could submit a patch? they respond to contacts

21:28 ekiwi has joined #yosys

21:37 m_t has quit [Quit: Leaving]

21:42 <rqou> o/ ekiwi

21:42 <rqou> i see your berkeley.edu reverse-dns

21:48 oldtopman has quit [Ping timeout: 258 seconds]

21:54 oldtopman has joined #yosys

22:22 <cr1901_modern> qu1j0t3: I

22:22 <cr1901_modern> ve also asked him privately "would you accept X" and most of the time the answer is "it depends"

22:23 <cr1901_modern> Also, back, but kinda don't really have spoons for convo right now :(

22:27 leviathanch has quit [Remote host closed the connection]

22:53 <ZipCPU> YES!! It's suppertime here, but I just managed to finish proving all three of my ZipCPU pre-fetch modules!

22:55 <qu1j0t3> cr1901_modern: *nod*

22:55 <qu1j0t3> ZipCPU: nice

22:55 eduardo has quit [Quit: Ex-Chat]

23:30 azonenberg_work has quit [Ping timeout: 248 seconds]