hackerfoo has quit [Remote host closed the connection]
m_hackerfoo has quit [Remote host closed the connection]
Miyu has joined ##openfpga
Miyu has quit [Ping timeout: 246 seconds]
rohitksingh has quit [Ping timeout: 245 seconds]
hackerfoo has joined ##openfpga
m_hackerfoo has joined ##openfpga
<whitequark>
daveshah: poke
<daveshah>
whitequark: hi!
<whitequark>
daveshah: currently thinking about the best way to map decision trees in processes to lookup tables that output one-hot signals
<whitequark>
i feel like BDDs are definitely related in that their reduced form is good for eliminating irrelevant terms, but also i don't see any form of BDDs so far that would directly be a good fit
<whitequark>
for several reasons:
<whitequark>
1. in yosys processes you can end up with *several* cases in the tree simultaneously active. what's worse is that not only the decision trees are priority-encoded in that the first case wins, but they are *also* priority-encoded is that the last switch assigning a specific signal wins
<whitequark>
2. it would be nice to share one-hot functions between several parallel muxes (with some preprocessing/creative wiring), but that requires turning the usual BDD structure "inside-out" in a way
<whitequark>
so for example, in a typical BDD you would have two leaf nodes, 0 and 1, and you build up functions by branching on variables, where a more complex function can be split into several smaller ones that are effectively subtrees
<whitequark>
on the other hand, in the hypothetical structure i want, each node can potentially drive a signal, and indeed if an entire switch subtree chooses some particular value for a particular mux, you want to get the selection signal as close to the root as possible
<whitequark>
on the other hand, canonicity isn't that important because combining these structures isn't that useful
<whitequark>
any thoughts?
<daveshah>
Not really close to anything I've looked at before but I'll think about it
<whitequark>
I'm not even sure how to *build* this structure tbh
<whitequark>
but I suspect what might work is "pushing from the top"
<whitequark>
that is, there is a shared decision tree for all muxes, and each node can have *both* a set of mux selections *and* a set of edges
<daveshah>
Yes, I think that is starting to make sense
<whitequark>
each time you add a new node you look at whether the sub-switches have a variety of selections for any particular mux, or only one
<whitequark>
and if it's one, you pin it there, or if it's many, you push the decision to sub-nodes
cr1901 has quit [Ping timeout: 245 seconds]
cr1901 has joined ##openfpga
<mithro>
whitequark: A while back you had a page which discussed some of your theory behind nmigen but I can't find it right now...
<mithro>
whitequark: I felt like there was something else to that...
<daveshah>
Just to make sure I understand what is going on, what will the ultimate final inputs to the BDD be?
<daveshah>
e.g. if a case was selecting on a==5 would the input be a or a==5?
<daveshah>
With a being some kind of top level input
<whitequark>
daveshah: hmmm, do you mean like `case (a==5) 1: ... end` or `case (a) 5: ... end` ?
<daveshah>
Let's say the first
ZipCPU has quit [Ping timeout: 250 seconds]
ZipCPU has joined ##openfpga
<whitequark>
then the process would only ever see the result of a==5, which is a 1-bit $eq.\Y signal
<daveshah>
Which it seems like Yosys will create if you have a chain of if/else instead of a case
<whitequark>
indeed, and nmigen does that as well
<whitequark>
what I'm really interested in though is things like instruction decoders
<whitequark>
where for example you can have nested switches branching on the *same bits*
<whitequark>
boneless' decoder branches up to three times on the same \i_insn[15:0]
<whitequark>
it's not hard at all to handle if/else chains because it pretty much just becomes a tree of muxes and all downstream passes should handle that more or less optimally
<whitequark>
but the moment you get an FSM it becomes bad already
<whitequark>
and decoders are the worst
<whitequark>
daveshah: so I think I'm going to write some code that builds *some* decision diagram (not necessarily a good one), and then go from there, because,
<whitequark>
with BDDs, choosing variable order is very important, but how do you choose variable order? for example by looking at which choices affect the most primary outputs
<whitequark>
but to do this, I need to be able to quickly assess the impact of a particular decision on the primary outputs... which means I need to expand the priority encodings somehow, which means I need basically the same strcture I'm building
<whitequark>
or at least, that would be an easy way to do it, since, given this tree makes every decision explicit, evaluating the choices is as simple as cutting off some branches
<sorear>
it sounds like you’re hitting the “synthesizing decoders requires specialized tools” problem that, as you have observed, other vendors solve with spreadsheets and bespoke logic optimization scripts
<whitequark>
sorear: yeah but it's not just decoders
Asu has quit [Quit: Konversation terminated!]
<whitequark>
like, FSMs with rich conditionals inside them have the same problem
<whitequark>
also this is a stupid ass problem to have. you have a logic optimizer *right here*
<tnt>
I'm curious what you'll end up with and how much better than the naive/brute-force approach to build a giant truth table and minimizing that.
<whitequark>
tnt: "giant truth table" might work for 16-bit instructions but probably not for 32-bit ones
<tnt>
yeah, of course, it's not a universal approach, that's just the one I'm using atm :p
<tnt>
Really need to get back to it and put it online somewhere.
<whitequark>
tnt: strongly reminiscent of boneless, heh
<tnt>
whitequark: IIRC I took the register window concept from it, but other than that, it was actually mostly defined on paper before boneless. I guess the design space isn't that big :) Although there are quite a bit of fundamental differences like different data/program/io spaces.
<tnt>
(at least AFAIK, I haven't actually dug much into boneless)
<whitequark>
tnt: yeah, not saying you even borrowed anything
<whitequark>
it's just there are not so many ways to make a small cpu
<whitequark>
i'm curious if you would consider using boneless v3
<tnt>
I was definitely planning to look at it a bit closer. Especially because chances are you'll make a much better toolchain for it than I would for my softcore :)
<whitequark>
tnt: there is already the assembler and disassembler (both into text and structured data aka python code)
<whitequark>
the core, in theory, is functional, the FSM-driven one, but it does not have a formal spec yet, because it pisses me off how inefficient the yosys synthesis results are
<whitequark>
and I will not drive a pipelined core without a formal spec for sure
<tnt>
tbh when I saw you were working on one, I considered not even pursuing mine at all, but then I always wanted to try and design and implement one ever since I used and tweaked the picoblaze like 20y ago (damn I'm old) so I figured I would see it through.
<whitequark>
oh yeah, designing one really gives you a lot of insight into that
<whitequark>
for example... on a LUT architecture you don't really care about a prefix encoding
<tnt>
mostly because I'm targetting fmax of 48 MHz (to be synchronous with usb core) on a up5k and that's ... tricky and requires hacks that I didn't think you'd go for :)
<whitequark>
that's actually something I want to do.
<whitequark>
for the same reason, amusingly
<whitequark>
the entire instruction set is carefully designed such that the path between any two registers is at most a 16-bit adder + a 2-way 16-bit mux
<whitequark>
well, in theory, it doesn't seem to synthesize to that yet
<whitequark>
that's why boneless has so many multicycles, among other things