<mithro>
whitequark: What do you need the "returns precise location information for every token" stuff for?
sb0_ has joined #m-labs
<sb0_>
larsc, are you familiar with the way sdio device probe works in the linux kernel?
<sb0_>
I'm trying to backport brcmfmac to 3.13, which doesn't have some show-stopper acpi/emmc bugs that later kernels have
<sb0_>
got it to compile, now that stupid thing won't attach to the sdio device
<sb0_>
this sdio stuff is particularly obscure (and buggy). there isn't even a 'lssdio' command like there is for pci and usb...
sj_mackenzie has quit [Remote host closed the connection]
* sb0_
is eagerly waiting for the day operating systems stop sucking
<mithro>
sb0_: That will happen shortly after hardware stops sucking :P
<whitequark>
mithro: py2llvm error reporting
<mithro>
whitequark: what is py2llvm? (what is it being used for?)
<whitequark>
mithro: compiling computational kernels from python to LM32 assembly, which is then uploaded to a softcore on an FPGA
<whitequark>
in essence python is used as a DSL
<whitequark>
or more a subset of python
<sb0_>
whitequark, it's compiling to or1k right now (the lm32 llvm backend needs some work...)
<whitequark>
oh
<whitequark>
well, no big difference
<sb0_>
yeah, and mor1kx has lost some weight. the or1k ISA/ABI isn't very clean and I'd still prefer LM32, but it's not an important detail.
<sb0_>
there's also *ahem* risc-v, but to date no one has published a usable implementation, just a lot of hot air...
<sb0_>
and some ridiculously unusable code
<sb0_>
...seriously, who'd use a FSM to schedule a CPU with a CPI of >3, for example
<larsc>
sb0_: you can see all detected devices in /sys/bus/sdio/devices/
<mithro>
Well, bed time for me
<mithro>
whitequark: the pypy project didn't have anything you could steal?
<sb0_>
they even managed to make that FSM implementation larger than lm32
<sb0_>
larsc, well it's empty...
<sb0_>
how does the detection work? parsing acpi tables I guess?
<sb0_>
well, "work". acpi never works.
<larsc>
I think SDIO devices have a internal ID that the driver is supposed to read
<larsc>
the MMC host driver
<cr1901_modern>
sb0_: Reminds me of a project I had to do at uni to create a toy CPU using an FSM. Tbh, I don't really know a better way of doing it, though I have some guesses
<larsc>
and matching is done based on the ID
<sb0_>
cr1901_modern, make a pipeline
<larsc>
If there is no device its a problem with the host driver not the SDIO driver
<sb0_>
larsc, yes, it's VID/PID like USB
<cr1901_modern>
A pipeline doesn't require a FSM for control information re: hazards?
<sb0_>
no
<sb0_>
cr1901_modern, a potentially useful project would be to take lm32 and modify its instruction decoder to read risc-v
<cr1901_modern>
Do all the instructions of risc-v map cleanly to lm32?
<sb0_>
maybe among all the people who follow the hype, one or two good developers will write useful software for risc-v
<sb0_>
yes.
<cr1901_modern>
(Of course, I've never actually had to write a pipelined CPU in verilog. That's part of the reason I decided to take a look at lm32 after seeing whitequarks RPI article)
<cr1901_modern>
Maybe if I see the code myself, things will be more obvious
<larsc>
but if there is no device in the first place it will never be called
<sb0_>
larsc, and the device list is retrieved dynamically from the host driver?
<sb0_>
*not* acpi?
<larsc>
Which list?
<sb0_>
the list of all devices on the sdio bus
<larsc>
yes
<larsc>
have a look at mmc_attach_sdio()
<larsc>
that's where the magic happens
<sb0_>
oh, great, so 3.13 has a mmc host controller bug
<sb0_>
blergh
<sb0_>
well, thanks for the info
<cr1901_modern>
sb0_: I can understand why a pipelined CPU free of hazards can be implemented without an FSM. As long as control flow isn't interrupted, there really isn't any state to keep.
<cr1901_modern>
But how does one get around keeping state when there are hazards?
<sb0_>
cr1901_modern, pipelined cpus typically aren't free of hazards
<cr1901_modern>
(state "besides the pipeline registers between each stage" :P)
<cr1901_modern>
I know. I guess I just have trouble visualizing how hazards are handled.
<sb0_>
2-stage ones, maybe. but then the max frequency is rather low (though if done well, higher than most crappy cpus from opencores or riscv that have longer pipelines)
<cr1901_modern>
SuperFX is a RISC CPU (that had a limited use case) that didn't have hazards- it was up to the programmer to ensure they didn't try anything clever (stupid) during the 1 instruction delay before a jump :P
<sb0_>
you also have register-related hazards. and yes, you can leave hazard management all to the compiler if you wish.
<sb0_>
that may impact performance and code density, though
<sb0_>
plus the asm listings become painful to debug
<cr1901_modern>
Well, in the case of SFX, pretty sure that was all assembly. But again, maybe 10 development teams in the entire world used it
<cr1901_modern>
I'll just look at LM32's code and see how hazards are handled
<cr1901_modern>
Speaking of "innovative CPUs", what do you think of the Mill Architecture :P?
<sb0_>
cr1901_modern, you could probably get a textbook on pipelines and hazards. it's a pretty common topic...
<cr1901_modern>
I have one. I know what pipelines and hazards do. My issue is that if you told me, right now, to make a pipelined CPU in Verilog, I would have trouble doing it b/c of the hazard logic
<sb0_>
they teach it in many unis
<cr1901_modern>
I took comp arch a while back. The final project was a toy, hardcoded logic (no microcode), non-pipelined CPU.
<sb0_>
non-pipelined CPUs are not comp arch, they're just messing around
<cr1901_modern>
Gotta start somewhere. I had no clue how a CPU read data before I took that class, so I'm hesitant to write off the class like that XD
<cr1901_modern>
Thanks, noted. I'll take a look after I take a nap.
<sb0_>
seriously, in ACPI you can write "Zero" and "One" instead of 0x00 and 0x01
<cr1901_modern>
TLDR version: My initial attempt to try to make a pipelined CPU in verilog, if I was told to do it right now, would in fact have BEEN to use an FSM to handle the hazard logic. I'll take sb0's word for it that that's a bad way to do it.
<ysionneau>
you can insert bubbles or put forwarding logic etc
<sb0_>
cr1901_modern, what exactly would that FSM do?
<sb0_>
you can totally e.g. have a FSM as one pipeline stage, btw. just make sure it processes most information in one cycle...
<cr1901_modern>
Example off the top of my head (so if it's bad, poorly thought-out, I'll take the blame):
<cr1901_modern>
00- Pipeline-okay
<cr1901_modern>
01- RAW- Hazard- stop pipeline for x cycles
<cr1901_modern>
when x cycles pass 01=>00
<cr1901_modern>
which requires a two bit counter connected to the master clock :P
<cr1901_modern>
ysionneau: Those ARE some nice slides- as a nice diagram for how to implement stalls
<cr1901_modern>
and... no FSM XD
<ysionneau>
the teacher has a very annoying voice, but I really enjoyed this course :)
<whitequark>
mithro: nope
<whitequark>
also wow, apparently I have a highlight on "pipeline"
<larsc>
while we are at asking questions. Somebody is trying to tell me that if I have a clock that is fed to multiple different clock buffers and I want to transfer data from logic clocked by the one clockbuffer to logic clocked by another clockbuffer I always have to use proper CDC circuits
<larsc>
and can't rely on the fixed phase relationship between the clocks and that the tools will complain if the data can't be transferred safely
<larsc>
thoughts?
<sb0_>
if they are static clock buffers, then if you account for PVT in the clock buffer skew, you don't need CDC circuits - it stays synchronous
<larsc>
yea, that's what I've been saying
<sb0_>
you may need CDC if they are complicated clock buffers that e.g. contain a PLL or similar
<larsc>
just a BUFIO and a BUFG
<sb0_>
one BUFIO driving several BUFGs?
<larsc>
just one
<larsc>
for source synchronous capture
<sb0_>
well obviously you don't need CDC... how could you use synchronous off-chip devices then?
<larsc>
The reasoning was 'regardless of what you may read elsewhere'
<sb0_>
what are the two clock domains? inside the chip sending the data, and inside the FPGA after it's been through BUFIO+BUFG?
<larsc>
the current setup is a BUFG driven by a IBUFGDS which clocks the whole logic.
<larsc>
including the IDDRs used for capturing the incomming data
<larsc>
the problem is the skew introduced by sending the clock from the IO bank to the BUFG and back to the IDDRs in the IO bank is massive
<larsc>
larger than what you can compenstate for with a IDELAY
<sb0_>
is that slowtan6?
<larsc>
kintex
<sb0_>
are you using IBUFDS or IBUFGDS?
<larsc>
GDS I think
<sb0_>
you should use IBUFDS, otherwise you are chaining two BUFGs which is useless, wastes BUFGs and increases skew
<larsc>
looking at the final result I think there is only one BUFG
<sb0_>
ok...maybe the synthesizer removes one automatically then
<sb0_>
you can use a PLL to absorb the skew
<larsc>
looks like in series7 both are the same
<sb0_>
or IDELAY. I'm surprised that would not work. they have pretty long range (nanoseconds) and I assume that your data rate is high if you have this skew problem...
<sb0_>
also, unlike in slowtan6, you can have multiple data edge traveling in a xDELAY
<larsc>
there is a window of about 1ns were the data is good
<larsc>
skew is 3.5ns or sometimg and IDEALY gives 2.8ns when fully turned up
<larsc>
where
<larsc>
something
<sb0_>
use a PLL then
<larsc>
but what's wrong with a BUFIO?
<sb0_>
what's the clock frequency?
<larsc>
200MHz
<sb0_>
BUFIO should work as well, yes
<larsc>
It worked in my test setup, but as I said I was told we cant do that
<sb0_>
*shrug* isn't the BUFIO designed for doing exactly that?
<larsc>
because of the CDC logic that would be required for it
<sb0_>
AFAIK there's no CDC logic required and using BUFIO to clock the IDDR and BUFG for the fabric is exactly what the kintex7 architecture is designed for
<larsc>
k, thanks.
<sb0_>
but if your boss becomes excessively annoying about it, you can use a PLL...
<larsc>
we've settled for launching and capturing on the same edge instead of the opposite edge, that means we start of with a negative skew at the device which gives us the extra slack we need to compenstate for the the delay inside the FPGA
<larsc>
about IBUF vs IBUFG: 'Synthesis will automatically insert a BUFG on clock nets that it detects, but if it already has a BUFG instantiated in the code it will not add another BUFG.'
<whitequark>
that has nothing to do with engineering an OS
<whitequark>
and everything to do with, I dunno, being able to read a datasheet?
<cr1901_modern>
Maybe I just don't see the point in writing an OS if you're protected from the details by a hypervisor that is unfathomably more complicated...
<cr1901_modern>
There will never be another Linux, so all toy OSes are basically to "learn what goes on under the hood"
<cr1901_modern>
(More generally- there will never be another successful hobbyist OS)
<cr1901_modern>
whitequark: After thinking about this more, I think I see your point a bit more clearly. One can always decide if they want to go further after making an OS in a hypervisor or within Linux
<whitequark>
an OS is something that allocates resources and, usually, manages sandboxes
<whitequark>
and communicates between those sandboxes
<whitequark>
that an OS communicates with hardware is an implementation detail
<cr1901_modern>
Thinking about using a hypervisor makes me with more than two privelege levels took off. The kernel itself could be one entire privelege level, which can be tested under a hypervisor or virtual environment. And when the kernel is installed on real hardware, the drivers are given a lower privilege level. This way if the driver is buggy, it doesn't crash the system
<whitequark>
congrats, you have invented microkernels
<cr1901_modern>
What CPU besides x86 has more than two privilege levels :P?
<whitequark>
you don't need that
<whitequark>
run drivers in userspace
<whitequark>
mmap the device IO memory into their address space
<whitequark>
(map the device IO ports into their address space on x86, using TSS IOPM field)
<cr1901_modern>
right, that's what iopl() does on Linux/NetBSD, but you have to be root to do it. Which makes sense. So drivers can only be started as superuser or admin
<whitequark>
screw the whole unix access control system
<whitequark>
make it from scratch based on capabilities
<whitequark>
(capabilities in the sense of unforgeable tokens granting access to resources)
<whitequark>
Mach does that, it makes way more sense
<cr1901_modern>
Uses IPC to notify the driver: "Hey, some process needs to use your capabilities, are you okay/ready for use?" or something like tat
<whitequark>
no
<whitequark>
the capability IS the resource, for the process
<whitequark>
capabilities are a bit like fd's. you can duplicate them, you can inherit them, you can send them via a socket
<whitequark>
(no open though)
<whitequark>
init "inherits" all the capabilities that exist
<cr1901_modern>
I think of fd's typically as "a handle to a that another function uses to determine what function to ACTUALLY call or what data to manipulate"- is that an accurate analogy in your example?
<whitequark>
well, sorta, the kernel-mode code is what actually does the access
<whitequark>
(or mapping)
mumptai has quit [Quit: Verlassend]
<cr1901_modern>
yes, that makes sense. Capabilities are a resource, and kernel handles resources. As long as the code talking to the hardware itself is user mode, all is good XD. Inevitably, if I write ever an OS, I expect to crash most of the drivers in this manner at least once.
<cr1901_modern>
or ten times
<whitequark>
the good thing about being able to send capabilities is that it doesn't really matter who ran you
<whitequark>
or which group do you run under
<whitequark>
etc
<whitequark>
also since there is no way to "acquire" a capability from thin air? no escalation either
<cr1901_modern>
Idk enough about privilege escalation, so I'll take your word for it. Thinking about how to do an OS is certainly more fun than actually writing it XD.