sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
fengling has joined #m-labs
aeris has quit [Ping timeout: 252 seconds]
<whitequark> rjo: oh, it's definitely a bug
<whitequark> except it's one of these things that really should not happen in a release *shrug*
aeris has joined #m-labs
<GitHub198> [pyparser] whitequark pushed 1 new commit to master: http://git.io/vUZsh
<GitHub198> pyparser/master d3396d5 whitequark: Documentation fixes.
fengling has quit [Quit: WeeChat 1.1.1]
acathla has quit [Quit: Coyote finally caught me]
acathla has joined #m-labs
<GitHub63> [pyparser] whitequark pushed 2 new commits to master: http://git.io/vUZdn
<GitHub63> pyparser/master 1e0947e whitequark: Fix scoping error in coverage.
<GitHub63> pyparser/master 1a49d4d whitequark: Add Python 3.2 support.
<GitHub153> [pyparser] whitequark pushed 2 new commits to master: http://git.io/vUneU
<GitHub153> pyparser/master 83bd532 whitequark: Add Python 3.3-3.4 support.
<GitHub153> pyparser/master 35749e2 whitequark: Add Python 3.5 support.
travis-ci has joined #m-labs
<travis-ci> fallen/artiq#133 (flash_storage - c03638f : Yann Sionneau): The build has errored.
travis-ci has left #m-labs [#m-labs]
<GitHub120> [pyparser] whitequark pushed 2 new commits to master: http://git.io/vUnu6
<GitHub120> pyparser/master 11ec157 whitequark: Add diagnostic.Engine.
<GitHub120> pyparser/master 6f0a179 whitequark: Normalize ast to 3.4.
mumptai has joined #m-labs
<GitHub172> [pyparser] whitequark pushed 2 new commits to master: http://git.io/vUnMf
<GitHub172> pyparser/master cb25dc3 whitequark: Update README.
<GitHub172> pyparser/master 71bc9c1 whitequark: Add algorithm module.
<whitequark> sb0: it is Done.
<GitHub144> [pyparser] whitequark pushed 1 new commit to master: http://git.io/vUnMr
<GitHub144> pyparser/master 5688d04 whitequark: Add license info.
<GitHub177> [pyparser] whitequark pushed 1 new commit to master: http://git.io/vUnD0
<GitHub177> pyparser/master 9ffba43 whitequark: Doc fixes.
sj_mackenzie has joined #m-labs
<mithro> pyparser?
<whitequark> yes
<mithro> whitequark: What do you need the "returns precise location information for every token" stuff for?
sb0_ has joined #m-labs
<sb0_> larsc, are you familiar with the way sdio device probe works in the linux kernel?
<sb0_> I'm trying to backport brcmfmac to 3.13, which doesn't have some show-stopper acpi/emmc bugs that later kernels have
<sb0_> got it to compile, now that stupid thing won't attach to the sdio device
<sb0_> this sdio stuff is particularly obscure (and buggy). there isn't even a 'lssdio' command like there is for pci and usb...
sj_mackenzie has quit [Remote host closed the connection]
* sb0_ is eagerly waiting for the day operating systems stop sucking
<mithro> sb0_: That will happen shortly after hardware stops sucking :P
<whitequark> mithro: py2llvm error reporting
<mithro> whitequark: what is py2llvm? (what is it being used for?)
<whitequark> mithro: compiling computational kernels from python to LM32 assembly, which is then uploaded to a softcore on an FPGA
<whitequark> in essence python is used as a DSL
<whitequark> or more a subset of python
<sb0_> whitequark, it's compiling to or1k right now (the lm32 llvm backend needs some work...)
<whitequark> oh
<whitequark> well, no big difference
<sb0_> yeah, and mor1kx has lost some weight. the or1k ISA/ABI isn't very clean and I'd still prefer LM32, but it's not an important detail.
<sb0_> there's also *ahem* risc-v, but to date no one has published a usable implementation, just a lot of hot air...
<sb0_> and some ridiculously unusable code
<sb0_> ...seriously, who'd use a FSM to schedule a CPU with a CPI of >3, for example
<larsc> sb0_: you can see all detected devices in /sys/bus/sdio/devices/
<mithro> Well, bed time for me
<mithro> whitequark: the pypy project didn't have anything you could steal?
<sb0_> they even managed to make that FSM implementation larger than lm32
<sb0_> larsc, well it's empty...
<sb0_> how does the detection work? parsing acpi tables I guess?
<sb0_> well, "work". acpi never works.
<larsc> I think SDIO devices have a internal ID that the driver is supposed to read
<larsc> the MMC host driver
<cr1901_modern> sb0_: Reminds me of a project I had to do at uni to create a toy CPU using an FSM. Tbh, I don't really know a better way of doing it, though I have some guesses
<larsc> and matching is done based on the ID
<sb0_> cr1901_modern, make a pipeline
<larsc> If there is no device its a problem with the host driver not the SDIO driver
<sb0_> larsc, yes, it's VID/PID like USB
<cr1901_modern> A pipeline doesn't require a FSM for control information re: hazards?
<sb0_> no
<sb0_> cr1901_modern, a potentially useful project would be to take lm32 and modify its instruction decoder to read risc-v
<cr1901_modern> Do all the instructions of risc-v map cleanly to lm32?
<sb0_> maybe among all the people who follow the hype, one or two good developers will write useful software for risc-v
<sb0_> yes.
<cr1901_modern> (Of course, I've never actually had to write a pipelined CPU in verilog. That's part of the reason I decided to take a look at lm32 after seeing whitequarks RPI article)
<cr1901_modern> Maybe if I see the code myself, things will be more obvious
<larsc> sb0_: this is how the matching of the device to the driver is done: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/mmc/core/sdio_bus.c#n67
<larsc> but if there is no device in the first place it will never be called
<sb0_> larsc, and the device list is retrieved dynamically from the host driver?
<sb0_> *not* acpi?
<larsc> Which list?
<sb0_> the list of all devices on the sdio bus
<larsc> yes
<larsc> have a look at mmc_attach_sdio()
<larsc> that's where the magic happens
<sb0_> oh, great, so 3.13 has a mmc host controller bug
<sb0_> blergh
<sb0_> well, thanks for the info
<cr1901_modern> sb0_: I can understand why a pipelined CPU free of hazards can be implemented without an FSM. As long as control flow isn't interrupted, there really isn't any state to keep.
<cr1901_modern> But how does one get around keeping state when there are hazards?
<sb0_> cr1901_modern, pipelined cpus typically aren't free of hazards
<cr1901_modern> (state "besides the pipeline registers between each stage" :P)
<cr1901_modern> I know. I guess I just have trouble visualizing how hazards are handled.
<sb0_> 2-stage ones, maybe. but then the max frequency is rather low (though if done well, higher than most crappy cpus from opencores or riscv that have longer pipelines)
<cr1901_modern> SuperFX is a RISC CPU (that had a limited use case) that didn't have hazards- it was up to the programmer to ensure they didn't try anything clever (stupid) during the 1 instruction delay before a jump :P
<sb0_> you also have register-related hazards. and yes, you can leave hazard management all to the compiler if you wish.
<sb0_> that may impact performance and code density, though
<sb0_> plus the asm listings become painful to debug
<cr1901_modern> Well, in the case of SFX, pretty sure that was all assembly. But again, maybe 10 development teams in the entire world used it
<cr1901_modern> I'll just look at LM32's code and see how hazards are handled
<cr1901_modern> Speaking of "innovative CPUs", what do you think of the Mill Architecture :P?
<sb0_> cr1901_modern, you could probably get a textbook on pipelines and hazards. it's a pretty common topic...
<cr1901_modern> I have one. I know what pipelines and hazards do. My issue is that if you told me, right now, to make a pipelined CPU in Verilog, I would have trouble doing it b/c of the hazard logic
<sb0_> they teach it in many unis
<cr1901_modern> I took comp arch a while back. The final project was a toy, hardcoded logic (no microcode), non-pipelined CPU.
<sb0_> non-pipelined CPUs are not comp arch, they're just messing around
<cr1901_modern> Gotta start somewhere. I had no clue how a CPU read data before I took that class, so I'm hesitant to write off the class like that XD
<cr1901_modern> (This was over 4 years ago, btw)
<ysionneau> cr1901_modern: a good coursera (MOOC) I took on comp arch, I dumped all the videos: http://sionneau.net/edgebsd/comparch-002/
<ysionneau> it explained well how to deal with hazards
<ysionneau> (it was this MOOC: https://www.coursera.org/course/comparch )
<cr1901_modern> Thanks, noted. I'll take a look after I take a nap.
<sb0_> seriously, in ACPI you can write "Zero" and "One" instead of 0x00 and 0x01
<cr1901_modern> TLDR version: My initial attempt to try to make a pipelined CPU in verilog, if I was told to do it right now, would in fact have BEEN to use an FSM to handle the hazard logic. I'll take sb0's word for it that that's a bad way to do it.
<ysionneau> you can insert bubbles or put forwarding logic etc
<sb0_> cr1901_modern, what exactly would that FSM do?
<sb0_> you can totally e.g. have a FSM as one pipeline stage, btw. just make sure it processes most information in one cycle...
<cr1901_modern> Example off the top of my head (so if it's bad, poorly thought-out, I'll take the blame):
<cr1901_modern> 00- Pipeline-okay
<cr1901_modern> 01- RAW- Hazard- stop pipeline for x cycles
<cr1901_modern> when x cycles pass 01=>00
<cr1901_modern> which requires a two bit counter connected to the master clock :P
<cr1901_modern> ysionneau: Those ARE some nice slides- as a nice diagram for how to implement stalls
<cr1901_modern> and... no FSM XD
<ysionneau> the teacher has a very annoying voice, but I really enjoyed this course :)
<whitequark> mithro: nope
<whitequark> also wow, apparently I have a highlight on "pipeline"
<larsc> while we are at asking questions. Somebody is trying to tell me that if I have a clock that is fed to multiple different clock buffers and I want to transfer data from logic clocked by the one clockbuffer to logic clocked by another clockbuffer I always have to use proper CDC circuits
<larsc> and can't rely on the fixed phase relationship between the clocks and that the tools will complain if the data can't be transferred safely
<larsc> thoughts?
<sb0_> if they are static clock buffers, then if you account for PVT in the clock buffer skew, you don't need CDC circuits - it stays synchronous
<larsc> yea, that's what I've been saying
<sb0_> you may need CDC if they are complicated clock buffers that e.g. contain a PLL or similar
<larsc> just a BUFIO and a BUFG
<sb0_> one BUFIO driving several BUFGs?
<larsc> just one
<larsc> for source synchronous capture
<sb0_> well obviously you don't need CDC... how could you use synchronous off-chip devices then?
<larsc> The reasoning was 'regardless of what you may read elsewhere'
<sb0_> what are the two clock domains? inside the chip sending the data, and inside the FPGA after it's been through BUFIO+BUFG?
<larsc> the current setup is a BUFG driven by a IBUFGDS which clocks the whole logic.
<larsc> including the IDDRs used for capturing the incomming data
<larsc> the problem is the skew introduced by sending the clock from the IO bank to the BUFG and back to the IDDRs in the IO bank is massive
<larsc> larger than what you can compenstate for with a IDELAY
<sb0_> is that slowtan6?
<larsc> kintex
<sb0_> are you using IBUFDS or IBUFGDS?
<larsc> GDS I think
<sb0_> you should use IBUFDS, otherwise you are chaining two BUFGs which is useless, wastes BUFGs and increases skew
<larsc> looking at the final result I think there is only one BUFG
<sb0_> ok...maybe the synthesizer removes one automatically then
<sb0_> you can use a PLL to absorb the skew
<larsc> looks like in series7 both are the same
<sb0_> or IDELAY. I'm surprised that would not work. they have pretty long range (nanoseconds) and I assume that your data rate is high if you have this skew problem...
<sb0_> also, unlike in slowtan6, you can have multiple data edge traveling in a xDELAY
<larsc> there is a window of about 1ns were the data is good
<larsc> skew is 3.5ns or sometimg and IDEALY gives 2.8ns when fully turned up
<larsc> where
<larsc> something
<sb0_> use a PLL then
<larsc> but what's wrong with a BUFIO?
<sb0_> what's the clock frequency?
<larsc> 200MHz
<sb0_> BUFIO should work as well, yes
<larsc> It worked in my test setup, but as I said I was told we cant do that
<sb0_> *shrug* isn't the BUFIO designed for doing exactly that?
<larsc> because of the CDC logic that would be required for it
<sb0_> AFAIK there's no CDC logic required and using BUFIO to clock the IDDR and BUFG for the fabric is exactly what the kintex7 architecture is designed for
<larsc> k, thanks.
<sb0_> but if your boss becomes excessively annoying about it, you can use a PLL...
<larsc> we've settled for launching and capturing on the same edge instead of the opposite edge, that means we start of with a negative skew at the device which gives us the extra slack we need to compenstate for the the delay inside the FPGA
<larsc> about IBUF vs IBUFG: 'Synthesis will automatically insert a BUFG on clock nets that it detects, but if it already has a BUFG instantiated in the code it will not add another BUFG.'
<cr1901_modern> whitequark: https://twitter.com/whitequark/status/597459054475276288 I think the point is to show it's so freaking involved to get bare metal code running on modern non-micro CPUs
<whitequark> that has nothing to do with engineering an OS
<whitequark> and everything to do with, I dunno, being able to read a datasheet?
<cr1901_modern> Maybe I just don't see the point in writing an OS if you're protected from the details by a hypervisor that is unfathomably more complicated...
<cr1901_modern> There will never be another Linux, so all toy OSes are basically to "learn what goes on under the hood"
<cr1901_modern> (More generally- there will never be another successful hobbyist OS)
<cr1901_modern> whitequark: After thinking about this more, I think I see your point a bit more clearly. One can always decide if they want to go further after making an OS in a hypervisor or within Linux
<whitequark> an OS is something that allocates resources and, usually, manages sandboxes
<whitequark> and communicates between those sandboxes
<whitequark> that an OS communicates with hardware is an implementation detail
<cr1901_modern> Thinking about using a hypervisor makes me with more than two privelege levels took off. The kernel itself could be one entire privelege level, which can be tested under a hypervisor or virtual environment. And when the kernel is installed on real hardware, the drivers are given a lower privilege level. This way if the driver is buggy, it doesn't crash the system
<whitequark> congrats, you have invented microkernels
<cr1901_modern> What CPU besides x86 has more than two privilege levels :P?
<whitequark> you don't need that
<whitequark> run drivers in userspace
<whitequark> mmap the device IO memory into their address space
<whitequark> (map the device IO ports into their address space on x86, using TSS IOPM field)
<cr1901_modern> right, that's what iopl() does on Linux/NetBSD, but you have to be root to do it. Which makes sense. So drivers can only be started as superuser or admin
<whitequark> screw the whole unix access control system
<whitequark> make it from scratch based on capabilities
<whitequark> (capabilities in the sense of unforgeable tokens granting access to resources)
<whitequark> Mach does that, it makes way more sense
<cr1901_modern> Uses IPC to notify the driver: "Hey, some process needs to use your capabilities, are you okay/ready for use?" or something like tat
<whitequark> no
<whitequark> the capability IS the resource, for the process
<whitequark> capabilities are a bit like fd's. you can duplicate them, you can inherit them, you can send them via a socket
<whitequark> (no open though)
<whitequark> init "inherits" all the capabilities that exist
<cr1901_modern> I think of fd's typically as "a handle to a that another function uses to determine what function to ACTUALLY call or what data to manipulate"- is that an accurate analogy in your example?
<whitequark> well, sorta, the kernel-mode code is what actually does the access
<whitequark> (or mapping)
mumptai has quit [Quit: Verlassend]
<cr1901_modern> yes, that makes sense. Capabilities are a resource, and kernel handles resources. As long as the code talking to the hardware itself is user mode, all is good XD. Inevitably, if I write ever an OS, I expect to crash most of the drivers in this manner at least once.
<cr1901_modern> or ten times
<whitequark> the good thing about being able to send capabilities is that it doesn't really matter who ran you
<whitequark> or which group do you run under
<whitequark> etc
<whitequark> also since there is no way to "acquire" a capability from thin air? no escalation either
<cr1901_modern> Idk enough about privilege escalation, so I'll take your word for it. Thinking about how to do an OS is certainly more fun than actually writing it XD.