#m-labs on 2015-05-10 — irc logs at freenode.irclog.whitequark.org

2015-03-04 14:45 sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

02:10 fengling has joined #m-labs

04:47 aeris has quit [Ping timeout: 252 seconds]

07:26 <whitequark> rjo: oh, it's definitely a bug

07:27 <whitequark> except it's one of these things that really should not happen in a release *shrug*

08:17 aeris has joined #m-labs

08:51 <GitHub198> [pyparser] whitequark pushed 1 new commit to master: http://git.io/vUZsh

08:51 <GitHub198> pyparser/master d3396d5 whitequark: Documentation fixes.

09:17 fengling has quit [Quit: WeeChat 1.1.1]

10:02 acathla has quit [Quit: Coyote finally caught me]

10:03 acathla has joined #m-labs

11:26 <GitHub63> [pyparser] whitequark pushed 2 new commits to master: http://git.io/vUZdn

11:26 <GitHub63> pyparser/master 1e0947e whitequark: Fix scoping error in coverage.

11:26 <GitHub63> pyparser/master 1a49d4d whitequark: Add Python 3.2 support.

11:54 <GitHub153> [pyparser] whitequark pushed 2 new commits to master: http://git.io/vUneU

11:54 <GitHub153> pyparser/master 83bd532 whitequark: Add Python 3.3-3.4 support.

11:54 <GitHub153> pyparser/master 35749e2 whitequark: Add Python 3.5 support.

13:28 travis-ci has joined #m-labs

13:28 <travis-ci> fallen/artiq#133 (flash_storage - c03638f : Yann Sionneau): The build has errored.

13:28 <travis-ci> Build details : http://travis-ci.org/fallen/artiq/builds/61968284

13:28 travis-ci has left #m-labs [#m-labs]

13:28 <GitHub120> [pyparser] whitequark pushed 2 new commits to master: http://git.io/vUnu6

13:28 <GitHub120> pyparser/master 11ec157 whitequark: Add diagnostic.Engine.

13:28 <GitHub120> pyparser/master 6f0a179 whitequark: Normalize ast to 3.4.

13:43 mumptai has joined #m-labs

14:13 <GitHub172> [pyparser] whitequark pushed 2 new commits to master: http://git.io/vUnMf

14:13 <GitHub172> pyparser/master cb25dc3 whitequark: Update README.

14:13 <GitHub172> pyparser/master 71bc9c1 whitequark: Add algorithm module.

14:13 <whitequark> sb0: it is Done.

14:15 <GitHub144> [pyparser] whitequark pushed 1 new commit to master: http://git.io/vUnMr

14:15 <GitHub144> pyparser/master 5688d04 whitequark: Add license info.

14:17 <GitHub177> [pyparser] whitequark pushed 1 new commit to master: http://git.io/vUnD0

14:17 <GitHub177> pyparser/master 9ffba43 whitequark: Doc fixes.

15:24 sj_mackenzie has joined #m-labs

15:33 <mithro> pyparser?

15:35 <whitequark> yes

15:36 <mithro> whitequark: What do you need the "returns precise location information for every token" stuff for?

15:44 sb0_ has joined #m-labs

15:44 <sb0_> larsc, are you familiar with the way sdio device probe works in the linux kernel?

15:45 <sb0_> I'm trying to backport brcmfmac to 3.13, which doesn't have some show-stopper acpi/emmc bugs that later kernels have

15:46 <sb0_> got it to compile, now that stupid thing won't attach to the sdio device

15:47 <sb0_> this sdio stuff is particularly obscure (and buggy). there isn't even a 'lssdio' command like there is for pci and usb...

15:48 sj_mackenzie has quit [Remote host closed the connection]

15:51 * sb0_ is eagerly waiting for the day operating systems stop sucking

15:52 <mithro> sb0_: That will happen shortly after hardware stops sucking :P

15:53 <whitequark> mithro: py2llvm error reporting

15:54 <mithro> whitequark: what is py2llvm? (what is it being used for?)

15:55 <whitequark> mithro: compiling computational kernels from python to LM32 assembly, which is then uploaded to a softcore on an FPGA

15:55 <whitequark> in essence python is used as a DSL

15:56 <whitequark> or more a subset of python

15:56 <sb0_> whitequark, it's compiling to or1k right now (the lm32 llvm backend needs some work...)

15:56 <whitequark> oh

15:56 <whitequark> well, no big difference

15:58 <sb0_> yeah, and mor1kx has lost some weight. the or1k ISA/ABI isn't very clean and I'd still prefer LM32, but it's not an important detail.

15:59 <sb0_> there's also *ahem* risc-v, but to date no one has published a usable implementation, just a lot of hot air...

15:59 <sb0_> and some ridiculously unusable code

16:00 <sb0_> ...seriously, who'd use a FSM to schedule a CPU with a CPI of >3, for example

16:00 <larsc> sb0_: you can see all detected devices in /sys/bus/sdio/devices/

16:00 <mithro> Well, bed time for me

16:00 <mithro> whitequark: the pypy project didn't have anything you could steal?

16:01 <sb0_> they even managed to make that FSM implementation larger than lm32

16:03 <sb0_> larsc, well it's empty...

16:03 <sb0_> how does the detection work? parsing acpi tables I guess?

16:03 <sb0_> well, "work". acpi never works.

16:05 <larsc> I think SDIO devices have a internal ID that the driver is supposed to read

16:05 <larsc> the MMC host driver

16:05 <cr1901_modern> sb0_: Reminds me of a project I had to do at uni to create a toy CPU using an FSM. Tbh, I don't really know a better way of doing it, though I have some guesses

16:05 <larsc> and matching is done based on the ID

16:05 <sb0_> cr1901_modern, make a pipeline

16:05 <larsc> If there is no device its a problem with the host driver not the SDIO driver

16:06 <sb0_> larsc, yes, it's VID/PID like USB

16:06 <cr1901_modern> A pipeline doesn't require a FSM for control information re: hazards?

16:07 <sb0_> no

16:07 <sb0_> cr1901_modern, a potentially useful project would be to take lm32 and modify its instruction decoder to read risc-v

16:08 <cr1901_modern> Do all the instructions of risc-v map cleanly to lm32?

16:08 <sb0_> maybe among all the people who follow the hype, one or two good developers will write useful software for risc-v

16:09 <sb0_> yes.

16:09 <cr1901_modern> (Of course, I've never actually had to write a pipelined CPU in verilog. That's part of the reason I decided to take a look at lm32 after seeing whitequarks RPI article)

16:09 <cr1901_modern> Maybe if I see the code myself, things will be more obvious

16:09 <larsc> sb0_: this is how the matching of the device to the driver is done: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/mmc/core/sdio_bus.c#n67

16:09 <larsc> but if there is no device in the first place it will never be called

16:10 <sb0_> larsc, and the device list is retrieved dynamically from the host driver?

16:10 <sb0_> *not* acpi?

16:11 <larsc> Which list?

16:11 <sb0_> the list of all devices on the sdio bus

16:11 <larsc> yes

16:11 <larsc> have a look at mmc_attach_sdio()

16:12 <larsc> that's where the magic happens

16:12 <sb0_> oh, great, so 3.13 has a mmc host controller bug

16:12 <sb0_> blergh

16:13 <sb0_> well, thanks for the info

16:13 <cr1901_modern> sb0_: I can understand why a pipelined CPU free of hazards can be implemented without an FSM. As long as control flow isn't interrupted, there really isn't any state to keep.

16:13 <cr1901_modern> But how does one get around keeping state when there are hazards?

16:14 <sb0_> cr1901_modern, pipelined cpus typically aren't free of hazards

16:14 <cr1901_modern> (state "besides the pipeline registers between each stage" :P)

16:15 <cr1901_modern> I know. I guess I just have trouble visualizing how hazards are handled.

16:15 <sb0_> 2-stage ones, maybe. but then the max frequency is rather low (though if done well, higher than most crappy cpus from opencores or riscv that have longer pipelines)

16:16 <cr1901_modern> SuperFX is a RISC CPU (that had a limited use case) that didn't have hazards- it was up to the programmer to ensure they didn't try anything clever (stupid) during the 1 instruction delay before a jump :P

16:17 <sb0_> you also have register-related hazards. and yes, you can leave hazard management all to the compiler if you wish.

16:17 <sb0_> that may impact performance and code density, though

16:18 <sb0_> plus the asm listings become painful to debug

16:18 <cr1901_modern> Well, in the case of SFX, pretty sure that was all assembly. But again, maybe 10 development teams in the entire world used it

16:18 <cr1901_modern> I'll just look at LM32's code and see how hazards are handled

16:19 <cr1901_modern> Speaking of "innovative CPUs", what do you think of the Mill Architecture :P?

16:20 <sb0_> cr1901_modern, you could probably get a textbook on pipelines and hazards. it's a pretty common topic...

16:21 <cr1901_modern> I have one. I know what pipelines and hazards do. My issue is that if you told me, right now, to make a pipelined CPU in Verilog, I would have trouble doing it b/c of the hazard logic

16:21 <sb0_> they teach it in many unis

16:21 <cr1901_modern> I took comp arch a while back. The final project was a toy, hardcoded logic (no microcode), non-pipelined CPU.

16:22 <sb0_> non-pipelined CPUs are not comp arch, they're just messing around

16:23 <cr1901_modern> Gotta start somewhere. I had no clue how a CPU read data before I took that class, so I'm hesitant to write off the class like that XD

16:24 <cr1901_modern> (This was over 4 years ago, btw)

16:26 <ysionneau> cr1901_modern: a good coursera (MOOC) I took on comp arch, I dumped all the videos: http://sionneau.net/edgebsd/comparch-002/

16:26 <ysionneau> it explained well how to deal with hazards

16:27 <ysionneau> plus you have a bit of informations there http://en.wikipedia.org/wiki/Hazard_%28computer_architecture%29#Eliminating_hazards

16:27 <ysionneau> (it was this MOOC: https://www.coursera.org/course/comparch )

16:27 <cr1901_modern> Thanks, noted. I'll take a look after I take a nap.

16:27 <sb0_> seriously, in ACPI you can write "Zero" and "One" instead of 0x00 and 0x01

16:28 <cr1901_modern> TLDR version: My initial attempt to try to make a pipelined CPU in verilog, if I was told to do it right now, would in fact have BEEN to use an FSM to handle the hazard logic. I'll take sb0's word for it that that's a bad way to do it.

16:29 <ysionneau> you can insert bubbles or put forwarding logic etc

16:31 <sb0_> cr1901_modern, what exactly would that FSM do?

16:32 <sb0_> you can totally e.g. have a FSM as one pipeline stage, btw. just make sure it processes most information in one cycle...

16:33 <cr1901_modern> Example off the top of my head (so if it's bad, poorly thought-out, I'll take the blame):

16:33 <cr1901_modern> 00- Pipeline-okay

16:34 <cr1901_modern> 01- RAW- Hazard- stop pipeline for x cycles

16:34 <cr1901_modern> when x cycles pass 01=>00

16:34 <cr1901_modern> which requires a two bit counter connected to the master clock :P

16:38 <cr1901_modern> ysionneau: Those ARE some nice slides- as a nice diagram for how to implement stalls

16:38 <cr1901_modern> and... no FSM XD

16:39 <ysionneau> the teacher has a very annoying voice, but I really enjoyed this course :)

16:42 <whitequark> mithro: nope

16:43 <whitequark> also wow, apparently I have a highlight on "pipeline"

16:47 <larsc> while we are at asking questions. Somebody is trying to tell me that if I have a clock that is fed to multiple different clock buffers and I want to transfer data from logic clocked by the one clockbuffer to logic clocked by another clockbuffer I always have to use proper CDC circuits

16:47 <larsc> and can't rely on the fixed phase relationship between the clocks and that the tools will complain if the data can't be transferred safely

16:47 <larsc> thoughts?

16:50 <sb0_> if they are static clock buffers, then if you account for PVT in the clock buffer skew, you don't need CDC circuits - it stays synchronous

16:50 <larsc> yea, that's what I've been saying

16:51 <sb0_> you may need CDC if they are complicated clock buffers that e.g. contain a PLL or similar

16:51 <larsc> just a BUFIO and a BUFG

16:52 <sb0_> one BUFIO driving several BUFGs?

16:52 <larsc> just one

16:53 <larsc> for source synchronous capture

16:53 <sb0_> well obviously you don't need CDC... how could you use synchronous off-chip devices then?

16:54 <larsc> The reasoning was 'regardless of what you may read elsewhere'

16:55 <sb0_> what are the two clock domains? inside the chip sending the data, and inside the FPGA after it's been through BUFIO+BUFG?

16:57 <larsc> the current setup is a BUFG driven by a IBUFGDS which clocks the whole logic.

16:58 <larsc> including the IDDRs used for capturing the incomming data

16:58 <larsc> the problem is the skew introduced by sending the clock from the IO bank to the BUFG and back to the IDDRs in the IO bank is massive

16:58 <larsc> larger than what you can compenstate for with a IDELAY

16:59 <sb0_> is that slowtan6?

16:59 <larsc> kintex

16:59 <sb0_> are you using IBUFDS or IBUFGDS?

16:59 <larsc> GDS I think

17:00 <sb0_> you should use IBUFDS, otherwise you are chaining two BUFGs which is useless, wastes BUFGs and increases skew

17:01 <larsc> looking at the final result I think there is only one BUFG

17:01 <sb0_> ok...maybe the synthesizer removes one automatically then

17:02 <sb0_> you can use a PLL to absorb the skew

17:03 <larsc> looks like in series7 both are the same

17:03 <sb0_> or IDELAY. I'm surprised that would not work. they have pretty long range (nanoseconds) and I assume that your data rate is high if you have this skew problem...

17:03 <sb0_> also, unlike in slowtan6, you can have multiple data edge traveling in a xDELAY

17:04 <larsc> there is a window of about 1ns were the data is good

17:04 <larsc> skew is 3.5ns or sometimg and IDEALY gives 2.8ns when fully turned up

17:05 <larsc> where

17:05 <larsc> something

17:05 <sb0_> use a PLL then

17:06 <larsc> but what's wrong with a BUFIO?

17:06 <sb0_> what's the clock frequency?

17:06 <larsc> 200MHz

17:06 <sb0_> BUFIO should work as well, yes

17:06 <larsc> It worked in my test setup, but as I said I was told we cant do that

17:07 <sb0_> *shrug* isn't the BUFIO designed for doing exactly that?

17:07 <larsc> because of the CDC logic that would be required for it

17:08 <sb0_> AFAIK there's no CDC logic required and using BUFIO to clock the IDDR and BUFG for the fabric is exactly what the kintex7 architecture is designed for

17:09 <larsc> k, thanks.

17:09 <sb0_> but if your boss becomes excessively annoying about it, you can use a PLL...

17:13 <larsc> we've settled for launching and capturing on the same edge instead of the opposite edge, that means we start of with a negative skew at the device which gives us the extra slack we need to compenstate for the the delay inside the FPGA

17:21 <larsc> about IBUF vs IBUFG: 'Synthesis will automatically insert a BUFG on clock nets that it detects, but if it already has a BUFG instantiated in the code it will not add another BUFG.'

17:22 <larsc> http://forums.xilinx.com/t5/General-Technical-Discussion/The-difference-between-IBUF-IBUFDS-and-IBUFG-IBUFGDS/td-p/310949

20:32 <cr1901_modern> whitequark: https://twitter.com/whitequark/status/597459054475276288 I think the point is to show it's so freaking involved to get bare metal code running on modern non-micro CPUs

20:50 <whitequark> that has nothing to do with engineering an OS

20:50 <whitequark> and everything to do with, I dunno, being able to read a datasheet?

21:00 <cr1901_modern> Maybe I just don't see the point in writing an OS if you're protected from the details by a hypervisor that is unfathomably more complicated...

21:00 <cr1901_modern> There will never be another Linux, so all toy OSes are basically to "learn what goes on under the hood"

21:01 <cr1901_modern> (More generally- there will never be another successful hobbyist OS)

21:13 <cr1901_modern> whitequark: After thinking about this more, I think I see your point a bit more clearly. One can always decide if they want to go further after making an OS in a hypervisor or within Linux

21:25 <whitequark> an OS is something that allocates resources and, usually, manages sandboxes

21:25 <whitequark> and communicates between those sandboxes

21:25 <whitequark> that an OS communicates with hardware is an implementation detail

22:42 <cr1901_modern> Thinking about using a hypervisor makes me with more than two privelege levels took off. The kernel itself could be one entire privelege level, which can be tested under a hypervisor or virtual environment. And when the kernel is installed on real hardware, the drivers are given a lower privilege level. This way if the driver is buggy, it doesn't crash the system

22:44 <whitequark> congrats, you have invented microkernels

22:45 <cr1901_modern> What CPU besides x86 has more than two privilege levels :P?

22:51 <whitequark> you don't need that

22:51 <whitequark> run drivers in userspace

22:51 <whitequark> mmap the device IO memory into their address space

22:52 <whitequark> (map the device IO ports into their address space on x86, using TSS IOPM field)

22:52 <cr1901_modern> right, that's what iopl() does on Linux/NetBSD, but you have to be root to do it. Which makes sense. So drivers can only be started as superuser or admin

22:54 <whitequark> screw the whole unix access control system

22:54 <whitequark> make it from scratch based on capabilities

22:54 <whitequark> (capabilities in the sense of unforgeable tokens granting access to resources)

22:54 <whitequark> Mach does that, it makes way more sense

22:59 <cr1901_modern> Uses IPC to notify the driver: "Hey, some process needs to use your capabilities, are you okay/ready for use?" or something like tat

23:00 <whitequark> no

23:00 <whitequark> the capability IS the resource, for the process

23:00 <whitequark> capabilities are a bit like fd's. you can duplicate them, you can inherit them, you can send them via a socket

23:00 <whitequark> (no open though)

23:01 <whitequark> init "inherits" all the capabilities that exist

23:02 <cr1901_modern> I think of fd's typically as "a handle to a that another function uses to determine what function to ACTUALLY call or what data to manipulate"- is that an accurate analogy in your example?

23:03 <whitequark> well, sorta, the kernel-mode code is what actually does the access

23:03 <whitequark> (or mapping)

23:05 mumptai has quit [Quit: Verlassend]

23:05 <cr1901_modern> yes, that makes sense. Capabilities are a resource, and kernel handles resources. As long as the code talking to the hardware itself is user mode, all is good XD. Inevitably, if I write ever an OS, I expect to crash most of the drivers in this manner at least once.

23:06 <cr1901_modern> or ten times

23:08 <whitequark> the good thing about being able to send capabilities is that it doesn't really matter who ran you

23:09 <whitequark> or which group do you run under

23:09 <whitequark> etc

23:09 <whitequark> also since there is no way to "acquire" a capability from thin air? no escalation either

23:12 <cr1901_modern> Idk enough about privilege escalation, so I'll take your word for it. Thinking about how to do an OS is certainly more fun than actually writing it XD.