#m-labs on 2014-03-08 — irc logs at freenode.irclog.whitequark.org

2013-12-11 12:34 lekernel changed the topic of #m-labs to: Mixxeo, Migen, MiSoC & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

00:17 rjo_ is now known as rjo

00:21 mumptai has quit [Ping timeout: 240 seconds]

02:17 xiangfu has joined #m-labs

03:43 xiangfu has quit [Ping timeout: 265 seconds]

03:57 sh4rm4 has quit [Remote host closed the connection]

03:57 sh4rm4 has joined #m-labs

04:02 sh[4]rm4 has joined #m-labs

04:04 sh4rm4 has quit [Ping timeout: 252 seconds]

04:11 sh[4]rm4 is now known as sh4rm4

08:06 sb0 has quit [Quit: Leaving]

09:08 Alain has joined #m-labs

09:10 Alain has quit [Read error: Connection reset by peer]

09:10 Alain has joined #m-labs

09:13 kflux has joined #m-labs

09:36 mumptai has joined #m-labs

11:04 xiangfu has joined #m-labs

11:45 kiritan has joined #m-labs

11:45 kflux has quit [Ping timeout: 244 seconds]

11:53 xiangfu has quit [Ping timeout: 265 seconds]

13:05 Alain has quit [Remote host closed the connection]

13:08 Alain_ has joined #m-labs

13:36 sh4rm4 has quit [Ping timeout: 252 seconds]

13:41 Gurty has quit [Ping timeout: 240 seconds]

13:43 Gurty has joined #m-labs

14:09 xiangfu has joined #m-labs

14:30 kiritan has quit [Read error: Operation timed out]

14:48 sh4rm4 has joined #m-labs

15:01 xiangfu has quit [Quit: leaving]

15:20 Alain_ has quit [Remote host closed the connection]

18:04 Alain has joined #m-labs

18:10 awallin_ has joined #m-labs

18:14 <awallin_> hi all, was wondering if anyone has worked with the tdc-core? in particular it would be interesting to put it on a papipilio pro spartan 6 dev-board. I think I saw a github repo with something like that..

18:17 sb0 has joined #m-labs

18:17 <sb0> awallin_, which TDC core? SERDES or delay line based?

18:18 kflux has joined #m-labs

18:19 <awallin_> sb0: hi, I did not know there were two!? I just read about the one on ohwr.org

18:19 <awallin_> how do they differ in architecture and performance?

18:19 <sb0> that's the delay line based then

18:20 <sb0> delay line has ~25ps resolution but it'll be difficult to fit in lx9 (though possible if you multiply the clock and use a shorter delay line, which might also improve resolution, but will take work)

18:21 <sb0> serdes is easier to use and much simpler and smaller, but has a bit less than 1ns resolution

18:21 <sb0> rjo has a migen version of the serdes tdc

18:22 <awallin_> that is all python code which is converted to vhdl?

18:22 <sb0> the serdes tdc I wrote is vhdl

18:23 <sb0> https://github.com/sbourdeauducq/serdes-tdc

18:24 <sb0> rjo wrote a similar one with migen

18:24 <awallin_> hm, I am looking at http://www.xilinx.com/publications/prod_mktg/Spartan6_Product_Table.pdf it seems the SPEC has an LX45 chip with maybe 5x more resources than LX9? is that roughly correct?

18:24 <sb0> with the delay line tdc, the problem is the height of the device column where you'll have to fit the delay line

18:25 <sb0> I don't think it'll fit as-is in anything smaller than lx45

18:25 <sb0> not because the device is full, but because the carry chains are not long enough

18:25 <sb0> so you'd need to shorten it, and multiply the clock

18:26 <awallin_> ok. is there a description of the serdes approach somewhere? is it in principle possible to get a resolution similar to delay-line with serdes?

18:26 <sb0> delay line will always be more precise than serdes by orders of magnitude

18:27 <awallin_> what about the xilinx tools for large FPGAs? I heard some large devices require an expensive paid version of xilinx ISE?

18:28 <mumptai> only for ones bigger than LX75 for spartan6

18:29 <awallin_> ah, ok, so LX45 is still ok. I guess ohwr/SPEC users would not be happy otherwise :)

18:30 <awallin_> so maybe I need an LX45 dev-board then if I want to play with the delay-line tdc..

18:31 <sb0> serdes is just sampling the incoming signal at ~1GHz and detecting edges

18:32 <sb0> awallin_, what's your application by the way?

18:33 <sb0> the delay line might fit in lx9, if you multiply the clock

18:34 <sb0> and note that the slowtan6 PLLs have a lot of jitter (>100ps) so for high resolution you'll need to add an external, better PLL chip and deal yourself with the phase-alignment of the clocks

18:35 <awallin_> just think it would be nice to build an open-hardware/software time-interval/frequency counter..

18:35 <sb0> though you might still have luck with the internal PLLs...

18:35 <awallin_> PLLs: you mean you generate a 100/200 MHz external clock and feed it as input on some FPGA pin to clock parts of the fpga-circuit?

18:36 <sb0> in lx9, you'll need a short carry chain

18:36 * awallin_ away for a while..

18:37 <sb0> which you will sample at a high frequency, typically all that the slowtan6 clock network will give you

18:38 <sb0> but your whole circuit will not run at that frequency, so you'll need to deserialize that sampled data with a lower frequency phase-aligned clock

18:38 <sb0> and you'll need a PLL for those clocks

18:39 <sb0> you can also not use a carry chain and implement the delays with LUTs/routing

18:39 <sb0> but you'll need a lot of difficult low-level work

18:40 <sb0> (in spartan6, the carry chains need to be vertically stacked and are relatively fast, so the device has to be high enough to accommodate one with a total delay longer than the system clock period)

18:44 <awallin_> that sounds complicated :)

18:44 <awallin_> maybe best to get started with LX45 and the current code as-is then?

18:44 <sb0> yeah, maybe it's better for you to use the serdes-tdc first

18:45 <sb0> the delay line tdc code is also not all that simple to use, even on lx45

18:47 <awallin_> have you tried it on anything else than a SPEC ?

18:48 <sb0> hmm, maybe the synchrotron soleil people did

18:48 <sb0> http://www.ohwr.org/documents/206

18:50 <awallin_> so is there a well-supported cheap LX45 dev-board around?

18:51 <sb0> sp601

18:51 <sb0> or mixxeo ;)

18:51 <sb0> though the latter won't be "cheap"

18:52 <awallin_> the sp601 I find in digikey is listed as having an LX16

18:53 <sb0> sp605 sorry

18:54 <awallin_> hmh that costs as much as a SPEC :)

18:55 <awallin_> sb0: what's the MIXXEO going to be used for primarily? real-time mixing on live tv?

18:57 <sb0> more for events

19:56 kflux has quit [Ping timeout: 244 seconds]

20:29 kflux has joined #m-labs

21:47 littlebab has joined #m-labs

21:58 Alain has quit [Quit: ChatZilla 0.9.90.1 [Firefox 27.0.1/20140212131424]]

21:59 furan has joined #m-labs

21:59 <furan> hi any fpga people around?

22:18 <sb0> yes

22:19 <furan> do you know much about the hardware graphics (scaling/etc) code?

22:19 kflux has quit [Ping timeout: 240 seconds]

22:25 <sb0> in milkymist soc? since I wrote most of it, yes

22:26 <furan> cool

22:28 <furan> I'm self taught in FPGA stuff, have made several things, but graphics hardware logic kind of stumps me.

22:30 <furan> like for the scaling filter, i would expect there to be a module that takes access to the bus with some parameters and walks through doing its thing, but instead you seem to have modules for memory traversal that call into simpler modules that do the kernel.

22:30 <furan> can you tell me why it was implemented that way?

22:30 <sb0> huh?

22:31 <sb0> are you talking about milkymist soc, from http://m-labs.hk/m1.html ?

22:31 <furan> yeah

22:31 <sb0> there's no scaling filter, there's a texture mapping unit

22:32 <sb0> there's a FSM that fetches vertex data from the memory and pushes it into the pipeline

22:32 <furan> I thought there was some filter that came with a cool VPI for testbench

22:32 <sb0> you're probably talking about the TMU test bench, yes

22:33 <sb0> you can do scaling with the TMU

22:33 <sb0> but it's not a 'kernel', just texture coordinates

22:33 <furan> gotcha

22:33 <furan> can you explain to me your graphics pipeline flow?

22:34 <sb0> have you read this? http://m-labs.hk/thesis/thesis.pdf

22:34 <sb0> most of the TMU is explained there

22:34 <furan> thanks

22:35 <furan> is tmg texture management unit?

22:35 <sb0> texture mapping unit

22:42 <furan> I agree with you about open research papers; they're how I've learned my whole adult life.

22:43 <furan> I've made my own organic light emitting diodes and I think that says a lot for the value of open research papers.

22:51 ramzes has quit [Ping timeout: 264 seconds]

22:57 ramzes has joined #m-labs

22:59 <sb0> furan, do you have a web page about that?

22:59 <furan> yeah

22:59 <furan> http://escapehatchlabs.com/blog/oled-manufacturing-hobby-post-mortem/

23:00 <furan> your sdram explanation is really good

23:03 <furan> and is making me think I could optimize read-modify-write operations by keeping a row open for the duration

23:04 <furan> that's another thing, I see a lot of fifo usage with memory controllers, where the controller itself is not so close to the modules which manipulate it

23:06 <furan> whereas doing the kind of optimization above would block other modules on the bus from accessing memory during the duration

23:11 <furan> sorry if I'm bugging you I just don't know many people who do this stuff

23:12 <sb0> hpdmc is already keeping rows open

23:12 <sb0> lasmicon still does

23:12 <furan> oh so that is a thing

23:13 <sb0> the problem with back-to-back rmw is that the read and write operations have different latencies and there's also a IO bus turnaround time

23:13 <sb0> plus a write recovery time

23:13 <furan> recovery time = time to precharge state?

23:13 <sb0> so you typically want to read a lot, then write a lot

23:13 <sb0> yes

23:14 <furan> ah, hence the streaming/fifo designs

23:14 <sb0> or to read

23:14 <sb0> you cannot read data you've just written

23:14 <furan> yeah it would require a flush

23:17 <furan> so with sdram it really makes sense to do a sort of tiling thing where you read into a tile in fpga bram, do the operations, and then copy that tile back to sdram

23:17 <sb0> yeah, caching basically

23:18 <sb0> when the access pattern isn't fully predictable

23:18 <sb0> or you read/write the data multiple times in a short interval

23:19 <sb0> nice oled hack. maybe you'll like http://ehsm.eu :)

23:19 <furan> does the milkymist soc's sdram controller optimize keeping a row open for multiple operations when the word size used is larger than the word size in the sdram?

23:19 <furan> thanks :)

23:20 <furan> wow that looks cool

23:20 <sb0> sdram rows are big - much bigger than the words

23:21 <furan> I (somehow?) got invoted to a conference called "Hackers" that was started by the folks written about in steven levy's hackers book

23:22 <furan> super high bandwidth conversations

23:24 <furan> but I had to speak my first year and I made a fool of myself by talking about how I was going to make OLEDs but not knowing enough yet in front of people like donald knuth, guy who invented the furby(can't remember his name), and the woot.com founder :P

23:25 <sb0> you should bring some OLEDs to EHSM :)

23:25 <sb0> now that they work

23:26 <sb0> last year Ben Krasnow was one of the speakers - he's doing ITO deposition in his home lab now

23:26 <furan> yeah I think I like sdram now. for a while I've been thinking about it as slow and taking many clocks but now I'm thinking it enforces good design, and a lot of optimization can be done with the rows.

23:26 <sb0> he might come again this year

23:26 <furan> yeah I've told him I'll send him some alq3 to do resistive oled emitter deposition but I haven't sent it yet(oops)

23:27 <furan> I hate shipping stuff and he's in the same city I am every work week.

23:27 <sb0> yeah, making SDRAM go fast is learning the hard way :-)

23:27 <furan> lol

23:33 <sb0> if you want to make your open source GPU, the Milkymist SoC TMU would be a good starting point

23:34 <sb0> though I really recommend you use migen for that, because a lot of structures are recurring, and you could copy them with a few lines of python code instead of massive verilog/vhdl copy and paste

23:35 <sb0> you'd need to use triangle interpolation instead of the squares it uses atm, add perspective correction, mipmapping, etc.

23:35 <sb0> blending of colors (not only texture coordinates)

23:35 <furan> well I have made some kind of retarded ones so far, that just do arbitrary walks and 2d/rop operations based on a bunch of parameter registers. I used it to control a LED matrix core I wrote.

23:36 <furan> I'd like to rebuild that and get it working, and then graduate to a 3d pipeline

23:36 <sb0> a Z-buffer too

23:37 <sb0> all those things are built on the same principles as the current TMU

23:37 <furan> http://www.flickr.com/photos/oohshiny/6372599273/ :D

23:37 <furan> nods

23:37 <sb0> the most painful thing is the triangle interpolation, and maybe mipmapping, as you'll need to process 8 cache access per cycle

23:37 <sb0> for trilinear filtering

23:37 <furan> well I like to build things from scratch but I'll take a lot of knowledge out of looking at your designs

23:38 <sb0> current cache is only 4 accesses

23:38 <sb0> now that I think of it, upgrading to 8 wouldn't be too much of a hassle, just use more BRAM

23:38 <furan> does that mean you have an 8-way ported memory?

23:39 <sb0> yeah

23:39 <sb0> but there's a trick

23:39 <sb0> since you need multiport for read only

23:39 <sb0> you can have several memories and duplicate the contents

23:40 <furan> that's nuts

23:42 <sb0> well, the current TMU design already does that, since I need a 4-port memory

23:44 <furan> I was thinking if your walks were aligned towards row/column you could have 8 horizontal or vertical scan buffers

23:44 <furan> instead of duplication across brams

23:46 <sb0> hmm, the mipmap levels might change during the scanning of a single triangle

23:48 <furan> what would that cause?

23:52 <sb0> only a bit of further annoyance :) you already cannot escape the 4-port memory for bilinear filtering already

23:53 <sb0> mipmapping is about fetching 2 textures instead of one

23:53 <sb0> so you'd have two of those 4-port caches operating in parallel, or a single 8-port cache

23:54 <furan> nods

23:55 <sb0> both will use the same amount of BRAM at roughly the same performance level. the single cache might give slightly better performance when the mipmap levels change and what used to be the mipmap level of one way becomes that of the other.

23:56 <sb0> unless, of course, the cache data of one mipmap way gets trashed by the other way, which you need to be careful to avoid

23:56 <furan> if you're talking about mipmap level changing that means you keep these caches for a long time, not just for the duration of a frame

23:56 <sb0> (if possible at all. I never actually implemented mipmapping. maybe 2 separate caches is the best option for this reason)

23:57 <sb0> as I said mipmap levels can change during the texturing of a single triangle

23:58 <furan> because part of it could be 'nearer'?

23:58 <sb0> yes

23:58 <furan> I didn't realize mip mapping had that granularity, got it

23:58 <sb0> you're also averaging the output of the two mipmap levels

23:59 <furan> alright I am gonna get back to the reading about the sdram controller before I page too much out

23:59 <sb0> you do an average weighted by the distance between the ideal (frational) mipmap level and the two discrete mipmap levels you have

23:59 <sb0> that's why it's called trilinear filtering