<kc8apf>
those are technically part of the INTL tile but their meaning is part of the adjacent CLBLM
<kc8apf>
so the bits are physically in the INTL (and their addresses in the bitstream reflect that)
<rqou>
oh, i already encountered that
<rqou>
that's not the problem i'm having right now
shadow_dancer has quit [Ping timeout: 260 seconds]
shadowdancer has joined ##openfpga
<rqou>
new investigation seems to suggest there are 7(?!) up/down wires?
soylentyellow__ has quit [Read error: Connection reset by peer]
soylentyellow has joined ##openfpga
StCipher has joined ##openfpga
StCypher has quit [Ping timeout: 255 seconds]
gnufan has quit [Ping timeout: 264 seconds]
gnufan has joined ##openfpga
<daveshah>
kc8apf: that reminds me of the ice40 ultraplus, where one of the 8 DSPs swaps 2 config bits with an adjacent IPConnect tile
StCipher has quit [Ping timeout: 264 seconds]
massi has quit [Remote host closed the connection]
massi has joined ##openfpga
<rqou>
ok, wat
<rqou>
i have a particular set of mux bits
<rqou>
and in rows 1 and 4 a certain bit pattern selects a certain wire
<rqou>
and in rows 2/3 it does a different wire
<rqou>
but these wires are all logically named "I0"
StCypher has joined ##openfpga
<rqou>
ping azonenberg
<azonenberg>
ack
<rqou>
i'm seeing some behavior that totally doesn't make sense to me
<rqou>
i'm pretty sure tile inputs work the way i've hypothesized last time
<rqou>
namely 3x 4-to-1 one-hot muxes sharing control bits followed by a 4-to-1 one-hot mux
<rqou>
so 8 control bits total, 13 unique inputs
<rqou>
anyways, so i tried to fuzz e.g. the mux that controls LAB track number 0
<rqou>
and for a certain pair of set bits (the "same" input), as i move up/down the column
<rqou>
the _logical_ index of the source wire is the same
<rqou>
but the _physical_ location of which mux bits are getting set to drive that source wire don't make any sense
<rqou>
any guesses as to wtf is going on?
<azonenberg>
first guess: tiles are not arranged the same way logically and physically
<azonenberg>
Say, a tile is 2x as high as it is wide physically but is logically square
<rqou>
um, that definitely does not seem to be the problem
<rqou>
the LUT bits all show up in the expected place
<azonenberg>
next possibility, tiles are in groups of (say) 2x2 and mirrored?
<azonenberg>
or possibly just interconnect?
<rqou>
not in any consistent way
<rqou>
so e.g. for LAB line 0
<rqou>
i have a pattern
<rqou>
0111
<rqou>
1110
<rqou>
this seems to select "a right-going wire two tiles to the left"
<rqou>
but which right-going wire?
<rqou>
in rows 1 and 4 it selects the third one
<rqou>
in rows 2 and 3 it selects the first one
<rqou>
but i have a different pattern
<rqou>
1011
<rqou>
1110
<rqou>
this selects "a right-going wire in this tile"
<rqou>
but in row 2 it selects the first one and in all other rows it selects the third one
<rqou>
but ALL OF THESE HAVE INDEX 0?!
<azonenberg>
??
<rqou>
i know, right?
<azonenberg>
talk about non-orthogonality
<rqou>
so wtf do you think is happening?
<rqou>
also, afaict there are _definitely_ wires originating out of the io tiles
<rqou>
but their coordinates are not in the io tile
<rqou>
the coordinates get forced into some other tile
<rqou>
also also, i still cannot get the numbers to add up to the routing resources that quartus will report
<rqou>
maybe the report is bogus?!
<rqou>
azonenberg: so, afaict based on the bits
<rqou>
each tile has 8 left wires, 8 right wires, 7 up wires, and 7 down wires
<rqou>
io tiles have an unknown number of wires
<azonenberg>
The bits dont lie
<azonenberg>
reports are useful but should not be considered absolute truth
<rqou>
hmm, the "direct links" count in the report seems busted too
<rqou>
it claims 888, which depending on how you interpret it is either way too many or way too few
<rqou>
waaaait a sec
<rqou>
the "direct links" number is more plausible if you use the restricted LE count
<rqou>
um... could the R4/C4 numbers be like that too?
<azonenberg>
i guess?
<rqou>
what
<rqou>
no, it reports the same number for the real 240LE
<rqou>
i think i'm going to start ignoring the report; it makes no sense
<rqou>
btw 8 l/r and 7 u/d is consistent with the "routing channel width" numbers that we had disregarded earlier
<rqou>
so azonenberg, what next?
<rqou>
i'm pretty stuck on "wtf is this crazy coordinates/numbering scheme"
<rqou>
but in general the mux patterns kinda make sense?
<rqou>
it's just not clear exactly which wires go into them
shadow_dancer has joined ##openfpga
<rqou>
azonenberg: what would you focus on fuzzing next?
<rqou>
i'm pretty stuck right now on the coordinates issue
<rqou>
but it seems to be blocking getting a deeper understanding?
shadowdancer has quit [Ping timeout: 265 seconds]
sdancer has joined ##openfpga
shadow_dancer has quit [Ping timeout: 276 seconds]
<rqou>
ok, the column bits seem to be much more consistent
<rqou>
it seems like the row wires are just shuffled somehow
Hamilton has joined ##openfpga
sdancer has quit [Ping timeout: 265 seconds]
<rqou>
hmm, based on the data i have here i wonder if N3/N8 neighbor connections are several ps slower than the others
<rqou>
since it seems to be one more mux level
<rqou>
so afaict only the row wire numbering is fucked
<rqou>
the column wire numbering seems to make some amount of sense?
<rqou>
i'm pretty sure i've marked all of the control bits correctly except at the edges
<rqou>
so i think the next step will be to fix my own 2d coordinates
<rqou>
and then mark the bits that are involved in IO cell muxes (not necessarily decoding them yet)
<rqou>
and then actually try to decode mux values
<rqou>
which should be much easier once i know what bits control what rather than going in blind
shadowdancer has joined ##openfpga
shadowdancer has quit [Remote host closed the connection]
shadowdancer has joined ##openfpga
shadowdancer has quit [Ping timeout: 256 seconds]
shadowdancer has joined ##openfpga
<q3k>
shadowdancer: i know nothing about the ps4..?
<q3k>
not sure where you got that impression from
shadow_dancer has joined ##openfpga
dfgg has quit [Remote host closed the connection]
shadowdancer has quit [Ping timeout: 276 seconds]
dfgg has joined ##openfpga
Hamilton has quit [Quit: Leaving]
ondrej2 has joined ##openfpga
merskiasa has quit [Ping timeout: 260 seconds]
X-Scale has joined ##openfpga
Bike has joined ##openfpga
bitd has joined ##openfpga
shadow_dancer has quit [Ping timeout: 240 seconds]
shadowdancer has joined ##openfpga
shadow_dancer has joined ##openfpga
shadowdancer has quit [Ping timeout: 245 seconds]
Miyu has joined ##openfpga
genii has joined ##openfpga
azonenberg_work has quit [Ping timeout: 245 seconds]
azonenberg_work has joined ##openfpga
DocScrutinizer05 has quit [Quit: EEEEEEK]
pie_ has joined ##openfpga
azonenberg_work has quit [Ping timeout: 268 seconds]
scrts has quit [Ping timeout: 264 seconds]
azonenberg_work has joined ##openfpga
scrts has joined ##openfpga
pie_ has quit [Quit: Leaving]
digshadow1 has quit [Ping timeout: 265 seconds]
digshadow has joined ##openfpga
<rqou>
huh, I've been pinged multiple times now on GitHub regarding Rust and SVD/device support crates
<rqou>
i should probably allocate some time to deal with it
shadow_dancer has quit [Ping timeout: 240 seconds]
<awygle>
is it possible to run svd2rust as part of a build script? so that you don't have to include a device-specific crate every time?
<rqou>
there was a comment somewhere that japaric doesn't like that idea
<rqou>
but yes, of course it's possible
<rqou>
since build scripts can run arbitrary subprocesses
<cr1901_modern>
You have svd crates?
<rqou>
(expecting whitequark to jump in at any moment now and call these "typical rqou hacks")
<rqou>
cr1901_modern: i was trying to maintain some unified stm32/efm32 crates earlier
<rqou>
but this requires a ton of effort that i haven't fully invested into it
<rqou>
and overall the ecosystem for this kinda sucks
<cr1901_modern>
I like the core idea of the structs svd2rust generates, even if svd files are of varying quality (*cough* NXP)
DocScrutinizer05 has joined ##openfpga
<rqou>
a bunch of people have ideas that i disagree with, so I should probably find some time to jump into the conversation that they've pinged me on
<rqou>
also, overall I've been finding japaric's rtfm framework itself pretty great, but embedded-hal seems pretty unusable
<rqou>
which is disappointing because i love the idea of embedded-hal
<rqou>
it just doesn't seem to actually be very usable
<rqou>
also japaric has been going around fucking everything up recently so I'm still stuck on a two-month-old nightly until i have time to make everything working again
<awygle>
rqou: can you point me at this discussion? i'm at least an interested observer
<rqou>
awygle: i like the essence of that idea, but that post has a bunch of "extraneous" comments that make me wary
<rqou>
awygle: specifically the "only support 7 modules (that covers 99% of Adafruit stuff)" comment
<awygle>
yeah lol
<rqou>
this is often a red flag for me that this will become a useless Ardui-noob api that isn't actually powerful enough for real use
<awygle>
I don't actually agree even with the proposal as written but at least that person seems to be looking at slicing the right way
<rqou>
(embedded-hal _already_ has this problem)
<awygle>
I don't even really know why SVD is so prevalent in the discussion
<rqou>
yeah I'm not really tied to the idea of svd
<awygle>
I really like the way chibios is arranged
<rqou>
see for example my svd fragments that have to run through the c preprocessor first
<rqou>
hmm i should look into that
<rqou>
I've seen several people recommend it
<rqou>
in general though my opinion is that i hate HALs
<awygle>
A chip as a collection of peripheral drivers, a board as a mapping from pins to peripherals (in short, glossing over a lot)
<rqou>
i just don't get the point of "board" abstraction
<awygle>
Well, your opinion is wrong :-P HALs are hugely useful for all those cases where you're not pushing the envelope, as long as they're reasonably sane
<awygle>
You can always beat a hal, but that's not the point
<awygle>
I'm not super married to a board level abstraction but somebody somewhere has to know what pins go to what, and it's nice if that's all in one place for purposes of porting
<cr1901_modern>
Also, rqou/awygle, you both idle in #rust-embedded. Why not discuss embedded rust in there where you could actually get help?
<rqou>
because there's never activity there
<cr1901_modern>
ppl will get back to you, you just need to be patient
<awygle>
because Im not actually trying to do anything. I just enjoy shooting the breeze. I'm not writing any kind of embedded code, currently.
<awygle>
If I needed help I'd go there
<rqou>
in my (one, so not very representative) attempt to contact japaric it didn't go particularly well
<cr1901_modern>
If you DM him he will eventually get back to you. I understand OT in #openfpga, but doing so for embedded rust seems incredibly redundant (when not everyone in here uses Rust like that in the first place, if at all).
m_w has quit [Quit: leaving]
<rqou>
well, my one attempt to contact japaric went like this: "plz 2 comment out this one line of code. it doesn't break anything and fixes cortex-m0. <crickets, time passes> oh, i fixed it in the latest release (which also broke a whole bunch of other stuff)"
<awygle>
cr1901_modern: my concern is that if I complain, idly and without research, in #rust-embedded, it will sound like I'm asking for a change. That will either burn social capital with the community as they explain all the ways my uninformed complaints are uninformed, or cause a bunch of people to do a bunch of work based on my idle musings. The first is bad for me, the second bad for others. Complaining here is safe.
<awygle>
When/if I actually want to engage the community, I'll do a lot more homework.
<cr1901_modern>
awygle: Social capital? You're one of the reasons msp430 works in the first place! :P
<rqou>
in general i find "large" communities not really worth the effort to interact with
<awygle>
cr1901_modern: well, yes :) but that was at a substantially lower level of the stack (and i have a pile of TODOs in that area that realistically i won't get back to for maybe as much as a year)
DocScrutinizer05 has quit [Disconnected by services]
DocScrutinizer05 has joined ##openfpga
DocScrutinizer05 has quit [Disconnected by services]
DocScrutinizer05 has joined ##openfpga
digshadow has quit [Ping timeout: 256 seconds]
DocScrutinizer05 has quit [Quit: EEEEEEK]
DocScrutinizer05 has joined ##openfpga
<Ultrasauce>
to throw a little more on the OT pile, today a technical rep from a major camera vendor told me to reverse engineer their product to avoid having to go through the nda/partnership/knowledge transfer process
<Ultrasauce>
I am a little weirded out!
<Bike>
that sounds pretty shady.
<shapr>
yikes
digshadow has joined ##openfpga
<balrog>
Ultrasauce: haaaah
<balrog>
because they refuse to provide reasonable documentation?
mwk has quit [Ping timeout: 240 seconds]
ovf_ is now known as ovf
mwk has joined ##openfpga
m_w has joined ##openfpga
<awygle>
wow seriously?
<awygle>
i can't decide if that's horrifying or awesome. probably both.
<Bike>
If they have one of those "if you use this you can't RE it" agreements couldn't they be huge assholes and pursue you?
Miyu has quit [Ping timeout: 260 seconds]
<Ultrasauce>
it certainly does feel like a potential trap, not that I'd ascribe any explicit ill intent to the suggestion
<Bike>
yeah, they probably wouldn't actually do that, but an advantage of going through the legal gibberish is that they can't change their minds
user10032 has quit [Quit: Leaving]
DocScrutinizer05 has quit [Quit: EEEEEEK]
DocScrutinizer05 has joined ##openfpga
bitd has quit [Quit: Leaving]
Bike has quit [Ping timeout: 260 seconds]
numarkaee has joined ##openfpga
DocScrutinizer05 has quit [Ping timeout: 256 seconds]
pie_ has joined ##openfpga
<azonenberg_work>
awygle: So i re-ran the numbers for the mac table w/ more significant digits
<azonenberg_work>
If we have minimum length packets at full line rates on all interface
<azonenberg_work>
We have a max of 95.23 Mpps
<azonenberg_work>
Which means we need to average 1.64 clocks per packet if the MAC table is running at 156.25 MHz
<azonenberg_work>
that's more margin than i thought, but i still want to try and pipeline it to do one lookup per cloc
<azonenberg_work>
That would allow me to process 156.25 Mpps, or 107.9 Gbps, of min-sized packets without blocking in the mac table
<azonenberg_work>
Still only ~half the performance I need for LATENTORANGE though, i will probably have to do a dual-panel table and/or upclock to 312.5 MHz for that
<azonenberg_work>
(targeting 280 Gbps max throughput there)
<q3k>
i wonder if that's something commercial switches actually handle well
<q3k>
i wouldn't be surprised if they just start flooding
<azonenberg_work>
Don't know
<azonenberg_work>
As long as you never send >1 Gbps per gig port and >10 Gbps per 10G port, LATENTRED should not drop anything or flood ever
<azonenberg_work>
If you have bursts of faster data, the buffers will cover it up to a point
<azonenberg_work>
in particular, the 72 Mb of QDR-II+ can handle up to 7.2 ms of full line rate 10G traffic going to a single 1G interface
<azonenberg_work>
before filling up
<azonenberg_work>
at which point it'll start to drop 90% of the traffic
<azonenberg_work>
Actually the 7.2 ms assumes i'm not emptying the buffer as i fill it
<azonenberg_work>
So actually i think it comes out to 8 ms
<azonenberg_work>
in any case, that is kind of an unavoidable problem if you are rate-matching interfaces, all you can do is make the buffer bigger but dropping is inevitable in that situation
<azonenberg_work>
Not something i can fix architecturally
<q3k>
i'm still disguisted by how expensive commercial 10GbE switches are
<q3k>
fucking broadcom monopoly
<azonenberg_work>
lol
<azonenberg_work>
how expensive are you complaining about?
DocScrutinizer05 has joined ##openfpga
<q3k>
well, expensive for a hackerspace/home lab
<azonenberg_work>
give me a number
<q3k>
i think the arista I want is around $2k
<q3k>
second-hand
<azonenberg_work>
So, the BOM cost for LATENTRED right now (incomplete, for example i dont have all of the passives for the brain board yet)
<azonenberg_work>
is about 1.2k in components alone
<azonenberg_work>
For single unit volume
<q3k>
sure
<azonenberg_work>
then several hundred in PCBs
<q3k>
i don't mind paying that for low-volume hardware
<azonenberg_work>
and then the custom 1U case
<q3k>
i mind paying that for mass-produced second-hand hardware
<azonenberg_work>
also keep in mind that LATENTRED is 24x 1G / 4x 10G interfaces
<q3k>
i know
<q3k>
still worth it when it comes to experimental hw
<azonenberg_work>
The thing you linked is closer to LATENTORANGE, which will be tentatively 28 10G lanes, with a TBD mix of 10G and 40G ports
shadow_dancer has joined ##openfpga
<q3k>
might end up going with a juniper qfx3500
<q3k>
but then I don't have access to firmware downloads
<q3k>
and need a license for bgp (!)
<q3k>
ugh
shadowdancer has quit [Ping timeout: 268 seconds]
<rqou>
aaaaaargh i just spent ages hunting down a bizarro hardware bug
<rqou>
*firmware bug
<rqou>
turns out I got bit by store/reorder buffers
<rqou>
in a cortex *m*
hackkitten has quit [Read error: Connection reset by peer]
hackkitten has joined ##openfpga
<awygle>
arm's memory model is bonkers
<whitequark>
what
<awygle>
well okay. bonkers is not fair. but it's much looser wrt ordering than x86 or x86-64
<pie_>
single stack was a mistake
<rqou>
if you clear an interrupt pending flag too close to the end of the isr handler, the write can get buffered and cause the handler to get entered again
<q3k>
>arm's memory model is bonkers
* q3k
[laughs in MIPS]
<awygle>
mips is not in the list im' looking at
GenTooMan has joined ##openfpga
<whitequark>
rqou: oh
<whitequark>
wow, I'll need to keep that in mind
<rqou>
yeah, so if your timers ever appear to be firing twice as quickly, this is one possible reason :P
<azonenberg_work>
rqou: did you not do a dsb before the end of the ISR?
<q3k>
i tend to always treat interrupts as possibly spurious in my systems code
<rqou>
no, you usually don't need one
<q3k>
is not pretty but protects you against weird races like that
<azonenberg_work>
q3k: and i prefer to not use interrupts and write event-driven code where hardware does all of the hard-realtime stuff and you just pop an event queue as you get aroudn to things :P
<q3k>
well, you're not supposed to do hard work in ISRs anyway
<azonenberg_work>
??
<q3k>
drop that event into a queue, schedule a bottom half, run the bottom half when your system is idle
<azonenberg_work>
how do you normally handle things like "this has to be done within 5 clocks of pin X going high"
<q3k>
you wanna do that in software? :P
<azonenberg_work>
No :p
<whitequark>
you're absolutely supposed to do work in ISRs
<azonenberg_work>
Which is why i use FPGAs for almost all of my embedded work these days
<whitequark>
cortex-m-rtfm is built entirely around that
<azonenberg_work>
my point is, i prefer the event-driven model
<q3k>
whitequark: and I'll argue this is poor practice
<azonenberg_work>
so why have your CPU be interrupted at all?
<azonenberg_work>
Why not just design the architecture so the hardware puts events in the queue for you, then you just pop when idle?
<whitequark>
q3k: do you have a non-cargo-cult reason?
<azonenberg_work>
This is why in antikernel most of my CPUs don't even support interrupts
<q3k>
whitequark: starvation of non-interrupt-driven logic
<azonenberg_work>
It's much more deterministic this way
<whitequark>
q3k: you don't have to have any.
<whitequark>
for one.
<q3k>
whitequark: and oftentimes I end up having code have to synchronize data from multiple sources, so I prefer getting them accessible from a single thread safely as fast as possible
<whitequark>
and for another, if you do event-driven, you're just exchanging that for losing events *and* you can still starve other logic if event-driven logic takes too much
<whitequark>
that's a better reason
<q3k>
right, but with event driven it's much easier to apply backpressure on different parts of the system to limit starvation
<q3k>
and to actually measure your system load by different event types
<whitequark>
you can't apply backpressure to interrupts, just like you can't block in them...
<azonenberg_work>
q3k: in antikernel i planned to implement that by having ulimits per event source
<azonenberg_work>
For example, once the NIC has more than 32 pages allocated to it, future mallocs will fail with "you're using too much ram"
Bike has joined ##openfpga
<azonenberg_work>
and ethernet frames will be dropped until the IP stack (whether SW or HW) catches up and frees some of the pages the NIC is using
<awygle>
i would have expected reti to act as a barrier, 'parently not
<awygle>
i usually try to avoid substantial work in interrupts, but i've mostly worked on systems where the main interrupts are "DMA complete" interrupts that just need to wake up a thread to deal with the new buffers
<rqou>
arm doesn't have reti
<rqou>
especially not in cortex-m
<rqou>
it's just a normal bx lr
<whitequark>
um, no
<whitequark>
the instruction encoding is normal but the return isn't
<whitequark>
you have a special value in lr
<rqou>
well yes
<whitequark>
so the core can do whatever
<whitequark>
for that matter, it *does* whatever
<rqou>
but it's not a separate opcode like x86
<rqou>
i guess `bx lr with a magic value` could have been made to be a barrier
<awygle>
huh. i've never written arm asm, but i googled "arm reti" before i said that and got results that implied to me it existed. is it an alias?
<awygle>
oh, no, i see
<awygle>
nvm, reading comprehension failure
<awygle>
wow, that seems kind of ugly actually, vis a vis the hardware
gnufan has quit [Ping timeout: 255 seconds]
<whitequark>
awygle: I think the idea is that interrupt handlers are just C functions
<whitequark>
NVIC knows the C ABI, reusing bx lr was the last missing part
<awygle>
i can see it being useful for the software. kind of weird for hardware to be reading return addresses and doing atypical things though. seems like a big comparator for one thing.