<traeak>
lkcl: i still want to see that 8 core mips proc with the open media instructions
<traeak>
iCube that is
<lkcl>
traeak: yeah. me too. gotta find around $USD 8m though, because 40nm ain't gonna cut it - have to go to 28nm in 2014 when SMIC is set up and ready
<traeak>
yeah ouch
<traeak>
currently on 65nm
<lkcl>
40nm (with a $USD 5m budget) ain't gonna cut it
<traeak>
just looking at their us website
<lkcl>
yes - via the MVP "shared silicon" programme which is about $50k, very low yields, and done about every 3 months.
<traeak>
also i noticeits dual core, 4 threads per core
<lkcl>
all the fabs do that - it's a good way for people to test their ICs.
<lkcl>
that varies. they can do more, they can do less.
<traeak>
so basically this isn't going to happen from what it looks like
<traeak>
?
<lkcl>
it'll fucking well happen all right.
<traeak>
if they want traction
<traeak>
i can't tell how they are doing opengles
<traeak>
looks like a separate block
<traeak>
so probably proprietary again?
<lkcl>
the moment i find $8m to spare i'm going to phone them up and say "here you go"
<lkcl>
ahh no - you've missed how f*****g smart these guys are, entirely :)
<traeak>
pocket change for you?
<lkcl>
traeak: not a chance :)
<traeak>
well i spent the last 10 mins looking at their page
<traeak>
so that tells you how educated i am :-p
<lkcl>
but if this project goes as well as i planned, then yes it'll be pocket change. but not right now :)
<lkcl>
ok, the way it works is: it's *pure* software.
<lkcl>
there *are* no proprietary hardware blocks. at all. absolutely none.
<traeak>
which is exactly what's required to catch a part of the market that is absolutely not being served right now
<lkcl>
so the parallelism comes from the threads, which are nominally absolute hell to program.
<traeak>
hopefully the market is big enough
<lkcl>
yes, exactly.
eFfeM has quit [Quit: Leaving.]
<traeak>
nah, i do multithreaded stuff for a living
<lkcl>
well that depends on price-power-performance
<traeak>
i just need lots of double precision throughput
<lkcl>
ok, but this is VLIW multi-threading.
<lkcl>
so it's a bit different.
<lkcl>
and that's why they use the open64 compiler ****NOT***** gcc.
<traeak>
ie: i code at a higher level, not at machiencode level
<lkcl>
i repeat: they DO NOT use gcc.
<traeak>
i don't care :-p
<lkcl>
open64 diverged from gcc by using its front-end **ONLY**.
<traeak>
our stuff works on 3 compilers, probably also open64 and intel
<traeak>
i hate msvc by the way, but it's a good check for compatibility
<lkcl>
so at the "high level", the open64 VLIW compiler simply takes *STRAIGHT* c-code
<lkcl>
and if it's parallelisable across multiple VLIW threads, it will parallelise it for you... automatically.
<lkcl>
that's it.
<traeak>
ahh
<lkcl>
done.
<lkcl>
end of story.
<traeak>
so it's sortof lowish level parallelization
<traeak>
whereas i've been rewlying on task level parallelization
<lkcl>
as a result, they LLLLIITTTERRRALLLY just take the standard mesagl - not the clunky opengles - libraries
<lkcl>
and do "CC=open64 ./configure".
<lkcl>
make
<lkcl>
make install
<lkcl>
and that's.... it.
<lkcl>
they have an automatically-parallelised version of mesagl, capable of performing at the level of the big boys.
<traeak>
good
<traeak>
for applcation level stuff then it should be good things
<lkcl>
they have a guy with over 20 years of compiler experience, working for SGI, prior to being on-board with icube.
<lkcl>
yes!
<traeak>
i'm just currently pissed at intel for high performance
<traeak>
no competition anymore they just raise their prices on their "old shit"
<lkcl>
now, here's the thing though: one single CPU, even with multiple hardware threads, is *still* not really fast enough to do H264 decode
<traeak>
ie: a 3930k processor costs more now than when it was released
<lkcl>
arse!!
<traeak>
hmm?
<lkcl>
well, they're going to get a bit of a shock in this market
<traeak>
intel or icube?
<lkcl>
why their processor is not fast at doing H264 decode is because the CABAC decode of H264 is inherently non-parallelisable
<lkcl>
so you have to decode multiple frames, add in a multi-frame buffer, allocate one per core (4 cores say)
<traeak>
hmm
<traeak>
what's their power envelope?
<lkcl>
then have the remaining 4 do the parallelisable bits such as the YUV-to-RGB and the DCT bits, that's all fine
<lkcl>
about the same as any other GPU
<lkcl>
not the embedded ones, the desktop ones.
<traeak>
so the gpu runs on the cores
<lkcl>
for performance/watt that is
<lkcl>
no, the gpu *is* the core.
<lkcl>
the core *is* the gpu.
<traeak>
okay
<lkcl>
there *are* no separate cores.
<traeak>
so the fun is how well it runs under various loads
<lkcl>
there are only instruction set extensions that happen to run on separate hardware blocks, but there are no separate cores.
<traeak>
sure
<lkcl>
and the more "threads" in the GPU-style instructions, the faster those go
<lkcl>
and the more "threads" in the video-decode-style instructions, the faster _those_ go.
<traeak>
hmmm
<lkcl>
but those instructions are issued by the *main CPU* to those pipelined thread engines
<lkcl>
there *is* no separate "GPU".
<traeak>
so its geared mostly towards float32 type? or how is the float64 performance?
<lkcl>
they call it "Unified Processing Unit"
<lkcl>
i think it's mainly geared towards 32-bit
<lkcl>
but there are 64-bit floating point instructions as well - they just take longer
<traeak>
makes sense if they want to compete at themobile level
<lkcl>
and keep the power down, yes.
<traeak>
i honestly don't know what aarm64 is going to look like
<lkcl>
it turns out that in many cases, only 12-bit accuracy in the last stages of floating-point 3D operations is perfectly good enough!
<lkcl>
yeah arm are swamped :)
<traeak>
for consumer stuff sure
<traeak>
hopefully you could play with raytracing stuff on this
<traeak>
but the other question is how much bandwidht is available meomry wise
<lkcl>
no they're swamped right now with Soc vendors for aarm64 :)
<lkcl>
on the ic2?
<traeak>
yeah
<lkcl>
yeah we spec'd that as 64-bit wide, up to 1333mhz. approx.
<lkcl>
maybe by 2014 it'll be possible to go for LPDDR3 or maybe LPDDR4 - whatever.
<lkcl>
but, the NREs on licensing those hard macros gets insanely expensive.
<lkcl>
i can't quite recall but i think it was either $500,000 or $600,000 for the DDR3 interface hard macros
<lkcl>
USB3 was $300,000. etc. etc.
<lkcl>
it's why we needed $5m for a 40nm chip. about $1m was for the production masks. about $2.5m for hard macro licensing.
<traeak>
yeah ouch
<lkcl>
28nm will be another jump on that.
<traeak>
caches seem a bit low, but again this is competing with arm stuff i gather
<lkcl>
but, you see, if it works, you sell $400m worth of chips in about 6 months.
<lkcl>
so it's a high risk that can pay off *big*.
<lkcl>
all you need is someone who doesn't mind blowing $5m (or $9m for 28nm)
<eebrah>
lkcl, traeak: what are we talking about? who's iCube?
<lkcl>
and as it's only money, if i had $9m i'd definitely put it down - cash up front - to make $100m+ and satisfy free software requirements as well
<lkcl>
traeak: if you get it right, then yes. and have the sales channels, yes.
<lkcl>
if it compiles android including 3D graphics and pisses over the competition on price, then what do they care what the CPU is, underneath?
<traeak>
yup, that's what i was wondering
<traeak>
number of transistors, cost per unit, profit, etc
<traeak>
and if we aren't stuck on just android
<traeak>
that will make lots of people very happy
<traeak>
although the "big companies" don't give a rat's ass
<lkcl>
traeak: it's a general-purpose processor with instruction extensions that happen to be parallel and happen to be suitable for 3D and video decode.
<lkcl>
so you can do what you like!
<traeak>
you won't have a problem selling to me if it's capable of being a light desktop etc
<eebrah>
interesting stuff
<lkcl>
and the advantage for the people shipping it, they are no longer critically dependent on a complex proprietary GPU which is hell to integrate into the build.
<lkcl>
and goes wrong.
<traeak>
uh yeah
<traeak>
the GPU shit is exactly what can kill off arm socs
<lkcl>
and requires insane support calls and costs to get it fixed.
<lkcl>
yeah tell me about it.
<traeak>
i would really like to see imageon and nvidia bite the dust hard
<lkcl>
well... yeah.. imgtec that wouldn't surprise me, people are getting fed up with their overcomplex architecture
<traeak>
so us nerds tend to be excited by this, just as long as we can use it :-p
<lkcl>
nvidia ... ha ha
<traeak>
or the fun that is vidcore (hehe)
<lkcl>
nvidia are well ahead of the game, and so have the funds to *stay* ahead.
<traeak>
if you dont' care about being 100% tied to android
<lkcl>
because they're owned by toshiba. first ever with a commercially successful ARM Cortex A9 for example
<lkcl>
NEC's Emma was only 600mhz and didn't really count.
manuel254 has joined #arm-netbook
<lkcl>
yeah. i don't care about android, you might have guessed :)
<lkcl>
ls -altr
<lkcl>
fail
<lkcl>
find . -name "*.txt"
<lkcl>
fail
<lkcl>
grep
<lkcl>
fail
<lkcl>
du
<lkcl>
fail
<lkcl>
wtf
<lkcl>
fail
<lkcl>
deeply unimpressive :)
<lkcl>
ok. enough
<lkcl>
night all
<traeak>
have fun and thanks for the update
<traeak>
if this chip could serve nicely as a lightwieght desktop with some reliable media capabilities i'd be happy
<traeak>
i'll still be stuck with intel chips for serious development
KoH_ has joined #arm-netbook
KoH__ has quit [Ping timeout: 272 seconds]
anunnaki has quit [Ping timeout: 256 seconds]
anunnaki has joined #arm-netbook
Undertasker has left #arm-netbook [#arm-netbook]
gsilvis has quit [Read error: Connection reset by peer]