JohnDoe_71Rus has quit [Ping timeout: 240 seconds]
lkcl has quit [Ping timeout: 264 seconds]
IgorPec11 has quit [Ping timeout: 260 seconds]
lkcl has joined #linux-rockchip
geekerlw has quit [Quit: Page closed]
<LongChair>
stdint: i made some further investigations
<LongChair>
and i'm wondering something
<LongChair>
if vpu decodes 4K frames in NV12
<LongChair>
that makes roughly 3840*2160 * 1,5 bytes per frames right ?
<stdint>
right
<LongChair>
which is roughly 12.4 Megs per frames
<LongChair>
at 60 fps that would make 764 Megs /s
<LongChair>
and the Tinker mem bandwith is roughly 850 megs /s
<LongChair>
that doesn't include the input data bandwith
<LongChair>
given that i would say that there is no chance it can decode & render that as VOP would also consume about the same bandwidth
<LongChair>
is that a valid thinking ?
wadim_ has joined #linux-rockchip
<stdint>
LongChair, you forget the MV data
<LongChair>
MV data ?
<stdint>
it is not 12.4Megs per frame
<LongChair>
it's more you mean ?
<stdint>
it is about 16.5 Megs per frame
<LongChair>
that exceeds the meme bandwith ... 990 M/s
<stdint>
also the vop won't consume so much, it depends on the resolution of the target screen
<LongChair>
so i have two possibilities
<LongChair>
1 - find a way to OC the ram ... i looked yesterday, but could not find where that clock is set
<LongChair>
2 - decode to 1080p frames ... but dunno if vpu output frame size can be specified
<LongChair>
stdint : any clues on any of those two topics
<stdint>
LongChair, about the topic 2, yes it certainly is
paulk-collins has joined #linux-rockchip
<stdint>
you could set the output resolution in drm
<LongChair>
well that won't chaneg the memory bandwidth
_whitelogger has joined #linux-rockchip
<LongChair>
that would reduce the bandwidth by like 4
<LongChair>
because i don't see how drm woudl influence the frame size that vpu will output ...
<stdint>
LongChair, no, the vop scaling won't read all the stride
<stdint>
so the bandwidth would decrease
<LongChair>
at this point i am not using output, i still have problem with decoder and the memory bandwidth it will use
<LongChair>
so VOP is not in the picture yet
<LongChair>
i'm talking VPU & mpp
<stdint>
"render that as VOP would also consume about the same bandwidth"
<LongChair>
yeah i understand
<LongChair>
but if vpu already consumes all the memory bandwidth that will not change anything
<LongChair>
and i think that is what happens
mac-l1 has joined #linux-rockchip
<stdint>
that is why there is a output queue
<stdint>
the display and decoding are async
<LongChair>
so can i with mpp specify the size of the frames that vpu will generate ?
<stdint>
LongChair, you can't
<stdint>
or you can't decode all the MB at once
<LongChair>
MB ?
<LongChair>
do you agree that VPU will generate 16.5 MB frames in memory and will then be limited by the memory bandwidth ? (even without doing any display, so no VOP)
<LongChair>
the only way to overcome this would then be to allow to specify the size of the frames that are coming out of VPU. you could read a 4K stream and decode it into 1080p frames
_whitelogger has joined #linux-rockchip
<stdint>
LongChair, Marcoblock
<LongChair>
i have ni idea what that is
<stdint>
LongChair, you can't do that
<LongChair>
ok then 4K playback will be limited in terms of fps
<LongChair>
roughly to 24/30 fps
<LongChair>
it's not possible to get over this because of memory bandwith limitation
_whitelogger has joined #linux-rockchip
<mac-l1>
stdint: hi guys. i believe the vpu hw has postprocessing capabilities that can downscale the output frame.
<LongChair>
like can you run `hdparm -T /dev/mmcblk0`on RK3399 ?
<stdint>
I don't have rk3399 running Linux currently
<LongChair>
ok
paulk-collins has quit [Remote host closed the connection]
<stdint>
you may ask paulk-collins later
<stdint>
LongChair, also all the rk3399 are using a 32 bit userspace now
<amstan>
stdint: archlinuxarm uses 64 bit userspace on 3399
<amstan>
works pretty well
<stdint>
anyway, what I got at rk3288 tinker is Timing cached reads: 1794 MB in 2.00 seconds = 897.21 MB/sec
<amstan>
haven't tried much graphics, tough i hear it works too
<stdint>
amstan, just internally don't use it
<LongChair>
stdint: yeah i'm getting the same roughly here
<LongChair>
but 897M / 16,5 = 54 fps
<LongChair>
only decoding
wzyy2 has quit [Ping timeout: 240 seconds]
<stdint>
LongChair, 1190.81 MB/sec in 32 bit system
<LongChair>
pretty nice
<LongChair>
stdint: by the way, could you ask someone if there is a way to try OCing the RAM. I found some stuff on older boards in the forums, but the tinker dts files don't seem to have the same stuff
<stdint>
LongChair, in clock parameters is in a dts files
<stdint>
in the u-boot
<LongChair>
i looked both at the miniarm dts and the rk3299.dtsi but didn't find anything
<phh>
well there is no board with 4k60 decoding there ;)
<phh>
wzyy2: but according to the scheme you gave, CPU, VPU and VOP has the same bandwidth. it could indeed not be shared, so it would be ok. still, for 4k60, it needs ~ 950MB/s on the bus, and CPU on the same bus has 880
<phh>
LongChair: can you check with mbw?
<phh>
possibly actual bandwidth is higher, just hdparm is not reliable enough
<wzyy2>
It's not rk3288...
<wzyy2>
It's arm's a73 reference design.
<phh>
ok
<phh>
(when do we get firefly-rk3400 with cortex-a73, mali g71? ;) )
LongWork has joined #linux-rockchip
<wzyy2>
CPU have more stages to access memory than other ip, which limit its speed.
<wzyy2>
It seems rk3399 are not sold well, so i guess rockchip will not be interesting in high performance chip recently.. ; )
<phh>
pfff, I've ordered two rk3399 (firefly and chromebook plus) and still haven't received anything
<wzyy2>
Market capacity is too small, high-performance chips are mostly used for mobile phones, not tv box, tables, industries.
<LongWork>
@wzyy2 : i'm interested in investigating all those performance issues
<LongWork>
i suppose it's very late where you are
<wzyy2>
not late, 9:30
<LongWork>
ok, so you're saying that RAM has a BW of 8 GB/s
<LongWork>
but when decoding a 4K video, the only thing that goes from/to cpu is pushing the packets so that should not be significant
<wzyy2>
yep
<LongWork>
when i do just pure VPU decoding with my code
<LongWork>
VPU will generate 4K frames which ayaka said were roughly 16.5 M / frame
<LongWork>
i'm rtying to reach 60 fps on a few videos
<LongWork>
so that would be about 990M/s coming out of VPU->RAM
<LongWork>
I have clocked the VPU for HEVC to 600Mhz
<LongWork>
and i would expect this to allow to reach that framerate on HEVC/main10/at about 60 mbits
<LongWork>
and i have hard time to reach that
<LongWork>
for some reason in my setup with last release kernel, there is still the same dw_hdmi issue not being able to set the 4K frequencies
<wzyy2>
just revert it - -...
<LongWork>
yeah ... but the issue is still there :)
<wzyy2>
rockchip-linux kernel don't support 4k 60hz for rk3288..
<LongWork>
the wierd part is that i don't set any mode, and i have a 1080p AVR in between so i wouldn't expect it to pick a 4K mode as AVR should not return an EDID with 4K modes
<LongWork>
most of the other devices i have will remain in 1080p mode when i go thru the avr
<LongWork>
they will show 4K only if i plug the device directly to tv
<LongWork>
anyways, not sure hat happens there, but if i have the hdmi cable plugged in, the hdparam measurement will be 650M/s
<LongWork>
if i unplug teh cable, it goes up to 850 M/s
<wzyy2>
Have you test it with xserver closed?
<LongWork>
yes
<LongWork>
we don't have even xserver on our LE build
<wzyy2>
ok
<LongWork>
then when i do pure decoding, i get a better decoding perf with cable unplugged , tahn cable plugged
<LongWork>
so more likely something is eating the bandwith, or it's lower than what we think
<wzyy2>
...I think i found the reason.
<wzyy2>
check this patch.
<LongWork>
i will tonite
<LongWork>
but if we did have 8 GB/s 4K@60 wouldn't be a problem
<LongWork>
at least for decoding
<wzyy2>
I seems nickey set a high qos level for vop.
<LongWork>
it would be imited by VPU performance
<wzyy2>
8 GB/s is not that much.
<LongWork>
is that GigaBit or GigaBytes ?
<wzyy2>
GigaBytes
<LongWork>
4K@60 is anyways 990M/s .. so that is what VPU will use to write the frames into memory, then VOP would need to read those with zero copy, so would be roughly 2GB/s of bandwidth
<LongWork>
which is 1/4th of what we should have available so should be easy ;)
<LongWork>
what would this patch do btw ?
<LongWork>
i mean it seems to set some priority to VOP
<LongWork>
but what would that change
<LongWork>
in my case i don't even do any display yet
<phh>
new
<phh>
fail
<wzyy2>
It's complex than you think... I'm not familiar with those bandwidth things, it's SOC architect's job, when they design the SOC, they will consider it.
<phh>
wzyy2: yes but perhaps we're missing some information that might show that something is not done properly somewhere
<wzyy2>
VOP might use a fixed bandwidth even you don't do any display.
cnxsoft has quit [Quit: cnxsoft]
<ayaka>
the future SoC used for tv box have a better performance in video
<phh>
rk3328? +10?
<ayaka>
not like wzyy2 said, they just having a weak cpu and less interfaces
<wzyy2>
But less performance in gpu and cpu.
<ayaka>
rk3228
<ayaka>
video doesn't need powerful GPU or VPU
<ayaka>
cup
<ayaka>
cpu
<phh>
yes but it's better to have both ;)
<phh>
(ok, not really for a tvbox)
<ayaka>
if you don't play game, dynamic drawing capability is not necessary
<LongWork>
wzyy2: thanks i'll start with removing that
<LongWork>
wzyy2: like phh said, we migh be missing something. From what you say with such bandwidth that should be no problem.
<phh>
perhaps it's not ram :s
<LongWork>
and ayaka said that the VPU was more powerefull than i though :)
<LongWork>
thought
<LongWork>
so if ram is allright and vpu is powerfull that should be no problem ... but that's not exactly what i'm seeing
<phh>
well, I think VPU has a mode where it just outputs a test target, perhaps using it would help there?
<ayaka>
I don't what do you mean
<phh>
I mean a mode where it takes no input at all, but does output to video buffer.
<phh>
well i don't see anything like this in the trm
<phh>
LongChair: perhaps you can try to make a file from a black screen?
<LongWork>
i dunno ... but there are probably some already existing test files
<LongWork>
i wish there was a way to measure VPU usage
<LongWork>
or some profiling tools to see what's happenning
<ayaka>
it is not possible
<ayaka>
there are two kind of things would effec the VPU
<ayaka>
Marco-blocks prediction and memory
<ayaka>
increase the frequency of the VPU would speed up the logic work
<LongWork>
i'm already at 600M there
<LongWork>
that's the last enum value ... not sure if i can go above that :)
<ayaka>
but now the memory become the other problem
<ayaka>
if the MB or I would said the sequence of frames are not to complex
<LongWork>
yeah what makes me wonder is that even if we measure a lower bandwith from CPU, it's not meaning full from what wzyy2 said. CPU has a lower bandwith to ram than VPU
<ayaka>
increasing the frequency won't help a lot
<LongWork>
if VPU has 8GB/s roughly that shouldn't be a problem
<LongWork>
unless something else on the BUS uses the memory bandwidth
<ayaka>
who said that
<LongWork>
who said what ?
<LongWork>
the 8GB/s ?
<LongWork>
wzyy2 did
<LongWork>
he said 533Mhz * 2 * 8. (2 channels, 8 bytes 64 bits data width)
<LongWork>
he also said that the CPU bandwidth to ram was lower .. around 990 M/s
<LongWork>
but that shouldn't matter for decoding
<ayaka>
that number is not correct
lkcl has joined #linux-rockchip
<ayaka>
it is the bandwidth to ram controller
<ayaka>
but there are some latency coefficient
<LongWork>
i'm not an hardware expert
<LongWork>
regarding VPU, is rockchip making the code that goes into it to decode or is it like built in the hardware ?
ganbold has quit [Ping timeout: 246 seconds]
ganbold has joined #linux-rockchip
<ayaka>
which code
<LongWork>
ayaka: i'm wondering how the guys in charge of that deal with such things
<LongWork>
the vpu run some code to decode the streams right ?
<LongWork>
like the codes that handles HEVC / H264 stream decoding
<ayaka>
oh, it is complete run in cpu
<ayaka>
I also write a part of them
<ayaka>
you would see those commit in mpp
<ayaka>
the rest is hardware
<ayaka>
there is no firmware, all the thing is open source
<phh>
LongChair: yes rockchip is amongst the last people to do real hw decoding, not dsp decoding
<ayaka>
no, no just us
<LongWork>
so how you guys check the vpu performance then ?
<ayaka>
nothing beyond what we have done but it in android
<LongWork>
hmmm
<LongWork>
i dunno really where to look at ....
<LongWork>
phh: you did mention another tool earlier ?
<phh>
LongChair: well, that's mbw, but that's to better measure the actual cpu bandwidth
wzyy2_ has joined #linux-rockchip
wzyy2 has quit [Ping timeout: 258 seconds]
wzyy2_ has quit [Read error: Connection reset by peer]
bertje__ has quit [Quit: bertje__]
wzyy2_ has joined #linux-rockchip
wzyy2_ has quit [Read error: Connection reset by peer]
lkcl has quit [Ping timeout: 260 seconds]
wadim_ has quit [Remote host closed the connection]
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
wzyy2 has joined #linux-rockchip
fireglow has quit [Quit: Gnothi seauton; Veritas vos liberabit]
fireglow has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
Mine has joined #linux-rockchip
Mine has quit [Client Quit]
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]