kloczek has quit [Remote host closed the connection]
Wizzup has joined #linux-exynos
kloczek has joined #linux-exynos
nighty- has joined #linux-exynos
TheSeven has quit [Ping timeout: 255 seconds]
TheSeven has joined #linux-exynos
mszyprow has joined #linux-exynos
aballier has joined #linux-exynos
snawrocki has quit [Remote host closed the connection]
snawrocki has joined #linux-exynos
nighty- has quit [Quit: Disappears in a puff of smoke]
<memeka>
mszyprow: ping
<mszyprow>
memeka: pong
<memeka>
mszyprow: so any idea why is that on kernel 3.10 playing video is so much faster than on 4.x?
<memeka>
same rootfs
<memeka>
issit MFC driver, or the V4L2 subsystem?
<mszyprow>
memeka: have you checked cpu usage?
<mszyprow>
memeka: maybe for some unknown reasons kernel is doing something completely insane like cache flushing on every frame
<mszyprow>
memeka: there was such issue some time ago
<memeka>
mszyprow: on 4.x kernel?
<mszyprow>
when user ptr v4l2 mode was used
<mszyprow>
yea
<memeka>
ok, so... decoding is FAST
<memeka>
i can't argue with that... it's not the decoding but the copying of buffering
<mszyprow>
the other idea I have is to check if cpu freq is configured to the same values
<memeka>
I traced gstreamer, and it was taking 20ms to decode 1 frame, and 24ms to copy
<memeka>
so CPU usage is huge
<memeka>
in 4.x
<memeka>
single thread CPU 100% with 1080p frame
<mszyprow>
okay, so you lose time mainly on copying the frames?
<memeka>
and dropping frames, can't cope
<memeka>
yup
<memeka>
then i found some optimized memcpy in arm asm
<mszyprow>
maybe the 3.10 kernel used some tricks to enable cache on the buffers
<memeka>
LD_PRELOAD that, and it's ok
<mszyprow>
this heavily improves cpu copying
<memeka>
hm
<memeka>
probably that's what memcpy does
<memeka>
that optimized one i mean
<memeka>
issit possible to enable that cache on 4.x?
<memeka>
mszyprow: and i found another interesting thing
<memeka>
profiling gstreamer, kodi and ffmpeg+mpv
<memeka>
so both gstreamer & kodi were losing time on memcpy
<mszyprow>
memeka: frankly I would first like to fix the zero copy path (dma buf issues) instead of hacking for enabling cpu cache on dma buffers
<memeka>
like 80% CPU time = memcpy
<memeka>
mszyprow: yeah assuming arm will ever publish a wayland driver with dmabuf :((
<mszyprow>
copying data to uncached buffer IS time consuming
<memeka>
so most of the other time, like 7% or something, was spent by some tiling function in the mali driver
<memeka>
something like cobjp_neon_linear_to_block_8b_8x8
<memeka>
now, profiling ffmpeg .... the results where opposite
<memeka>
70% CPU time in cobjp_neon_linear_to_block_8b_8x8
<memeka>
10% CPU time in cobjp_neon_linear_to_block_16b_8x8
<memeka>
so that's 80% CPU time in the mali driver
<memeka>
then just under 10% in memcpy
<mszyprow>
it really depends how the memory is mapped to userspace
<memeka>
so this means that the mali driver was importing the buffer, as opposed to gstreamer exporting the buffer?
<memeka>
something like that?
<mszyprow>
and different drivers / kernel versions might use different flags
<memeka>
well here is the same kernel, same drivers, different userspace programs...
<memeka>
i mean gst vs ffmpeg
<mszyprow>
it looks then that ffmpeg is doing de-tiling internally, while gst does it by mali
<mszyprow>
if I got it right
<memeka>
the overall result being that using that optimized memcpy helped gstreamer, because it was copying the buffer with the "memcpy" function, and ffmpeg was not optimized because it was somehow relegating the memcpy to the driver
<memeka>
i think the other way?
<mszyprow>
gsc was copying tiled buffer to mali texture
<mszyprow>
while ffmpeg was de-tiling it (during the copying?)
<memeka>
looks like
<mszyprow>
that's why memcpy replacement had no effect
<memeka>
yup
<memeka>
ok makes sense
<memeka>
so basically you think that the reason 3.10 is better with videos is because it caches the buffers, and basically that's the same thing done by that optimized memcpy?
<memeka>
CPU usage is a bit lower in 3.10 i think, but not by much
<mszyprow>
nope optimized memcpy != using cache
<mszyprow>
that's something completely different
<mszyprow>
probably both can be even used together to have even higher boost
<mszyprow>
I assume that you have compared the hardkernel's v3.10 kernel?
<mszyprow>
memeka: but you might easily check if it helps to reduce cpu usage
<mszyprow>
memeka: arm maintainer rejected this approach many times
<memeka>
thanks
<mszyprow>
btw, it might be a good idea to register on tizen.org
<mszyprow>
there is quite a lot of our exynos related stuff there
<memeka>
i actually have account it seems
<memeka>
from a few years back :))
<mszyprow>
and it looks that anonymous git access is working, git clone git://git.tizen.org/platform/kernel/linux-exynos
<memeka>
but initially i got "Not Found"
<memeka>
now it works
<mszyprow>
maybe it needs login for the first access
<mszyprow>
to set cookies, etc
<memeka>
it works now.... it took a while to load all the repos
<memeka>
testing now if it reduces cpu :D
<mszyprow>
which gst plugin provides "v4l2video0dec" element?
<mszyprow>
I got ERROR GST_PIPELINE grammar.y:816:priv_gst_parse_yyparse: no element "v4l2video0dec"
<memeka>
YAY
<memeka>
it's gst-plugins-good
<memeka>
60% CPU usage on 1080p video, and what looks like good framerate (well, reported via SSH)
<memeka>
yay yay
<memeka>
thanks mszyprow! I'll check tomorrow to see stuff is actually displayed :) but it looks like it's working!
<mszyprow>
:)
<memeka>
time to sleepd now, it's almost tomorrow
<mszyprow>
have a good night then! now you can sleep peacefully ;)
<memeka>
yeah :D
genii has joined #linux-exynos
nighty- has joined #linux-exynos
_whitelogger_ has joined #linux-exynos
_whitelogger_ has quit [Ping timeout: 246 seconds]
_whitelogger has joined #linux-exynos
_whitelogger has joined #linux-exynos
_whitelogger has quit [Ping timeout: 258 seconds]
_whitelogger_ has joined #linux-exynos
mszyprow has quit [Ping timeout: 248 seconds]
willmore has quit [Read error: Connection reset by peer]
willmore has joined #linux-exynos
Putti has joined #linux-exynos
<Putti>
Hi, thought to come here to ask whether anyone knows how to get serial console showing up with Samsung galaxy I9305 that has exynos 4412 SoC? I'm using the linux-next tree. And I actually tried searching this channels IRC logs but found just me asking this very same question one year ago :D
<Putti>
If I understood right wiewo got the UART working with this git tree: https://code.fossencdi.org/kernel_i9300_mainline.git but I'm having the I9305 version instead of I9300 and it doesn't seem to work on it.
Putti has quit [Remote host closed the connection]
Putti has joined #linux-exynos
putti_ has joined #linux-exynos
Putti has quit [Ping timeout: 248 seconds]
putti_ has quit [Remote host closed the connection]
putti_ has joined #linux-exynos
putti_ is now known as Putti
Putti has quit [Ping timeout: 248 seconds]
Putti has joined #linux-exynos
Vasco_O is now known as Vasco
prahal_odroid has quit [Remote host closed the connection]
Putti has quit [Ping timeout: 248 seconds]
Putti has joined #linux-exynos
Putti has quit [Ping timeout: 248 seconds]
Putti has joined #linux-exynos
Putti has quit [Remote host closed the connection]