narmstrong changed the topic of #linux-amlogic to: Amlogic mainline kernel development discussion - our wiki http://linux-meson.com/ - ml linux-amlogic@lists.infradead.org - Publicly Logged on https://irclog.whitequark.org/linux-amlogic
default__ has joined #linux-amlogic
cthugha has quit [Ping timeout: 256 seconds]
<ndufresne> Ely, no idea for the cpu cache, aren't CMA memory always non cacheable ?
<Ely> Yeah I guess so
<Ely> I should try that hack to see how much of a performance improvement there is
<ndufresne> with virtual memory, I can decode in software and render 1080p with 48fps average
<ndufresne> hmm, sorry, that was just the decoder, so it average to 21fps
<ndufresne> Ely, wow, ok it's really the color conversion, I've limited the sw decoder to 1 thread, and I get steady 25fps, all in sw
<ndufresne> I measure the HW decoder to do about 58fps raw speed on that stream, so there clearly the cache have a drastic impact
<ndufresne> Ely, have you played with the 4 bits called "endian" in the canvas ?
<ndufresne> Ely, was a good guess, with this patch https://paste.fedoraproject.org/paste/7aw9nH2ZB1s8VOP8pKhkDQ
<ndufresne> Ely, I now have non-swapped images
<ndufresne> I picked the value 7 for endian from some code in 3.14
<ndufresne> Ely, event better, I got NV12 instead of NV21 !!!
<ndufresne> the doc I found from a comment in the 3.14 kernel, https://paste.fedoraproject.org/paste/85DDK1SfDDK0sSS5HRGa9g
<ndufresne> Ely, a little further, https://paste.fedoraproject.org/paste/6o3SnnFyu-glnthyduqswg this is proper NV21, the first bit of the endian parameter will cause each 16bit to be swapped
vagrantc has quit [Quit: leaving]
<ndufresne> the follow will swap two 16bit over each 32, third two 32bit over 64, and forth two 64 over 128bit
<ndufresne> So as we have the first 8 byte reverted, applying the first 3 swaps (7) will put it back in order
<ndufresne> and skipping the 8byte swap on the luma plane will change UV into VU (NV12 into NV21)
<ndufresne> narmstrong, ^ that's good news, we can get normal NV12, and even can choose between NV12 and NV21, that's makes this decoder much much nicer !!!
<ndufresne> it also means we don't need new V4L2 and DRM formats, at least not in the short term
chewitt_ has joined #linux-amlogic
<ndufresne> it's a bit unbelievable that dsd and Jasper missed that on the endless side ...
<ndufresne> good night !
chewitt has quit [Ping timeout: 248 seconds]
<narmstrong> ndufresne: I suspected the canvas params was involved, great findings !!
<TobiasTh1Viking> xdarklight: https://pastebin.com/KDnudu0i <- dmesg with "&meson8_pmx_ops,", i see no difference. serial console seems to work nominally with this patch.
trem has joined #linux-amlogic
<xdarklight> TobiasTh1Viking: ok, maybe it doesn't disable unused pins
<narmstrong> ndufresne: si it will be the same on the display side then !
Elpaulo has quit [Quit: Elpaulo]
<xdarklight> TobiasTh1Viking: I guess the next step is to get meson_bank right so you can test whether the GPIOs are working (then we can keep playing with the pinmux ops)
<xdarklight> let me know once you have time so we can figure it out
Elpaulo has joined #linux-amlogic
libv_ has joined #linux-amlogic
libv has quit [Ping timeout: 260 seconds]
The_Coolest has joined #linux-amlogic
<The_Coolest> Hey guys, how can I set a pull up on a gpio pin?
<The_Coolest> Will this work?
<The_Coolest> gpio_direction_output(dev->dat_pin, 1);
<The_Coolest> gpio_direction_input(dev->dat_pin);
yann has joined #linux-amlogic
<TobiasTh1Viking> xdarklight: ok, i'll ping at some point when i feel i have an hour. How do we verify GPIO's when i don't actually have access to any GPIO? (just by sd card working? )
<ndufresne> narmstrong, yes, I guess, that would likely allow swizzling various formats, BGRA <-> ARGB, of NV12/NV21, etc.
<xdarklight> TobiasTh1Viking: SD card detection pin is typically wired up, so you can plug/unplug an SD card and see if the driver can detect that. also if you have an LED that can be controlled via GPIO you can make it blink ;)
<TobiasTh1Viking> ah. cool
<The_Coolest> xdarklight>> Do you know whether I can somehow use the i2c bit bang driver without the pins being declared in the device tree?
<The_Coolest> If not, how can I set a pin into a open drain mode?
<xdarklight> The_Coolest: regarding i2c without devicetree: no idea, sorry - open drain can be set in device tree using the "drive-open-drain" property (see https://www.kernel.org/doc/Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt), but I'm not sure how to use this without device-tree either
<The_Coolest> :( ok, thanks.
<The_Coolest> Well, let's just hope there are built in pull ups in the devices I plan to use :P
<The_Coolest> Oh looks like there is :]
<The_Coolest> are*
trem_ has joined #linux-amlogic
trem has quit [Ping timeout: 264 seconds]
<Ely> ndufresne: holy crap, great findings!
<Ely> ndufresne: I'm currently poking into the HEVC decoder, and on GXL it looks like they dropped canvas usage and instead went for an IOMMU (that seems bypassable but then the decoder writes to phy directly). Hopefully we'll be able to get a proper pixfmt as well :/ .
trem_ has quit [Remote host closed the connection]
trem_ has joined #linux-amlogic
<Ely> Finally, nice looking video on my TV.. Thanks again ndufresne
The_Coolest has quit [Ping timeout: 256 seconds]
The_Coolest has joined #linux-amlogic
distemper has quit [Ping timeout: 256 seconds]
chewitt_ has quit [Quit: Adios!]
chewitt has joined #linux-amlogic
<narmstrong> Yes, pushed the necessary
<narmstrong> And yuq the developer has a potato
<Ely> cool
<Ely> so if I compile his driver as a module I can get 3D ?
<Ely> (+ the appropriate mesa config)
<ndufresne> Ely, so the canvas thingy is some sort of frame allocator ?
<ndufresne> I didn't know there was any iommu in there
* ndufresne not sure how usable lima is, but it's probably worth trying
<Ely> ndufresne: I'm still not 100% sure. Other parts of the code actually write in canvas indexes to hevc registers so I might have spoken too soon.
<Ely> But yeah starting with GXL they have an iommu in there
<Ely> that seems - again, based on code reading - bypassable
<ndufresne> it's quite important delta, on DB410c, the iommu makes a huge difference
<Ely> or maybe I'm 100% wrong and not reading the code right.
<Ely> There's a crap ton of "if (hevc->mmu_enable)" and "decoder_mmu_box_alloc_idx"
sputnik_ has quit [Remote host closed the connection]
<Ely> but the code for the mmu alloc thing is so big and terrible that I'm not even sure what it does
<ndufresne> one thing is sure, iommu can be bypassed, we should probably not worry for now
<ndufresne> this way we can get both 805 and 905 up to start with
<Ely> yup indeed
<Ely> I don't think I'd have the courage to write an iommu driver anyway
<TobiasTh1Viking> wait, the mali/lima driver is in that good state?
* TobiasTh1Viking wonders how hard it would be to bring up hdmi on the meson6...
<ndufresne> TobiasTh1Viking, it's much further then it ever been, and this new gen of lima is public at least
<Ely> TobiasTh1Viking: Wouldn't you need to write code for the HDMI link anyway ? I thought it was unrelated to mali
<xdarklight> TobiasTh1Viking: we don't even have CVBS (AV video out) on Meson8/Meson8b yet and there's no HDMI driver (it's not even clear from which vendor the HDMI IP is, I heard it's not Amlogic's design, ...)
<ndufresne> it's unfortunate that uttgard and newer mali generation are completly incompatible ...
<TobiasTh1Viking> Ely: exactly my point, there is no NO hdmi support for meson6 at this juncture. so that would be something i would have to do.
<ndufresne> let's hope we find out it's a known chip ;-P
<TobiasTh1Viking> xdarklight: we will get to it :þ
<xdarklight> hopefully at some point :P
<ndufresne> Ely, btw, we should default to NV12 (endian == 7), it's more common
<Ely> ndufresne: Sure. I wonder if we'll still need to keep tiled support for the display stack though (e.g can it use regular NV12/NV21..)
<TobiasTh1Viking> bg
<TobiasTh1Viking> btw, i have no CVBS on my machine. (at least not without soldering)
<Ely> TobiasTh1Viking: respect if you can get hdmi up on meson6/8/8b :P
<ndufresne> Ely, on my todo, was to measure the decoder throughput in the two tiled mode
<TobiasTh1Viking> Ely: definetly not by myself. first i have to do pinctrl, and clock, and other stuff. HDMI is definetly last.
<TobiasTh1Viking> also, i don't really care much(personally). but is always nice to make something complete.
* TobiasTh1Viking doesn't know how to do those things yet, learning.
<ndufresne> Ely, but we probably need to integrate the 4K fw to get real measurement, not sure why they use specific fw for 4K/FHD ....
<Ely> ndufresne: Beware of the clocks having an impact as well. Right now I'm defaulting to 318.75MHz which was the max. for S805, but S905/S905X can go up to 648MHz
<ndufresne> ah, that might explain why I don't reach 60fps in FHD, while it should work in practice
<xdarklight> Ely: ndufresne: only partially related: did you know that GXM (S912) comes with a Chips&Media WAVE420L (which only does H.265: https://en.chipsnmedia.com/videocodecboard/view/335 ) in addition to the Amlogic en/decoders?
<Ely> 318.75 should be enough for 1080p60 tho.. I remember getting 80fps with ffmpeg -f null -
<ndufresne> xdarklight, nop, that's new to me
<Ely> xdarklight: had no idea. It's in addition of AML HEVC decoder ?
<ndufresne> Ely, hmm, maybe we trigger some contention then, I reached 58 here
<ndufresne> btw, I've more to gst master now, and use fpsdisplaysink video-sink=fakevideosink sync=0 text-overlay=0 -v
<ndufresne> to measure
<xdarklight> Ely: yes, I believe it's in addition to the Amlogic stuff
<ndufresne> Ely, note that the scene complexity impacts the speed
<ndufresne> so we have to test against the same encoded streams to compare
<ndufresne> xdarklight, oh, that's for encoding, I'm not surprised they needed a little side help, it's quite complex to encode HEVC
<narmstrong> Ely: lima is stick in basic shape, only basic opengl es is implemented but kmscube and some glmark2 tests passés
<ndufresne> well, for us, if kmsscube works, it's quite interesting
<ndufresne> e.g. add dmabuf import and texture mapping, we could run kmscube + gst in zero-copy, without needing HW planes
<Ely> xdarklight: I have a feeling the driver for that will be hidden in userspace tho.. What's nice with AML stuff is that 100% of the relevant stuff is in their kernel, but I don't know for third party.
<ndufresne> narmstrong, basic GLSL works ?
<xdarklight> Ely: yes, I also believe that some of that is hidden in userspace (they expose clock control IOCTLs, *shrug*) - but the biggest problem is that I couldn't find the wave420l firmware anywhere (which is required to operate that thing)
<ndufresne> Ely, you mean like Allwinner and Rockchip, where userspace have to parse and fill HW specific tables .... (well v4l2 will standadise the table, kernel translates)
<Ely> ndufresne: I mean any kind of driver that has ioctls for clk/power/regread/regwrite :D
<narmstrong> It’s not a good sign
<ndufresne> like an uio driver ;-P
<narmstrong> Maybe dmabuf import is part of mesa ? I don’t have a clue about how these opengl drivers work
<ndufresne> well, it's quite the same as dumb prime
<ndufresne> normally that should all be handled by gallium itself
<ndufresne> but enablement for NV12, or RG88 + R8 texture pair is likely missing at this stage
<narmstrong> It would be with trying
<narmstrong> The devs think basic Weston rendering should work
<narmstrong> I should make a Kodi build with lima and Ely ‘s hw decoder !
<ndufresne> gst and khodi will import each planes as RG88 / R8 (depending on the number of 8 bit comp), and use shaders to produce RGBA
<narmstrong> Yep but kodi could support rendering on an overlay plane with dmabuf from ffmpeg
<narmstrong> So if Lima has enough support, it could work !
<ndufresne> some driver have "direct upload" feature, notably IMX.6 can import YUYV as RGBA, and vivante NV12 to RGBA, without shaders, they have special cscp HD
<ndufresne> HW
<narmstrong> Ok
<ndufresne> yeah, khodi also have overlay support
<ndufresne> narmstrong, it's like if we called in ge2d right before sampling in the textures, using a cached image in EGLImage object
<narmstrong> ndufresne: can gst generate some dmabuf frames to test overlay ?
<ndufresne> narmstrong, sure, on 1.12, pass capture-io-mode=dmabuf to the decoder
<ndufresne> on 1.14+, it's already dmabuf
<narmstrong> ndufresne: good to know
<ndufresne> see kmscube code on how to access the dmabuf FD
libv_ is now known as libv
<Ely> ndufresne: Sorry to bother you for generic gstreamer questions but I can't find any info on that one.. I'm getting "Missing element: Timed Text decoder" on many files which seems to kill the pipeline ( Internal data stream error). Do you know how to remedy that ?
<Ely> I installed every plugin as well as gst-libav :/
<ndufresne> Ely, subtitles, which files ?
<ndufresne> and which pipeline ?
<Ely> gst-launch-1.0 souphttpsrc location=<url> ! parsebin ! v4l2video0dec ! videoconvert n-threads=4 ! kmssink driver-name=meson force-modesetting=true connector-id=31 max_lateness=-1 sync=false
<Ely> it's an MKV with subtitles indeed
<Ely> in srt format
<ndufresne> hmm, didn't know parsebin would give up if a parser wasn't found ....
<ndufresne> might be a bug in 1.12
<ndufresne> Ely, just replace parsebin with matroskademux ! <something>parse
<ndufresne> e.g. h264parse, or h265parse atm
<ndufresne> otherwise it should just work, the parser for that is called subparse, and is always included with plugins-base
<ndufresne> it's parses about 20 variations of so called SRT
distemper has joined #linux-amlogic
<Ely> Thanks. Now I'm getting matroskademux0: Delayed linking failed :<.
<Ely> Maybe I should just wait for gst 1.14 to hit poky :D
<chewitt> ^ quitter ;)
<Ely> hahaha
<Ely> ndufresne: I pushed a bump in vdec clk. I can decode 1080p at ~110fps on a low-bitrate file, but there are many buffer decode errors (they happen also with the lower clock, and only with ffmpeg -f null -, gst-launch is OK. Most likely because ffmpeg really queues the buffers fast because of null).
<Ely> canvas configuration doesn't seem to have an impact on performance
edcragg has quit [Quit: ZNC - http://znc.in]
edcragg has joined #linux-amlogic
<TobiasTh1Viking> xdarklight: if it's enough time, i have like 15-30 min to make a patch for meson_bank, if you feel that is enough time.
<xdarklight> it just stopped raining outside so I'll go for a quick run. we can have a look at it later
<TobiasTh1Viking> oki. enjoy
<Ely> xdarklight: To followup on the C&M HEVC encoder, I don't think work for that is gonna start anytime soon, those ioctls are scary... Even firmware loading seems to be done via ioctl
<Ely> good find though
<narmstrong> Ely: did you test perf between tiled and linear ?
<Ely> I tried 32x32 without ndufresne's patch, and linear with ndufresne's patch, both were ~110fps
<chewitt> i'm itching to try this out on Kodi
<Ely> hehe
<Ely> as ndufresne was saying though, you're likely to find similar performance as SW decoding since the SW rendering takes a huge chunk out of the perf.
Ntemis has joined #linux-amlogic
<xdarklight> Ely: I thought I'd mention it anyways, because sometimes it's an unexpected "oh, I've seen that elsewhere <link to other vendor which completes the picture>" response from someone else :)
trem has joined #linux-amlogic
trem has quit [Client Quit]
trem_ is now known as trem
sputnik_ has joined #linux-amlogic
Ntemis has quit [Read error: Connection reset by peer]
vagrantc has joined #linux-amlogic
<ndufresne> Ely, same impression as you, looking at the codec_mm code, allocation of canvas seems to use a custom scatter gather memory allocator
<ndufresne> so long term we might not be stuck with CMA
vagrantc has quit [Quit: leaving]
vagrantc has joined #linux-amlogic
trem has quit [Quit: Leaving]
_whitelogger has joined #linux-amlogic