<_whitenotifier-3>
[scopehal] fsedano opened pull request #436: Add ATT and OFFSET for RS RTM3000 - https://git.io/J3mpo
fsedano43 has joined #scopehal
<fsedano43>
Thoughts about adding timestamp to trace/debug output?
<azonenberg>
fsedano43: hmmm
<azonenberg>
i've generally tried to make those calls fairly lightweight, that would add a fair bit of overhead. Maybe as an option
<azonenberg>
I'll have a look at your PR in a bit
Degi_ has joined #scopehal
<fsedano43>
Tnx. I'm working on another PR to fix some dangerous things on the RS driver that also cause hangs on the scope
Degi has quit [Ping timeout: 240 seconds]
Degi_ is now known as Degi
<azonenberg>
Great
<_whitenotifier-3>
[scopehal] fsedano opened pull request #437: Avoid asking for data if not needed - https://git.io/J3Yea
<GenTooMan>
azonenberg, first not really as in it doesn't have to be slow, namely I have made code similar to that work, however I can't say it's EASY to do. On the other side your suggest can be made to work. I'm not sure "my way" is the "best way" to be honest. :D
<_whitenotifier-3>
[scopehal] fsedano edited pull request #437: Avoid asking for data if not needed on RS driver - https://git.io/J3Yea
<GenTooMan>
azonenberg, I think the best course of action (IE instead of "fixing" code) I will toss //FIXME - <COMMENT> for each function I find causing issues with the SDS1 scope I have first
<GenTooMan>
@mubes I found 4 functions that on the SDS1104X-E the SiglentSCIPOsciloscope module has time outs. They are (in module order), GetChannelDisplayName, PollTrigger, GetChannelVoltageRange, and PullTrigger. Would it be a good idea to commit the code I have that ends up with a "Halt Conditions" window?
<_whitenotifier-3>
[scopehal] azonenberg pushed 2 commits to master [+0/-0/±4] https://git.io/J3Y1H
<_whitenotifier-3>
[scopehal] fsedano f96efc0 - Add ATT and OFFSET for RS
<_whitenotifier-3>
[scopehal] azonenberg 540e779 - Merge pull request #436 from fsedano/add_att_offset Add ATT and OFFSET for RS RTM3000
<_whitenotifier-3>
[scopehal] azonenberg closed pull request #436: Add ATT and OFFSET for RS RTM3000 - https://git.io/J3mpo
<_whitenotifier-3>
[scopehal] azonenberg pushed 2 commits to master [+0/-0/±2] https://git.io/J3YMw
<_whitenotifier-3>
[scopehal] fsedano 6cdb09d - Avoid asking for data if not needed
<_whitenotifier-3>
[scopehal] azonenberg abbf63c - Merge pull request #437 from fsedano/fix_data Avoid asking for data if not needed on RS driver
<_whitenotifier-3>
[scopehal] azonenberg closed pull request #437: Avoid asking for data if not needed on RS driver - https://git.io/J3Yea
<_whitenotifier-3>
[scopehal] azonenberg pushed 2 commits to master [+0/-0/±29] https://git.io/J3YAw
<_whitenotifier-3>
[scopehal] azonenberg 2612042 - Added GetAvailableCouplings() API to Oscilloscope class. Fixes #67.
<_whitenotifier-3>
[scopehal] azonenberg 256252f - Merge branch 'master' of github.com:azonenberg/scopehal
<_whitenotifier-3>
[scopehal] azonenberg closed issue #67: Add API for querying supported input coupling modes - https://git.io/fjlMF
<_whitenotifier-3>
[scopehal] azonenberg pushed 1 commit to master [+0/-0/±2] https://git.io/J3YpY
<_whitenotifier-3>
[scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±4] https://git.io/J3Ypd
<_whitenotifier-3>
[scopehal-apps] azonenberg d2e017f - Added 50 ohm AC coupling option to context menu for supported scopes. Coupling menu now hides items which don't apply to the active instrument.
<d1b2>
<mubes> @GenTooMan a couple of those functions are pretty fundemental :-) you can see what the semantics are so I would suggest you guard using the typeid of the scope and figure new incantations to get the behaviour you want. Personally I'd try it in a telnet session or similar first as that saves you compile cycles etc. If you want to put up a PR for comment feel free to do it against my repository at https://github.com/mubes/scopehal if you like.
<d1b2>
There's no problem with doing it against the azonenburg's repository but I'm trying to figure a scalable approach...come the happy day that the project has a hundred contributors PRing developing work against the main repository isn't going to work too well :-)
Tost has joined #scopehal
<azonenberg>
Also hmmmmm
<azonenberg>
I'm beginning to wonder about the possibility of rewriting the renderer using OpenCL instead of compute shaders at some point
<monochroma>
azonenberg: oh?
<azonenberg>
So the immediate use case is my work laptop
<azonenberg>
Where i have everything rendered on the discrete intel card
<azonenberg>
but there's an nvidia card too
<azonenberg>
i could plausibly use that as a headless compute accelerator w/o using it for graphics
<azonenberg>
But if we had a software opencl backend it could allow running in VMs
<azonenberg>
might work on mac too
<azonenberg>
Thinking a bit more, it would only be a kloc or so of stuff that had to be rewritten
<monochroma>
oooo that would be nice
<d1b2>
<mubes> It opens the possibility of a web based front end in future too?
<azonenberg>
The opencl stuff needs a bit of fixup before i cna consider that though
<azonenberg>
can*
<azonenberg>
So basically we would end up with a fully OpenCL based data processing and waveform rendering backend
<azonenberg>
which produced bitmaps that a very thin OpenGL compositor would combine with the software rendered overlays and display
<d1b2>
<mubes> yum yum
<d1b2>
<mubes> Architecturally separating processing from display makes a huge amount of sense...it also let's developers play to their strengths (I'm allergic to UIs, for example).
<azonenberg>
Well we have that already for the most part
<azonenberg>
in terms of glscopeclient vs libscopehal
<azonenberg>
The issue is more, right now the rendering of waveforms is done in compute shaders which have proved to be a big headache
<azonenberg>
Not that opencl isnt
<sorear>
when I last looked into the matter a few years ago WebCL was DOA and the browser people wanted to do opengl 4 compute shaders exclusively instead
<azonenberg>
Web stuff is not even remotely on my radar forglscopeclient
<azonenberg>
i'm thinking about desktop/laptop use cases here
<azonenberg>
if it turns out that opencl, even if software based, can be made to work on mac or VM platforms
<azonenberg>
that's a strong argument in favor of moving waveform rendering over to that
<d1b2>
<mubes> Is this a pre-release item?
<azonenberg>
converting to opencl? this is a pre v1.0 item if we decide to do it at all
<azonenberg>
i would not say it's a priority for v0.1 although if i find the time to play with it, i might try
<azonenberg>
Not even going to make a ticket just yet. but i did want to make some other improvements to the rendering logic in that area and it might be nicer to do in CL than in shaders
<azonenberg>
Before i think about any of that i need to fix some other CL issues
<azonenberg>
in particular, really large opencl fft's seem to hit limitations of some sort and fail to run
<azonenberg>
and i need to fall back to software compute in those cases rather than aborting or giving garbage results etc
<azonenberg>
also if we're going to use it for rendering rather than just compute acceleration, the noopencl switch has to be removed as it will now be a mandatory feature
<azonenberg>
and we can also clean up some other processing code by not needing to maintain software and accelerator based paths
<azonenberg>
It might actually simplify things a fair bit
<azonenberg>
i guess the question is... we know there are platforms with at least software based opencl support that don't have compute shaders
<azonenberg>
Is there anything that has compute shaders that can't do opencl?
<d1b2>
<mubes> I'm seeing some issues in rendering with large (for me, 10M/channel) datasets that I need to investigate. Xorg goes to 100% load for 2-3 seconds and the whole UI locks out. Haven't had the chance to dig at it yet, and might not for a while.
<azonenberg>
Most likely you're seeing scaling issues in the current renderer. Are you zoomed out pretty far and does it get better when you zoom in?
<azonenberg>
Right now we run one GPU thread per X coordinate value. So if you have too many points per pixel, you have a fairly large loop on a single GPU thread
<azonenberg>
Which leads to long delays
<azonenberg>
That's one of the things i wanted to fix with a fairly significant rewrite of the shaders
<azonenberg>
which is why i'm thinking it might be a good excuse to migrate to opencl at the same time if that's gonna happen
<d1b2>
<mubes> Yes, that seems to explain the symptoms
<d1b2>
<mubes> It's generally when I'm 'finding' the waveform. It would be nice if a capture snapped to the full waveform when its landed, but perhaps that won't work well for multi-device captures.
<azonenberg>
Yeah i have plans to figure out something about autoscaling
<azonenberg>
in the early days it wasnt practical because Reasons (tm)
<azonenberg>
but those limitations are no longer present
<d1b2>
<mubes> You're in that horrible bit of a project where you're far enough along to know what you want to do, but not so far along that it feels like you've got lots of it done 🙂
<azonenberg>
lol
<azonenberg>
Also it sounds like there's lots of issues on opencl + AMD
<azonenberg>
in particular the combination of using opengl and opencl at the same time is buggy
<azonenberg>
So we might want to stick with compute shaders for now. Fixing those scaling issues will definitely be good though
<azonenberg>
It's just nontrivial to figure out how
<xzcvczx>
opencl is the devil :P
<azonenberg>
well CUDA is always an option :p
<azonenberg>
CUDA + graphics APIs will work great
<azonenberg>
and CUDA provides a FFT library
<azonenberg>
but that only works on nvidia
<azonenberg>
If we had enough development resources to support multiple implementations of various stuff
<azonenberg>
we could plausibly provide cuda, opencl, and compute shader backends and pick whatever is most convenient at build/run time
<azonenberg>
but right now, it's just me doing all of that
<xzcvczx>
one azonenberg for sale
<xzcvczx>
actually no nvm
<azonenberg>
and i have to pick something that works a) for me and b) for as many other people as reasonably practical
<xzcvczx>
it improved at the end there
* xzcvczx
gives azonenberg another cookie for getting it working on his machine
<azonenberg>
lol
<azonenberg>
i still have bugs with CL stuff on my system
<azonenberg>
It's not ideal
<azonenberg>
And it's hard to optimize because nvidia's profiler doesnt support opencl, only cuda
<xzcvczx>
yeah i setup all the opencl stuff and it just crashed
<xzcvczx>
because 2+2 apparently != 5
<xzcvczx>
azonenberg: out of curiosity have you tried going *bsd for os?
<azonenberg>
No
<xzcvczx>
fair enough
<azonenberg>
I use too many binary blob tools like sonnet and vivado
<azonenberg>
i'm not going to try to get those running under bsd
<azonenberg>
its enough of a pain on linux
<xzcvczx>
ah true you nvidia, which works for crap on bsd nvm then :)
<azonenberg>
As far as i can tell, there is no option for gpu compute that runs on linux/windows/osx, nvidia/amd/intel, and plays well with opengl
<azonenberg>
opencl is the closest there is and it's a giant pile of garbage
<xzcvczx>
i thought there was another attempt in the early stages
<xzcvczx>
oh i might have just been thinking of opencl 3.0
<_whitenotifier-3>
[scopehal-apps] azonenberg labeled issue #327: Waveform compute shaders get really slow if too many points per X coordinate - https://git.io/J3OzC
<_whitenotifier-3>
[scopehal-apps] azonenberg opened issue #327: Waveform compute shaders get really slow if too many points per X coordinate - https://git.io/J3OzC
<d1b2>
<bob_twinkles> if you're willing to require relatively modern hardware and drivers, there are OpenGL/VK compute shaders
<d1b2>
<bob_twinkles> though i think you've already put quite a bit of work in to the opencl stuff, so it may not be worth switching
<azonenberg>
bob_twinkles: Right now we use compute shaders for rendering
<azonenberg>
That's a core requirement
<azonenberg>
the problem is that OpenGL 4 is not supported on OSX or in any hypervisor
<xzcvczx>
relatively modern == intel 4000 series cpus :)
<azonenberg>
Hence considering the possibility of moving elsewhere
<d1b2>
<bob_twinkles> classic apple drivers =/
<azonenberg>
Apple considers opengl deprecated
<azonenberg>
they just stopped updating their implementation because they want everyone to use Metal
<xzcvczx>
azonenberg: can llvmpipe be used on macos?
<azonenberg>
Don't know
<azonenberg>
Vulkan might become an option in the future but right now there is no good interop between vulkan and GTK. At least on any stable linux distro
<azonenberg>
gtk4 might be adding something along those lines, i know they did a lot of improvements to GL performance etc
<d1b2>
<bob_twinkles> I say relatively modern because IIRC the stuff that's core in early versions of GL is pretty useless
<azonenberg>
bob_twinkles: Yeah. The current minimum requirement for glscopeclient is gl 4.2 plus GL_ARB_compute_shader and a few other extensions
<azonenberg>
most of which are in 4.3
<azonenberg>
We also optionally use OpenCL for accelerating a bunch of waveform processing
<azonenberg>
but from what i can tell CL+GL interop is a nightmare on AMD
<azonenberg>
and even on my nvidia platform it's a pain
<xzcvczx>
you use nvidia rather than nv i assume?
<azonenberg>
I use the blob drives, yes. I consider nouveau in the same class as internet explorer
<azonenberg>
a tool you use to install something better
<xzcvczx>
lol
<xzcvczx>
how windows xp of you :)
<azonenberg>
it's worse than useless because it gets in the way of the blob driver
<azonenberg>
which is the only way to get any actual work done
<d1b2>
<bob_twinkles> (reading more backlog) if you move your big loops to compute shaders/CL, that will solve your hangs
<d1b2>
<bob_twinkles> on NV, graphics work (including frag shaders) cannot be interrupted to ctxsw but compute can
<azonenberg>
No
<azonenberg>
The loops are in compute shaders now
<d1b2>
<bob_twinkles> huh
<d1b2>
<bob_twinkles> that probably shouldn't be hanging the whole system then
<azonenberg>
on nvidia? it doesnt
<azonenberg>
it just slows down to a crawl
<azonenberg>
it does however hang the whole thing on intel integrated gfx afaik
<azonenberg>
the other thing is, it's making inefficient use of the gpu by doing so much work in a few threads
<azonenberg>
I need to retool it to use a 2D thread array of some sort
<xzcvczx>
is this if you scroll out massively?
<azonenberg>
Yes
<xzcvczx>
ah yeah that was fun
<azonenberg>
basically you loop over potentially a 50M point waveform in one gpu thread
<xzcvczx>
woops zoom not scroll out
<d1b2>
<bob_twinkles> ah, ok. Sorry for the noise
<d1b2>
<bob_twinkles> if you don't have a burning desire to work on this particular issue, I might have some bandwidth to take a look at it this weekend
<azonenberg>
actually wait a minute it looks like i did actually start cleaning things up as far as 2D multithreading
<azonenberg>
so that improved it a bit
<azonenberg>
now it uses 16 threads per X coordinate
<azonenberg>
But still you end up bottlenecking on those 16
<azonenberg>
bob_twinkles: sure if you wanna look at it, go for it
<azonenberg>
fundamentally the algorithm is as follows
<azonenberg>
Preprocess (on the CPU) the waveform so we know which ranges of X coordinates map to which pixel locations
<azonenberg>
Each group of 16 threads starts fetching from its offset and loops over the points in its bin
<azonenberg>
then it finds the min/max Y values for the sample segment within this pixel, interpolating if needed
<azonenberg>
then at the very end, it fills things
<azonenberg>
The main interesting bit is in waveform-compute-core.glsl
<azonenberg>
this block is compiled multiple times into different shaders for analog, digital, and histogram rendering
<azonenberg>
it's basically a simple rasterizer
<azonenberg>
that does intensity grading
<azonenberg>
the output is a fp32 monochrome texture which is then colorized in a fragment shader at the final point of rendering
<azonenberg>
This is the third generation of renderer
<azonenberg>
First round used GL_LINES and looked gross
<azonenberg>
second round tesselated to GL triangles and had its own set of problems
<azonenberg>
Then i switched to compute
<azonenberg>
I think sticking with compute is the way to go but the current algorithm is probably inefficient
<azonenberg>
but something like bresenham rasterization is a bad idea too because of the very common scenario of having hundreds or thousands of waveform points in a small area
<azonenberg>
you dont want to keep rasterizing and stacking them
<azonenberg>
so as you can see here i just store min/max Y values
<azonenberg>
then at the end of the inner loop i bump the alpha values
<d1b2>
<bob_twinkles> makes sense
<d1b2>
<bob_twinkles> is there somewhere I can grab some trace samples?
<azonenberg>
Best for this would probably just be using the "demo" driver
<azonenberg>
which has a bunch of test signals - 8b10b, sinewave, sum of sweeping sines, etc
<azonenberg>
you can control sample rate and depth to a range of values which should provide a reasonable testbed
<d1b2>
<bob_twinkles> ah nice, thanks!
<azonenberg>
oh i almost forgot to mention the other variant
<azonenberg>
Waveforms have X coordinates too, since they can be sparse/irregularly sampled
<azonenberg>
These are 64 bit ints but not all cards support GL_ARB_gpu_shader_int64
<azonenberg>
so i have one variant of the shader that does bignum int32 math and one that uses native int64s
<azonenberg>
Longer term there will probably be another one that's optimized for uniform spacing as that's the common case
<azonenberg>
and it will eliminate all the memory fetches of X coordinates
<azonenberg>
but i imagine the rasterizer will be basically the same, you'd just have one version use memory fetches and the other just index*spacing + offset or something
<azonenberg>
Anyway if you wanna have a look at it, definitely let me know :)
<d1b2>
<bob_twinkles> yeah, taking a look through the shaders
<azonenberg>
one other FYI is that the shaders are in src/glscopeclient/shaders/ but the makefile copies them to the build directory
<azonenberg>
so despite the fact that they're not compiled per se, you do need to rerun make to see updates
<azonenberg>
this will be fixed later on once we get proper data file path resolution taken care of
<azonenberg>
(another v0.1 pending item)
<d1b2>
<bob_twinkles> I think maybe an adaptation of the techniques used for light batching in deferred rendering pipelines could work here
<_whitenotifier-3>
[scopehal-apps] azonenberg opened issue #328: Add support for dense packed waveforms to rendering shaders - https://git.io/J3OMm
<_whitenotifier-3>
[scopehal-apps] azonenberg labeled issue #328: Add support for dense packed waveforms to rendering shaders - https://git.io/J3OMm
<azonenberg>
also see that one. that would probably be fairly easy and give a 3x reduction in GPU memory bandwidth usage for non-sparse waveforms
<azonenberg>
i.e. fetching a float per sample rather than a float plus an int64
<d1b2>
<bob_twinkles> makes sense. if I'm reworking the rendering to start with I probably won't be able to do that optimization as well, but I'll try to leave the option open
<azonenberg>
With the current architecture that optimization is actually a very clean fit i think
<azonenberg>
there's on function FetchX() that would have to be repalced
<azonenberg>
one*
<azonenberg>
and then probably catching some stuff host side to not bother pushing the x coordinates to the gpu at all if they're not going to be used
<azonenberg>
and adding a flag to the header struct to specify which mode is active
<azonenberg>
actually no flag
<azonenberg>
just an ifdef and two builds of the shader
<azonenberg>
so you'd need to have two shader objects in WaveformArea since it's possible that we could switch a given view from sparse to non-sparse as conditions change
<azonenberg>
e.g. consider displaying the output of a math function
<azonenberg>
if you swap the input from dense to sparse the output will swap too
<_whitenotifier-3>
[scopehal-apps] azonenberg commented on issue #325: GPU hang on iris Plus driver - https://git.io/J3OSh
<_whitenotifier-3>
[scopehal] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/J3O5V
<_whitenotifier-3>
[scopehal] azonenberg closed issue #400: LeCroy: Support for SDA 8Zi memory depth/sample rate tables - https://git.io/JYKDT
<_whitenotifier-3>
[scopehal] fsedano opened issue #438: Support digital channels on RS RTM3000 scope - https://git.io/J33WH
fsedano43 has quit [Quit: Connection closed]
<d1b2>
<fsedano> re: hang on Iris plus - It might be something different that we were thinking. It just happened now to me with a stopped capture, just by playing with the menus (no zoom etc) - It went into a state where the popup menu was being shown and hidden in a cycle without me doing anything, then complete lockup
<d1b2>
<fsedano> At this stage scopehal is pretty much unusable for me - It crashes my laptop every few minutes at most
<d1b2>
<mubes> Well, that's the first time I've overflowed an int64_t in regular code 🙂
<d1b2>
<mubes> @azonenberg Even 'reasonable' sample rates don't fit in a uint64_t when measured in fs, unless I'm doing something silly. Biggest int into one is 1.84x10^19, and 1GS/sec in Samples/fs is 1x10^24. Umm...how do we get out of this one?
<d1b2>
<mubes> Ah...GetSampleRate is in seconds, not fS. Phew.
<azonenberg>
mubes: lo
<azonenberg>
yeah Hz is the base unit for frequency
<azonenberg>
however when working with sample *periods* we use fs
<azonenberg>
fs provides a reasonable range there
<azonenberg>
actually since we multiply by the timebase unit this puts a ceiling on the upper length of a scopehal capture
<azonenberg>
Approximately 5 hours 7 minutes, or +/- half that since time units are signed
<d1b2>
<mubes> Just trying to sort out the mess with trigger points, it's easy once we've got a sample set 'cos it's in the wavedesc, but we need some default for the startup case before any waveform has arrived. I'm bat at it again tomorrow I think.
<azonenberg>
I was using ps before which had a much longer min duration, on the order of a month, but just lacked the resolution needed for high speed serial stuff
<azonenberg>
I figure you probably arent going to have a single acquisition longer than two hours anyway
<azonenberg>
we can break anything longer than that up into multiple waveforms since we use 128-bit timestamps for the beginning of a waveform (64-bit time_t plus 64-bit fs since the last whole second)
<d1b2>
<mubes> Yeah, I'd got an extra SECONDS_PER_FS in there and it was frying my brain 😦
<d1b2>
<mubes> I may have been slightly too negative about the sampling speed on this thing. With one channel and up to 500Kpoints I get 2.37 frames/sec, which actually feels vaguely interactive.
<d1b2>
<mubes> 200Kpoints on 4 channels is about 0.8 frames/sec though.
<d1b2>
<mubes> Right, off to file Zzs. Will try and get this pushed out tomorrow, with a following wind.
<azonenberg>
I mean, i'm spoiled with lecroy and pico stuff getting double digit WFM/s even on deep memory. But low end gear isnt really optimized for this use case so it's never performed well
<azonenberg>
Some entry level keysight/agilent stuff is the only other good performer we've seen so far IIRC. miek has got good results on his 3000 series i think
<d1b2>
<mubes> I ought to try this with usb tbh...often the different transports behave differently, although generally ethernet is faster than usb in my experience.