<_whitenotifier-c>
[scopehal-apps] azonenberg pushed 2 commits to master [+0/-0/±2] https://git.io/JfBxL
<_whitenotifier-c>
[scopehal-apps] azonenberg 1c8894a - Set thread pool size for scope drivers
<_whitenotifier-c>
[scopehal-apps] azonenberg a710522 - Don't specify thread pool size for rendering prep
<_whitenotifier-c>
[scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±2] https://git.io/JfBxO
<_whitenotifier-c>
[scopehal-apps] azonenberg ca8fee7 - OscilloscopeWindow: don't run event loop when polling scopes
<azonenberg>
well, that massively reduced cpu usage
<azonenberg>
this was the first time i've actually run a profiler on glscopeclient lol
<lain>
lol
<azonenberg>
and i see a bunch more spots to improve things
<azonenberg>
WFM/s i don't think is up by much, but i cpu usage in operation is definitely down a lot
<azonenberg>
and i think general responsiveness is up too but i need to try some bigger waveforms to have better data to see tha
<azonenberg>
that
<azonenberg>
just found a bunch of time wasted calling a virtual function that could have been just a simple member variable access
<azonenberg>
etc
<azonenberg>
ooooh just got a huge boost from that
Degi has quit [Ping timeout: 258 seconds]
Degi has joined #scopehal
<azonenberg>
I now see some other red flags like OscilloscopeSampleBase::OscilloscopeSampleBase() taking 8.5 SECONDS of cpu time in what really should be a no-op
<monochroma>
:o
<azonenberg>
and that's now fixed yay
<azonenberg>
monochroma: so i'm continuing to optimize glscopeclient core stuff
<azonenberg>
the test procedure is to load a save file that's preconfigured for my waverunner, analog channels only, both legs of two diffpairs for 100baseTX
<azonenberg>
subtract them, CDR, eye pattern, bathtub for the upper eye, and eth protocol decode
<azonenberg>
run that for 1 minute
<azonenberg>
there's a loop in LeCroyOscilloscope::AcquireData) that took 22.3 sec of cpu time, i now have it down to 9.9
<azonenberg>
that's 1 min of triggering as fast as i can on 1M points per waveform
<azonenberg>
average cpu load is actually only about 1.3 cores active of 32
<azonenberg>
the loop in question takes the raw adc samples that came off the scope and does a bunch of repacking and floating point math to convert adc codes to volts and scopehal sample objects
<_whitenotifier-c>
[scopehal] azonenberg pushed 2 commits to master [+0/-0/±2] https://git.io/JfRvx
<_whitenotifier-c>
[scopehal] azonenberg 9fd5f0a - OscilloscopeSample: added empty default constructor for STL to use
<_whitenotifier-c>
[scopehal] azonenberg 6b52304 - LeCroyOscilloscope: massive speedup of loop that converts ADC codes to volts
<_whitenotifier-c>
[scopehal-apps] azonenberg pushed 3 commits to master [+0/-0/±8] https://git.io/JfRvp
<_whitenotifier-c>
[scopehal-apps] azonenberg 9617a3c - WaveformArea: no longer call lots of virtual functions in inner loop of waveform preparation
<_whitenotifier-c>
[scopehal-apps] azonenberg 8e85aa1 - Added --nodata argument to load saved UI config without data on the command line
<_whitenotifier-c>
[scopehal-apps] azonenberg d691d03 - Added --retrigger argument to start the trigger immediately upon loading a save file on the command line
<azonenberg>
let's see, in my 1 minute test EyeDecoder2::Refresh spends 6.285 seconds in floor()
<azonenberg>
wonder if i can do something about that
<azonenberg>
Just the last few changes i've done have brought CPU time in this 1-minute test down from 98.16 to 80.88 sec
<azonenberg>
i really wish i could multithread the eye processing but since it's locked to a clock that can vary in frequency that would be tricky
<azonenberg>
i might get back to that later
<azonenberg>
i do at least multithread if you have more than one eye to process
<_whitenotifier-c>
[scopehal] azonenberg opened issue #114: Add control for eye pattern saturation - https://git.io/JfRIo
<_whitenotifier-c>
[scopehal] azonenberg labeled issue #114: Add control for eye pattern saturation - https://git.io/JfRIo
<_whitenotifier-c>
[scopehal] azonenberg pushed 3 commits to master [+0/-0/±3] https://git.io/JfRtL
<_whitenotifier-c>
[scopehal-apps] azonenberg opened issue #98: When reconnecting to a scope via a save file, channels are not added to the "add channel" menu - https://git.io/JfROv
<_whitenotifier-c>
[scopehal-apps] azonenberg labeled issue #98: When reconnecting to a scope via a save file, channels are not added to the "add channel" menu - https://git.io/JfROv
<_whitenotifier-c>
[scopehal] azonenberg pushed 3 commits to master [+0/-0/±3] https://git.io/JfROD
<_whitenotifier-c>
[scopehal] azonenberg d29f824 - Minor performance tweaks in FindZeroCrossings, fixed some warnings
<_whitenotifier-c>
[scopehal] azonenberg 2cfa2b2 - Performance tuning to Ethernet100BaseTDecoder
<_whitenotifier-c>
[scopehal] azonenberg f8f2231 - LeCroyOscilloscope: optimized waveform downloading for digital channels
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<_whitenotifier-c>
[scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/JfR3C
<_whitenotifier-c>
[scopehal-apps] azonenberg de8b4e4 - WaveformArea: optimized index generation loop
<_whitenotifier-c>
[scopehal-cmake] azonenberg pushed 1 commit to master [+0/-0/±2] https://git.io/JfR34
<azonenberg>
So i did a bunch more poking around in vtune. i'm seeing a lot of NUMA accesses that i think can be optimized if i lock stuff to run on one package
<azonenberg>
But that didnt pan out
<azonenberg>
I'm also starting to wonder about retooling the waveform structure to be vector start, vector len, vector voltage
<azonenberg>
rather than vector<start, len, voltage>
<azonenberg>
this would allow much more efficient memory accesses i think
<azonenberg>
but would also be a very nontrivial refactoring
<azonenberg>
sounds like a weekend project perhaps :p
<azonenberg>
The other possible optimization is non-sparse waveforms
<azonenberg>
but that would be a lot more workl
<azonenberg>
work*
<azonenberg>
in either case i think the refactoring to separate arrays makes more sense to do first
<_whitenotifier-c>
[scopehal] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/JfRMw
<_whitenotifier-c>
[scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±3] https://git.io/JfRMr
<_whitenotifier-c>
[scopehal-apps] azonenberg c0749a9 - Status bar now shows total number of trigger events for performance analysis. Removed a bunch of sleeps from ScopeThread.
<azonenberg>
ok soooo let's see how much i can break with this refactoring
<azonenberg>
deleting one class i use through literally the entire codebase (OscilloscopeSample), then completely retooling another (CaptureChannel)
<azonenberg>
Which is now known as Waveform because i've changed the interface so much it's not even recognizable as the same class by this point :P
<azonenberg>
(and that was a more logical name anyway
<funkylab[m]>
azonenberg: thanks for the twitter-dm heads up
<azonenberg>
funkylab[m]: yeah so basically the end goal is to avoid strided access to all the waveform data
<azonenberg>
because that's not SIMD friendly
<azonenberg>
but this is going to be one heck of a massive diff and there's not really going to be any way to break it up into anything smaller
<funkylab[m]>
honestly, if the stride is small enough, it's not that big of a deal – modern SIMD extensions do have slightly more elegant loading instructions
<azonenberg>
there's some other advantages re GPU compute etc on this
<azonenberg>
it also will eventually allow special-casing sparse vs dense waveforms
<azonenberg>
without much of a code change to the datapath
<funkylab[m]>
yep, serialization of uniform sample data gets more compact and everything
<funkylab[m]>
basically, all PC-connected SDR devices I can think of just deliver plain contiguous IQ samples, within relatively large packets
<azonenberg>
Yeah
<funkylab[m]>
(some do have ... interesting wire formats; signed int 12, anyone?)
<azonenberg>
Lol
<azonenberg>
aka raw adc samples
<azonenberg>
The reason i have this architecture is to allow for things that aren't directly digitized waveforms. Like "I2C sensor readings" or the instantaneous frequency of a waveform
<azonenberg>
etc
<azonenberg>
in which case you are very likely to have irregular sample intervals
<funkylab[m]>
not even that; it's just that if you have a << 16 ENOB ADC, and don't do too much decimation in the DDC on-device, you can fit more through the USB link
<funkylab[m]>
(that's Ettus B2xx btw)
<funkylab[m]>
(without losing any significant digits)
<funkylab[m]>
yep, fully understand why the more flexible format makes sense to something that essentially is meant to abstract all kinds of scopes
<azonenberg>
yeah
<azonenberg>
or sampling scopes, or specans, or really any sampled analog or digital data of some sort lol
<funkylab[m]>
yep
<azonenberg>
btw if you havent already, in just the past couple of days there have been massive (>20% in a few commits) performance boosts from various profiling and tweaking i've done
<azonenberg>
as well as a great reduction in idle CPU
<funkylab[m]>
You do NOT want your DDR5 development-enabling scope to deliver full-rate sampling continuously
<azonenberg>
lol
<azonenberg>
what, you don't have petabit ethernet on your workstation?
<funkylab[m]>
talking of optical comms testers: same!
<funkylab[m]>
(honestly, whenever I talk to people working on high-rate optical links, I get ADC envy. Like: anything I ever did in radio is totally baseband to these ADCs)
<azonenberg>
lol i know the feeling
<funkylab[m]>
and you've definitely worked larger BWs than I did!
<funkylab[m]>
anyways, I've got to get some rest
<azonenberg>
btw not sure if i mentioned on twitter or whatever
<azonenberg>
But my long term plan is to push as much of the DSP as possible to compute shaders
<azonenberg>
and maybe even look into stuff like RDMA
<funkylab[m]>
:+1:
<funkylab[m]>
that'd be extremely nice
<azonenberg>
Because my vision for some of my longer term scopes (think 8 channels 1 GHz bw, one AD9213 and a SODIMM of DDR4 per channel)
<azonenberg>
involves 40G or 100GbE as the backhaul to the host system
<azonenberg>
and i want to be able to do analysis on that data in real time
<azonenberg>
in my recent testing, the fastest performance i've got was 363 triggers in 1 minute pulling four channel 1M point 8-bit waveforms off a LeCroy WaveRunner 8104-MS
<funkylab[m]>
Nice, bit of background on that: GNU Radio as a project is currently (since fosdem) trying to figure out how to come up with an architecture for incorporating accelerators (currently: FPGAs, GPUs, and there's stakeholders with domain-specific ASIC accelerators) into a signal processing workflow
<azonenberg>
if you do the math that comes out to only 193.6 Mbps of actual waveform data hitting the PC, and glscopeclient spent 47 seconds of that minute waiting in Socket::RecvLooped()
<azonenberg>
i.e. the majority of my time was waiting for the DSO to send me data
<azonenberg>
BTW, that was not me just downloading waveforms and throwing them away
<funkylab[m]>
that's a good sign!
<azonenberg>
i was subtracting two differential inputs for tx and rx of a 100 Mbit ethernet waveform
<azonenberg>
doing full 100baseTX protocol decode on both lanes
<funkylab[m]>
niiice
<azonenberg>
then doing a separate CDR PLL filter
<azonenberg>
rendering separate eye patterns
<azonenberg>
AND generating a BER bathtub curve for the top eye of each lane
<azonenberg>
(it's a 3-level signaling so two openings in the eye)
<funkylab[m]>
:D enough getting my mouth watery! I'm off to bed!
<azonenberg>
and cpu usage was near zero because i was spending most of my time waiting for samples lol
<azonenberg>
of course "near zero" on a dual socket xeon 6144 workstation is still a fair bit of compute, but still
<azonenberg>
anyway, my eventual goal is to be able to process >10 Gbps of realtime waveform data