2016-10-17

Getting faster transfer speeds when reading measurement results from Rigol DS1054Z

After reading countless reviews, issues discussions, comparisons, price tags and all, I finally bought a Rigol DS1054Z oscilloscope. It was some time ago, somewhere in June/July this year. Until then, I used a Hantek DSO-2090 "PC oscilloscope", or rather, a kind of a data logger shield connected by USB, which means you can't actually do a thing unless you have it connected all the time to a computer with a proper software running. No display, no knobs. Just a box with three BNCs for CH1/CH2/EXT. It has its issues, but I learned a ton. Since it came with its own proprietary software (not that bad actually) and some API with almost no docs, it was obvious that device can be fully programmatically controlled from the PC. I researched the API on my own and started writing my own software for it. But that's another history, maybe I will have time to write it all down some day.

Anyways, I got really used to it over time and I wanted to check if I can do the same with a "real oscilloscope". That was one of the reasons for selecting DS1054Z. Its price is relatively low and it has a quite well documented programmatic access. When I started to play with DS1054Z, the first thing which surprised me was .. I couldn't reliably download the measurements results from it.

Note: I use Windows 10 and NI-VISA/IVI drivers. I know it's a bloat for my use case. I'm not using LabView, I'm not building automated lab (maybe later). I have this one oscilloscope, a PSU (axiomet, that one I wrote earlier about), and that's all. I took these drivers because .. they seemed to be default ones suggested by manufacturer. I would really like to strip that 1GB software package down to the actual few megabytes of device drivers needed. I even lost a few hours trying to rip them out, but failed, and I don't care about the harddisk space much enough to spend more time trying. I also didn't want to start researching raw device protocols like I had to with Hantek.

IMPORTANT: I actually have not verified data that was read from the device during these tests; in some or all cases I could have got a total garbage, or a mixture of all channels, instead of the channel I wanted to read from. The "fun fact" from part two indicates such possibility. I was focused on communications and simply didn't have time to verify the data yet. When I read from 1-chan or 2-chan modes and then looked at raw binary data, it looked fine. But that doesn't tell anything. I need to generate some images and compare to on-screen data to be sure. Verifying the structure of received data and determining which modes are usable and in what way is the next thing I'm going to investigate. I will remove this warning text afterwards.

Intro: basic setup

My current DS1054Z firmware, 00.04.03.01.05 "SP1", built 2015-05-26 08:38:06, has a really interesting way of determining how much data can you fetch from it in one go.

The device has a memory of ~24M samples. It can be used for a single 1-channel measurement. I started with that and tried to download all 24M samples. According to the "MSO1000Z/DS1000Z Series Programming Guide 2014", it is not possible right away.

First of all, as noted in the Guide for :WAVeform:DATA? command, there are various "areas" that the data can be downloaded from. Namely, at least two: screen and memory. By default, device will return samples from "screen" (:WAVeform:MODE NORMal). You can get them always, even if the device is working, but the data returned is .. the data you currently see on the screen. It's fast, it's easy to fetch, but I wanted to get whole 24M of the measured data, not just a ~1000pts (1K) post-processed fragment of it.

To access the other buffer called "waveform data from internal memory", you have to switch the :WAVeform:MODE RAW option first. I'll skip the MAX option since it doesn't change much. Also, the device cannot be measuring at the same time, it has to be in a "STOP" state. Basically, if the device was left in 'auto' or 'waiting for trigger' states, you can assume that the data of the previous measurement has already been (partially or fully) overwritten by new samples, and you can safely get only the screen data.

Another intersting thing is that the :WAVeform:DATA? won't usually return you the whole data. It actually returns only a part of the data taken from a range set up by a pair of :WAVeform:STARt xxx and :WAVeform:STOP commands. For example, here's an attempt to read two consecutive chunks of 1000 (1K) samples:

:WAVeform:STARt 1
:WAVeform:STOP 1000
:WAVeform:DATA?
> #9000001000........
:WAV:STAR 1001
:WAV:STOP 2000
:WAV:DATA?
> #9000001000........
...

As you see, the 'addresses' start at 1 (not zero), and the START-STOP range is inclusive-inclusive. As you may guess, the last address is either 12000000 (12K) or 24000000 (24M) (depending on your device feature state).

There is a reason I started with this example. You cannot just set START=1 STOP=24000000 and have fun:

:WAV:STAR 1
:WAV:STOP 12000000
:WAV:DATA?
> #9000000000\n

Here, I tried to read too much in one go. Device didn't raise an error. It simply responded: ok, zero bytes for me - and it actually included no measurement data in the response.

There is a limit to the amount of data that can be read in one block. The device communicates via USB/etc and it seems to have a limited communication buffer which simply cannot be exceeded. If you want more than that, you need to read it block-by-block, like in the first example.

Also, it's good to note the :WAVeform:FORMat option. As I saw in various articles, many folks out there use ASCii option so they can easily process the data in scripts, spreadsheets, etc. But this means that the device has to print out the text instead of raw data, and the text eats the buffer much faster. I wanted to download as much data as possible, so I use :WAVeform:FORMat BYTE. The measurements are 8bit anyways and this format saves much comms time.

I also want to use as large blocks as possible. For example, I could read using blocks of 1000 (1K) samples as in that first example, but then to download whole 24M memory, it would need .. 24000 "start-stop-data?" queries, but it would mean a ridiculously high (really) waiting time for me. Something like .. 2 hours .. probably. That's not what I would like to wait after each measurement :) Fortunately I don't need to, since that 1k window was just written there for an example.

Sadly, "MSO1000Z/DS1000Z Series Programming Guide 2014" (that's the one I used at first) doesn't give any hints on what are the buffer limits. I later found "MSO1000Z/DS1000Z Series Programming Guide 2015" and there you can find a table (page 2-219):

BYTE  -> 250000
WORD  -> 125000
ASCii ->  15625

Even from this table you can see that picking ASCii mode means lowering the transfer rate to 15.6 kilo-samples per query. Comparing to RAW/BYTE mode and its 250K, this means over 16x slower transfer. For those that don't "fell" the multipliers: instead of a 1 minute, you would need to wait 16 minutes.

As I said, I found 2015 guide much later, and I already had discovered this myself, as well as the fact that this numbers are not accurate. These numbers are safe, meaning, you can (probably) use this numbers at any time.

For a single channel, I am able to successfuly use ~1180Ks blocks. That's over 4x the number mentioned in the docs. I have not hacked the firmware or hardware. It's just that the docs didn't want to delve into really detailed details of how to get that speed.

Comparing things to the value suggested in docs:

    Channels: CH1
        Mode: RAW
      Format: BYTE

      Memory:   24.0M   12.0M    7.5M*   6.0M*   2.4M*   1.2M
Block=250K
    Queries*:     95      47      29      23      10       5
        Time:   56.6s   27.2s   17.1s   13.2s    5.7s    2.7s
Block=1180K
    Queries*:     21      11      7       6       3        2**
        Time:   16.5s    8.5s    5.3s    4.3s    1.8s    0.9s**

*) these acquisition memory depths are not available "by the knobs" on the device front panel, hovewer you can get them using "AUTO" memory depth
**) sadly, the best transfer I got was 1180K and here the source memory a tiny bit larger than that, so two queries were needed

I'm not including results for memory size smaller than 1.2M simply because it would all fit into one query. I'm not including results for smaller block sizes, because the programming guide gave such a suggested value, I see no point in limiting the block size to lower than that. Also, I'm not including comparisons to ASCII format, since.. I'm not a masochist. I want the data, I don't want to sit and grew old waiting for it.

My test code isn't super-optimized so you may be able to squeeze a better transfer times. You could probably remove some assertions and some noncritical commands sent to the device during a single "query", but the point is that the higher block size you can get, the lower your waiting time will be. It may not look like much for 1.2M depth, but for the higher ones it really makes a difference.

However, to get 1180K blocks in a reliable way, you need to prepare both yourself and the device for that first.
No hardware hacks needed.

What's the problem?

My current DS1054Z firmware, 00.04.03.01.05 "SP1", built 2015-05-26 08:38:06, has a really interesting way of determining how much data can you fetch from it in one go. It consists of two really inobvious parts.

First of all, it's not a nice value of 1180k, but actually, it's a value between 1179584..1179647 samples per block. Not a random value. You can see that the 'range' is exactly 64 bytes long and that lower bound is divisible by 64b, and upper bound+1 is divisible by 64b as well. In hex that's 0x0011FFC0 and 0x0011FFFF. I suppose that it comes from device's internal memory architecture, grouped in segments of 64 bytes. I have really no idea though, just guessing.

Now, that latter value, the upper bound, looks very tempting! (Prett-y shin-y round-y 0x120000 tempts too, but it is not available at all). However, using 1179647 requires some very precise conditions to be met, that even I, after researching it and finding out the rule, I decided to drop it and stick to the lower bound value, 1179584/0x0011FFC0 that is almost always available. (that 'almost' word is the second big part of the mystery, it's covered in the next section of this article)

When I first investigated it, I didn't know the limits, nor the window size. From a few manual attempts I knew how a successful read looks like and also I knew that if the requested block size is too large, the device will respond in #9000000000 empty response instead of some helpful kind of a range-error. I wrote a small application to scan various setups and bisect the blocksize. It would start at some position, with blocksize=1 and blocksize=24M and would try intermediate values until blocksize=N is ok, and blocksize=N+1 is not ok.

Here's an example of results:

measurement 01, reading at START=1:       found best STOP=1179587      => max read block size=1179587
measurement 01, reading at START=1000:    found best STOP=1180611      => max read block size=1179612
measurement 01, reading at START=1000000: found best STOP=2179587      => max read block size=1179588

Surprise! I would expect that transfer capabilities would be the same, regardless of position. I left the scanner overnight and let it scan positions at random. It found out that the quite probale min/max blocksize values were 1179584 and 1179647. (Initially I was sure that the max is 1179648, but that was due to off-by-one error due to the :WAVeform:STARt beginning at 1)

I also made some finer-grained sequential scans and they has shown that numbers in range are not random at all and that they form a saw-tooth shape:

...
reading at pos=1000:    bsmax=1179628        |
reading at pos=1001:    bsmax=1179627        |  you can see the available blocksize falling by one
reading at pos=1002:    bsmax=1179626        |  for each increase in position; all on 1-by-1 basis
reading at pos=1003:    bsmax=1179625        |
...                                          |  but then, suddenly the value jumps
reading at pos=1042:    bsmax=1179586  (-61) |  from 1179584 to 1179647, and then falls again on 1-by-1 basis
reading at pos=1043:    bsmax=1179585  (-62) |  then the cycle repeats
reading at pos=1044:    bsmax=1179584  (-63) /
reading at pos=1045:    bsmax=1179647  (-00) \
reading at pos=1046:    bsmax=1179646  (-01) |  the length of such 'monothonic window' is 64 samples(bytes)
reading at pos=1047:    bsmax=1179645  (-02) |  starts with highest possible blocklength=1179647
...                                          |  ends with lowest possible blocklength=1179584
reading at pos=1107:    bsmax=1179585  (-62) |  
reading at pos=1108:    bsmax=1179584  (-63) /
reading at pos=1109:    bsmax=1179647  (-00) \
reading at pos=1110:    bsmax=1179646  (-01) |  ..window of next 64 samples/bytes, and so on
...

I thought, wow, so in fact the memory is segmented and it's just that a single query just can never cross a segment boundary. Or something like that. Or more likely something different than that, since this description doesn't fit at all, as I was reading over a one million samples in one go and the segments ('windows') seem to be 64 bytes long, so each 1M query crosses a ton of them. Well, nevermind ;)

Anyways.. sawtooth pattern holds across the whole 24M memory.
But that's not all to it.

I played a little with the DS1054Z, made a few measurements and tried to fetch results using the same windows and offsets, and I couldn't get it working. Just to be sure I noted the values correctly, I ran the scanner again and here's what I saw:

measurement 02, reading at pos=1:       bsmax=1179603
measurement 02, reading at pos=2:       bsmax=1179602
...
measurement 02, reading at pos=999:     bsmax=1179629
measurement 02, reading at pos=1000:    bsmax=1179628
measurement 02, reading at pos=1001:    bsmax=1179627
...
measurement 02, reading at pos=999999:  bsmax=1179605
measurement 02, reading at pos=1000000: bsmax=1179604
measurement 02, reading at pos=1000001: bsmax=1179603

Please compare it with the first measurement and positions 1/1000/1000000 above. Suprise#2! Although the values are still in the same bounds, they are different. Fortunatelly, the sawtooth pattern is still preserved. It "just" "starts" in different place.

Actually, it turns out that the sawtooth pattern gets offsetted after each single triggering and capturing another waveform.

I think that, maybe, while waiting for a trigger condition, the device constantly samples and writes to the sample memory in a ring buffer scheme, and when trigger condition is detected, the device simply runs forward and stops to not overwrite the trigger position (minus configured pre-trigger amount to be left visible). That's a pure guess of course. That's how my old Hantek DSO-2090 worked, but then it didn't have any 64-byte windows. It just returned its tiny 10k or 64k of samples.

Whatever is inside the device, the facts are that after each measurement, a random number between 0-63 is selected as the 'monothonic window offset', the 64 byte window at :WAVeform:STARt 1 is shortened by this value, and then a normal sawtooth pattern follows with full 64 byte windows till the end of the memory (where upon at the very end obviously the very last window will also be shorter).

Knowing that all, we can sketch a following readout procedure:

  • we know the min/max blocksize (constants)
  • we can learn the current offset by simply trying to read at START=1 and checking which blocksize will be accepted; we can check it by bisecting in about 6 attempts (may be time consuming if we often hit good sizes and get 1M response), or we can start with max blocksize and try-fail-retry decreasing blocksize by 1 each time (max: 64 fast "zero-length" responses)
  • knowing the window offset, we can easily determine all the sawtooth peak points
  • keep the first partial block that was read at the time of determining the offset
  • read all blocks from second to next-to-last at max blocksize=1179647
  • calculate how much data is left, and read the last block

Assuming a typical case when the random offset is not zero (assuming flat distribution, that's 98% of cases), we get floor(MEM/maxblocksize) full reads followed by +2 partial reads [read=set start, set stop, query data]. Plus some queries to learn the current offset value. So, for full 24M that's X+20+2 reads. Nice!

However, as you might notice, the table from the Intro section claimed that my code was able to read 24M in 21 queries. How come?

That's simple: I was lazy and I didn't implement it. What I did was take the lower bound of the sawtooth pattern, 1179584. This value allows you to completely ignore sawtooth - just because the lower bound value is valid throughout the whole memory - and read the data as it goes in blocks of 1179548 bytes right from the START=1 till the end, where partial block read will occur. That means floor(MEM/minblocksize) followed by +1 partial read, which for 24M gives .. 20+1. And we drop the 'X' since we don't care about the offset. Well, children please don't listen now, sometimes laziness pays off!

Availability of 1179K blocksize buffer

If you pick now the magic 1179584 number and try it out yourself, you have a high change of failure. That's because it's the "max value". It does not mean that your device will handle that right away. We still have the second part of the mystery to see.

Since I used 2014 Programming Guide (the one with no hints) I had to guess the blocksize. I remember I tried 12M at first, failed, then 5M, failed, then 2M, failed, then 1M, failed, then succeeded and got data in a stable way at some blocksize around 500k. Few days later I returned to this topic only to find out that now I can use those 1.0M blocks as well. I was suprised, but hey, I got bigger blocks now, great.

However, I remembered seeing "500k" before, so when after some time my device started rejecting 1180K and started to claim that the highest transferrable blocksize is 580K I felt like "I knew it would return".

Here begins a long story of trials and errors, many overnight scans and trying out different setups, which led me to creating a following table:

------- no channel*---------    Marker [+] shows trigger setup
[+] ::: ::: ::: ::: -> 1179k    <- TRIG=CH1
::: ::: ::: ::: [+] -> 1179k    <- TRIG=AC
------ one channel ---------    If trigger is on active channel, it can be ignored
CH1 ::: ::: ::: ::: -> 1179k    <- TRIG=CH1
::: ::: CH3 ::: ::: -> 1179k    <- TRIG=CH3
[+] CH2 ::: ::: ::: ->  580k    <-- **
[+] ::: CH3 ::: ::: ->  580k    <-- **
[+] ::: ::: CH4 ::: ->  580k
------ two channels --------
CH1 CH2 ::: ::: ::: ->  580k
CH1 ::: CH3 ::: ::: ->  580k
CH1 ::: ::: CH4 ::: ->  580k
[+] CH2 CH3 ::: ::: ->  290k    <-- **
[+] CH2 ::: CH4 ::: ->  ....    not tested yet
[+] ::: CH3 CH4 ::: ->  290k    <-- **
---- three channels --------
CH1 CH2 CH3 ::: ::: ->  290k
CH1 CH2 ::: CH4 ::: ->  ....    not tested yet
CH1 ::: CH3 CH4 ::: ->  ....    not tested yet
[+] CH2 CH3 CH4 ::: ->  ....    not tested yet
----- four channels --------
CH1 CH2 CH3 CH4 ::: ->  290k
----------------------------

Notes to the table:

- effects of 'REF' and 'MATH' were not checked
- effects of various memory and timebase modes were not checked
- (*) actually, when you turn off all channels, most of the things behave as if CH1 were active but just not displayed; I mean, even [SINGLE] button works and actually TRIGGERs and refreshes the waveform
- (**) fun fact: the tool I build for testing had the data-reading queries hardcoded to read from CH1. As you can see in the table, it successfully read the data when CH1 was the active channel was i.e. CH3.. I wonder, what was actually read there? :)

As you can see, most of the times I had trigger on CH1. It had me confused for some time, and the relation between active channels and max-blocksize was unclear, until I remembered about the trigger and included it in the table.

To sum up the table, the rule for determining blocksize is quite simple, count the active channels, including trigger, then if the result is:

1 channel  -> max blocksize = 1179K / 1 = 1179K
2 channels -> max blocksize = 1179K / 2 =  580K
3+channels -> max blocksize = 1179K / 4 =  290K

If you compare Programming Guide from 2014 to 2015 version, you can find that in the latter one, in comments for :TIMebase:DELay:SCALe there's a set of rules to calculate so-called amplification factor. The rules for its channel sum works seem to be exactly the same as seen here, with a small note that TRIG=AC counts as ZERO (IIRC it is not mentioned in channel-sum rules).

But if you think you can just switch some channels on or off to get a higher blocksize, you're wrong!

During the runtime of the device, the blocksize limit seems constant. I tried various things, including disconnecting USB, resetting by *RST command, resetting by [CLEAR] or [AUTO] buttons - nothing seems to change the blocksize limit once it is set.

It seems that the blocksize limit is determined .. at the BOOT TIME

You can only guess how long it took me to figure that out. Sadly, I haven't wrote it down.

As you know, the device by default, remembers last-used settings. I think you can change that behavior somewhere in the Utility and set it to revert to some preset config instead. Anyways, what counts here is, how many channels are active during boot time. If you turned off your DS1054Z having 3 channels active (or 2 channels and trigger on third), then your next session tomorrow will have max.blocksize=290K. Whoo.

An interesting thing is that this works both ways. Once you turn off the device in a zero- or single-channel state, it then boots in 1179K and it seems to keep that mode until shut down, even turning on all four channels during doesn't degrade the max transfer size. No needed to adjust any other options, just turn off all channels before turning off the device.

TL;DR

0) all detailed information contained here applies to firmware, 00.04.03.01.05 "SP1", built 2015-05-26 08:38:06; I have not tried other versions yet

1) use RAW/BYTE mode to save bandwith; BYTE is not very convenient but it is not that hard to calculate actual values from it

2) magic numbers for transfer size limits:
- absolute max data payload size: 1179647 samples(bytes), but please DONT USE IT; explanation is at the end of What's the problem part
- easier to use max data payload size: 1179584, it's almost as high as the absolute max, and with it you can ignore many irritating things

3) if your acquisition memory depth is higher than blocksize, you have to make several batches of 'start-stop-data?' commands

4) when reading, memory adresses start at 1 (ONE). Not zero. Watch out for off-by-ones. START and STOP commands set the data range, both values are inclusive. When reading at START=XXX, you have to set STOP=XXX+blocksize-1. Watch out for off-by-ones again. Really. I lost several hours tracking +/-1 errors.

5) if your device rejects 1180K blocksize, turn off all channels then power off the device. After you turn it on back, it should be good to go at 1180K.

IMPORTANT: I actually have not verified data that was read from the device during these tests; in some or all cases I could have got a total garbage, or a mixture of all channels, instead of the channel I wanted to read from. The "fun fact" from part two indicates such possibility. I was focused on communications and simply didn't have time to verify the data yet. When I read from 1-chan or 2-chan modes and then looked at raw binary data, it looked fine. But that doesn't tell anything. I need to generate some images and compare to on-screen data to be sure. Verifying the structure of received data and determining which modes are usable and in what way is the next thing I'm going to investigate. I will remove this warning text afterwards.