Thread: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes

Started: 2016-08-11 19:58:23

Last activity: 2016-08-25 01:48:01

Topics: FDSN Working Group II FDSN Working Group III

This thread is from a mailing list that has moved to Google Groups. Use the following links to browse the updated archives.

FDSN Working Group II

FDSN Working Group III

Chad Trabant

Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes

2016-08-11 19:58:23

Hi all,

Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached: Reduce record length field from 4 bytes to 2 bytes.

Please use this thread to provide your feedback on this proposal by Wednesday August 24th.

thanks,
Chad

View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/E4DD5E3A-C886-4E66-AA87-477349E96970%40iris.washington.edu.

Philip Crotwell

Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes

2016-08-12 02:48:42

Hi

I think there should be a separation from what a datacenter permits in
its ingestion systems and what is allowed in the file format. I have
no problem with a datacenter saying "we only take records less than X
bytes" and it probably also makes sense for datacenters to give out
only small sized records. However, there is an advantage for client
software to be able to save a single continuous timespan of data as a
single array of floats, and 65k is kind of small for that. I know
there is an argument that miniseed is not for post processing, but
that seems to me to be a poor reason as it can handle it and it is
really nice to be able to save without switching file formats just
because you have done some processing. And for the most part,
processing means to take records that are continuous and turn them
into a single big float array, do something, and then save the array
out. Having to undo that combining process just to be able to save in
the file format is not ideal. And keep in mind that if some of the
other changes, like network code length, happen, the existing post
processing file formats like SAC will no longer be capable of holding
new data.

And in this case, the save would likely not compress the data, nor
would it need to do the CRC. I would also observe that the current
miniseed allows records of up to 2 to the 256 power, and datacenters
have not been swamped by huge records.

It is true that big records are bad in certain cases, but that doesn't
mean that they are bad in all cases. I feel the file format should not
be designed to prevent those other uses. The extra 2 bytes of storage
to allow up to 4Gb records seems well worth it to me.

thanks
Philip

On Thu, Aug 11, 2016 at 4:00 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:

Hi all,

Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached:
Reduce record length field from 4 bytes to 2 bytes.

Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.

thanks,
Chad

----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)

Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/

View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/CAGFrVcXpFk4oTgBkpyK%3DgU95MQa478a_7OuQE3uiSoWuOoq7cQ%40mail.gmail.com.
- David Ketchum
  
  Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
  
  2016-08-17 19:17:57
  
  Hi,
  
  My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
  
  I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
  
  Dave
  
  On Aug 11, 2016, at 5:49 PM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
  
  Hi
  
  I think there should be a separation from what a datacenter permits in
  its ingestion systems and what is allowed in the file format. I have
  no problem with a datacenter saying "we only take records less than X
  bytes" and it probably also makes sense for datacenters to give out
  only small sized records. However, there is an advantage for client
  software to be able to save a single continuous timespan of data as a
  single array of floats, and 65k is kind of small for that. I know
  there is an argument that miniseed is not for post processing, but
  that seems to me to be a poor reason as it can handle it and it is
  really nice to be able to save without switching file formats just
  because you have done some processing. And for the most part,
  processing means to take records that are continuous and turn them
  into a single big float array, do something, and then save the array
  out. Having to undo that combining process just to be able to save in
  the file format is not ideal. And keep in mind that if some of the
  other changes, like network code length, happen, the existing post
  processing file formats like SAC will no longer be capable of holding
  new data.
  
  And in this case, the save would likely not compress the data, nor
  would it need to do the CRC. I would also observe that the current
  miniseed allows records of up to 2 to the 256 power, and datacenters
  have not been swamped by huge records.
  
  It is true that big records are bad in certain cases, but that doesn't
  mean that they are bad in all cases. I feel the file format should not
  be designed to prevent those other uses. The extra 2 bytes of storage
  to allow up to 4Gb records seems well worth it to me.
  
  thanks
  Philip
  
  On Thu, Aug 11, 2016 at 4:00 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
  
  Hi all,
  
  Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached:
  Reduce record length field from 4 bytes to 2 bytes.
  
  Please use this thread to provide your feedback on this proposal by
  Wednesday August 24th.
  
  thanks,
  Chad
  
  ----------------------
  Posted to multiple topics:
  FDSN Working Group II
  (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
  FDSN Working Group III
  (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
  
  Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
  Update subscription preferences at http://www.fdsn.org/account/profile/
  
  ----------------------
  Posted to multiple topics:
  FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
  FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
  
  Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
  Update subscription preferences at http://www.fdsn.org/account/profile/
  
  View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/CFCF60C6-B28C-4BC4-BBAC-1A8E5486E511%40stw-software.com.
  - Andres Heinloo
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-18 05:05:09
    
    On 08/17/2016 08:18 PM, David Ketchum wrote:
    
    Hi,
    
    My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
    
    I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
    
    One alternative, which would be better suited for real-time, would be
    using fixed-size "frames" instead of records. Think of a record
    consisting of header frame followed by a variable number of data frames.
    A frame might include timecode (sequence no.), channel index (for
    multiplexing) and possibly CRC. Due to fixed size, finding the start of
    a frame would be unambiguous. Compared to a 512-byte mseed 2.x record
    (header + 7 data frames), latency would be 7 times smaller, because each
    data frame could be sent separately. And by using more data frames one
    could reduce overall bandwidth without increasing latency.
    
    Transmitting data in 64-byte chunks was already attempted with mseed
    2.4, but unfortunately the total number of samples and the last sample
    value must be sent before any data. In the new format I would put such
    values, if needed, into a "summary" frame that would be sent after data
    frames.
    
    Regards,
    Andres.
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/57B4C375.6040903%40gfz-potsdam.de.
    
    Chad Trabant
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-19 06:57:20
    
    On Aug 17, 2016, at 1:06 PM, andres<at>gfz-potsdam.de wrote:
    
    On 08/17/2016 08:18 PM, David Ketchum wrote:
    
    Hi,
    
    My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
    
    I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
    
    One alternative, which would be better suited for real-time, would be
    using fixed-size "frames" instead of records. Think of a record
    consisting of header frame followed by a variable number of data frames.
    A frame might include timecode (sequence no.), channel index (for
    multiplexing) and possibly CRC. Due to fixed size, finding the start of
    a frame would be unambiguous. Compared to a 512-byte mseed 2.x record
    (header + 7 data frames), latency would be 7 times smaller, because each
    data frame could be sent separately. And by using more data frames one
    could reduce overall bandwidth without increasing latency.
    
    Transmitting data in 64-byte chunks was already attempted with mseed
    2.4, but unfortunately the total number of samples and the last sample
    value must be sent before any data. In the new format I would put such
    values, if needed, into a "summary" frame that would be sent after data
    frames.
    
    I like this idea. I've been considering similar concepts, dubbed microSEED, with frames that are not necessarily fixed length. The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data. But, if this concept could be developed in such a way that multiple frames could be easily reassembled into a next generation miniSEED record it might be a nice way to satisfy both archiving and real-time transmission needs.
    
    Chad
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/05022E9F-AA1A-4CB1-8528-DC7A53AFDE30%40iris.washington.edu.
    
    Andres Heinloo
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-19 21:54:27
    
    On 08/19/2016 08:58 AM, Chad Trabant wrote:
    
    On Aug 17, 2016, at 1:06 PM, andres<at>gfz-potsdam.de wrote:
    
    On 08/17/2016 08:18 PM, David Ketchum wrote:
    
    Hi,
    
    My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
    
    I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
    
    One alternative, which would be better suited for real-time, would be
    using fixed-size "frames" instead of records. Think of a record
    consisting of header frame followed by a variable number of data frames.
    A frame might include timecode (sequence no.), channel index (for
    multiplexing) and possibly CRC. Due to fixed size, finding the start of
    a frame would be unambiguous. Compared to a 512-byte mseed 2.x record
    (header + 7 data frames), latency would be 7 times smaller, because each
    data frame could be sent separately. And by using more data frames one
    could reduce overall bandwidth without increasing latency.
    
    Transmitting data in 64-byte chunks was already attempted with mseed
    2.4, but unfortunately the total number of samples and the last sample
    value must be sent before any data. In the new format I would put such
    values, if needed, into a "summary" frame that would be sent after data
    frames.
    
    I like this idea. I've been considering similar concepts, dubbed microSEED, with frames that are not necessarily fixed length. The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data. But, if this concept could be developed in such a way that multiple frames could be easily reassembled into a next generation miniSEED record it might be a nice way to satisfy both archiving and real-time transmission needs.
    
    All existing software would require significant modifications even with
    the current straw man (especially if variable length records are
    allowed). SeedLink, Web Services, all user software. The overall cost of
    the transition would be huge.
    
    If we want to design a format for the next 30 years, we should not
    restrict ourselves with limitations imposed by the current miniSEED
    format. On the other hand, if compatibility with the current miniSEED
    format is desired, just add another blockette to miniSEED 2.x (as
    suggested by Angelo Strollo earlier) and that's it.
    
    Back to the idea of "frames" -- indeed, some info that is needed for
    real-time transfer could be stripped in offline format. If records could
    be easily converted to frames and vice versa, it would be great.
    Currently the main problem is forward references (number of samples,
    detection flags, anything that refers to data that is not yet known when
    sending the header), so we need a "footer" in addition to header.
    
    Regards,
    Andres.
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/57B70183.5080108%40gfz-potsdam.de.
    
    Chad Trabant
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-24 03:40:44
    
    On Aug 19, 2016, at 5:55 AM, andres<at>gfz-potsdam.de wrote:
    
    On 08/19/2016 08:58 AM, Chad Trabant wrote:
    
    On Aug 17, 2016, at 1:06 PM, andres<at>gfz-potsdam.de wrote:
    
    On 08/17/2016 08:18 PM, David Ketchum wrote:
    
    Hi,
    
    My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
    
    I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
    
    One alternative, which would be better suited for real-time, would be
    using fixed-size "frames" instead of records. Think of a record
    consisting of header frame followed by a variable number of data frames.
    A frame might include timecode (sequence no.), channel index (for
    multiplexing) and possibly CRC. Due to fixed size, finding the start of
    a frame would be unambiguous. Compared to a 512-byte mseed 2.x record
    (header + 7 data frames), latency would be 7 times smaller, because each
    data frame could be sent separately. And by using more data frames one
    could reduce overall bandwidth without increasing latency.
    
    Transmitting data in 64-byte chunks was already attempted with mseed
    2.4, but unfortunately the total number of samples and the last sample
    value must be sent before any data. In the new format I would put such
    values, if needed, into a "summary" frame that would be sent after data
    frames.
    
    I like this idea. I've been considering similar concepts, dubbed microSEED, with frames that are not necessarily fixed length. The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data. But, if this concept could be developed in such a way that multiple frames could be easily reassembled into a next generation miniSEED record it might be a nice way to satisfy both archiving and real-time transmission needs.
    
    All existing software would require significant modifications even with
    the current straw man (especially if variable length records are
    allowed). SeedLink, Web Services, all user software. The overall cost of
    the transition would be huge.
    
    If we want to design a format for the next 30 years, we should not
    restrict ourselves with limitations imposed by the current miniSEED
    format. On the other hand, if compatibility with the current miniSEED
    format is desired, just add another blockette to miniSEED 2.x (as
    suggested by Angelo Strollo earlier) and that's it.
    
    Hi Andres,
    
    You have a point that we should not be limiting our thinking. I do think there is a sweet spot in the balance between a small patch on the current miniSEED (in particular one that could very detrimental to data identification) and something radically different. The very first straw man was created with that particular balance in mind as a place to start discussion, with the full expectation that it would evolve. My feeling is that non-independent records, ala headers plus frames transmitted independently, is a more radical change than anything in the straw man from the perspective of code reading the data.
    
    As for the concept of "just" adding a blockette to extend the network code, all of software you mentioned (SeedLink, Web Services, all user software) in addition to data center schemas, data center software and, very importantly, data generation systems would need to be updated in order to not lose network identifiers. The libraries that do this parsing at the data center and user levels are the easy part, pushing updates out to all the places that use them will simply take a lot of time. As D. Ketchum wrote, updates will not be overnight. You can easily imagine there will be old versions of slink2ew, slink2ew, chain_plugin and many, many more pieces of middle-ware running for a very long time. In some cases they will be transforming the data from miniSEED to something else and silently stripping the network identifiers out. In other cases the new blockette(s) may be retained but all miniSEED3 data will need to be referred to as network "99" (or whatever) because the old system doesn't know any better. The overall cost of this transition would be huge, even for just adding a blockette.
    
    Surely we can address other fundamental issues such as record byte order identification, which cannot be fixed with a simple blockette, if we are going to effectively go through a full software stack update. Much of the same planning, such as getting systems and software updated well before any new style data flows, is similar.
    
    Back to the idea of "frames" -- indeed, some info that is needed for
    real-time transfer could be stripped in offline format. If records could
    be easily converted to frames and vice versa, it would be great.
    Currently the main problem is forward references (number of samples,
    detection flags, anything that refers to data that is not yet known when
    sending the header), so we need a "footer" in addition to header.
    
    A footer would work. Alternatively, the "micro" header on each frame could contain: start time of primary header (for sequencing), the starttime of the first sample in the frame, the number of samples in the frame and any optional headers relevant for the frame (detection). Reassembly to a full record would require summing up the sample counts, combining the optional headers and stripping the micro/frame headers. Some care would be needed with details. If we created such a telemetry framing for otherwise complete "next generation" miniSEED it would have the advantage of limiting the telemetry complexity to those systems that need it, allowing some degree of separation between the use cases of telemetry, archiving, etc. It's certainly an intriguing line of thought.
    
    regards,
    Chad
    
    Regards,
    Andres.
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/2754F30D-F56C-4318-ABB1-2F82FE754E01%40iris.washington.edu.
    
    Joachim Saul
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-19 22:45:47
    
    Chad Trabant schrieb am 19.08.2016 um 08:58:
    
    The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data.
    
    Thank you, Chad, for addressing an important point: the costs of the new
    format!
    
    Do you have a rough idea about what the costs of the transition to an
    incompatible new data format would be? Reading this discussion one might
    get the impression that the transition would be a piece of cake. A
    version change, a few modified headers, an extended network code plus a
    few other improvements like microsecond time resolution. Hitherto
    stubborn network operators will be forced not to use empty location
    codes. But all these benefits will come with a price tag because of the
    incompatibility of the new format with MiniSEED.
    
    So what will be the cost of the transition? Who will pay the bill? Will
    the costs be spread across the community or will the data centers have
    to cover the costs alone?
    
    There are quite a few tasks ahead of "us". "Us" means a whole community
    of data providers, data management centers, data users, software
    developers, hardware manufacturers. World-wide! I.e., everyone who is
    now working with MiniSEED and has got used to it. Everyone!
    
    Tasks will include:
    
    * Recoding of entire data archives
    
    * Software updates. In some cases redesign will be necessary, while
    legacy software will just cease to work with the new format.
    
    * Migrate data streaming and exchange between institutions world-wide.
    It is easy to foresee that real-time data exchange, which was pretty
    hard to establish in the first place with many partners world-wide, will
    be heavily affected by migrating to the new format.
    
    * Request tools: will there be a deadline like "by August 1st, 2017,
    00:00:00 UTC" all fdsnws's have to support to the new format? Or will
    there be a transition? If so, how will this be organized? Either access
    to two archives (for each format) will be required or the fdsnws's will
    have to be enabled to deliver both formats by conversion on the fly?
    
    * Hardware manufacturers will have to support the new format.
    
    * Station network operators will have to bear the costs of adopting the
    new format even though it may not yield any benefit to them.
    
    I could probably add more items to this list but thinking of the above
    tasks causes me enough headaches already. That's the reason why I am
    publicly raising the cost question now because the proponents of the new
    format must have been thinking about this and probably have some idea
    about how costly the transition would be.
    
    Speaking of costs I would like to remind you of the alternative proposal
    presented on July 8th by Angelo Strollo on behalf of the major European
    data centers. They propose to simply introduce a new blockette 1002 to
    accommodate longer network codes but with enough space for additional
    attributes such as extended location id's etc. This light-weight
    solution is backward compatible with the existing MiniSEED. It is
    therefore the least disruptive solution and minimizes the costs of the
    transition.
    
    Regards
    Joachim
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/11e77e9a-b684-058a-5ebb-f30984c3fde4%40gfz-potsdam.de.
    
    Philip Crotwell
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-19 17:36:47
    
    Just want to point out that a new blockette with extended network code
    is NOT backwards compatible. Old software that does not recognize the
    new blockette (and therefore likely ignores it) will report it
    successfully read the data, but will attribute new data records to the
    wrong network. It may appear that this is a lower cost, however this
    would generate a new class of bugs that would likely be subtle and
    would persist for decades to come. There is pain in both ways, but I
    would much prefer a system that fails obviously when it fails to one
    that seems to work but actually is wrong infrequently and in a way
    that is hard to notice.
    
    A failure that looks like a failure gets fixed quickly, a failure that
    looks like a success can easily persist for a long time, causing much
    more damage in the long run.
    
    Philip
    
    On Fri, Aug 19, 2016 at 9:46 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Chad Trabant schrieb am 19.08.2016 um 08:58:
    
    The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data.
    
    Thank you, Chad, for addressing an important point: the costs of the new
    format!
    
    Do you have a rough idea about what the costs of the transition to an
    incompatible new data format would be? Reading this discussion one might
    get the impression that the transition would be a piece of cake. A
    version change, a few modified headers, an extended network code plus a
    few other improvements like microsecond time resolution. Hitherto
    stubborn network operators will be forced not to use empty location
    codes. But all these benefits will come with a price tag because of the
    incompatibility of the new format with MiniSEED.
    
    So what will be the cost of the transition? Who will pay the bill? Will
    the costs be spread across the community or will the data centers have
    to cover the costs alone?
    
    There are quite a few tasks ahead of "us". "Us" means a whole community
    of data providers, data management centers, data users, software
    developers, hardware manufacturers. World-wide! I.e., everyone who is
    now working with MiniSEED and has got used to it. Everyone!
    
    Tasks will include:
    
    * Recoding of entire data archives
    
    * Software updates. In some cases redesign will be necessary, while
    legacy software will just cease to work with the new format.
    
    * Migrate data streaming and exchange between institutions world-wide.
    It is easy to foresee that real-time data exchange, which was pretty
    hard to establish in the first place with many partners world-wide, will
    be heavily affected by migrating to the new format.
    
    * Request tools: will there be a deadline like "by August 1st, 2017,
    00:00:00 UTC" all fdsnws's have to support to the new format? Or will
    there be a transition? If so, how will this be organized? Either access
    to two archives (for each format) will be required or the fdsnws's will
    have to be enabled to deliver both formats by conversion on the fly?
    
    * Hardware manufacturers will have to support the new format.
    
    * Station network operators will have to bear the costs of adopting the
    new format even though it may not yield any benefit to them.
    
    I could probably add more items to this list but thinking of the above
    tasks causes me enough headaches already. That's the reason why I am
    publicly raising the cost question now because the proponents of the new
    format must have been thinking about this and probably have some idea
    about how costly the transition would be.
    
    Speaking of costs I would like to remind you of the alternative proposal
    presented on July 8th by Angelo Strollo on behalf of the major European
    data centers. They propose to simply introduce a new blockette 1002 to
    accommodate longer network codes but with enough space for additional
    attributes such as extended location id's etc. This light-weight
    solution is backward compatible with the existing MiniSEED. It is
    therefore the least disruptive solution and minimizes the costs of the
    transition.
    
    Regards
    Joachim
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/CAGFrVcVcFWu3hiZwVUyxsF9eMR1Bsa1cHDkU7Rhm59q-rh79Ww%40mail.gmail.com.
    
    Andres Heinloo
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-20 00:14:58
    
    On 08/19/2016 04:37 PM, Philip Crotwell wrote:
    
    Just want to point out that a new blockette with extended network code
    is NOT backwards compatible. Old software that does not recognize the
    new blockette (and therefore likely ignores it) will report it
    successfully read the data, but will attribute new data records to the
    wrong network. It may appear that this is a lower cost, however this
    would generate a new class of bugs that would likely be subtle and
    would persist for decades to come. There is pain in both ways, but I
    would much prefer a system that fails obviously when it fails to one
    that seems to work but actually is wrong infrequently and in a way
    that is hard to notice.
    
    A failure that looks like a failure gets fixed quickly, a failure that
    looks like a success can easily persist for a long time, causing much
    more damage in the long run.
    
    A special 2-letter network code can be reserved. AFAIK there are even
    some obvious network codes, such as "99" or "XX" that have never been
    used. If data records are attributed to network "99", it is quite
    obvious what is going on. Yet, if I use my old PQLX to quickly look at
    the data, I don't care about the network code.
    
    Wasn't the network code added in SEED 2.3 in the first place? Any issues
    known?
    
    Regards,
    Andres.
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/57B72272.5070107%40gfz-potsdam.de.
    
    Chad Trabant
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-20 00:00:50
    
    On Aug 19, 2016, at 8:15 AM, andres<at>gfz-potsdam.de wrote:
    
    On 08/19/2016 04:37 PM, Philip Crotwell wrote:
    
    Just want to point out that a new blockette with extended network code
    is NOT backwards compatible. Old software that does not recognize the
    new blockette (and therefore likely ignores it) will report it
    successfully read the data, but will attribute new data records to the
    wrong network. It may appear that this is a lower cost, however this
    would generate a new class of bugs that would likely be subtle and
    would persist for decades to come. There is pain in both ways, but I
    would much prefer a system that fails obviously when it fails to one
    that seems to work but actually is wrong infrequently and in a way
    that is hard to notice.
    
    A failure that looks like a failure gets fixed quickly, a failure that
    looks like a success can easily persist for a long time, causing much
    more damage in the long run.
    
    A special 2-letter network code can be reserved. AFAIK there are even
    some obvious network codes, such as "99" or "XX" that have never been
    used. If data records are attributed to network "99", it is quite
    obvious what is going on. Yet, if I use my old PQLX to quickly look at
    the data, I don't care about the network code.
    
    Wasn't the network code added in SEED 2.3 in the first place? Any issues
    known?
    
    I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications. You can easily imagine older data converters being used for a long time and the expanded network code going missing right away. I predict it wouldn't take very long before network 99 shows up in publications.
    
    I do not believe assertions that all users of SEED will think it obvious what is going on with network 99. The grad student doing their work with an old version of PQLX is simply not going to know.
    
    As Philip says, it'd be better to break things than quietly continue to work while losing network identifiers.
    
    Furthermore, even this small update would require modifications to all software chains, from data generations to data centers to users, along with database schemas, protocols, etc., etc. That is a huge amount of work for such a small change. If we are going to go through all of that we should at least fix some of the other issues with miniSEED. And now we are back at the beginning of this conversation that started in ~2013.
    
    Chad
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/B65FA7A1-1FBF-4498-95D4-807A1DCDEF90%40iris.washington.edu.
    
    Joachim Saul
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-24 22:27:14
    
    Chad Trabant wrote on 20.08.2016 02:01:
    
    I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications.
    
    Hallo Chad,
    
    what would be "a very long time"?
    
    First of all note that most of the current infrastructures world-wide
    will not be affected by the blockette-1002 extension at all. The reason
    for this is that most institutions will simply not produce any data with
    1002 blockettes because they don't need the extended attributes. They
    will continue to produce and exchange 2.4 MiniSEED just as they have
    been for many years. They will not have to upgrade their station
    hardware/software in order to produce up-to-date, valid MiniSEED. NO CHANGE!
    
    Of course, "most institutions" is not necessarily all and sooner or
    later data with blockette 1002 will start to circulate. This will
    require blockette-1002 aware decoders to make use of the extended
    attributes.
    
    The obvious question is now: How much time would it take to update
    libmseed, qlib, seedlink et al. to support blockette 1002? A week? A
    month? A year? A very long time?
    
    As soon as blockette-1002 aware versions of said libraries are
    available, the software using them needs to be re-compiled and linked
    against them. A lot of software if not most is going to be
    blockette-1002 enabled that way, without need for further modifications.
    And, very importantly, the software can be made blockette-1002-ready
    WELL IN ADVANCE of the actual circulation of blockette-1002 data!
    
    This means specifically: If a consensus about the blockette 1002
    structure can be found, say, by December (e.g. AGU), then the work to
    make libmseed, qlib, seedlink et al. blockette-1002 ready and
    subsequently the software that uses them will take at most a few more
    months. With an updated libmseed, software like ObsPy and SeisComP will
    support at least the extended attributes out of the box. I haven't
    looked at the PQLX details but since it also uses libmseed to read
    MiniSEED, a blockette-1002-ready libmseed should allow the transition
    will very little (if any) further effort. I am therefore sure that most
    relevant, actively maintained software can likewise be made
    blockette-1002 ready before the Kobe meeting.
    
    There are, of course, details that need to be addressed. For instance,
    the proposed 4-character location identifier and how it is converted to
    Earthworm's tracebuf format, as pointed out by Dave. But these problems
    would be the same for blockette-1002 MiniSEED and the proposed new format.
    
    You can easily imagine older data converters being used for a long time and the expanded network code going missing right away.
    
    Older data converters WILL continue to work fine with all currently
    existing MiniSEED streams. Whereas NO older data converters will work
    with ANY data converted to the proposed new and entirely incompatible
    format!
    
    I predict it wouldn't take very long before network 99 shows up in publications.
    
    This implies authors who don't have a clue about what a network code is.
    How would they be able to correctly use a network code? That's not an
    issue of data formats but channel naming in general.
    
    I do not believe assertions that all users of SEED will think it obvious what is going on with network 99. The grad student doing their work with an old version of PQLX is simply not going to know.
    
    Why not inform the grad student? What does it take for the grad student
    to learn that in an FDSN network code context "IU" doesn't stand for
    "Indiana University"?
    
    http://www.fdsn.org/networks/detail/IU
    
    That's all! In case that grad student happens to stumble upon "99" then
    probably an explanation on http://www.fdsn.org/networks/detail/99 would
    help him or her.
    
    As Philip says, it'd be better to break things than quietly continue to work while losing network identifiers.
    
    What do you mean by "things"? The proposed new format and its
    implementation would not just break the grad student's PQLX but it would
    break ENTIRE INFRASTRUCTURES. World-wide and from bottom to top!
    
    Do you want to disrupt the entire FDSN data exchange to protect the grad
    student using an old PQLX from getting a "99" network code? Is that what
    you are saying?
    
    Furthermore, even this small update would require modifications to all software chains,
    
    You have a position and are trying your best to defend it. This is
    legitimate of course. But are exaggerating minor problems in order to
    discredit an approach that you cannot deny would be a lot less
    disruptive and expensive than the proposed new format.
    
    from data generations
    
    No modifications are needed at the stations. Stations continue to
    produce 2.4 MiniSEED which will remains valid. There is no need to
    produce blockette 1002 except for stations that e.g. have extended
    network or location codes. There will not be many (it any) in currently
    existing networks.
    
    to data centers
    
    Data centers are the ones that benefit most from a continuity that the
    blockette-1002 approach would allow because they neither need to recode
    entire archives nor have to provide "old" and "new" data formats in
    parallel.
    
    to users
    
    Only users that actually use blockette-1002 data. If these users use
    up-to-date versions of actively maintained software such as ObsPy,
    SeisComP or MiniSEED-to-SAC converters they will not notice any
    difference. Legacy software will continue to work with the exception of
    the network code that will show up as "99".
    
    along with database schemas, protocols, etc., etc.
    
    There are some cases where updates will require further efforts. We
    already read about Earthworm and the limited space for the location
    identifier in the current Tracebuf2 format. But the effort at the
    Earthworm end to accommodate a longer location identifier would be the
    same for blockette-1002 data as for the proposed new format. It is
    therefore understandable that the Earthworm community has reservations
    against an extended location code because it would have to pay the price
    for something it probably doesn't need.
    
    In general chances are high that most database schemas will remain
    unaffected as well as most protocols.
    
    But I am curious to hear about specific database schemas that would be
    more difficult to update to blockette-1002 MiniSEED than to the proposed
    new format.
    
    That is a huge amount of work for such a small change.
    
    I hope to have pointed out by now that the work required to implement
    blockette 1002 would in fact be dramatically less compared to the work
    required to upgrade entire infrastructures (indeed from the data loggers
    all the way to data users) to a fully incompatible new format.
    
    And now we are back at the beginning of this conversation that started in ~2013.
    
    What conversation are you referring to?
    
    Cheers
    Joachim
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/57BDA0B2.8%40gfz-potsdam.de.
    
    Philip Crotwell
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-24 16:54:40
    
    Hi
    
    Just like to point out that merely upgrading a library, like libmseed,
    to parse a new blockette does not make suddenly make older software
    compatible with a longer network code. If the software itself is not
    also upgraded to use the information in the new blockette then the new
    information is effectively ignored. I feel that this idea that there
    is a non-disruptive, easy "fix" to expanding the network code is
    unrealistic.
    
    Philip
    
    On Wed, Aug 24, 2016 at 9:28 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Chad Trabant wrote on 20.08.2016 02:01:
    
    I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications.
    
    Hallo Chad,
    
    what would be "a very long time"?
    
    First of all note that most of the current infrastructures world-wide
    will not be affected by the blockette-1002 extension at all. The reason
    for this is that most institutions will simply not produce any data with
    1002 blockettes because they don't need the extended attributes. They
    will continue to produce and exchange 2.4 MiniSEED just as they have
    been for many years. They will not have to upgrade their station
    hardware/software in order to produce up-to-date, valid MiniSEED. NO CHANGE!
    
    Of course, "most institutions" is not necessarily all and sooner or
    later data with blockette 1002 will start to circulate. This will
    require blockette-1002 aware decoders to make use of the extended
    attributes.
    
    The obvious question is now: How much time would it take to update
    libmseed, qlib, seedlink et al. to support blockette 1002? A week? A
    month? A year? A very long time?
    
    As soon as blockette-1002 aware versions of said libraries are
    available, the software using them needs to be re-compiled and linked
    against them. A lot of software if not most is going to be
    blockette-1002 enabled that way, without need for further modifications.
    And, very importantly, the software can be made blockette-1002-ready
    WELL IN ADVANCE of the actual circulation of blockette-1002 data!
    
    This means specifically: If a consensus about the blockette 1002
    structure can be found, say, by December (e.g. AGU), then the work to
    make libmseed, qlib, seedlink et al. blockette-1002 ready and
    subsequently the software that uses them will take at most a few more
    months. With an updated libmseed, software like ObsPy and SeisComP will
    support at least the extended attributes out of the box. I haven't
    looked at the PQLX details but since it also uses libmseed to read
    MiniSEED, a blockette-1002-ready libmseed should allow the transition
    will very little (if any) further effort. I am therefore sure that most
    relevant, actively maintained software can likewise be made
    blockette-1002 ready before the Kobe meeting.
    
    There are, of course, details that need to be addressed. For instance,
    the proposed 4-character location identifier and how it is converted to
    Earthworm's tracebuf format, as pointed out by Dave. But these problems
    would be the same for blockette-1002 MiniSEED and the proposed new format.
    
    You can easily imagine older data converters being used for a long time and the expanded network code going missing right away.
    
    Older data converters WILL continue to work fine with all currently
    existing MiniSEED streams. Whereas NO older data converters will work
    with ANY data converted to the proposed new and entirely incompatible
    format!
    
    I predict it wouldn't take very long before network 99 shows up in publications.
    
    This implies authors who don't have a clue about what a network code is.
    How would they be able to correctly use a network code? That's not an
    issue of data formats but channel naming in general.
    
    I do not believe assertions that all users of SEED will think it obvious what is going on with network 99. The grad student doing their work with an old version of PQLX is simply not going to know.
    
    Why not inform the grad student? What does it take for the grad student
    to learn that in an FDSN network code context "IU" doesn't stand for
    "Indiana University"?
    
    http://www.fdsn.org/networks/detail/IU
    
    That's all! In case that grad student happens to stumble upon "99" then
    probably an explanation on http://www.fdsn.org/networks/detail/99 would
    help him or her.
    
    As Philip says, it'd be better to break things than quietly continue to work while losing network identifiers.
    
    What do you mean by "things"? The proposed new format and its
    implementation would not just break the grad student's PQLX but it would
    break ENTIRE INFRASTRUCTURES. World-wide and from bottom to top!
    
    Do you want to disrupt the entire FDSN data exchange to protect the grad
    student using an old PQLX from getting a "99" network code? Is that what
    you are saying?
    
    Furthermore, even this small update would require modifications to all software chains,
    
    You have a position and are trying your best to defend it. This is
    legitimate of course. But are exaggerating minor problems in order to
    discredit an approach that you cannot deny would be a lot less
    disruptive and expensive than the proposed new format.
    
    from data generations
    
    No modifications are needed at the stations. Stations continue to
    produce 2.4 MiniSEED which will remains valid. There is no need to
    produce blockette 1002 except for stations that e.g. have extended
    network or location codes. There will not be many (it any) in currently
    existing networks.
    
    to data centers
    
    Data centers are the ones that benefit most from a continuity that the
    blockette-1002 approach would allow because they neither need to recode
    entire archives nor have to provide "old" and "new" data formats in
    parallel.
    
    to users
    
    Only users that actually use blockette-1002 data. If these users use
    up-to-date versions of actively maintained software such as ObsPy,
    SeisComP or MiniSEED-to-SAC converters they will not notice any
    difference. Legacy software will continue to work with the exception of
    the network code that will show up as "99".
    
    along with database schemas, protocols, etc., etc.
    
    There are some cases where updates will require further efforts. We
    already read about Earthworm and the limited space for the location
    identifier in the current Tracebuf2 format. But the effort at the
    Earthworm end to accommodate a longer location identifier would be the
    same for blockette-1002 data as for the proposed new format. It is
    therefore understandable that the Earthworm community has reservations
    against an extended location code because it would have to pay the price
    for something it probably doesn't need.
    
    In general chances are high that most database schemas will remain
    unaffected as well as most protocols.
    
    But I am curious to hear about specific database schemas that would be
    more difficult to update to blockette-1002 MiniSEED than to the proposed
    new format.
    
    That is a huge amount of work for such a small change.
    
    I hope to have pointed out by now that the work required to implement
    blockette 1002 would in fact be dramatically less compared to the work
    required to upgrade entire infrastructures (indeed from the data loggers
    all the way to data users) to a fully incompatible new format.
    
    And now we are back at the beginning of this conversation that started in ~2013.
    
    What conversation are you referring to?
    
    Cheers
    Joachim
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/CAGFrVcW0RYAbsC9Vp7VgYWR2PjWnPzi9Lr2vk6MrhAWwmCG-hw%40mail.gmail.com.
    
    Joachim Saul
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-25 01:48:01
    
    Philip Crotwell schrieb am 24.08.2016 um 15:55:
    
    Just like to point out that merely upgrading a library, like libmseed,
    to parse a new blockette does not make suddenly make older software
    compatible with a longer network code.
    
    The structure in libmseed that holds the record header attributes is 'MSRecord'. If the decoder of an updated libmseed sees a blockette 1002 it will have to take the information about the network code etc. from there and populate the MSRecord accordingly. That's all. The software will then use or copy the content of MSRecord.network, which by the way is large enough already (10 characters plus '\0') to accommodate the extended network code.
    
    Mission accomplished! Well... mostly.
    
    If the software itself is not
    also upgraded to use the information in the new blockette then the new
    information is effectively ignored.
    
    There will of course be target data structures in which the network code is hard-coded to be only two characters long. In such cases (hopefully) only two characters are copied. I haven't found a software in which this would be an actual issue. There *is* a similar issue, though, with the extended location code and the Earthworm Tracebuf2 structure. This will be a pain to solve within the Earthworm community but neither the blockette 1002 nor the proposed new format can be blamed for it. It's a limitation of Earthworm that is due to the current SEED channel naming conventions.
    
    ObsPy, SeisComP, SAC, to name a few, would have no problem at all to accommodate the extended attributes. This is probably true for most other actively maintained software that uses either libmseed or qlib.
    
    I feel that this idea that there
    is a non-disruptive, easy "fix" to expanding the network code is
    unrealistic.
    
    There will never be a solution involving zero effort.
    
    The question is how much effort each of the proposals would require. The blockette-1002 solution would be by far the easiest to adopt. But most importantly, existing infrastructures not requiring extended headers will not be disrupted at all. In other words: all existing real-time data exchange world-wide can continue to work as it does now. This allows enough time to upgrade software to support blockette 1002 and once blockette-1002 data actually start to circulate, most software infrastructures should be able to handle it properly.
    
    Cheers
    Joachim
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/9541fa02-c8d5-f4eb-ecae-3103ce0cf1c4%40gfz-potsdam.de.
    
    Jean-Marie Saurel
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-24 03:30:08
    
    Hello,
    
    I've read most of the topics and I'm inclined to follow Joachim and
    Andres comments about the change cost, specially regarding the network
    operators and people who use data for operational stuffs (as David
    pointed also).
    I'm not worried for PhD students and those who work on off-line data.
    
    And I understand the concerns about having a reserved network code like
    99 being incorrectly used in publication because of legacy software not
    reading the extended one.
    
    So, all this preamble to simply ask the following question.
    Will the extended network code be reserved for temporary networks ?
    Or will it also be available for new permanent networks as soon as
    adopted ?
    
    I ask this because if it's only for temporary networks, then we can
    have more time to migrate all operationnal stuff.
    On the contrary, if it's also available for permanent networks, we very
    soon would see new permanent stations not being used by most of
    operational entities (I'm thinking right now about Tsunami Service
    Providers, global location and/or CMT providers) because their software
    doesn't support more than two letter network codes.
    Think that some Earthworm modules have been made location code
    compatible only one or two years ago and some of their users have not
    migrated to use those new modules !!
    
    Regards.
    
    Jean-Marie SAUREL.
    
    Le 19.08.2016 15:15, andres<at>gfz-potsdam.de a écrit :
    
    On 08/19/2016 04:37 PM, Philip Crotwell wrote:
    
    Just want to point out that a new blockette with extended network
    code
    is NOT backwards compatible. Old software that does not recognize
    the
    new blockette (and therefore likely ignores it) will report it
    successfully read the data, but will attribute new data records to
    the
    wrong network. It may appear that this is a lower cost, however this
    would generate a new class of bugs that would likely be subtle and
    would persist for decades to come. There is pain in both ways, but I
    would much prefer a system that fails obviously when it fails to one
    that seems to work but actually is wrong infrequently and in a way
    that is hard to notice.
    
    A failure that looks like a failure gets fixed quickly, a failure
    that
    looks like a success can easily persist for a long time, causing
    much
    more damage in the long run.
    
    A special 2-letter network code can be reserved. AFAIK there are even
    some obvious network codes, such as "99" or "XX" that have never been
    used. If data records are attributed to network "99", it is quite
    obvious what is going on. Yet, if I use my old PQLX to quickly look
    at
    the data, I don't care about the network code.
    
    Wasn't the network code added in SEED 2.3 in the first place? Any
    issues
    known?
    
    Regards,
    Andres.
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II
    (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III
    (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at
    http://www.fdsn.org/account/profile/
    
    --
    --------------------------------------
    ICG-CARIBE EWS WG1 chair
    Institut de Physique du Globe de Paris
    Observatoire Volcanologique et Sismologique
    1 rue Jussieu
    75005 Paris
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/3433d9e89144b1e62c51fee1a9416868%40imap.ipgp.fr.
    
    Joachim Saul
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-20 00:31:51
    
    Philip Crotwell wrote on 19.08.2016 16:37:
    
    Just want to point out that a new blockette with extended network code
    is NOT backwards compatible.
    
    As I wrote before, it *is* backward compatible with the *existing* MiniSEED, which is *all* MiniSEED currently existing in *all* archives. I didn't write "blockette-1002 MiniSEED", because it is obvious that attributes specific to blockette 1002 need to be retrieved from there.
    
    The *only* compromise w.r.t. backward compatibility occurs if a blockette-1002 unaware software reads blockette-1002 MiniSEED. That is the price tag of the alternative solution. A minimal cost compared to, e.g., the recoding of entire data archives and disruption of complex data infrastructures. And as soon as that previously blockette-1002 unaware software is linked against an updated libmseed or qlib, the problem is gone anyway. In fact, for many data centers and infrastructures, the cost will be close to zero in practice.
    
    Actually an updated libmseed or qlib would be made available long before the first blockette-1002 MiniSEED data actually start circulating publicly. Therefore all actively maintained software can be made 1002-ready well in advance.
    
    Regards
    Joachim
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/69bdc56a-1934-0e4f-fa0e-6b3a8d312153%40gfz-potsdam.de.
    
    Chad Trabant
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-24 04:27:33
    
    Hi Joachim,
    
    At the IRIS DMC we have thought quite a bit about the costs of a transition to a newer generation of miniSEED. In many respects I think the DMC has more at stake in terms of operational change than any other single group in the FDSN. This is a discussion intended to develop a proposal for the FDSN to consider in 2017, only after which an adoption plan can be finalized. Personally, depending on the transition discussions, I would be surprised if we have much traction on adoption by 2018, it could easily be longer.
    
    The transition of SEED data to use new identifiers as outlined in the alternative proposal presented on July 8th by Angelo Strollo would also require most of the same data systems (data producers, middle-ware, data centers, user software) to be updated, which would take a long time. Also, until such time that most software has been updated we risk losing any extended network identifiers. The implication that we would simply add a new blockette, update a few libraries and the transition is over seems very unrealistic to me. Furthermore, that is a lot of cost to address a single issue in SEED.
    
    Chad
    
    Chad Trabant schrieb am 19.08.2016 um 08:58:
    
    The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data.
    
    Thank you, Chad, for addressing an important point: the costs of the new
    format!
    
    Do you have a rough idea about what the costs of the transition to an
    incompatible new data format would be? Reading this discussion one might
    get the impression that the transition would be a piece of cake. A
    version change, a few modified headers, an extended network code plus a
    few other improvements like microsecond time resolution. Hitherto
    stubborn network operators will be forced not to use empty location
    codes. But all these benefits will come with a price tag because of the
    incompatibility of the new format with MiniSEED.
    
    So what will be the cost of the transition? Who will pay the bill? Will
    the costs be spread across the community or will the data centers have
    to cover the costs alone?
    
    There are quite a few tasks ahead of "us". "Us" means a whole community
    of data providers, data management centers, data users, software
    developers, hardware manufacturers. World-wide! I.e., everyone who is
    now working with MiniSEED and has got used to it. Everyone!
    
    Tasks will include:
    
    * Recoding of entire data archives
    
    * Software updates. In some cases redesign will be necessary, while
    legacy software will just cease to work with the new format.
    
    * Migrate data streaming and exchange between institutions world-wide.
    It is easy to foresee that real-time data exchange, which was pretty
    hard to establish in the first place with many partners world-wide, will
    be heavily affected by migrating to the new format.
    
    * Request tools: will there be a deadline like "by August 1st, 2017,
    00:00:00 UTC" all fdsnws's have to support to the new format? Or will
    there be a transition? If so, how will this be organized? Either access
    to two archives (for each format) will be required or the fdsnws's will
    have to be enabled to deliver both formats by conversion on the fly?
    
    * Hardware manufacturers will have to support the new format.
    
    * Station network operators will have to bear the costs of adopting the
    new format even though it may not yield any benefit to them.
    
    I could probably add more items to this list but thinking of the above
    tasks causes me enough headaches already. That's the reason why I am
    publicly raising the cost question now because the proponents of the new
    format must have been thinking about this and probably have some idea
    about how costly the transition would be.
    
    Speaking of costs I would like to remind you of the alternative proposal
    presented on July 8th by Angelo Strollo on behalf of the major European
    data centers. They propose to simply introduce a new blockette 1002 to
    accommodate longer network codes but with enough space for additional
    attributes such as extended location id's etc. This light-weight
    solution is backward compatible with the existing MiniSEED. It is
    therefore the least disruptive solution and minimizes the costs of the
    transition.
    
    Regards
    Joachim
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/9353F66A-E78E-4BA4-A7FC-01B6F580E34A%40iris.washington.edu.
  - Chad Trabant
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-19 23:38:56
    
    Hi Dave,
    
    I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
    
    Can you share some of the other reasons?
    
    I get the rapid access reasoning I think. As I've heard it described where one makes some educated guesses about where the data are in a file and skips around until you zero-in on the correct record(s).
    
    The notion of a variable record length has been raised a number of times in the past, we finally added it to the straw man for these reasons:
    a) In many ways it is a better fit for real time streams. No more waiting to "fill a record" or transmitting unfilled records, latency is much more controllable without waste. Also, data are usually generated at a regular rate, if one would like to package and transmit them at a regular rate with compression the output size is not readily predictable.
    
    b) Adjustments to records such as adding optional headers become much easier. In 2.x miniSEED if you wanted to, for example, add a blockette but there is not enough room you are stuck with re-encoding the data into unfilled records or reprocessing a lot of data to pack it efficiently.
    
    I'm on the fence with this one and would appreciate hearing about any other pros and cons regarding variable versus fixed record lengths.
    
    thanks,
    Chad
    
    On Aug 17, 2016, at 11:18 AM, David Ketchum <dckgov<at>stw-software.com> wrote:
    
    Hi,
    
    My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
    
    I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
    
    Dave
    
    On Aug 11, 2016, at 5:49 PM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    I think there should be a separation from what a datacenter permits in
    its ingestion systems and what is allowed in the file format. I have
    no problem with a datacenter saying "we only take records less than X
    bytes" and it probably also makes sense for datacenters to give out
    only small sized records. However, there is an advantage for client
    software to be able to save a single continuous timespan of data as a
    single array of floats, and 65k is kind of small for that. I know
    there is an argument that miniseed is not for post processing, but
    that seems to me to be a poor reason as it can handle it and it is
    really nice to be able to save without switching file formats just
    because you have done some processing. And for the most part,
    processing means to take records that are continuous and turn them
    into a single big float array, do something, and then save the array
    out. Having to undo that combining process just to be able to save in
    the file format is not ideal. And keep in mind that if some of the
    other changes, like network code length, happen, the existing post
    processing file formats like SAC will no longer be capable of holding
    new data.
    
    And in this case, the save would likely not compress the data, nor
    would it need to do the CRC. I would also observe that the current
    miniseed allows records of up to 2 to the 256 power, and datacenters
    have not been swamped by huge records.
    
    It is true that big records are bad in certain cases, but that doesn't
    mean that they are bad in all cases. I feel the file format should not
    be designed to prevent those other uses. The extra 2 bytes of storage
    to allow up to 4Gb records seems well worth it to me.
    
    thanks
    Philip
    
    On Thu, Aug 11, 2016 at 4:00 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hi all,
    
    Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached:
    Reduce record length field from 4 bytes to 2 bytes.
    
    Please use this thread to provide your feedback on this proposal by
    Wednesday August 24th.
    
    thanks,
    Chad
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II
    (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III
    (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/DC2049FC-F853-44C9-A853-8ADD933749E4%40iris.washington.edu.
    
    David Ketchum
    
    Re: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes
    
    2016-08-20 17:12:04
    
    Chad,
    
    For instance the Edge/CWB software takes advantage of fixed length or at least power of two to store all channels of miniSEED in one files where regions of the file are reserved for each channel (generally 64 512 byte blocks). It can accommodate the 2.4 miniSEED of any size, but the blocks fit nicely into their extents and the indexing to find a channel and time range uses an index which only has to index the extents. This powerfully speeds up access to queried data. I know a lot of the work uses “file per channel per day”, but we found that pretty inefficient. I know that many use the binary search method you mentioned, which also works better on fixed length blocks.
    
    I do not particularly think the miniSEED is a very good choice for telemetry when short latency is desired - like for earthquake early warning. The fixed part of the header is so big relative to the pay load that it is not bandwidth efficient. If variable length records are desired for this, I think the alternative of using another telemetry format that is more efficient should win out. Note the current Q330 one second packets are not in miniSEED form but they are fairly efficient and variable length. The receiving software takes this format an generates miniSEED. The one second packets are available for the EEW and the miniSEED is generated for later use and archival. My take is that miniSEED 3 should not try to be a telemetry format as it would be a bad one - it is a standard format used at datacenter and after the realtime processing is done. Further the telemetry format is a competitive function best left to the digitizer vendors. We should insist their telemetry format make good MiniSEED 3 including all of the mandatory and optional flags etc, but how they achieve that should be left to them.
    
    Dave
    
    On Aug 19, 2016, at 5:38 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hi Dave,
    
    I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
    
    Can you share some of the other reasons?
    
    I get the rapid access reasoning I think. As I've heard it described where one makes some educated guesses about where the data are in a file and skips around until you zero-in on the correct record(s).
    
    The notion of a variable record length has been raised a number of times in the past, we finally added it to the straw man for these reasons:
    a) In many ways it is a better fit for real time streams. No more waiting to "fill a record" or transmitting unfilled records, latency is much more controllable without waste. Also, data are usually generated at a regular rate, if one would like to package and transmit them at a regular rate with compression the output size is not readily predictable.
    
    b) Adjustments to records such as adding optional headers become much easier. In 2.x miniSEED if you wanted to, for example, add a blockette but there is not enough room you are stuck with re-encoding the data into unfilled records or reprocessing a lot of data to pack it efficiently.
    
    I'm on the fence with this one and would appreciate hearing about any other pros and cons regarding variable versus fixed record lengths.
    
    thanks,
    Chad
    
    On Aug 17, 2016, at 11:18 AM, David Ketchum <dckgov<at>stw-software.com> wrote:
    
    Hi,
    
    My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
    
    I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
    
    Dave
    
    On Aug 11, 2016, at 5:49 PM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    I think there should be a separation from what a datacenter permits in
    its ingestion systems and what is allowed in the file format. I have
    no problem with a datacenter saying "we only take records less than X
    bytes" and it probably also makes sense for datacenters to give out
    only small sized records. However, there is an advantage for client
    software to be able to save a single continuous timespan of data as a
    single array of floats, and 65k is kind of small for that. I know
    there is an argument that miniseed is not for post processing, but
    that seems to me to be a poor reason as it can handle it and it is
    really nice to be able to save without switching file formats just
    because you have done some processing. And for the most part,
    processing means to take records that are continuous and turn them
    into a single big float array, do something, and then save the array
    out. Having to undo that combining process just to be able to save in
    the file format is not ideal. And keep in mind that if some of the
    other changes, like network code length, happen, the existing post
    processing file formats like SAC will no longer be capable of holding
    new data.
    
    And in this case, the save would likely not compress the data, nor
    would it need to do the CRC. I would also observe that the current
    miniseed allows records of up to 2 to the 256 power, and datacenters
    have not been swamped by huge records.
    
    It is true that big records are bad in certain cases, but that doesn't
    mean that they are bad in all cases. I feel the file format should not
    be designed to prevent those other uses. The extra 2 bytes of storage
    to allow up to 4Gb records seems well worth it to me.
    
    thanks
    Philip
    
    On Thu, Aug 11, 2016 at 4:00 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hi all,
    
    Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached:
    Reduce record length field from 4 bytes to 2 bytes.
    
    Please use this thread to provide your feedback on this proposal by
    Wednesday August 24th.
    
    thanks,
    Chad
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II
    (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III
    (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    ----------------------
    Posted to multiple topics:
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
    FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg2-data/37A894C6-48EC-48AE-B667-E521F5E5429E%40stw-software.com.

Thread: Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes

Attachments