wiki:SIS/WGSummary
Last modified 5 years ago Last modified on 08/02/12 14:55:37

ANSS Centralized Metadata Storage and Management (CMSM) Working Group

Summary Findings and Recommendations

February 2012

Recommendation Summary

It is the majority opinion of the Working Group (WG) that the advantages of Centralized Metadata Storage and Management (CMSM) outweigh the disadvantages, and that SIS in its current form largely provides the resources needed from a CMSM. There is a real demand for metadata management resources that are useful for participants, funding agencies, and management, so it should be pursued. If the ANSS decides to accept our endorsement of using SIS as the foundation of a CMSM for the Regional Seismic Networks (RSNs), we recommend that SIS first be further developed following the SIS-Central Functional Specification as revised by the WG, so that the final revised product would be a maximally useful CMSM for the RSNs. The WG further recommends that should this path be taken, a Technical Implementation Committee (TIC) be formed to help direct the next stage of SIS development (set specifications, priorities, assist in implementation strategies, etc). A more detailed response to our working group's charge, including disadvantages of a CMSM and unresolved issues, follows.

Question 1. Does the Working Group recommend that ANSS adopt a Centralized Metadata Storage and Management (CMSM) system?

Answer: Yes, the working group recommends using a CMSM.

Question 1 Comments and Recommendations

The WG believes a centralized system for metadata management would be very helpful to the ANSS and recommends that a CMSM be used. Currently, the ANSS has metadata requirements, and hence a de facto CMSM exists, but it is unwieldy and inefficient. A single, well-designed, unified system for RSNs to control and produce metadata could greatly reduce duplication of effort and aid in the full and proper production of metadata to satisfy the ANSS requirements. Small RSNs would likely benefit the most, as they often lack the resources that larger networks have to create systems for full metadata management. A single system also simplifies access for those who wish to make use of the metadata, especially from multiple RSNs, by providing a single point of access. This would benefit both non-RSN users, as well the RSNs themselves who would have much improved access to the most up-to-date metadata of neighboring RSNs.

However, the system must work well, be easy to use, and have sufficiently powerful input and retrieval capabilities in order to truly be useful. A CMSM that is difficult to use would not be an improvement. The need is also broader than the minimum level of information such as from dataless SEED.

There are disadvantages of a CMSM, especially to RSNs that currently have operational systems for metadata management. For RSNs with an existing system, change could be expensive and disruptive. For networks operating data centers or real time systems, a remote CMSM, with unavoidable internet delays, may force the continued upkeep of a local system. If existing metadata systems at RSNs are maintained in parallel with a CMSM, there is an issue of duplication of information and the attendant problem of which data is authoritative. Synchronizing such parallel systems will likely be complicated and may require increased effort on the part of RSNs.

Nonetheless, it is the opinion of the working group that the advantages of a CMSM outweigh the disadvantages, and so it should be pursued.

Question 2. Should usage of the CMSM system be mandatory for all ANSS networks?

Answer: Usage should be optional, not mandatory.

Question 2 Comments and Recommendations

While we encourage all ANSS networks to share network information in a unified, collaborative way to achieve the goal of collecting and managing important and relevant information related to things like equipment cataloging, ownership, operating dates, etc., so it can be kept updated and polled by RSN’s on demand, we realize that there could be resource barriers for some networks. What SIS adoption would encourage, however, is a relatively easy-to-use and centralized mechanism that would eliminate these barriers. By agreeing on which metadata or parametric information is needed by ANSS management and RSNs, it is hoped that having a centralized RDBMS (Relational Database Management System) will both encourage and reward participation.

Question 3. Should the CMSM system be available to non-ANSS networks that contribute to ANSS?

Answer: Yes.

Question 3 Comments and Recommendations

The CMSM should be available to non-ANSS networks that contribute data to ANSS. This will facilitate metadata exchange between ANSS RSNs and non-ANSS networks, and help reduce the need for multiple metadata databases. Realizing that we cannot require any network’s participation in the CMSM, we recognize that participation has great potential benefits. It could be helpful if the ANSS depot participated at least in the inventory portion of SIS, as equipment owned by the ANSS depot now has the potential to move from one RSN to another. Some advantage might also be gained if the GSN stations used by various RSNs were also in the CMSM, for example. Others might include USGS temporary networks that are operated under the GS FDSN network code, or if needed, USArray/TA stations when deployed within RSN footprints, etc.

Question 4. Would the Station Information System (SIS) fulfill the need as a CMSM?

Answer: SIS in its current form will not meet all of the needs of ANSS network, but does represent a framework that we believe could be developed to meet current needs and evolve as new technologies and requirements develop.

Question 4 Comments and Recommendations

The WG recommends that further development of SIS should incorporate this WG’s modifications of the SIS Functional Specification (Appendix 1). This revised Functional Specification captures many of the thoughts of the WG, but is not exhaustive. Further development of SIS should be guided by a Technical Implementation Committee (TIC) using the revised Functional Spec as a starting point, with particular attention to the following priorities:

  1. RSN-specific usage should be supported, to include RSN-specific views of data; appropriate control over read/write permissions of a particular RSN’s data.
  2. The initial population of the SIS database needs to be as automated as possible. Users should be able to import and export meta-data in standard formats, including dataless SEED, StationXML, and CSS3.0. Information not covered by these standard formats should be uploaded through spreadsheets.
  3. Error and consistency checking of the meta-data upon entry and afterwards should be done, in particular of the instrument response. Users should have the option to ignore warning messages.
  4. SIS should allow for certain types of equipment to be shared between stations, such as Earthworm or KMI Granite digitizers.
  5. All station information and configurations need to have epochs, i.e. trackable, searchable, and exportable for a given time period, or for a range of time periods.
  6. SIS should provide RSNs with a high level of control of their data, including direct access to the database schema. SIS should provide an interface for direct read access to the database, so that RSNs’ automated systems do not have to complete an export/import cycle for up-to-date information. RSNs should have the ability to maintain “private” information, such as site owner contact information, that is “off-limits” for users from other RSNs.
  7. The interface should be usable by technicians that maintain the stations. Powerful search capabilities of the Station logs should be available. Ideally, SIS would tie into a ticketing system to prioritize station maintenance work.
  8. When station information is being updated in SIS, SIS queries should provide a consistent view of station information. Users should get either an initial or final view of station information, not intermediate states.
  9. Because of the need to interface properly with the strong motion community, a new representation of offsets (N, E, vertical, azimuth) or a more general representation of an orthogonal basis offset would be needed. In addition, there would need to be the ability to either compute the standard location (latitude, longitude, elevation, orientation) on the fly, or a robust procedure to automatically update derived latitude, longitude, elevation, orientation of a channel whenever the reference site’s location is changed. In addition, other metadata such as geotechnical properties (e.g., Vs30) and COMSOS site codes must be incorporated.
  10. Documentation should be excellent and available, to include technical descriptions of software and schema, examples, and tutorials.
  11. The WG strongly recommends that scope creep must be avoided so that a viable product is available in a reasonable amount of time. Additional proposed features should be vetted for consumer demand, reasonableness, implementation effort, and priority.

Question 5. Would SIS meet future metadata needs in areas such as documentation of uncertainties of Class C instrumentation?

Answer: With more development perhaps, however, this WG does not have enough information about Class C instrumentation and would like to forward this question to the Class C WG.

Question 6. Would SIS meet metadata needs in areas such as documentation of equipment information unrelated to instrument response (e.g., vendor type, radio configurations, firmware versions, etc.), site owner information, funding information, and relative coordinates in monitored structures?

Answer: Yes, probably.

Question 6 Comments and Recommendations

This would have to be referred to the Technical Implementation Committee to make sure that the needs of the RSNs are met. It looks like SIS currently handles some of these metadata needs, and not others.

Unresolved Issues

It should be noted that this committee was not charged to nor has it examined or evaluated any metadata system other than SIS. An alternate option is to develop an RFP and solicit budgetary proposals for developing a CMSM.

In addition, there are technical hurdles to be overcome: SIS will have to allow users to make complex transactions (e.g., multi-station digitizers) without the intervention of SIS developers, and without presenting intermediate states to users not involved in the transaction. SIS will need to be migrated to a full multi-RSN platform, which will require a significant level of effort.

A CMSM by its very nature will be reasonably complex, and yet needs to be kept as simple as possible in order to be usable; this balance must be carefully kept lest one take over and destroy the whole concept. A desire has been expressed for the CMSM to have a comprehensive tracking system, with a full suite of inventory tools; but all desired features will have to be balanced against the need to keep the CMSM from becoming overly complex in design and usage. Segregating RSNs within the CMSM to give each RSN complete control over their own metadata will have to be done in such a way that inter-RSN queries are still easy to do. If the CMSM is not properly developed, it could force RSNs to maintain two sets of metadata – one in the CMSM and one locally. The complexity associated with the design, implementation, and operation of a shared metadata facility may severely limit its functionality and therefore impact the viability of a shared ANSS metadata system.