A High-Level Overview of SIF

December 3rd, 2011

Overview

In any educational organizational organization, many computer applications are used to support the process of educating students, in one way or another. The majority of these applications know about who attends the school, who teaches there, and who the teachers are. Some of them know the students’ schedules while others know details such as the grades they received or detailed health-related information such as immunizations or records of injuries. The point is that much of this information is common to many of them.

So, what is the “best” approach to managing the information that is shared by these applications?

  • Some say it is best to have a single, centralized database containing all the information and one large application that does everything. (This is referred to as the “integrated approach”.)  This has its efficiencies, but the application must be:
    • able to do everything and do everything well
    • able to scale well, because all school users will be using it regularly
  • Others say that it is best to use individual applications from those companies with expertise in the application subject areas (and find a way to resolve the data issues). This is referred to as the “best of breed” approach.

Most of the time, what ends up being adopted is either a “best of breed” or a mixture of the two: an organization starts off wanting to get an integrated system, but then realizes that some other systems are needed as well.

Regardless of what’s in place, these organizations will have multiple systems that will, by necessity, contain the same data. The big question is:how does that data get into those systems and who keeps it up to date?

How can this problem be managed?

Before we introduce SIF, we’ll speak to three other alternatives that have been used in the past and are being used to address this issue; one by default (manual data entry), another because it was a simple technology that was available (exports and imports), and a third that is being experimented with in other industries (enterprise service buses). These are only being presented here to give some perspective.

Manual data entry

Person at deskThe most obvious method (and what gets done by default) is to enter data manually into each system using the application’s own user interface. Besides being the most obvious, this is also the most expensive and is the most error prone.

We had a customer district once follow some sample stacks of paperwork around their district to see how much time was spent entering data from those pieces of paper into various systems and how much of that data entry was redundant. With a student population of about 7,000 students, they estimated that they spent about two staff positions worth of time in a given year simply doing the redundant part of the data entry.

Exports and Imports

Import/ExportThis method of synchronizing data works by having the provider application of the data create an “export” file on a regular basis, store it in a pre-determined location for each type of data (student teacher, parent, school, enrollment, etc.). Then, each of the “receiving” applications look for files in that location, open them and read the information. If they see any changes in the files, they refresh their databases with the new information.

A setup like this can be set up by an IT department if all the applications have appropriate interfaces.  This assumes that:

  • there is an “authority” application for each type of data (the SIS would normally be the authority for student data, an HR application might be the authority for teacher data, etc.)
  • those applications have the ability to create “extract” files (typically in CSV (Comma-Separated-Values) format) – the design (and maintenance of that design) of these files is often the responsibility of the IT staff member
  • those applications can generates these extract files on a regular basis
  • the other applications that need the data have matching interfaces that can “import” these extract files on a scheduled basis
  • the applications that import the extract files do reasonable things when they encounter errors in the source data
  • all the clocks on all these servers stay synchronized so that the “imports” always follow the “exports”

Many organizations use this, but it is costly and error-prone. Costly because whether the IT staff is managing the CSV import/export process or the software suppliers are, the end user ends up paying the cost.  There are no standards for the layouts of these files and most application suppliers typically spend a considerable effort maintaining a library of CSV-interfaces.

These interfaces are error-prone because the CSV format doesn’t have the capability to recover from errors like other transmission methods do. Typically, if there is an error in a field, the entire record is rejected – or perhaps even the remainder of the file.

Enterprise Service Bus

Enterprise Service BusThere appear to be many definitions of this term (we will not include the airport car rental company shuttles in this discussion), but most of them center around the offerings of particular companies, such as Microsoft, Oracle or IBM. They allow connections from different operating systems types, they support web services interfaces and use XML and SOAP to transfer data.

Because of the way they are structured, they have a better error handling capacity than the older CSV import and export and because of its design, the transfer of information will be right away, instead of in batches. This means when a student enrolls, his or her information is immediately sent to all connected systems and is available in a few seconds, not in the next day or two.

The problem with ESB is that it is a technology – it defines the transportation mechanism through which data can pass. It is a very good and well thought-out one, but it doesn’t do anything for making life simpler or more cost-effective for a school organization. Those who develop around it will still need to re-do much of the work that all of the companies participating in SIF have done for the last 10 years. This is why these projects tend to be enormously expensive.

Then came SIF…

The biggest difference with SIF and these other ways of solving this problem is that it is an industry standard. It wasn’t imposed by a government or invented by a single company, but was created by industry experts who amazingly agreed on a number of key concepts, including:

  • What things needed to be shared between applications (these became SIF objects)
  • What characteristics those things had (these became the elements and attributes in those objects)
  • How those things would be passed between applications (this became the definition of the Zone Integration Server):
    • Automatically sending data as soon as it changes – the Publish/Subscribe model
    • Allowing applications to ask for data when it needs it – the Request/Response model
  • How those things would be protected:
    • Encryption, HTTPS, Certificates…
    • Object-level protection – determining who has the right to see which objects
    • Element-level protection – determining who has the right to see which data elements in those objects

This didn’t happen overnight; this has been an ongoing process since the late 1990′s and has involved hundreds of companies and thousands of school organizations (people who use and support these applications every day). It has grown to support three regional specifications in the US, UK and Australia and is currently being used in tens of thousands of schools worldwide.

SIF is standards-based; it is fast and it is secure. It can economically meet the needs of a very small installation, yet scale to meet the needs of a large state with millions of students.

How Does SIF Work?

In a SIF environment, applications share data through web connections that send and receive XML language messages in a standardized format. All of the applications communicating connect to a common hub (called a Zone Integration Server (ZIS)) that is responsible for making sure that all messages are delivered properly. So, at the highest level, the collection of systems looks like the picture to the right.

The Zone Integration Server is software that runs either at the school organization or at a hosted location.

For all of these applications to be able to communicate properly, they must “use a common language” as follows:

  • The XML messages must have the same format
  • If there are encoded fields (such as “Telephone Number Type” or “Citizenship Status”) in the data, all parties must agree to use a common set of values for these codes when the data is being transferred (they may use whatever they want in their own applications, but as the data is being transferred it must be in this standardized form).
  • All parties must agree to the same method for exchanging these messages (“the protocol”)

The Zone Integration Server (ZIS)

The ZIS is a piece of software that runs as a web site that routes messages between applications. It does not hold permanent copies of school information; the only time it keeps much of anything is if one of the receiving applications is offline or slow – it will temporarily hold messages for it until the receiving application drains its message queue.(Another exception is with our ZIServer product – we store a complete audit trail of all messages that pass through the ZIS – these only maintained for auditing purposes)ZIS and Agents

A ZIS can either be hosted and shared between many organizations or can be installed locally at a school or school district.

Besides routing messages, it is also responsible for maintaining the infrastructure’s security.

  • Before an application can connect to a SIF infrastructure, the ZIS administrator must “create” a connection for it and authorize the application for certain types of operations. For example, an administrator may create a connection for a library application and authorize it to receive student and teacher demographic information, and student enrollment information, but not financial information.
  • Before one location (perhaps a school) connects to another location (perhaps a hosting provider), typically digital certificates are exchanged, so that each party can be assured that the connection is safe.

Visual Software’s ZIS is ZIServer™. You can find more information about it here.

Applications and Agents

When SIF was first envisioned, it was assumed that most applications would be adapted to work with SIF and that they wouldn’t be built from the ground-up knowing how to directly communicate with a Zone Integration Server.  These adapters are called “SIF Agents“.

When SIF was in its infancy, several companies set out to address the need to build SIF Agents. We at Visual Software went in one direction; everyone else went in the other. AgentThe others began with the assumption that these agents were going to be different enough that each of them should be built mostly from the ground-up (with a little help from an ADK (a programmer’s library).

We at Visual Software, however, had spent the previous eight years refining techniques and building technology allowing “new services to attached to existing technologies without disrupting them.”  So, instead of starting fresh, we used these principles and used this code base to create the first version of our configurable SIF agent ZIAgent, that provides SIF services to existing applications without requiring programming and without requiring modifications to the target application.

In years since, we’ve continued to refine this model, making it more and more efficient, flexible, reliable and scalable. Moreover, we’ve made the agent do “the difficult stuff” by default; the things many leave out; the things that separate the good from the excellent agents. To learn more about our configurable agent, see ZIAgent.

How Software Suppliers See SIF

Ipseity is SIF-CertifiedAlmost universally, software suppliers would prefer to spend as much effort as possible focused on their own application, but instead find themselves getting sidetracked with multiple interfaces to other applications. They need to stay aware of when the other applications change, then change their interfaces to match the changing designs of the other applications. This is an expensive process that takes away from their ability to address issues within their own product.

If it isn’t a waste of their time, then they’re charging a significant price for these interfaces, because they take significant work to maintain.

What SIF allows them to do is to create a single interface that can be used to connect to many other applications. Unlike these Import/Export interfaces (which most of the existing ones are), the SIF interface can handle errors and properly recover from them, can protect private information at levels meet or that exceed government standards and can even simplify the entire process for their customers (if done properly).

How School Organizations See SIF

Schools need to solve a number of data-related problems:

  • Student, teacher and contact information appears in many different applications
  • If there isn’t an automatic update feed between them, the data becomes very inaccurate
  • If the time-to-update is long, there will be certain operational inefficiencies (for example, day-old attendance data is not useful in the cafeteria/canteen)

SIF is a good way to solve those problems because it has good error handling characteristics, it scales well and data is passed between applications as soon as the data event happens.

We have found in many installations that once the system has been set up, it doesn’t need much care beyond normal backups.

Because it is a standard, a school organization can look to an application supplier to provide such an interface without that request being considered unusual, nor seen by the vendor as something useful for only one customer.

Is it Perfect? (time to be honest…)

Well, like anything human-made, SIF has its issues. From our experience installing and supporting many SIF agents over the past ten years, we have found that if all participants use the specification the way it was intended to be used, then things work remarkably well and the users and suppliers are very pleased with the results.

Issues arise, however, whenever one party or another begins to compromise the specification. The certification process catches many of these errors, but it isn’t perfect either.

A good rule for evaluation is if you ever hear anyone saying that a particular agent only works when paired with another agent or when paired with a specific Zone Integration Server, then even if it is SIF certified, don’t even consider it to be a SIF agent. This means that they have compromised the SIF specification and found a way to fool the certification test harness.

Would you like to know more?

Here are some other links that might be of interest:

  • Basic SIF Concepts- this page contains links to several articles about basic SIF-related topics
  • Planning a SIF Implementation- this page contains links to several articles about planning a SIF implementation
  • Contact US- this page contains information about how to contact us for more information or to schedule a demonstration

 

Share

Supporting SIF 1.5 and 2.* in the Same Zone

August 30th, 2011

We are often asked by existing and prospective customers if our ZIServer product supports both SIF 1.5 and 2.0 agents running in the same zone at the same time.  They say that they’ve been told that others do it and “everything works just fine”, but the problem is that it may work, but unfortunately it can’t work reliably. Here’s why.

Element Position Changes

Between the two versions, some of the locations of elements inside SIF objects were changed – they moved.  This is something that could possibly be handled dynamically by the ZIS.  It could read each message, decode the message in the original format and re-construct it in the SIF version of the agent receiving the message. There is nothing preventing a ZIS from reliably doing this part.

Data Format Changes

The format of most dates changed between these two versions of the SIF specification as well.  In SIF 1.5, a date might appear like “20100603″ in “YYYYMMDD” format. The same date in SIF 2.0r1 would appear as “1010-06-03″. As with the reconstruction of a message, the ZIS could also change the format of the dates in the messages.

Header Changes

One of the basic changes would need to be in the header, the part of the message that identifies the version of the message and other control information. As it has changed the rest of the information, it could also change the header.

Code Mappings

This is where the unsolvable problem exists. To illustrate this problem, we will look at one of the simplest of objects: the StudentPersonal and some of the elements contained in it. At a high level, the problems with code values are as follows:

  • Some code values were deprecated
  • Some encoded elements were split into multiple encoded elements
  • Some deprecated values had meanings in newly introduced code sets

Example: Deprecated Code Values

Please put yourself in the position of the ZIS, trying to translate a message from 1.5 to 2.0 and choosing a new 2.0 value for the existing 1.5 value it sees in a message.

The first example is a StudentPersonal record where multiple addresses may be stored. Each address has an address type and the following table shows the acceptable values for the two SIF versions:

US SIF 1.5 US SIF 2.0
AC City and State 123 Mailing address
CC Country 124 Shipping address
CI City 765 Physical location address
CY County or Parish 1073 Other home address
DR District of Residence 1074 Employer’s address
F Current Address 1075 Employment address
H Home Address 2382 Other organization address
L Local Address
M Mailing Address
O Office Address
P Permanent Address
PT 3 Digit Canadian Postal Code
PU 6 Digit Canadian Postal Code
SB Suburban
SD School District
SH School Campus Code
SP State/Province
SS School
TN Township
UR Urban
ZZ Mutually Defined
1 Permanent home address–physical location of home
2 Other home address
3 Mailing address–other address or P.O. Box address
4 Campus address
5 Employer’s address
6 Employment address
7 Organization’s address

Some of these values can be mapped easily, but for others, there is no equivalent mapping. This is but one example from many of such circumstances where there is no equivalent value in the 2.0 spec for the code values in the 1.5 spec.

Example: Split Data Element

In the SIF 1.5 specification, there was a field called “Ethnicity” which could have values defined as follows:

SIF 1.5
7 Not Provided
A Asian or Pacific Islander
B Black
C Caucasian
D Subcontinent Asian American
E Other Race or Ethnicity
F Asian Pacific American
G Native American
H Hispanic
I American Indian or Alaskan Native
J Native Hawaiian
N Black (Non-Hispanic) – A person having origins in any of the black racial groups of Africa who is not of Mexican, Puerto Rican, Cuban, or South or Central American origin or of any other Spanish culture or origin regardless of race
O White (Non-Hispanic) – A person having origins in any of the original peoples of Europe, North Africa, or the Middle East who is not o Mexican, Puerto Rican, Cuban, or South or Central American origin or of any other Spanish culture or origin regardless of race
P Pacific Islander
Z Mutually Defined

Now, in SIF 2.0r1, this single element (for good reasons), was split into two separate elements:

  • Race - this has several coded values
  • HispanicLatino- this is a Yes/No field indicating Hispanic origins

The new codes acceptable for the Race element are:

0998 American Indian or Alaska Native
0999 Asian
1000 Black or African American
1001 Native Hawaiian or Other Pacific Islander
1002 White

So, in order for the ZIS to be able to translate a 1.5 message with an Ethnicity data element, for example equal to “H” (Hispanic), it would already know how to set the HispanicLatino element (sometimes it wouldn’t), but it would have no clue as to how to set the Race element.

Having any sort of automated procedure for this translation would yield incorrect data. The only way to provide correct data would be to have a SIF 2.0 agent that accesses the correct raw data in the SIS database and loads it properly into the correct 2.0 fields. Anything else is alchemy.

This is a start – We will build more on this article in the coming weeks.

Share

SIF Reliability and Privacy

July 28th, 2011

Implementing SIF – Reliability and Privacy

When building a software product, there are at least two approaches (from a high level, at least) to testing it.  A while back, I learned that the way we do this at our company is a bit out of the ordinary, although I never knew that we were that different. Let me illustrate…

The typical approach to building a software product is to design it, write it, and then run it to see if the results match the results you expected when you designed it. In testing it, you use test data that someone else created to the best of his or her ability.

Our approach, on the other hand, has been to design the software, write it, and then run the software in “single step mode” in the development environment and prove that it works correctly each step of the way, using test data specifically designed to find every possible type of error by those writing the code (they know every junction in the code). It takes far longer to do this the first time, but in the long run, we end up spending far less time than the “typical” approach mentioned above.

We feel that the best way to test code is to prove that it works instead of testing to see if it doesn’t work and then going back and patching it up where needed.

prove that it works correctly
vs
fix it when it doesn’t work properly

In the typical approach, the error is detected by looking at the results, noticing they do not meet expectations, then working backwards to find the source of the problem.

An example

This situation is illustrated in the following example where the source of the problem was only found by examining “post-process 6” results.

Figure 1 - Finding the error

When we identify the problem in Process 3, we might be tempted to do whatever was needed to fix it there. While it might fix the issue found in this particular test, we may overlook the subtle problem in Process 2 that could emerge at some other time.

Using the “prove that it works” method, you never make either of these errors in the first place. While still imperfect, this method ensures that when development is complete the code is in sync with the programmer’s understanding of the requirements. This method cannot completely eliminate human factors such as:

  1. The programmer didn’t understand the specification/requirements;
  2. Deficiencies in tools on which we depend;
  3. The specification/requirements were insufficient or inaccurate.

Despite these imperfections, this method consistently reduces code errors and provides a framework for the programmer to verify his or her understanding of the specification/requirements is reflected in his or her code.

What does this have to do with SIF?

You might be reading this wondering how this could possibly relate to a SIF implementation. Take for example, a typical US SIF implementation (we will NOT be recommending something like this):

  1. Install a Zone Integration Server and configure it
  2. Install HTTPS certificates as needed
  3. Install SIF agent for MIS (or SIS as they call them in the US)
  4. Install an agent for a subscriber (let’s say a VLE)
  5. Run the conversion utilities in the VLE and see if all the data makes it there as it should

In this example, you would discover problems with your SIF implementation if some teachers, parents or learners never made it into the VLE.

Figure 2 - Similar errors in a SIF Environment

If there were a problem, you would never quite know where the problem originated without significant work (was there a problem with the MIS, with the MIS SIF agent, with the ZIS, with the VLE SIF agent or with the VLE?). Furthermore, you may have purchased your MIS from one supplier, your ZIS from another and your VLE from still another. You could also have purchased the SIF agents from someone other than the MIS or VLE suppliers.

What we will be suggesting here is something analogous to our software design approach, where the implementation is built incrementally and is proven to work properly at each juncture so that at the end of the process there is a much reduced risk of having something happening like the above. What we are suggesting is to not use your subscribing applications to “debug” your SIF implementation – there is a better way.

Things to Consider Beforehand

Hearing about SIF for the first time is exciting, especially if you’ve endured the headaches of export and import, rejected batches, resubmission, and all that goes with it. Before we get into connecting applications and exchanging data, there are two commonly overlooked areas that should be considered: data quality and data privacy.  Properly addressing these two issues before data starts moving is critical to any successful SIF implementation.

Data Quality

Data Quality is an issue that will likely catch many by surprise. In years past, maintaining “less than clean” data in a school’s MIS/SIS had far less impact than it will when that system’s data becomes the authority for most of the applications in the school, the local authority (district), the RBC (state) and eventually as the basis for reporting to federal authorities.

Before SIF, if some words were typed into a “phone number” field instead of a real phone number (for example, the word “UNLISTED” was typed where the number should go), that might be acceptable because everyone who used the MIS knew what it meant. But when the MIS is used as the supplier of data for everyone who wants learner data, only the number itself will be acceptable in the phone number field.

Why then could SIF be implemented in the US as described earlier? Rejected messages were never an issue in the US, because the US SIF schema (the mechanism used to programmatically check the correctness of data) validates very few fields where the UK schema validates many. In the US, almost any value is passed through by the ZIS with no complaints where in the UK, the ZIS ends up rejecting a much higher percentage of messages. Is one good and one bad? They’re different. The important thing is that the principles that work well in one environment may not work so well in the other.

Types of Errors

Errors can be introduced in the source data; for the sake of the remainder of this document, I’ll classify them as follows:

  • Type 1: The difference between the validation that your MIS system performs on input and that which the ZIS does when it considers a message as “acceptable” for transmission to another application. For example, an MIS may allow a user to type anything for a teacher’s National Insurance Number (like a Social Security Number), but if it isn’t exactly in the form <2 letters followed by 6 numbers followed by 1 letter>, the WorkforcePersonal record will not be published.
  • Type 2: This is where a data element is optional in a MIS, is not typically entered, but is required by a subscribing application for it to work properly. For example, a VLE subscribes to a LearnerSchoolEnrolment object because it needs to know about the level at which a student speaks English. Specifically, what it needs from this object is an optional element named “EnglishProficiency”. If delivered, the VLE works as it should. The gap between what the MIS publishes and what the application needs represent data entry policy that needs to be changed and/or data that needs to be cleaned up at the school.
  • Type 3: Elements that the MIS does not support at all, the SIF specification specifies as optional (hence the ZIS allows messages to pass without them present), but the subscribing application may not work properly without them present. This happens more than you might expect. For example, when an MIS publishes SchoolGroup (similar to SectionInfo) objects, the timetable information is optional. There are many subscribing applications that may not work properly if SchoolGroup objects are supplied without timetable information.
  • Type 4: Data that meets or doesn’t meet local standards – these may be standards defined to reflect locally or regionally defined policies. Examples may include: every student must have a birth date that is reasonable for his or her NCYearGroup  (similar to a GradeLevel); if a student is registered in “English as a Second Language” as a course, then he or she shouldn’t have English specified as his or her primary language. We’ll also throw into this category other random but common data entry errors.
  • Type 5: Data that may or may not meet national census (or state reporting) standards. Schools and local authorities spend enormous time and effort annually preparing data for these reports and in most places, spend too much time fixing the same errors over and over. If an error can be fixed at the source with the same level of effort (or less) than fixing it “the old way”, then the savings will start to multiply since most of the same data is repeated from reporting cycle to cycle.

Figure 3 - Types of errors in a SIF Environment

Data Pollution

With SIF in place, the MIS gains new authority – it officially becomes the new “digital authority” for learner, contact and teacher information at the school. Before SIF it was only responsible for its own data (unless it generated extract files that were sent elsewhere). What this means is that as soon as an error is entered in the MIS it will propagate throughout the school, and sometimes even overwrite correct information in other applications. I like to refer to this as “data pollution”.

Errors vs. Unmet Expectations

Figure 4 - Errors vs. Unmet ExpectationsUnmet expectations shouldn’t really be considered an error, except perhaps that if a project goes to production with them still unmet, it would be an error in pre-planning.  I’ve seen more than my share of implementations where managers would insist that a subscribing agent should somehow be able to “invent” data for students that doesn’t exist anywhere, that isn’t published by the MIS/SIS.

If the only expectation of the entire project depends on a subscribing application being able to do something based on information that your MIS doesn’t even store (and never will), then perhaps the project is more than simply buying a ZIS and a few SIF agents. Something will need to change in the MIS, so that the data can be stored there first.

Data Privacy

Data privacy is one area where there are many legal differences between the US and the UK that must be understood before starting a SIF implementation. This is important because the SIF specification was originally designed by people with a good understanding of US laws and customs and sometimes these designs bump up against UK requirements.

For example, in the US, the predominant authority for privacy of student data is the Family Educational Rights and Privacy Act (FERPA ). FERPA doesn’t really deal with passing of information between applications in the same school or between schools in the same community – it mostly addresses a person’s right to know where his or her data has been and how has the ability to access it. In the UK, the laws govern who has access to it and how it is controlled.

In the UK, schools are required to operate under the requirements of the Data Protection Act (DPA) of 1998, the Education Act of 1996 and others which have specific guidelines that govern the transmission of what the DPA refers to as “Educational Records” (SCHEDULE 11, Section 68 (1)(6)), between departments in schools as well as between applications in the same school. For more information, see Data Protection Good Practice Note .

Given these differences, SIF was designed without “fine-tuned security” when connecting applications horizontally, because in the US, it was never an issue. The MIS publishes the information to all who are listening – it is up to the subscriber to take what it wants, ignore what is not appropriate. It relies on trusting the integrity of the subscribers to not do “bad things” with sensitive information that has been passed to them that they didn’t need.

Resolving the Differences

There are really two issues: the existing SIF objects that were designed for the UK using the US “way of thinking” and “what to do going forward”. I believe both have answers; the first took some development and the second may take some changes in the way we design new objects in the future. I’ll address the second one first…

Going Forward

This is strictly opinion, but I believe that people should consider data sensitivity when designing new objects. Designers should differentiate sensitive data from non-sensitive data by splitting it into separate objects. In a way, this was done with the LearnerSpecialNeeds object – most, if not all of the information in this object is sensitive, so SIF object level permissions should be sufficient in safeguarding access to this information.

For example, if the DPA had been considered in the original design of LearnerPersonal, it might have been split into two objects: one containing basic student information and a second containing information covered by the DPA, both containing the same LearnerPersonalRefId.

Working with Existing Objects

To handle the “mixed objects”, we’ve added a feature called “PrivacyPlus™” to two of our products: ZIServer™ and Envoy™. PrivacyPlus essentially puts a “strainer” in front of the “fire hose” or, more technically, allows an administrator to set up a series of filters that determine exactly which information is allowed to go where.

For ZIServer (the Zone Integration Server), the PrivacyPlus feature can restrict data from being sent to a particular application.ZIServer Configuration Page19.png

When the subscribing agent is set up in the ZIS, the administrator can choose which elements in a message would be filtered out before sending a message to this agent. This is useful for restricting information between applications within a school.

In this example, the administrator selects which elements (or group of elements) will not be sent to a subscribing application. When ZIServer receives a message for distribution from the provider, it will strip out this information before passing the information on to this application. Since there can be several different “versions” of the same message being distributed, each different version will be audited separately.

For Envoy, PrivacyPlus provides blanket-level filtering for implementing policies concerning what information can be transmitted from one type of organization to another. For example, if a policy states that only the basic learner (no medical or sensitive personal) information can be transferred from a school to a local authority, a set of filters (similar to those in ZIServer) can be placed on the entire Local Authority Zone in Envoy. This would cause all LearnerPersonal objects published for subscribing applications at the Local Authority level to be missing medical and personally sensitive information.

Auditing

In the UK, there are typically two roles within each school (to paraphrase their document ):

  • Senior Information Risk Owner (SIRO): This person is the policy maker (typically the head teacher) and is familiar with the information risks and procedures in place to enforce them. This person has appointed Information Asset Owners to enforce these policies.
  • Information Asset Owners (IAO): These are the owners of the different types of data: student personal, medically-related, special education, etc.. They need to be able to “ensure that information handling both complies with legal requirements and is used to the full to support the delivery of education”.

Having these responsibilities is a daunting enough responsibility, but not having the tools needed to accomplish these duties or verify that things are “as they should be” is unsettling at best.

So, to address these, Visual Software has added to its products, tools that allows people with different levels of responsibility, the ability to see audits for the applications over which they have responsibility.

  • Internal audit: this tool allows those responsible for the protection of data assets to make sure that the school’s data is only being distributed as it should be from settings in PrivacyPlus. It also gives them access to audits that have been keeping track of all traffic going through the Zone Integration Server.
  • External audit: Using the Veracity Data Integrity Manager, tests can be run to validate that no sensitive data has “leaked out” from any of the providers of information.

Designing for the UK

After spending a short time in the UK, we soon realized that a product line designed for the US market would only partially meet the needs of the UK market and that the differences between the two were far different than post code formats or the way that contacts were stored. We couldn’t just take a new database schema, update code sets and pretend that our products met the needs of the UK market, because they wouldn’t. Some of the differences between the two environments run as deep as the differences between the two countries’ views on data privacy.

So, in our feeble attempt to buck the American tradition to force our way of doing things on everyone else (you are supposed to be at least smiling here), we took the path of redesigning our products to meet the needs and requirements of the UK market, from the enhanced data privacy requirements to the multi-level regional zoning requirements.

PrivacyPlusThere are three core products in a typical Visual Software SIF implementation: ZIServer (the Zone Integration Server, Envoy (the Multi-Zone manager) and Veracity (the Data Quality manager). All three products are designed to work on a Microsoft platform and can be installed in a small location using a freely downloadable SQL Server Express edition or can be spanned across multiple large servers forming load balanced clusters and using failover clustered databases and all the things you would expect in a large city implementation.

Each of the Visual Software products has components of our PrivacyPlus technology, developed to meet or exceed the requirements of the UK’s data privacy requirements. We’ve added it in more than one place to make administration more reasonable. For example, putting privacy protection only in the ZIServer product would provide the protection required by the DPA, but since there are so many individual endpoints to manage there, it would become very difficult to manage if the ZIS were hosted.

To make the management more reasonable to manage and to provide a way for administrators to provide a fail-safe “blanket-filter”, we also added it to Envoy. This is added protection, allowing for a regional administrator to set default policies for all schools in the region as a starting place, then allowing the ZIS administrator to “take it from there”. Of course, all of these are policy settings and are at the choice of those who set up the software.

Planning a Successful Implementation

This section outlines a recommendation for the implementation of the foundational SIF layer (the part that gets things going). Implementation of these parts successfully will ensure a much higher likelihood of success as other applications are added later on.

  1. Set up a test environment. If funds are tight, we would suggest using virtual servers and test editions of software – the test editions can usually be obtained at a fraction of the full cost and the environment for virtual servers (at least for Microsoft products) can be downloaded free of charge from their web site and can be run from a desktop or laptop computer. If you’re running in a large organization that has hosted, shared facilities, you can ask to have test zones created on the shared servers but you should still set up test copies of your applications.
  2. Install the Zone Integration Server software, Envoy and VeracitySIF Zone Setup
  3. Create a “Raw” zone. Since very little data would pass validation in initial testing, we recommend turning validation off at this point – this will be the zone that connects the MIS and Envoy. Other agents will never connect to this zone because this data will not be privacy protected nor will its data be passed through any validation testing.
  4. Either the MIS or Mimic becomes provider in this zone – this “Raw” zone approach allows connection of a feeding agent by any SIF-enabled application – whether it is a supplier-provided SIF agent or Mimic generating events on behalf of the MIS. Then import MIS data into Envoy and let it analyze the MIS data.
  5. Envoy will separate MIS data into “good data” and “bad data”, depending if the data would be able to pass current SIF validation rules. “Good data” will be marked as “OK to publish” and “bad data” will be presented to school users through a Veracity-like web-based user interface. In this way, Envoy is behaving like a “quality gateway”, only letting through objects that meet basic quality tests.
  6. Have school users clean up data in MIS – as school users clean up data in the MIS, the SIF agent for the MIS will automatically send through SIF “Change” events and the data will move from the “bad data” to the “good data” categories and will no longer show up in the school user’s user interface (no further user action will be required other than simply correcting the mistake).
  7. “Clean” zones for other applications to subscribe to – two or three zones will be created, depending on if Envoy is implemented at the Local Authority or Regional Broadband Consortium level. Once the data for the school is sufficiently clean, the school’s Envoy connection can now register as a provider in its corresponding “Clean” zones.
  8. Before it registers as the provider to other applications, the school SIRO reviews the privacy settings for the school in Envoy to make sure that all DPA-defined sensitive information is properly protected. Note that this is all done before any other applications are connected, even within the same school.
  9. After Envoy is connected as the provider, the School, LA and RBC copies of Veracity will be loaded with initial copies of the school’s data and initial Veracity rules for Type 2-5 errors will be run. For LAs and RBC’s these will include data privacy checks to make sure that no DPA-defined sensitive information has “leaked” into one of the higher level zones.
  10. Connect other applications, convert data into them and check to make sure all data has made it successfully into them.

Compare this to a typical US installation:

  1. Install the ZIS
  2. Install MIS agent and subscribing agents, most likely in a test environment first
  3. Convert the data into subscribing applications and use a lack of information in the subscribing application to detect problems in the quality of the data in the MIS.

Although the second approach sounds much simpler on the surface, it has a few problems:

  • It leaves the users guessing why the subscribing application wasn’t able to create a student or other object. The cause could have originated in one of many SIF objects it subscribes to and it will now be the user’s responsibility to track down which one was the cause of the problem in order to solve the problem. A significant amount of time will be spent tracking down every individual problem.
  • It doesn’t address data privacy issues at all, which usually isn’t as important in the US as it is in the UK. This model allows all data to go everywhere and trusts receiving agents to ignore the information that is not supposed to be delivered to them.
  • Some records that look as if they are good are actually “almost good” and some of the information required for them to work efficiently in the subscribing application is missing. These types of errors are not likely to surface until the subscribing applications are in production and are being used every day.

This approach is like debugging an application only by looking at the output – you never quite know where a problem originated if one exists. You can take guesses by going backwards through the process, but in this case every time you move backwards it might involve a different supplier. Consider the scenario from before:

Figure 2 - Similar errors in a SIF Environment

…where each of these pieces was obtained from a different supplier. If the error was discovered in the VLE, the supplier of the VLE SIF agent could be the source of the problem. Or, the problem could be in the ZIS, or in the MIS SIF agent, or in the MIS. By using the methodology we suggest, we are using methods whereby errors can be caught at each point in the process as soon as they happen, avoiding the “snowball at the bottom of the hill” syndrome.

All in all, we believe that addressing data quality and privacy issues before any data is distributed to other applications automatically will save time and avoid data security problems in the long run, thus becoming the far lower cost solution after everything is taken into consideration.

Share

Australia Tri-Borders Project wins IMS People’s Choice Award

May 23rd, 2011

Over the past year, we at Visual Software have been working with the staff at South Australia, Western Australia and the Northern Territories on an implementation called the Tri-Borders project.

The purpose of this program has beendescribed as follows:

To track the attendance of students who move between schools and state borders in WA, SA and NT, a cross-jurisdiction project used SIF to gather data from each jurisdiction, then matched them centrally to provide a view of attendance across all three jurisdictions.

This project was recently awarded the “People‟s Choice” Award at the Australian IMS Regional Learning Impact Awards Competition. More details about the award can be found at:  http://www.sifassociation.org/au/upload/press/BZ6EAZ_SIF%20Association%20AU%20Tri-Borders%20Project%20wins%20IMS%20Award.pdf.

More details about the project may be found at:  http://www.sifassociation.org/us/Upload/story/8E3BFF_SIF%20Association%20AU%20Pilot%202.1%20Tri-Borders.pdf.

The software provided by Visual Software for this project include the ZIAgent for the SA connection and the ZIServer Zone Integration Server that was used to connect all the agents.

Congratulations, friends in Australia!

Share

The need for Basic Identity Management

May 13th, 2011

When we configure applications that subscribe to school information through SIF agents, we often find organizations that implement independent Student Information Systems at each school or have a single subscribing application set up to receive input from many of these systems simultaneously.

Duplicates

The problem arises with duplicate transactions. For example, a teacher in a US school district may teach at more than one school and his or her record appears in more than one of these systems. Since each of the systems have independent SIF agents, they each generate independent sets of SIF identifiers and, in its native form, the two records look like two different people to the subscribing application.

Basic Identity Management

So, there arises a need for the  “identity management” function to be done somewhere. This can be done either by:

  1. A separate Identity Management system – this is what they do in the UK, for example and in larger districts. It’s main job is to reconcile when the same person’s record shows up in more than one system. It assigns a common, universal ID to the pair of records and, in the SIF world, publishes the Authentication object (or in the UK, the Identity object). The advantage to having this done in a separate system is that the decisions on how to match records are done in a single place (it sometimes gets complicated – think of identical twins with the same first, middle and last names with id numbers mistyped).
  2. Having every application build in all of this functionality.

To be able to properly do the identity management function, the application writer and the end user would to need to agree on how the “matching up” process would work (this is why they have separate, configurable systems for doing this). For example, consider how the rules might be formed if you get records from two schools for the same “teacher record”:

  1. Look for matches in…  last name, first name, teacher number, birth date, gender… then these two records are a match (a perfect world)
  2. Look for matches in…  first name, teacher number, birth date, gender, phone number… then these two records are a match (one of the last names could be misspelled)
  3. Look for matches in…  last name, teacher number, birth date, gender… then these two records are a match (one of the first names could be misspelled)
  4. Look for matches in…  last name, first name, teacher number, phone number, gender… then these two records are a match (one of the birth dates could be missing or mistyped)
  5. Look for matches in…  last name, first name, birth date, gender, phone number… then these two records are a match (the teacher could be wrong in one of the systems)
  6. If none of these combinations, then assume that those are two different records

Having this functionality put in the application is OK, but not really a very good idea. We put in the DMATCH functionality into our ZIAgent configurable SIF agent to allow you to implement these quickly. The problem is that if the functionality is put in the application, it will most likely need to be customized for each installation because one will want one set of criteria while another will want a different set.

This is why an Identity Management system such as Ipseity™ is most useful – it allows these sets of rules to be defined independent of any application, the matching process to be done once, consistently and it reduces the complexity of the rest of the SIF agents in the zone.

Going Beyond the Basics – Record Virtualization…

What was described here is “the basics”.  It makes the SIF agents all more reliable by standardizing some of the functionality and centralizing it.  Now, take this another few steps and add an “infrastructure helper” to the SIF zone that actually merges these multiple records into a single record, so that the receiving applications never even see the duplicates. This is what we refer to as “record virtualization” and support through “managed virtual zones”.  For more information on this, see http://www.sifsupport.com/wiki/index.php?title=Managed_Virtual_Zones.

For more information on Ipseity, see http://www.visualsi.com/Ipseity.htm.

Share