Model Building 101

 

This tutorial introduces some basic principles behind model building. You will learn about how to build models that represent a single process - a “Cell Type” in GemStone. We will start with the simplest pieces and add a little complexity with each step. A follow-up tutorial, Model Building 102, will continue the model development process where this tutorial leaves off.

Strategy

Here is a cartoon of how we are going to approach model building.

 

images\modelbuildingflowchart.gif

 

We start by reading some synthesized listmode data. We create a very simple model for it, and analyze the data with our model. Then, we will add another parameter, enhance the model, and analyze the data. We will repeat the cycle until we have fully developed a model for the data.

We start with a synthesized data file because we want to define what “truth” is for our data. This file was generated by GemStone, and so we expect an almost-perfect analysis result. “Real” data is easier to understand after we have designed this first model with “toy” data. We will be building the skills and understanding to analyze an actual normal bone marrow file. Our model will help us understand the B-cell lineage and prepare us for “real” data files later on.

 

Getting Started

To get started, launch GemStone and pass through the start-up dialogs.

Click the Select FCS Files button on the main toolbar and navigate to the Sample Files folder. Select the file 101_CD19_SSC_TDT_CD20_CD10.fcs.

Click Open. The program will add the file to the File Database panel and read its data.

 

images\model101selectfile.gif

 

Choosing CD19 Parameter

Click the Cell Type shrink tool to expose some of the important editors we will be working with. We’ll learn a lot more about these editors as we proceed through this tutorial.

 

images\model101celltypeshrinktool.gif

 

Click the last button in the Cell Type toolbar to display a list of parameters in this file. Find CD19 in the list and select it. (The name might be CD19_CD19.)

 

images\model101selectcd19.gif

 

The parameter is added to the Cell Type widget and displayed as a band of dots on a plot.

Model Building Tip: Start with Selection Parameters

Start building a model with the simplest parameters first, and work toward parameters that have more complicated transitions. Parameters that are constant throughout the progression and that select the populations of interest should be defined first. We refer to these as selection parameters.

 

Modeling CD19

The CD19 plot is now the “active” plot. Notice that its label is highlighted. We also see that CD19 appears in the Parameter Profile properties panel. In that panel, click the dropdown control next to Parameter Profile in the property list to display a list of choices. Choose Constant from the list.

 

images\model101cd19constant.gif

Model Building Tip: Another Way to Select a Profile

You can also select a parameter profile from the context menu of the parameter plot. To do so, right-click the mouse on the parameter plot, select the Choose Parameter Profile option, and choose a profile from the list.

 

You should now see two Control Points in the CD19 plot. The vertical placement of the Control Points should be in the center of the band of dots. Click and drag either point to identify center of the CD19+ events at around 10^3.

 

images\model101cd19profile.gif

 

Model Building Tip: Control Points

Control Points are the objects that allow you to manipulate a Parameter Profile. They are also referred to as “Control Definition Points” because they define the important transitions in the profile.

At this point, we have set up a very simple GemStone model that will select CD19 positive events. Let’s tell GemStone to analyze the data with this model. Our expectation is that we will “classify” most of the events as matching our CellType1 model definition.

Click the Classify Data button the main toolbar.

 

images\model101classifydata.gif

 

We now see some dots are darker than others, and there are some additional decorations on the plot. The darker dots are those that have been classified by our model as being CD19 positive. Light gray dots are considered “unclassified” – not belonging to a cell type at this point. The graphics show the 95% confidence limits, as well as the mean of the data and the mean of the model.

 

images\model101cd19classified.gif

 

Good so far. We’re ready to add SSC to our model.

 

Modeling the SSC Parameter

Click the parameter selection tool button and choose SSC. A new plot is added to the Cell Type widget, and SSC appears in the Parameter Profile properties panel.

 

images\model101choosessc.gif

 

B-cells have a constant, low side scatter measurement, so we will again select a Constant profile. Right-click on the dots in the SSC plot and select the Choose Parameter Profile, then choose Constant from the list. Position the Control Points in the vertical center of the SSC population at around 10^2. Then click the Classify Data button on the main toolbar.

 

images\model101sscprofile.gif

 

Model Building Tip: Start with Simple, Dramatic Transitions

Once the selection parameters are defined in the model, you are ready to move on to parameters that transition during the progression. Start with parameters that have the simplest and most obvious changes. These will help create a backbone for the model. They will provide clues about how the more complex parameters behave.

 

Modeling the TdT Parameter

Click the parameter selection tool button and choose TDT. A new plot is added to the Cell Type widget, and TDT appears in the Parameter Profile properties panel.

 

images\model101choosetdt.gif

 

TdT expression in B-cells is a little more complicated. It starts with elevated expression and then transitions to a low intensity as a function of B-cell progression. We see two bands of dots for TdT, one at about 10^3 and another heavier band at about 10^1. We need to use a stair-step profile for this.

Choose a Step Down profile from the dropdown control next to Parameter Profile.

 

images\model101tdtstepdown.gif

 

With this profile, the first two points move together in the Y direction, and last two points move together. We need to position the first level on the bright TdT events and the second level on the dims.

Move one of the first two Control Points so that it is located at approximately 10^3 on the Y-axis – the center of the band of bright events. Since there are fewer events in the upper band, we can position this point near the left end of the X axis.

Now move the third Control Point to approximately 10^1 on the Y-axis. It should look approximately as shown here.

 

images\model101tdtprofile.gif

 

Click the Classify Data button on the toolbar to analyze the data. You can also use Ctrl-A on the PC keyboard, or Apple-A on the Mac keyboard to initiate the Classify Data command. The plot should look similar to this:

 

images\model101tdtmodeled.gif

 

GemStone has reordered the events along the progression axis (the X axis), because our model now tells it that the bright TdT events are early in the progression and dim TdT events are later in the progression. This new ordering of events is applied to all of the plots, including CD19 and SSC.

 

Using Estimate X Positions

The initial placement of control points for TdT is probably close, but not perfect. GemStone can help us optimize the X positions with the Estimate X Positions tool in the toolbar on the right of the TdT profile plot.

 

images\model101estimatex.gif

 

Locate the Estimate X Positions button and click it. GemStone will move the definition points on the TDT profile to create a better match for the model. You can click this button more than once and improve the model each time.

 

Adding a 1P Plot

We can create some conventional histograms to help us visualize the data. Click the 1P histogram tool in the Workspace toolbar.

 

images\model101histtool.gif

 

In the empty space to the right of the Parameter Plots, click and draw a box for the histogram. You can also click and hold down the Shift key to create a fixed size histogram. When you release the mouse button, the histogram is displayed.

Click the X-Axis label on the 1P histogram and choose TdT.

 

images\model101add1phist.gif

 

Getting Statistics

Let’s create three “zones” to measure TdT positive, TdT in transition, and TdT negative populations. To do this, we will use the Control Points on the TdT profile plot.

Double-click the second Control Point on the TdT plot. The Edit BeginDown dialog appears. “BeginDown” is the name of this point, and this dialog is used to edit properties of it.

Click the <--- Create Zone button, and notice that events to the left of the BeginDown point are now colored differently. These events fall into the new zone we created for TdT positives. Click OK to close the dialog.

Double-click the third Control Point. The Edit EndDown dialog appears. Click <--- Create Zone to create the intermediate TdT zone.

Finally, click Create Zone ---> to create the TdT negative zone to the right of the definition point. Click OK to close the dialog.

 

images\model101tdtzones.gif

 

The zone coloring is applied to dots in all of the Parameter Profiles. Because of the way Probability State Models work, you will notice that the system accounts for the overlap between adjacent populations. This is especially evident in the 1P histogram of TdT, where we easily see overlap of the 3 zones.

Now that our statistical zones are created, let’s name them. Locate the Zone Statistics Table on the canvas and drag it to a convenient location.

To rename a zone, click the zone label and edit the text. Label the zones “TdT+”, “TdT trans”, and “TdT-”, respectively. Notice that the percentages are approximately 10%, 10%, and 80%.

 

images\model101dragzoneedited.gif

 

Saving our Model

As you build models, it is always a good idea to save them as you add complexity. Keep a history of the model’s revisions by adding a number or letter to the end of the name each time you save a new version. Let’s save our CD19 TdT model now.

Click the Save Document button on the main toolbar. The Save GemStone Document dialog is displayed. Type “Model101” for the file name and click Save.

 

Conclusions

In this tutorial, we introduced many of the principles of model building with GemStone. Our strategy was to create a simple model, and then to analyze synthesized data with the model. We gradually added complexity and enhanced the model.

This approach is a sort of “self-fulfilling prophecy”. We would be very surprised if our model was not able to analyze the generated data properly. However, there are a number of real benefits to this technique. It forces us to describe the characteristics that our real data files will contain. As we add more parameters to our model and synthesize new data, the distributions in our generated data should also become more “realistic”. We can take a very complex analysis and break it into incremental steps that can be easily tested.

In our follow-up tutorial, we will add complexity to build a model that can be used with a real B-cell sample.

 

See also:

Model Building 102

Cell Types

Cell Type widget

Parameter Profile Descriptions