Users Manual for VSS-Alpha-ICLUST

Very Simple Structure-ALPHA-Item Cluster Analysis
VSS-ALPHA-ICLUST is a set of routines meant for scale construction and data manipulation. It is designed to identify the optimal number of factors (VSS) or extract item-clusters that are as independent and reliable as possible (ICLUST). It will find scale scores as unweighted linear sums of items and report basic scale and item statistics (ALPHA). Additional routines for basic data manipulation are included.

Other programs are more appropriate for complex analyses such as factor analysis (SYSTAT) or structural equation modeling (EQS). Excel is recommend for complex data manipulation,

This program is semi "bullet-proof" in that some error conditions will lead to a graceful exit with a brief explanation. However, unexpected error will just cause a system crash rather than gracefully ask for help. Important files should be saved before using this program.

The purpose of this package of routines is to facilitate the construction of personality inventories for the typical problem encountered by most personality researchers: a set of items that are meant to measure a number of independent constructs. Although standard factor analytic techniques are available to solve these problems, factor analysis of items is problematic. VSS and ICLUST are designed for the practical problem of deciding how many factors/clusters to extract from a set of items. ALPHA scores scales and reports the standard item and scale statistics.

For help using these programs, email William Revelle at Northwestern University. The interested user may download a Mac version of VSS-Alpha-ICLUST.

Alpha

Alpha combines a set of items into one or more scales. Keying information (items to select, direction of items) is read from the disk. Scale scores are saved to disk. Coefficient alpha, average within scale inter-item correlations, scale means, and variances are also reported. Scale correlations and item-scale correlations are reported. All summary statistics are reported on the screen and saved to disk for later processing. Alpha will score multiple scales (currently 50 maximum) from items (currently 600 max) written on a single line or multiple lines in a text file.

ICLUST

Item Cluster analysis using the ICLUST algorithm (Revelle, 1979). Item clusters are found by a hierarchical algorithm that combines items until a) coefficient alpha fails to increase or b) coefficient beta fail to increase. (Beta is the estimate of the general factor saturation and is based upon the cluster's worst split half reliability.) Current limits are for 600 variables but with a recompilation up to 1,000 variables may be processed.

VSS

VSS (Very Simple Structure) finds the VSS criterion for the optimal number of interpretable factors for a particular data set. A correlation matrix (found from the raw data) is fitted by factor structure matrices (previously found in SYSTAT/SPSS) of various rank and various complexities. The algorithm has been described by Revelle and Rocklin (1979).

Data entry

Each subject is assumed to have several fields of identification data followed by item data. Use a standard word processor (e.g., WORD or BBEDIT) and enter the subject ID, a space, and then up to 250 numbers (typically, 1 for Yes, 2 for No, 0 for missing, or Likert type items with responses of 1 ... 4). This file should be saved as a text file. If you have more than 250 items then you need to use multiple lines per subject and then check the multiple lines option.

A sample line:

01 1 1211213414231411131332211312111144121111112112111433
01 2 1111411411331111111111112414114144141121124111311441

Two important points: every subject's data should have a carriage return at the end and there should be no blank lines.

It is useful to enter the data in a fixed pitch font (e.g.,MONACO 9). That way, the columns will line up and it is possible to do some checking of the data while you are entering the data.

Note that the program is column sensitive. Blanks are treated as informative (they take up one column). It is possible to include free field input (rather than string oriented input) by checking the free-field option. In other words, if your data are more than a single digit per item you should have spaces (or tabs) between your fields.

Note that the data files to be read should not include extraneous blank lines.

Data can also be entered as free fields. Such a case might be if data were entered in Excel or some other spreadsheet program and then saved as either space or tab delimited text. Use the "free field" option in the options menu.

Save the file as a TEXT file: (SAVE AS)

Data checking can be done using the Compare file option which compares two files line by line for identical input (i.e., if the data have been entered twice).


SCORING scales using ALPHA

To run the scoring routine, select Score_Alpha from the FILE menu. A series of dialog boxes will follow.

In general, names for the following files are requested:

1) Results file name. This file will save a copy of all the output going to the screen. This includes means, standard deviations, and ranges for the items, inter scale correlations, scale reliabilities, means, and variances, as well as item -scale correlations.

2) Input (item) file name. This is the file containing the items to be scored.

3) Scale file name. This is the file that will store the scores for each subject for each scale.

4) Item key file. This is a file containing the keying information for each scale. It is in the form of:

Scale title (any string of up to 255 characters describing what the scale is)
Key information. Number of items in the scale followed by the specific items. i.e. to score a scale with 5 items (1 2 -3 4 and 5), with one item keyed in the opposite direction and all meant to measure impulsivity you would enter:

impulsivity scale (with five items, one scored negatively)
5 1 2 -3 4 5

Up to 30 scales can be scored at the same time. Sample keys are:
Norman's surgency
10 1 2 3 4 5 6 7 8 9 10
Norman's agreeableness
10 11 12 13 14 15 16 17 18 19 20
Norman's Conscientiousness
10 21 22 23 24 25 26 27 28 29 30
Norman's Emotional Stability
10 31 32 33 34 35 36 37 38 39 40
Norman's Culture
10 41 42 43 44 45 46 47 48 49 50

Each subject is assumed to have an Identification field preceding the actual items. This field may be any length and will be copied to the output file. You will be prompted for how many columns are devoted to ID.

To allow for convenient keying of multiple inventories from the same data file, the first column for a set of items is requested as a parameter.

Thus, in a file containing 3 columns of identification, 36 columns of one questionnaire, and 50 columns of another questionnaire, it is possible to ask to score the second questionnaire by specifying that the first item starts at item (column) 40. To score items that are written over several lines, select the multi-line option in the options menu. Note that this is only appropriate if carriage returns were entered within one subjectÕs data.

Missing data are replaced with the mean for the item. This can lead to strange estimates if subjects have a great deal of missing data. A revision that reports the number of missing values/scale/subject is being developed. Options include having data spread across several lines (multi-lines), reverse scoring of the items (reverse items), and free field input. Output from the scoring program is stored on a file in a tab delimited format compatible with Excel or Word for further processing. (i.e. to sort item_scale correlations into rank order, etc.)

To just describe a set of data without bothering to find scale scores, select DESCRIBE_ITEMS in the ALPHA menu. If you then want to find scores, select the Score option in the ALPHA menu. (Score_Alpha does both describe and score).


Data manipulation routines

In addition, the program has several utilities to manipulate the data.

Utility programs allow you to:

Note that the program is column sensitive. Blanks are treated as informative (they take up one column). It is possible to include free field input (rather than string oriented input) by checking the free-field option.

Also note that the if item-scale correlations are corrected for scale unreliability that for poor scales this will sometimes lead to item-scale correlations > 1.0.


Very Simple Structure

The VSS criterion is calculated based upon the inter-item correlation matrix and a set of factor pattern matrices. The input procedures are similar to those for Score_Alpha. (i.e., the basic data files and data structures are the same. )

To find the VSS criterion, it is first necessary to run Systat or some other equivalently powerful stats program to generate a factor pattern matrix. This matrix should be stored in an Editor File (e.g., a Word file saved as text). Several different Systat Factor output files can be combined into one file, in the following format:

      COL1              0.538      -0.196      -0.403       0.354
      COL2             -0.145      -0.050       0.548      -0.333
      COL3             -0.098       0.168       0.014      -0.835
      COL4              0.191       0.216      -0.351      -0.148
      COL5              0.645      -0.127      -0.284       0.292
....
COL35 0.106 0.617 -0.317 -0.182 COL36 0.726 0.186 -0.046 0.019 COL1 0.644 -0.148 -0.407 COL2 -0.299 -0.020 0.571 COL3 -0.534 0.380 0.116 ...
COL34 -0.491 COL35 0.100 COL36 -0.495

(This is taken directly from Systat, using Word as the text editor, each output file was edited down until just the factor patterns were left.)

VSS will first prompt for the Items file. This is the text file containing the items to be correlated and then have the VSS criterion applied. Note, that it is possible to have saved the correlation matrix as a text file from Systat and do the analyses based upon this correlation matrix. In this case, specify matrix input in the options menu.

It will also prompt for the output file (see Score Alpha).

Finally, it will prompt for the factor pattern file.

ALL FILES should be saved in TEXT mode.

The program will read the first set of factor patterns from the file, after prompting for how many factors to read. VSS will continue reading more patterns until told to stop by entering 0 factors. This allows for a comparison of different solutions and/different rotations.


ICLUST:Item Cluster Analysis

Item Cluster Analysis is a hierarchically based clustering algorithm that identifies clusters that are relatively independent and that maximize either coefficient alpha or coefficient beta. Alpha is, of course, the mean of all split half reliabilities and is a lower bound for the common factor variance of a test. Beta is defined as the worst split half reliability and is an estimate of the general factor saturation of a test. Given the logic of hierarchical cluster analysis, ICLUST provides a straightforward estimate of beta. Either raw data (subjects x items) or (square) correlation matrices (check the matrix input option) may be used. The input procedures are similar to those for Score_Alpha. (i.e., the basic data files and data structures are the same. )

The algorithm is the conventional hierarchical algorithm, with the addition of a stopping rule:

  1. find the inter item proximity matrix (defaults to correlation matrix).
  2. find the most similar pair of items.
  3. if the alpha and beta of the combined pair is greater than that of the separate elements, then combine them.
  4. repeat steps 2 and 3 until no more pairs meet the criterion.
This procedure will identify a certain number of clusters. The VSS criterion is then applied to this solution. Alternative solutions may be found by varying the minimum number of items needed to define a cluster or the total number of clusters to be extracted.
The program works under system 7.0, and can come in two versions. (for Mac+/SEs and for 68030 machines such as the SE30 or Mac IIci. Although not PPC native it will run on powermacs. (Note that the version on the server is called VSS-Alpha-Iclust and is meant for Mac IIs and SE 30s and newer machines).

Comments and additions to this set of brief notes are appreciated. Comments and suggestions for improvements to the program are also appreciated.

Source code is available to the interested user.


References

Revelle, W. (1978). ICLUST: A cluster analytic approach for exploratory and confirmatory scale construction. Behavior Research and Instrumentation, 10, 739-742. (Mac version available)

Revelle, W. (1979). Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research, 14, 57-74.

Revelle, W., & Rocklin, T. (1979). Very Simple Structure: an alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14, 403-414.


William Revelle
Department of Psychology
Department of Psychology
Northwestern University