Other programs are more appropriate for complex analyses such as factor analysis (SYSTAT) or structural equation modeling (EQS). Excel is recommend for complex data manipulation,
This program is semi "bullet-proof" in that some error conditions will lead to a graceful exit with a brief explanation. However, unexpected error will just cause a system crash rather than gracefully ask for help. Important files should be saved before using this program.
The purpose of this package of routines is to facilitate the construction of personality inventories for the typical problem encountered by most personality researchers: a set of items that are meant to measure a number of independent constructs. Although standard factor analytic techniques are available to solve these problems, factor analysis of items is problematic. VSS and ICLUST are designed for the practical problem of deciding how many factors/clusters to extract from a set of items. ALPHA scores scales and reports the standard item and scale statistics.
For help using these programs, email William Revelle at Northwestern University. The interested user may download a Mac version of VSS-Alpha-ICLUST.
A sample line:
01 1 1211213414231411131332211312111144121111112112111433
01 2 1111411411331111111111112414114144141121124111311441
Two important points: every subject's data should have a carriage return at the end and there should be no blank lines.
It is useful to enter the data in a fixed pitch font (e.g.,MONACO 9). That way, the columns will line up and it is possible to do some checking of the data while you are entering the data.
Note that the program is column sensitive. Blanks are treated as informative (they take up one column). It is possible to include free field input (rather than string oriented input) by checking the free-field option. In other words, if your data are more than a single digit per item you should have spaces (or tabs) between your fields.
Note that the data files to be read should not include extraneous blank lines.
Data can also be entered as free fields. Such a case might be if data were entered in Excel or some other spreadsheet program and then saved as either space or tab delimited text. Use the "free field" option in the options menu.
Save the file as a TEXT file: (SAVE AS)
Data checking can be done using the Compare file option which compares two files line by line for identical input (i.e., if the data have been entered twice).
To run the scoring routine, select Score_Alpha from the FILE menu. A series of dialog boxes will follow.
In general, names for the following files are requested:
1) Results file name. This file will save a copy of all the output going to the screen. This includes means, standard deviations, and ranges for the items, inter scale correlations, scale reliabilities, means, and variances, as well as item -scale correlations.
2) Input (item) file name. This is the file containing the items to be scored.
3) Scale file name. This is the file that will store the scores for each subject for each scale.
4) Item key file. This is a file containing the keying information for each scale. It is in the form of:
Scale title (any string of up to 255 characters describing what the scale is)
Key information. Number of items in the scale followed by the specific items. i.e. to score a scale with 5 items (1 2 -3 4 and 5), with one item keyed in the opposite direction and all meant to measure impulsivity you would enter:
impulsivity scale (with five items, one scored negatively)
5 1 2 -3 4 5
Up to 30 scales can be scored at the same time. Sample keys are:
Norman's surgency
10 1 2 3 4 5 6 7 8 9 10
Norman's agreeableness
10 11 12 13 14 15 16 17 18 19 20
Norman's Conscientiousness
10 21 22 23 24 25 26 27 28 29 30
Norman's Emotional Stability
10 31 32 33 34 35 36 37 38 39 40
Norman's Culture
10 41 42 43 44 45 46 47 48 49 50
Each subject is assumed to have an Identification field preceding the actual items. This field may be any length and will be copied to the output file. You will be prompted for how many columns are devoted to ID.
To allow for convenient keying of multiple inventories from the same data file, the first column for a set of items is requested as a parameter.
Thus, in a file containing 3 columns of identification, 36 columns of one questionnaire, and 50 columns of another questionnaire, it is possible to ask to score the second questionnaire by specifying that the first item starts at item (column) 40. To score items that are written over several lines, select the multi-line option in the options menu. Note that this is only appropriate if carriage returns were entered within one subjectÕs data.
Missing data are replaced with the mean for the item. This can lead to strange estimates if subjects have a great deal of missing data. A revision that reports the number of missing values/scale/subject is being developed. Options include having data spread across several lines (multi-lines), reverse scoring of the items (reverse items), and free field input. Output from the scoring program is stored on a file in a tab delimited format compatible with Excel or Word for further processing. (i.e. to sort item_scale correlations into rank order, etc.)
To just describe a set of data without bothering to find scale scores, select DESCRIBE_ITEMS in the ALPHA menu. If you then want to find scores, select the Score option in the ALPHA menu. (Score_Alpha does both describe and score).
Utility programs allow you to:
Also note that the if item-scale correlations are corrected for scale unreliability that for poor scales this will sometimes lead to item-scale correlations > 1.0.
The VSS criterion is calculated based upon the inter-item correlation matrix and a set of factor pattern matrices. The input procedures are similar to those for Score_Alpha. (i.e., the basic data files and data structures are the same. )
To find the VSS criterion, it is first necessary to run Systat or some other equivalently powerful stats program to generate a factor pattern matrix. This matrix should be stored in an Editor File (e.g., a Word file saved as text). Several different Systat Factor output files can be combined into one file, in the following format:
COL1 0.538 -0.196 -0.403 0.354 COL2 -0.145 -0.050 0.548 -0.333 COL3 -0.098 0.168 0.014 -0.835 COL4 0.191 0.216 -0.351 -0.148 COL5 0.645 -0.127 -0.284 0.292 ....(This is taken directly from Systat, using Word as the text editor, each output file was edited down until just the factor patterns were left.)
COL35 0.106 0.617 -0.317 -0.182 COL36 0.726 0.186 -0.046 0.019 COL1 0.644 -0.148 -0.407 COL2 -0.299 -0.020 0.571 COL3 -0.534 0.380 0.116 ...
COL34 -0.491 COL35 0.100 COL36 -0.495
VSS will first prompt for the Items file. This is the text file containing the items to be correlated and then have the VSS criterion applied. Note, that it is possible to have saved the correlation matrix as a text file from Systat and do the analyses based upon this correlation matrix. In this case, specify matrix input in the options menu.
It will also prompt for the output file (see Score Alpha).
Finally, it will prompt for the factor pattern file.
ALL FILES should be saved in TEXT mode.
The program will read the first set of factor patterns from the file, after prompting for how many factors to read. VSS will continue reading more patterns until told to stop by entering 0 factors. This allows for a comparison of different solutions and/different rotations.
The algorithm is the conventional hierarchical algorithm, with the addition of a stopping rule:
Comments and additions to this set of brief notes are appreciated. Comments and suggestions for improvements to the program are also appreciated.
Source code is available to the interested user.
Revelle, W. (1978). ICLUST: A cluster analytic approach for exploratory and confirmatory scale construction. Behavior Research and Instrumentation, 10, 739-742. (Mac version available)
Revelle, W. (1979). Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research, 14, 57-74.
Revelle, W., & Rocklin, T. (1979). Very Simple Structure: an alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14, 403-414.