Category Archives: Measurement
How Is Your Manager Doing at Performance Management? (Human Resource Management)
Performance management (PM) is the natural reaction to and extension of the hit or miss traditional performance appraisal. Instead of being a one-and-done event, PM is a set of behaviors that managers exhibit daily to identify, motivate, and develop their subordinates’ performance. Due to its effectiveness, putting a finger on the “right” PM system has been a popular and recent trend for many organizations. Unfortunately, this is often difficult for organizations as the literature has struggled to define and measure managerial PM behaviors.
In an effort better understand the PM construct, Kinicki and colleagues recently developed a reliable and valid measure of the managerial behaviors that make up effective PM. After a lengthy development process, their final scale included 27 items and measured the following six behavioral areas:
- Goal setting
- Communication
- Feedback
- Coaching
- Providing consequences
- Establishing/monitoring performance expectations
The scale overall displayed content, construct, and criterion-related validity. It also had the added benefit of accounting for unique criterion variance over existing leadership measures, supporting the idea that leadership and PM do not completely overlap conceptually.
Of course having a reliable and valid measure is important for further research efforts on PM, but perhaps the greatest benefit lies in the measure’s practical implications. Specifically, using this measure dovetails with other recent calls in the literature to hold managers accountable for PM, and to train them on proper execution of PM, instead of revamping the entire PM system when it doesn’t function like it should. Ultimately, having a manager-focused PM system, equipped with a viable means for evaluating their PM success, will ultimately save organizations money, time, and hassle in getting the best out of their employees.
A new rating scale for multisource feedback (IO Psychology)
Topic: Feedback, Job Performance, Measurement
Publication: Personnel Psychology (AUTUMN 2012)
Article: Evidence for the effectiveness of an alternative multisource performance rating methodology
Authors: B. J. Hoffman, C. A. Gorman, C. A. Blair, J. P. Meriac, B. Overstreet, & E. K. Atchley
Reviewed by: Alexandra Rechlin
Do you receive multisource feedback (also called 360 degree feedback) at work? Based on its extreme popularity, my guess is that you do. An important question, therefore, is how to make the ratings more accurate and thus more informative for development. Brian Hoffman and his colleagues recently conducted two studies in which they developed and evaluated the efficacy of a new type of scale, called frame-of-reference scales (FORS), to use in multisource feedback systems.
FORS start with a definition of the dimension being rated, as well as examples of both effective and ineffective behavior for that dimension. The actual items are then presented. FORS differ from normal rating scales in that a definition and behavioral examples are provided. In this way, FORS are similar to frame-of-reference training, except that the information is presented in written format along with the items (as opposed to being provided in a training program). FORS are different from behaviorally anchored rating scales (BARS) in that BARS link specific behaviors with each scale point (whereas FORS provide examples of effective and ineffective behavior).
The authors compared FORS with standard rating scales (which were similar to behavioral observation scales) and found that using FORS increased accuracy and led to greater differentiation among dimensions. When compared to frame-of-reference training, using FORS led to similar levels of accuracy. FORS therefore seem to be an important development in the improvement of multisource feedback ratings; FORS lead to increased accuracy and are not as expensive or time-consuming as BARS and frame-of-reference training.
human resource management, organizational industrial psychology, organizational management
Integrity Tests May Have Lower Performance Validity (IO Psychology)
Topic: Selection, Measurement
Publication: Journal of Applied Psychology
Article: The Criterion-Related Validity of Integrity Tests: An Updated Meta-Analysis
Authors: Van Iddekinge, C.H., Roth, P.L., Raymark, P.H., & Odle-Dusseau, H.N.
Reviewer: Neil Morelli
According to a recent meta-analysis by Van Iddekinge and colleagues, integrity tests may not be as predictive of job performance as once thought. Integrity tests have become popular with organizations and practitioners due to their high correlations with job performance and few differences between groups (based on race, gender, etc.). But, Van Iddekinge et al. were concerned that past meta-analytic results drew too heavily on unpublished studies authored by test publishers. In fact, only 10% of one meta-analysis’ sample was made up of studies published in peer-reviewed journals (pro-tip: we like things that are peer reviewed).
The authors used 104 studies (42 were published and 62 were unpublished) to investigate if including more “neutral” primary studies whose methodology has been more rigorously vetted would change the test’s validity. They reported that the overall job performance validity is .13 to .16 depending on whether it’s corrected for unreliability. In other words, this validity coefficient is much lower than originally reported and indicates that the integrity test is not as predictive of job performance as once thought. Although the test has a higher validity coefficient for counterproductive work behaviors (.26 to .32), this is still lower than originally reported in previous meta-analyses.
So what does this all mean? On the one hand integrity tests are still viable options for
predicting counterproductive work behaviors while maintaining low sub-group differences. On the other hand, integrity tests’ predictive validity is weaker than previously thought and practitioners may not be able to rely on meta-analytic results in lieu of a local validation study. The authors pointed out that one could argue test publisher data is overly optimistic while data from independent researchers is overly pessimistic. Regardless of your position the authors suggest that practitioners should consider the source when reporting integrity test validity and researchers may need to develop more primary studies on the standard integrity test’s true capability to predict future job performance.
Van Iddekinge, C.H., Roth, P.L., Raymark, P.H., & Odle-Dusseau, H.N. (2012). The
criterion-related validity of integrity tests: An updated meta-analysis. Journal of Applied
Psychology, 97, 499-530.
human resource management, organizational industrial psychology, organizational management
Predicting Job Performance with Implicit Words Games?
Topic: Personality, Measurement, Job Performance
Publication: Personnel Psychology (SPRING 2010)
Article: We (sometimes) know not how we feel: Predicting job performance with an implicit measure of trait affectivity
Authors: R.E. Johnson, A.L. Tolentino, O.B., Rodopman, and E. Cho
Reviewed By: Benjamin Granger
In the world of emotions, trait affect refers to the predisposition some people have to generally experience positive or negative emotions.
Trait affect is often broken up into Negative Affect (NA) and Positive Affect (PA). While high levels of NA are associated with negative emotions such as fear and anxiety, high levels of PA are associated with positive emotions such as excitement and joy. It should not come as a surprise that PA tends to relate favorably to work performance whereas the opposite is true for NA.
Recently, Johnson, Tolentino, Rodopman, and Cho (2010) suggested that because trait affect (e.g., PA & NA) operates outside of employees’ conscious awareness, it is more appropriate to measure it at the unconscious or implicit level. This is in stark contrast to the self-report, explicit measurement of trait affect that is typically used when explicitly asking people to rate the extent to which they feel certain emotions across many different situations.
But how in the heck would you measure trait affect implicitly? Johnson et al. used a word completion task that presented word fragments to employees for which they were required to complete to create a meaningful English word. The following are actual examples of word fragments used by Johnson and colleagues:
F E _ _ (NA = FEAR, or neutral = FEEL, FEED) S M _ _ _ (PA = SMILE, or neutral = SMART, SMOKE)
A person’s level of trait NA and PA were determined by the relative amount of NA-related and PA-related word fragments completed by employees, respectively. But, don’t worry if you are a bit skeptical; this is not exactly your everyday personnel survey!
Nevertheless, Johnson and colleagues conducted two independent pilot studies that supported the validity of their word fragment approach. Ultimately, Johnson and colleagues demonstrated that implicit measures of trait affect are important predictors of task performance, organizational citizenship behaviors (OCBs) and counterproductive work behaviors (CWBs), even more so than the conscious/explicit measures that we are more accustomed to. Johnson et al.’s study highlights an interesting way to measure employees’ predispositions to experience positive and negative emotions.
Moreover, while employees can easily misrepresent themselves on explicit personality measures, this is likely not possible for implicit measures.
Johnson, R.E., Tolentino, A.L., Rodopman, O.B., & Cho, E. (2010). We (sometimes) know not how we feel: Predicting job performance with an implicit measure of trait affectivity. Personnel Psychology, 63 (1), 197-219.
Making the Most Out of Multiple-Choice Testing
Topic: Measurement
Publication: International Journal of Selection and Assessment
Article: On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format
Authors: K.D. Kubinger, S. Holocher-Ertl, M. Reif, C. Hohensinn, and M. Frebort
Reviewed By: Benjamin Granger
In addition to being popular among test takers, the multiple- choice format test is nearly ubiquitous in employee selection and assessment contexts and offers many advantages (e.g., easily quantified, easily scored, etc.) to organizations.
The most common multiple-choice item format includes a single correct answer and several (perhaps 3 or 4) wrong answers or “distractors.” But, this format leaves the door open to what we may call the “guessing effect.” The basic idea is that, in theory, a person with absolutely no knowledge of the content area can endorse some items correctly by luck (i.e., guessing correctly). In fact, many standardized multiple-choice tests have instruction books that discuss guessing strategies (e.g., ACT, GRE).
Acknowledging the utility of the format itself, Kubinger and colleagues (2010) explored a multiple-choice format with two correct answers as opposed to the single correct (or best) answer that is most commonly used. In order to correctly answer such an item, test takers must endorse BOTH correct answers and cannot endorse any of the distractors. Needless to say, this manipulation makes multiple-choice items substantially more difficult which is indeed what the authors found. In fact, the difficulty of this format was comparable to that of a free response format test of the same content (i.e., math).
However, compared to the traditional multiple-choice format with a single correct answer and five distractors, the two correct answer format drastically reduced the “guessing effect.”
Kubinger et al.’s study presents an interesting alternative to the multiple-choice response formats that we are accustomed to. Although they are significantly more difficult, items that require recognition of two correct answers among three distractors can dramatically reduce the occurrence of lucky guesses that can potentially impact important employment decisions.
Kubinger, K.D., Holocher-Ertl, S., Reif, M., Hohensinn, C., & Frebort, M. (2010). On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format. International Journal of Selection and Assessment, 18(1), 111-115.
The Muddy Waters of Measuring Executive Coaching
Topic: Coaching, Measurement, Training
Publication: Consulting Psychology Journal (JUN 2009)
Article: Measuring and Maximizing the Business Impact of Executive Coaching
Author: A. Levenson
Reviewed by: Lit Digger
Given the amount of money organizations invest in executive coaching programs, it would be refreshing if someone could come up with a reliable and fool-proof way to measure their effectiveness.
Organizations are complex entities, so developing a measurement tool like this would be a notable challenge. Levenson (2009) explored a dozen coach-coachee pairs to contribute to this ongoing conversation and shed some light on this measurement puzzle. Given the constraints of the study, Levenson cautioned that we should interpret his findings lightly.
To recap, studies already exist measuring coaching’s effect on:
· The executive’s actual changes in behavior
· The degree to which those around the executive perceive increased effectiveness of the executive
· Changes in what Levenson calls “hard” performance measures (e.g., unit productivity, number of tasks completed, ability to meet goals, etc.)
But how can we measure business impact of executive coaching? Levenson suggests that we should “start with the organization’s strategy” (p.110). He recommends that we should determine whether the business impact we care to measure most is strategic or financial. For example, if a company has a strategic aim to increase sales to a certain demographic group, then the outcome should be designed to target that strategy – not a more distal, less-related financial goal.
Levenson also warns that we should consider the complexity of the executive’s job in relationship to the functioning of the organization. Take the above sales example for instance. If the executive’s primary role is to make decisions and cultivate a productive working environment, then he/she may not actually have all that much impact on increasing sales to the target demographic group. It would be difficult to evaluate the business impact of coaching if the executive’s role has little business impact to begin with.
Levenson reminds us that if other needed training programs or selection systems are being implemented around the time that executive coaching takes place, then you will be much more likely to see organizational changes in the direction desired. Systemic changes often will have more business impact than executive coaching alone.
Finally, is executive coaching always the answer to our organizational problems? No! Levenson cautions that the intervention needed will depend on the issue at hand. An executive might be better off gaining critical skills from a stretch assignment if the key issue is professional development. Or if team performance is slacking, perhaps a team building activity would be best.
You’re more likely to see bang for your buck if the interventions you select are targeted appropriately. Now we just need to figure out how to effectively measure the “bang”.
Book Review
Topic: Book Reviews, Strategic HR, Measurement
Book Title: Investing in what matters: linking employees to business outcomes
Authors: Scott Mondore, Ph.D. and Shane Douthitt, Ph.D.
In SHRM’s recently published the book, “Investing in What Matters,” Scott Mondore, Ph.D. and Shane Douthitt, Ph.D. offer a process to understand the links between HR strategy and business outcomes. Below is Dr. Mondore’s overview of the book.
Organizations collect vast amounts of data from operations to people, but rarely do organizations bring this data together to discover how these data relate to each other. In addition, current economic conditions are demanding deep budget cuts—leaving HR departments with few tools to figure out where to cut and where to invest. “Investing in What Matters” provides HR leaders with a straightforward process of six steps, that they can immediately implement, to allow them to create an HR strategy that is business-focused and based on expected ROI. With these steps, HR leaders will learn how to discover key business outcomes, show the link between HR data (training, surveys, competencies etc) to those business outcomes, execute programs that have an expected ROI and create a culture of measurement, analysis and adjustment going forward.
The six steps in the Business Partner RoadMap process are:
1. Determine Critical Outcomes (conduct stakeholder interviews with senior/functional leaders, examine the organization’s scorecards)
2. Create a Cross-Functional Data Team (bring together data owners from across the
organization to set up the analyses)
3. Assess Outcomes Measures (make sure the data being looked at is measured at the
same frequency and level within the organization)
4. Analyze Data (use advanced statistical to show cause-and-effect relationships between
HR data and business outcomes)
5. Build Programs & Execute (create initiatives around the drivers of business outcomes—
based on expected ROI calculations)
6. Measure and Adjust (re-analyze data on a regular basis to discover new drivers of
business outcomes or tweak current programs)
In addition, the book provides ten key principles for HR leaders to adopt during this process as it is not always easy and it needs to stay completely focused on business outcomes:
1. Organizations already spend significant amounts of money on their people….they just
don’t spend it on the right things.
2. Organizations make investments in people without any data or with the wrong data.
3. Employee engagement in itself is not a business outcome.
4. People and organizations are complex. The linkages between attitudes and outcomes
have to be understood within your organization using your data.
5. The people data and outcome data do exist—you just have to go and get it.
6. The organization’s data exist in silos.
7. There will be obstacles and barriers to obtaining the data (e.g. politics, turf battles).
8. Once a connection/linkage is made with the data—accountability is unavoidable (and
that’s a good thing).
9. Don’t assume a link between employee data and business outcomes—define it and
understand why or why not.
10. Perceptions alone do not show up on the profit and loss statement.
Mondore, S.P. & Douthitt, S.S. (2009). Investing in what matters: Linking employees to business outcomes. Society for Human Resource Management, Alexandria, Virginia.
Click to learn more: http://shrm.org/Publications/Books/Pages/InvestinginWhatMatters.aspx
Internet-based Data Collection: Just Do It Already!
Topic: Measurement, Statistics
Publication: Computers in Human Behavior
Article: From paper to pixels: A comparison of paper and computer formats in psychological assessment.
Author: M.J. Naus, L.M. Phillipp, M.Samsi
Featured by: Benjamin Granger
Although many organizations have jumped onto the internet-data collection bandwagon, several issues still need to be addressed. For example, are paper-pencil and internet-based tests of the same trait (e.g., personality questionnaire) or ability (e.g., cognitive ability test) really equivalent? Similarly, are there any reasons to believe that employees respond to internet-based tests differently than they would a paper-pencil test of the same trait or ability?
Naus, Philipp, and Samsi (2008) set out to investigate these questions using three commonly used psychological scales (Beck Depression Inventory, Short Form Health Survey, and the Neo-Five Factor Inventory).
Although Naus et. al found that the paper-pencil and internet-based survey formats performed equivalently for the Beck Depression Inventory and the Short Form Health Survey, there were differences for Neo-Five Factor Inventory (a commonly used personality assessment tool). What’s going on here?
One possibility is that responses were more socially desirable for the paper-pencil format, since a researcher was present at the time. That is, in the presence of an authority figure (i.e., researcher) participants may have responded in order to appear more self-controlled and self-focused. This is likely much less of a concern when completing the same survey on a computer at home (in PJs!).
Overall, respondents perceived the internet-based format to be convenient, user-friendly, comfortable and secure (All great things!). So what can we conclude about these findings? Although internet-based data collection methods have some advantages over paper-pencil methods, there are some caveats to their use. In some cases, the tests may operate differently due to the particular format. Unfortunately, not much is known about how they might differ. However, Naus et al.’s findings suggest internet-based methods receive good reactions from employees and can save an organization time and money!
Is interrater correlation really a proper measurement of reliability?
Topic: Measurement, Research Methodology, Statistics
Publication: Human Performance
Article: Exploring the relationship between interrater correlations and validity of peer ratings
Blogger: Rob Stilson
Interrater reliability (still with me?, Ok good) is often used as the main reliability estimate for the correction of validity coefficients when the criterion is job performance. Issues arise with this practice when one considers that the errors present between raters may not be random, but due to bias, while agreement between raters may also stem from bias instead of actual consistency. In this study, the authors’ main goal was to explore the relationship between interrater correlations and validity and also to explore the relationship between the number of raters and validity.
In order to do this, the authors gathered information from 3072 Israeli policemen from 281 work teams who took part in peer rating. The average size of each of these work teams averaged about 12 people and ranged from 5 all the way to 33. The measure used was overall performance (on a 7-point Likert scale). The predictor employed in this study was the ICC (C,k) model, which is equivalent to Cronbach’s alpha. Measurement indices were computed on the team level as rating only took place within work teams.
The predicted variable for the study was the validity coefficient for each work team. This is the part of the study where you could really feel the sweat involved. Here the authors gathered information on
supervisor evaluations, absenteeism data, and discipline data collected over several years (for over 3000 policemen)! The authors then converted this information into z scores with higher scores indicating better performance.
Results showed a weak positive linear relationship between interrater correlations and the various validity indexes. This is not what you want to hear if you are doing peer rated performance evaluations. The authors’ stipulate that the correlation between raters is a conglomeration of factors
having different theoretical relationships with validity (i.e. bias and other idiosyncrasies).
Practical implications from the information gleaned here include the adjustment of validity due to attenuation. If the measurements used in the calculation included non random error estimates, the ensuing calculations will be off. A positive finding for the work world was validity in small units (less than 10 people) was about the same as those for larger units. The authors’ believe this finding may be due to observation opportunity level, which is seemingly greater in smaller work units.