Measurement
1. Operationalization
-
Variable
- operationalized construct
- still somewhat abstract
-
Operationalization
-
Specific, concrete method to measure/manipulate a construct.
-
Operationalization means selection or creation of a specific procedure to measure or manipulate the construct of interest.
-
An operationalization makes it possible to assign people an actual score on the variable of interest.
-
An operationalization doesn’t necessarily capture or represent the construct in its entirety.
-
Keep in mind what aspect of the construct the operationalization actually measures or manipulates.
-
2. Measurement
-
Measurement Structure
- What information is (not) captured with numbers?
-
Definition
- measurement is representation of relations between objects on a property with relations between numbers.测量是指通过运用数字间的关系来表现物体、人或群体间具有的某种属性之间的关系。
- the relation:
- differentiate
- order
- compare differences
- compare ratios
-
Measurement levels
-
Nominal 定类排列(主观分类)
- categorize the values
- distinguish between values(inequality)
- only differentiate values
- eg.: nationality, sex, pet preference
-
Ordinal 定序排列(主观排列组合)
- ordering of values
- **oder difference does not determine the quantitative differences. **
- eg.: math ability
- differences or ratios of scores don’t reflect differences or ratios of math ability
-
Interval Variable 定距变量(主观定量)
- not only distinguish order values, but also to interpret differences between values.
- eg.: temperature
- can not say water at 80F is twice as hot as water at 40F because the zero point of temperature is arbitrarily defined.
- Celsius 0 = fresh water freezes vs. Fahrenheit 0 = brine freezes
- The value zero doesn’t correspond to the absence of temperature.
-
Ratio variable 定比变量(客观定量)
- The zero length is the same whether you measure in inches or in centimeters.
- rare in social science
- the structure of a property doesn’t have to be fully captured by a measurement instrument.
-
3. Variable Types
-
Categorical分类变量
-
unordered(nominal)/ordered(ordinal)
-
number of categories
-
dichotomous/binary: 2
- male/female
- under 20/ 20 above
-
polytomous: more than 2
-
-
distance between values uninterpretable
-
Intepretation: frequency(频数);mode(众数);median(中位数)
-
-
Quantitative定量变量
- reflects inequality,order and extent of differences
- interval/ratio
-
Continuous/Discrete variable连续/离散变量
-
Continuous
- always find value between any other two values
-
Discrete
- limited set of values
- nominal/ordinal
- always discrete by nature
- quantitative
- can also be discrete
- nominal/ordinal
- limited set of values
-
4. Measurement Validity
-
Face Validity 表面效度
- expert assessment
- experts can be wrong
-
Predictive/Criterion validity 预测效度/效标效度
- Instrument predicts relevant property
- Something is measured consistently
- not necessarily intended construct
- the ability to predict something doesn’t mean the scores used for prediction accurately reflect the intended construct.
-
Gold standard
- already had a valid instrument for the property of interest.
- administer both instruments and see whether the scores on the new scale agreed with the already validated scale.
- there aren’t many gold standard instruments for social and psychological constructs.
-
Direct empirical verification 定量测试效度
- For social and psychological constructs, no undisputed, direct way to determine whether one person is more intelligent.
-
Convergent/discriminant validity 收敛/区别效度
-
seeing whether the scores relate to similar and different variables in a way that we expect.观察其得分与相似以及不同变量的关系是否符合预期。
-
多重检验+反证法
-
Multi-trait multi-method matrix approach(MTMM) 多元特质多重方法矩阵法
- different instruments to measure different traits
-
5. Measurement Reliability
-
the instrument’s consistency, stability or precision
-
types:
-
test-retest
- not applicable to the memory test
-
internal consistency
- look at the consistency between different parts of the instrument at one time
-
split-halves reliability
- randomly splitting the tests in half and assess the association between the first and second half.
- there are also statistics that are equivalent to the average of all possible ways to split the test.
- if measurement consists of observation instead of self-report:
-
intra observer reliability: the same observer rate twice and assess the association between the two assessments.
- the memory of the observer can inflate the association
- inter rater reliability: two different people observe and rate the behavior and look at the association between the two rater’s scores.
-
intra observer reliability: the same observer rate twice and assess the association between the two assessments.
-
-
Systematic error
- systematically measure an additional construct
- cat fondness scale + general positive attitude?
- less valid but not less reliable
- systematically measure an additional construct
-
Random error
- error that’s entirely due to chance: random fluctuations / noise
-
Reliability&Validity
- reliability required for validity
- validity not require for reliability
- 一个测量方法必须有一定可靠性,进而才能是有效的 反之则不成立 。
- 一个非常可靠的测量方法也可能是完全无效的,当它完美测量的并不是应该测量的建构时,这一情况就会发生。
6. Survey Questionnaire Test
-
Surveys
- ask for different types of information
- covers different topics
-
Questionnaire
- measures one/related constructs
- psychological traits, emotional states or attitudes
-
Test
- measures ability
-
Components
-
a clear instruction
- cover story
-
Interviewers/ on-line application
-
items 题目
- a series of questions
-
stem 题干
- questions, statements or words that a participant has to respond
- respond options: discrete options or a continuous range
-
scale 尺度
- items measure the same construct or the same aspect of a construct
- subscale分量表
-
sumscore 总分
- indicates a person’s value on the property
-
7. Scales and Response Options
-
Likert scale
-
summative scale
-
items measure the same property
-
monotone单一维度: higher score = higher value on property
-
scale construction
-
items should be
- well formulated = short and simple 问题简短
- unambiguous: avoid double-barrelled questions 避免一题多义
-
not suggestive 中立
- don’t you think…?
-
answerable可回答
- a filter question in advance
-
avoid extreme wording避免极端词汇
- words like never or always
-
response options should be
- unambiguous没有歧义
- consistent前后一致
-
exhaustive详尽
- all respondents should be able to reflect their position on the property 所有受试者都可以找到一个反映他们立场的答案
- mutually exclusive互斥
-
-
other types of rare scales
- differential scale 差异量表
- allow for non-monotone items
- cumulative scale 累积量表
- items themselves show consistent ordering
- each item expressing the property more strongly than the previous one
- differential scale 差异量表
8. Response and Rater Bias
-
There is always some degree of systematic errors or bias.
-
Response sets反应定势/ response styles反应风格
-
Self-report bias
-
acquiescence
- the tendency to agree with all statements regardless of their content
- solution: include some negatively phrased items
-
social desirability
- the responses of people who tend to present themselves more favorably or in more socially acceptable ways.
- occur if a scale measures a property that’s considered socially sensitive or relevant to someone’s self image.
- **solution: adding social desirability items such as I’ve never stolen anything in my life or I’ve never lied to anyone. **
- If people strongly agree with these items, there’s a fair chance that their responses to other questions are biased towards responses that are more socially acceptable.
-
extreme response style
- respondents don’t want to think about exactly how strongly they agree or disagree with an item. They’ll choose the most extreme options.
- unlike acquiescence bias, participants’ responses are consistent, nut just more extreme than their true value.
-
bias towards the middle
- respondents tend to choose a less extreme response option.
-
solution: include some extremely strong items such as cats are purely evil creatures.
- if they respond with a middle category to all items, including these extremely worded items, their response pattern is inconsistent.
-
-
Observer Rating
-
halo effect 光环效应
- positivity/negativity on one dimension spills over to other dimensions
- eg. more attractive people are rated more intelligent or better at their job.
- positivity/negativity on one dimension spills over to other dimensions
-
generosity errors 慷慨评价误差
-
severity errors 严格评价误差
-
-
9. Other Measurement Types
-
Physical measurements生理测量
- medicine/biology/psychology
- skin conductance皮肤电导率–>arousal觉醒状态
- eye tracking–> focus of attention
- EEG脑电图/FMRI核磁共振成像–>brain/cognitive activity
- medicine/biology/psychology
-
Observation 观测法
- sociology/psychology/educational sciences
- careful registration of specific behavior
- employ coding schemes that specify categories of behavior and their criteria
- what the behavior in each category looks like
- how long it should be displayed
- under what circumstances it should occur
- time frame 编码时间
- training of observers
- under they show enough agreement when coding the same material
- sociology/psychology/educational sciences
-
Trace measurement 痕迹测量
- assess behavior indirectly through physical trace evidence
- eg.: counting the number of used tissues after a therapy session to represent how depressed a client is.
- assess behavior indirectly through physical trace evidence
-
Archival data 归档数据
- a property can be represented with measurements that were already collected by others
- eg.: census data
- a property can be represented with measurements that were already collected by others
-
Content analysis 内容分析
- structured coding of elements in a text
- Computer software can code very complicated schemes.
-
Interviewing 访谈
- Structured
- questions/ question order/ response options are predetermined
- similar to survey
- hard to get unbiased answers to sensitive questions
- Unstructured/open
- a qualitative method
- procedure:
- start off with a general topic
- a set of points to be addressed
- but the interview is not limited to these points
- questions are open ended
- disadvantages:
- the conversation can lead anywhere
- differ per respondent –> aggregation more difficult
- other qualitative methods
- case study
- focus groups
- oral histories
- participatory observation
- …
- Structured
10. Interview
- 3 utmost important things:
- more theory about the constructs.
- more up to date norms(常模) studies to have meaningful interpretation of the test scores.
- scores are changing.
- people are complicated.