Skip to main content

A statistical approach to address the problem of heaping in self-reported income data

Buy Article:

$71.00 + tax (Refund Policy)

Self-reported income information particularly suffers from an intentional coarsening of the data, which is called heaping or rounding. If it does not occur completely at random – which is usually the case – heaping and rounding have detrimental effects on the results of statistical analysis. Conventional statistical methods do not consider this kind of reporting bias, and thus might produce invalid inference. We describe a novel statistical modeling approach that allows us to deal with self-reported heaped income data in an adequate and flexible way. We suggest modeling heaping mechanisms and the true underlying model in combination. To describe the true net income distribution, we use the zero-inflated log-normal distribution. Heaping points are identified from the data by applying a heuristic procedure comparing a hypothetical income distribution and the empirical one. To determine heaping behavior, we employ two distinct models: either we assume piecewise constant heaping probabilities, or heaping probabilities are considered to increase steadily with proximity to a heaping point. We validate our approach by some examples. To illustrate the capacity of the proposed method, we conduct a case study using income data from the German National Educational Panel Study.

Keywords: 62D99; 62F10; 62F25; 62F30; 62P25; German National Educational Panel Study; heaping; self-reported income data; zero-inflated log-normal distribution

Document Type: Research Article

Affiliations: Leibniz Institute for Educational Trajectories (LIfBI), National Educational Panel Study (NEPS), Bamberg, Germany

Publication date: 11 March 2016

  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content