Large-Scale Bayesian Logistic Regression for Text Categorization

Authors: Genkin, Alexander1; Lewis, David D.2; Madigan, David3

Source: Technometrics, Volume 49, Number 3, August 2007 , pp. 291-304(14)

Publisher: American Statistical Association

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content

Abstract:

Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.

Keywords: INFORMATION RETRIEVAL; LASSO; PENALIZATION; RIDGE REGRESSION; SUPPORT VECTOR CLASSIFIER; VARIABLE SELECTION

Document Type: Research article

DOI: 10.1198/004017007000000245

Affiliations: 1: DIMACS, Rutgers University, Piscataway, NJ 08854 2: David D. Lewis Consulting, Chicago, IL 60614 3: Dept. of Statistics, Rutgers University, Piscataway, NJ 08854

The full text article is available for purchase

$25.00 plus tax

 

OR

Back to top

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content
Page Help Click here for Page Help
Shopping cart
Tools
Sign in






Need to register?
Sign up here
Text size: A | A | A | A