Utility function security in artificially intelligent agents

Buy Article:

Source: Journal of Experimental & Theoretical Artificial Intelligence, Volume 26, Number 3, 3 July 2014, pp. 373-389(17)

Publisher: Taylor and Francis Ltd

DOI: https://doi.org/10.1080/0952813X.2014.895114

The notion of ‘wireheading’, or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction.

Keywords: counterfeit utility; literalness; reward function; wireheading

Document Type: Research Article

Affiliations: Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, USA

Publication date: 03 July 2014

More about this publication?

Editorial Board
Information for Authors
Subscribe to this Title
Ingenta Connect is not responsible for the content or availability of external websites

Access Key
Free content
Partial Free content
New content
Open access content
Partial Open access content
Subscribed content
Partial Subscribed content
Free trial content

Utility function security in artificially intelligent agents

Buy Article:

Sign-in

Tools

Share Content