My recent post on testing for negative 0 in JavaScript created a lot of interest. So today, I’m going to talk about another bit of JavaScript obscurity that was also inspired by a Twitter thread.
I recently noticed this tweet go by:
This was obviously a trick question. Presumably some programmer expected this expression to produce an array like [1, 2, 3]
and it doesn’t. Why not? What does it actually produce? I didn’t cheat by immediately typing it into a browser but I did quickly look up something in my copy of the ECMAScript 5 Specification. From the spec. it appeared clear that the answer would be:
[1, NaN, NaN]
I then typed the expression into a browser and that was exactly what I got. Before I explain why, you may want to stop here and see if you can figure it out.
OK, here is the explanation. parseInt
is the built-in function that attempts to parse a string as a numeric literal and return the resulting number value. So, a function call like:
var n = parseInt("123");
should assign the numeric value 123
to the local variable n
.
You might also know, that if the string can’t be parsed as numeric literal, parseInt
will return as the result the value NaN
. NaN
, which is an abbreviation for “Not a Number”, is a value that generally indicates that some sort of numeric computation error has occurred. So, a statement like:
var x = parseInt("xyz");
assigns NaN
to x
.
map
is a built-in Array method that is in ECMAScript 5 and which has been available in many browsers for a while. map
takes a function object as its argument. It iterates, over each element of an array and calls the argument function once for each element, passing the element value as an argument. It accumulates the results of these function calls into a new array. Consider this example,
[1,2,3].map(function (value) {return value+1})
it will return a new array [2,3,4]
. It is probably most common to see a function expression such as this passed to map
but it is perfectly valid to pass an already existing function object such parseInt
.
So, knowing the basics of parseInt
and map
it is pretty clear that the original expression was intended to take an array of numeric strings and to return a corresponding array containing the numeric value of each string. Why doesn’t it work? To find the answer we will need to look more closely at the definition of both parseInt
and map
.
Looking at the specification of parseInt you should notice that it is defined as accepting two arguments. The first argument is the string to be parsed and the second specifics the radix of the number to be parsed. So, parseInt(“ffff”,16)
will return 65535
while parseInt("ffff"”,8)
will return NaN
because "ffff"
doesn’t parse as an octal number. If the second argument is missing or 0
it defaults to 10
so parseInt("12",10)
, parseInt("12")
, and parseInt("12",0)
all produce the number 12.
Now look carefully at the specification of the map method. It refers to the function that is passed as the first argument to map
as the callbackfn. The specification says, “the callbackfn is called with three arguments: the value of the element, the index of the element, and the object that is being traversed.” Read that carefully. It means that rather than three calls to parseInt
that look like:
parseInt("1") parseInt("2") parseInt("3")
we are actually going to have three calls that look like:
parseInt("1", 0, theArray) parseInt("2", 1, theArray) parseInt("3", 2, theArray)
where theArray
is the original array ["1","2","3"]
.
JavaScript functions generally ignore extra arguments and parseInt
only expects two arguments so we don’t have to worry about the effect of the theArray
argument in these calls. But what about the second argument? In the first call the second argument is 0
which we know defaults the radix to 10 so parseInt("1",0)
will return 1
. The second call passes 1
as the radix argument. The specification is quite clear what happens in that case. If the radix is non-zero and less than 2 the function returns NaN
without even looking at the string.
The third call passes 2
as the radix argument. This means that the string to convert is supposed to be a binary number consisting only of the digit characters "0"
and "1"
. The parseInt
specification (step 11) says it only tries to parse the substring to the left of the first character that is not a valid digit of the requested radix. The first character of the string is "3"
which is not a valid base 2 digit so the substring to parse is the empty string. Step 12 says that if the substring to parse is the empty string, the function returns NaN
. So, the result of the three calls will be 1
, NaN
, and NaN
.
The programmer of the original expression made at least one of two possible mistakes that caused this bug. The first possibility is that they either forgot or never knew that parseInt
accepts an optional second argument. The second possibility is that they forgot or never knew that map
calls its callbackfn with three arguments. Most likely, it was a combination of both mistakes. The most common usage of parseInt
passes only a single argument and most functions passed to map
only use the first argument so it would be easy to forget that additional arguments are possible in both cases.
There is a straight forward way to rewrite the original expression to avoid the problem. Use:
["1","2","3"].map(function(value) {return parseInt(value)})
instead of:
["1","2","3"].map(parseInt)
This makes it clear that the callbackfn only cares about a single argument and it explicitly calls parseInt
with only one argument. However, as you can see it is much more verbose and arguably less elegant.
After I tweeted about this, there was an exchange about how JavaScript might be extended to avoid this problem or to at least make the fix less verbose. Angus Croll (@angusTweets) suggested the problem could be avoided simply by using the Number
constructor as the callbackfn instead of parseInt
. Number
called in this manner will also parse a string argument as a decimal number and it only looks at one argument.
@__DavidFlanagan suggested that perhaps a mapValues
method should be added which only passes a single argument to the callbackfn. However, ECMAScript 5 has seven distinct Array method that operate similarly to map
, so we would really have to add seven such methods.
I suggest the possibility of adding a method that might be defined like:
Function.prototype.only = function(numberOfArgs) { var self = this; //the original function return function() { return self.apply(this,[].slice.call(arguments,0,numberOfArgs)) } };
This is a higher order function that takes a function as an argument and returns a new function that calls the original function but with an explicitly limited number of arguments. Using only
, the original expression could have been written as:
["1","2","3"].map(parseInt.only(1))
which is only slight more verbose and arguably retains a degree of elegance.
This led to a further discussion of curry functions (really partial function application) in JavaScript. Partial function application takes a function that requires a certain number of arguments and produces a new function that takes fewer arguments. My only
method is an example of a function that performs partial function evaluation. So is the Function.prototype.bind
method that was added to ES5. Does JavaScript need such additional methods? For example, a bindRight
method that fixes the rightmost arguments rather than the leftmost. Perhaps, but what does rightmost even mean when a variable number of arguments are allowed? Probably bindStartingAt
that took an argument position would be a better match to JavaScript.
However, all this discussion of extensions really misses the key issue with the original problem. In order to use any of them, you first have to be aware of the optional argument mismatch between map
and parseInt
. If you are aware of the problem there are many way to work around it. If you aren’t aware then none of the proposed solutions help at all. This really seems to be mostly an API design problem and raises some fundamental questions about the appropriate use of optional arguments in JavaSript.
Supporting optional arguments can simplify the design of an API by reducing the total number of API functions and by allowing many users to only have to think about the details of the most common use cases. But as we see above, this simplification can cause problems when the functions are naively combined in unexpected ways. What we are seeing in this example is that there are really two quite different use cases for optional arguments.
One use case looks at optional arguments from the perspective of the caller. The other use case is from the perspective of the callee. In the case of parseInt
, its design assumes that the caller knows that it is calling parseInt
and has chosen actual argument values appropriately. The second argument is optional from the perspective of the caller. If it wants to use the default radix it can ignore that argument. However, the actual specification of parseInt
carefully defines what it (the callee) will do when called with either one or two arguments and with various argument values.
The other use case is more from the perspective of a different kind of function caller. A caller that doesn’t know what function it is actually calling and that always passes a fixed sized set of arguments. The specification of map
clearly defines that it will always pass three arguments to any callbackfn it is provided. Because the caller doesn’t actually know the identify of the callee or what actual information the callee will need, map
passes all available information as arguments. The assumption is that an actual callee will ignore any arguments that it doesn’t need. In this use case the second and third arguments are optional from the perspective of the callee.
Both of these are valid optional argument use cases, but when we combine them we get a software “impedance mismatch”. Callee optional arguments will seldom match with caller optional arguments. Higher order functions such as the bind
or only
methods can be used to fix such mismatches but are only useful if the programmer is aware that the mismatch exists. JavaScript API designers need to keep this mind and every JavaScript programmer needs to take extra care to understand what exactly will be passed to a function used as a “call back”.
Update 1: Correctly credit Angus Croll for map(Number) suggestion.
Comments on this entry are closed.
Great article. A variation on this problem exists in Firefx, with the setTimeout() function. It chooses to force-pass an argument to the function that it calls, and that forced parameter passed is numeric and represents the number of ms that the function call is “late”. If your function accepts its one and only parameter as optional, and you have setTimeout call it, you will get some interesting results.
I first ran across this where my function took a boolean variable (which of course defaults to false), so i would get seemingly random results of it being passed a “true” or “false” value. Of course, the real problem is that I didn’t want anything passed to it, and the API creators of setTimeout passed something anyway. In that case, I eventually fixed it with a strict `=== true` check.
But the caution to how API’s are designed is also important for native JavaScript implementations in browsers… if they add non-standard parameters to the call signature, they will trip up developers.
I do occasionally wish that if a function is defined with a specific number of arguments but called with a different number, that a warning was emitted, just so you know it’s happened…
Yeah, that Firefox issue hit me below the belt a while back..
Great post, Allen. It’s good to not only remember that the Array#map callback accepts 3 arguments, but also that not every native JavaScript function is “map-friendly.” Not that I actually remembered that, mind you.. I made the same incorrect assumption until I read through article and chastised myself for forgetting!
FWIW, I’ve written an article on Partial Application in JavaScript that might be interesting to some of your readers.
A nice approach to explaining a thorny issue
By the way the Number suggestion was actually my tweet 🙂 https://twitter.com/#!/angusTweets/status/35774944293953537
On a related note I’m exploring making high order iterator calls more succinct by leveraging built in or predefined functions wherever possible. A case in point is string trimming:
You could do this (in lost browsers)
var a = [” a “, ” b ” , ” c “];
a.map(function(e) {return e.trim();})
but it leaves me a little unsatisfied. The problem is that map and friends accept array elements as args not ‘this’ objects. Maybe we could pass in a ‘globalized’ version of trim”
a.map($g(“”.trim));
where $g is defined like this:
var $g = function(fn) {
var args = arguments;
return function() {
return fn.apply(arguments[0], [].slice.call(args, 1).concat([].slice.call(arguments, 1)));
}
}
…though its performance is not stellar http://jsperf.com/globalizer
I corrected the attribution.
SpiderMonkey defines a generic String.trim method. You could define it in other environments as well:
String.trim = function(str) { return str.trim(); };
That way, you’d be able to do:
var a = [” a “, ” b ” , ” c “];
a.map(String.trim);
You could even write your generic function in a way that tried to accept other data types (e.g., anything that implements a toString method). SpiderMonkey’s array generics work this way so that they can be used with DOM NodeLists and whatnot. I’m not sure if their String generics do the same thing.
For calling an instance method on each item in a collection, Prototype defines [].invoke:
Array.prototype.invoke = function(method) {
var args = Array.prototype.slice.call(arguments, 1);
return this.map(function(value) {
return value[method].apply(value, args);
});
};
Which lets you do:
a.invoke(‘trim’);
It even allows you to pass extra arguments to the instance method if needed.
Andrew
Yeah sure I could rewrite trim but I was messing around with a more generic way to make any instance method behave like global function – specifically with the ECMA 5 iterators in mind
The globalize function does that for me – though not sure I’d ever use it in real code – just exploring 🙂
Prototype’s Array.prototype.invoke is a nice approach too btw
You suggested using:
[“1″,”2″,”3”].map(function(value) {return parseInt(value)})
That works consistently for those numbers but if the numbers are changed to something that starts with a leading
0
then it won’t be interoperable. If the goal is to convert to number then the Unary+
works great for that.Well there’s that thing called types that really might help in this situation (sarcasm intended ;-)). ML-style:
map: ‘a array -> (‘a -> int -> ‘a array -> ‘b) -> ‘b array
parseInt: string -> int -> int
The error is obvious with types. I like to see this as one more argument *against* the “types hinder programmers” theory that prevented the typed JS proposal (Ecmascript 4) from succeeding :-).
Note that if the second parameter to parseInt is omitted or zero then it tries to parse the number as hexadecimal if it begins with 0x or octal if it begins with 0. Otherwise it parses the number as decimal.
> If the second argument is missing or 0 it defaults to 10 so praseInt(“12”,10),
> parseInt(“12”), and parseInt(“12”,0) all produce the number 12.
This actually isn’t correct (aside of the typo – praseInt => parseInt). People are often unaware of it but the value 0 means “guess it”. If your string is not “12” but “012” then parseInt without a second parameter or with 0 as second parameter will interpret it as a octal number. So instead of 12 you will get 10. And parseInt(“0x12”) will not return zero but 18. Which is why in general omitting the second parameter isn’t a good idea – you pretty much always want to interpret the string as a decimal number.
@Kyle Simpson, Ben Alman: The setTimeout() issue is actually relatively easy to avoid. The SpiderMonkey implementation of setTimeout() has another unusual feature – you can pass more parameters to the setTimeout() call, these parameters will be passed to the function in question before the numerical parameter. So you can force the first parameter to be undefined:
setTimeout(function(a){alert(a)}, 0, undefined);
Please don’t encourage people to do this:
[].slice.call(…)
Creating a new array every time you want to slice some arguments is horribly inefficient.
Do this instead:
Array.prototype.slice.call(…)
“Premature optimization is the root of all evil. ”
Note that it is actually easier for an implementation to optimize [].slice than it is to optimize Array.prototype.slice because the current value of Array is not necessarily the built-in Array constructor. [ ] doesn’t have that issue. The current value of slice is another matter…
But the implementation would still have to construct an Array object in order to access it’s slice method so I doubt that it is better optimised.
Regardless, people copy code samples like this so it’s better not to introduce dodgy patterns.
Actually, the creation of the instance probably could be avoided because the instance that would be created is bothj a “dead” value and is known not to have a own property named “slice”. So code could be generated to access the property starting with the internally known Array prototype object. Something similar is done in the ES5 spec. for property access to primitive values.
Dodgyness is a matter of opinion.
The slice property of an Array object is just as mutable as the the Array property of the global object.
Here is how Array.prototype.slice translates to essential primitive semantic operations:
t1 = _globalObject.[[Get]](‘Array’) //we don’t know current global binding for ‘Array’
t2 = t1.[[Get]](‘prototype]] //we don’t know t1’s ‘prototype’ value (t1 may not be built-in Array)
t2 = t2.[[Get]](‘slice’) //we don’t know t2’s ‘slice’ value
//the above requires three dynamic property lookup
//It also assume that the translator was able to determine that this code was not within any dynamic with scopes
Here is how [].slice could be translated to essential primitive semantic operations, without requiring any global analysis:
t1=__builtin_Array_prototype //essentially a constant
t2 = t1.[[Get]](‘slice’)
// the above requires one dynamic property lookup and can be more simply express as:
t2=__builtin_Array_prototype.[[Get]](‘slice’)
@allen But in fact “Array.prototype.slice.call” if faster then “[].slice.call” on strings (7-200%) and look more correct.
My Benchmark: http://jsfiddle.net/azproduction/RD9Nz/
what I always do is to “trap” in a private scope a slice reference in order to have “the best from both worlds”
var slice = [].slice;
one single object rather than a new Array per method, as Dean said, plus it’s easier and *safer* access as allen says
use slice.call() as much as you want and avoid empty objects as bridge for their native methods …
var bridge = [], slice = bridge.slice, pop = bridge.pop, … ; bridge = null;
@Mikhail Davydov, why not create a jsPerf test case?
I would argue that Array.prototype.slice.call(…) better points out what you intend to do. With [].slice.call(…), you are using the instance to access the prototype. Hence, I would see the latter as the premature optimization.
The examples are inconsistent: one passes a built-in directly to map, the other passes a lambda. parseInt’s optional second argument is not news, so I guess the moral of the story is, wrap built-ins with a lambda before passing to map?
[“1″,”2″,”3”].map(function(value) {return parseInt(value, 10)})
The root issue is about the pitfalls of APIs that use optional arguments. The problems can occur with framework or user written functions as easily as it can occur with built-ins so any rule that only applies to built-in misses the point.
Also note that “lambda” has no formal definition in the context of ECMAScript. When you say “lambda” I don’t know if you mean FunctionExpression or something subtly different. From map’s perspective there is no difference between being passed the value of a built-in function object or being passed the value of a FunctionExpression.
“The root issue is about the pitfalls of APIs that use optional arguments”
Agreed.
Another angle on this is this definition of the “map” function is different from most other functional languages. This one should be called something else, and the one called “map” should have been defined to be like the more familiar one that takes a function of one argument rather than a callback.
Then programmers using, say, “map_callback” or some such would expect the argument to be a more cumbersome and less typical callback, and programmers using “map” would expect the usual one-arg function.
this does it:
[“1”, “2”, “3”].map(Number);
Makes me wish for auto-lambdification: [“1”, “2”, “3”].map(parseInt(_1)) where
foo(x, y, _1, z, _2)
would get magically converted to
function (arg1,arg2) foo(x,y,arg1,z,arg2)
and maybe optimized away the lambda if possible. Then you’d just have to get into the habit of always using it when passing in functions.
It seems that you are right and that modern JavaScript implementations have learned to optimise [].slice. Older implementations are not so well optimised, in IE6 it is five times slower than Array.prototype.slice.
Sorry for my nitpicking, especially when it is a couple of years out of date. 🙂
Very interesting read – I’ve never (knowingly) run into this problem, and most people aren’t likely to, but it definitely pays to be aware of limitations or quirks like this. I’ve run into similar issues writing functionality in PHP, where writing certain functions’ params as optional led to confusion down the line, though only when the code began to get spaghettified 😛
I hope this isn’t taken as a troll, but this is the sort of thing that separates JS/ES from a real programming language. Real programming languages don’t tend to have have these weird, gray-area behaviours. Situations where “you’d think this should happen, but…”.
Watching the Crockford lectures, it’s easy to get swept up into the idea that JS is a great thing. But things like this and my experiences battling against it in my work regularly remind me how half baked JS is.
Sorry, this wasn’t a very productive comment.