This cool puzzle (and solution) comes from my friend Mike.
Alice secretly picks two different real numbers by an unknown process and puts them in two (abstract) envelopes. Bob chooses one of the two envelopes randomly (with a fair coin toss), and shows you the number in that envelope. You must now guess whether the number in the other, closed envelope is larger or smaller than the one you’ve seen.
Is there a strategy which gives you a better than 50% chance of guessing correctly, no matter what procedure Alice used to pick her numbers?
I initially thought there wasn’t, or that the problem was paradoxically defined, but it turns out that it’s perfectly valid and the answer is “Yes.” See my first comment for an example of one such winning strategy.
This puzzle is similar to, but not the same as, the two envelopes paradox. For more time-devouring reading, see Wikipedia’s List of Paradoxes.
Owww noes Geeks attack!!!
LikeLike
The answer was intuitively ‘yes’ for me.
We already have the first real number. My intuition is that if you generated MANY random reals then the average would tend to 0 (because you’d have something like a uniform distribution… wouldn’t you?). So the ‘expected value’ of the 2nd value is 0.
So if your first number is negative, you bet the second number is bigger. If it’s positive you bet the second is smaller.
Or am I being an idiot?
LikeLike
I like kittens.
LikeLike
In the problem as stated, you’ve got an envelope. Now, an envelope will hold a finite amount of typographic characters, perhaps twenty thousand (you may want to wash this number off before picking it up). Then, decide what characters go into a number. If Alice is restricted to digits, this establishes a range; if not, the analysis gets more complicated, but there still is a finite number of possible numbers, and it’s easily determinable (although very large).
Infinities are sometimes very awkward things to deal with, and I’ve become allergic to the use of them without getting specific.
LikeLike
Seems to me that if the number you do get to see is negative, then you have a statistically better chance if you guess that the other one is higher (“more positive”). And if the number you get to see is positive, then you have a statistically better chance if you guess that the other one is lower.
Also, there is an element of human capacity to this, which might be relevant, or might not, as humans (and the computational devices they’ve invented) have limitations. Alice is just not likely to be giving us an especially large number simply due to time and space and communication constraints. But then that’s probably not too much of a factor, as the same limits are true for small numbers, right?
LikeLike
The envelopes are explicitly stated to be abstract. Abstract envelopes can hold abstract real numbers.
LikeLike
Also, why not just use a step function? Then, if x<0 guess that the second number is larger and vice versa.
In that case you're basically guessing that the numbers are, averaged over all processes Alice might choose, distributed symmetrically. Which is plausible to me, considering the Central Limit Theorem.
LikeLike
@Matthias:
Took myself a while to figure out, but using a step function won’t work. If Alice restricts herself to only using positive or negative numbers, a step function gives only 50% winning chance. If we use a monotonic function, we will have a winning chance > 50% no matter which procedure Alice uses to chose her numbers.
The step function is not monotonic, so it can’t distinguish between 5 and 10, for example: your chance of switching upon receiving 5 is the same as when receiving 10, whereas if you use the logistic function, your chance of switching is higher for 5 than it is for 10.
LikeLike
@Aaron: In probability theory you have to make a careful distinction between sets of zero probability and empty sets. They are not the same in general. Your ‘proof’ (which does not require a uniform distribution btw.) mixes up these two concepts.
LikeLike
Matthias, the Central Limit Theorem only says that when you have n independent identically distributed variables X, the distribution of their mean will approach a normal distribution around the mean of X. It doesn’t say that the mean will go to 0.
LikeLike
Alon: I know, I know. Still; in the complete set of processes Alice may choose from there must for each process yielding a number y another process yielding -y. But this should be the case around any chosen point z (such there is a prozess yielding z-y for any process yielding z-y) implying a distribution symmetric about every real number … I’m not sure what to make of this.
Anyway, I still don’t see the rationale of specifically using the logistic function.
LikeLike
You can make the answer look a little more elegant by scaling the problem to the interval [0,1]. If Alice’s numbers must both be chosen from [0,1] then guessing “lower” with a probability equal to the number you see (and “higher” otherwise) is a good strategy.
LikeLike
>An important thing to keep in mind here is that there’s no such thing as a >random sample of all the numbers.
But then how can I pick a real between 0 and 1 at random?
I guess I can choose randomly only amongst a finite set of numbers between 0 and 1.
Then Alice can choose two numbers big enough (small enough) so that p(x) will be always bigger (smaller) than my randomly picked number.
LikeLike
@Matthias: You do not need the logistic function specifically. You can instead choose any strictly monotone growing function p: R -> [0,1], e.g. 1/2 + arctan(x+c)/Pi with an arbitrary constant c.
Strange is that a slight modification of the game makes the strategy useless: Assume that inside the envelopes are random number generators that generate a number in the moment you open the envelope instead of fixed random numbers.
LikeLike
@Warll: If there *were* a single conventional definition of “larger”, I wouldn’t have asked the question. My whole point is it was ambiguous. I can infer from your answer what you think it should be (closer to infinity), even though you didn’t explicitly say so.
@Alon: Nitpick in the formulation: Note that the CLT does not hold for all RV types, Cauchy/Lorentzian, for instance.
LikeLike
I guess what inherently bothers me about the solution, and led to my question, is that the logistic function, as formulated, inherently treats “0” as special. This is the same issue that leads one to specify (unrelated, but analogous) that one should use the principal value of an indefinite integral.
If the spec is not a larger absolute value, but, rather, just closer to +inf, then there’s nothing special about 0 in the problem statement, but there is in the solution, and that bugs me. Now, the whole human factor that others have brought up is an alternative way to resolve this: the desire, all other things being equal, not to pick numbers with an arbitrarily large number of digits. Then 0 becomes important in the problem statement and my symmetry argument becomes void. This is also true with computers, for the reason xkcd mentioned of not being able to pick numbers with a uniform distribution over all reals…even with variable-precision arithmetic, I might add. It would be interesting to run the simulation with a PRNG that chooses over a range *asymmetric* wrt zero, and see if this is still a winning strategy.
LikeLike
@xkcd: OT: WordPress seems to be turning all quotes into “smart quotes” (read: incompatible quotes). It seems to have happened even to a portion of your post (though not the fixed-width portion at the top). I doubt everyone here is using an MS browser. I’ve read this is a common issue with WordPress and that there are fixes for this (not necessarily official).
LikeLike
Funny, my answer was: Yes, ask Alice. and I guess you should tack on use a polygraph.
LikeLike
Math is hard, let’s go shopping!
LikeLike
@Michael
Not sure what you mean by 0 being a special value in the solution.
The only special values in the solution are positive and negative infinity.
A more abstract way of stating this solution would be to say
If you got infinity then the other number must be less
If you got infinity then the other number must be greater
Rank all of the numbers that exist from -infinity to infinity and the probability that the other number is less than the one that you drew is related to how high the rank is.
In mathematical terms,
We need a function which ranks all of the numbers in existence by size and is 1 at infinity and 0 and negative infinity.
IE: Monotonic increasing function which approaches 1 at infinity and -1 at negative infinity
So you could just as well use a function like 1/(1+e^(-x+1))
now when p(0) = 1/(1+e^(-1))
but you’ll notice the same relationship happens because this function also increases monotonically
N = 0.5 * (1-p(A)) + 0.5 * p(B)
as the solution states
“Since p(A) is always smaller than p(B), N is always greater than 50%. If Alice picked 1.0 and pi, your chances of guessing correctly are 61.37%. If Alice picked 10 and 11, your chances of guessing correctly are 50.00053%. No matter what two numbers Alice picked, the chance of guessing right is always better than 50%. Sometimes it’s only better by a tiny, tiny amount, but it’s always better.”
Hope this makes it more clear
LikeLike
So, if you pick any non-decreasing function p:reals -> [0,1], a similar thing happens. For example, if you pick the step function with a step at zero, you get “switch envelopes if the number you see is negative”. Of course, zero isn’t special, any step function with a step at x gives the strategy “switch if the number you see is less than x”.
Now, any strategy like this will guarantee you get it right at least 50% of the time, no matter how Alice picks numbers. If Alice is picking numbers independently with some always positive distribution, then you will be right more than 50% of the time. The issue is that Alice can exploit the portions where p is constant in order to make you worse off. For example, if she only picks positive numbers (by whatever method), then you will be right exactly 50% of the time, since your strategy (using the step at zero) becomes “always keep what you’re given”. For any non-decreasing function p which is constant somewhere, Alice can foil your strategy (make you win only 50% of the time) by picking numbers exclusively in the constant part.
But that’s kind of her only option. As long as p is increasing (strictly), you’ll always win strictly more than 50% of the time. There’s nothing she can do: as long as you keep larger numbers strictly more often than smaller numbers, you win. (Of course this relies on the fact that Alice doesn’t control Bob.)
Unfortunately, by only picking numbers in a region where p’ is bounded by delta (which exists for any function p), she can guarantee you win less than (50 + epsilon)% of the time. So in some sense, this strategy is only infinitesimally better than “always keep what you got”.
LikeLike
@Michael: The function p(x) as provided is not a unique solution. There are many other similar functions that do not treat 0 as special and that still work. For example, a logistic function centered around 300000 would work. The only requirements are that the function
(a) has a domain from negative infinity to positive infinity
(b) is monotonically increasing throughout the entire domain
(c) has a finite range from which you can choose a number (i.e. 0 to 1).
LikeLike
This blew my mind. On one hand it really seems like it should be a 50-50 chance. On the other hand, it makes sense to switch less when you have a larger number.
LikeLike
A simple solution to the two-envelopes problem: feel for coins.
LikeLike
Mmmm Math puzzles
LikeLike
No. They did not use PGP encryption so their game is vulnerable to man-in-the-middle attacks.
LikeLike
There is nothing special about zero. The strategy Randall gives is just an example of one among uncountably many possible such strategies.
The general solution can be summarized as, “Pick any real number at random using any suitable* distribution, and make your guess as if your random number is in the other envelope.”
Or, to make a parallel with Randall’s strategy, let f: R->R be any real-valued function that is positive everywhere, with integral over the whole real line equal to 1. That is, any normalized density function on the reals. Now choose a random number between 0 and 1, call it M. Find the real number N such that the integral of f(x) from -∞ to N equals M. Assume the number in the unopened envelope is N and act accordingly.
* A suitable distribution over the reals for these purposes is any in which the probability density equals zero on at most a set of measure zero.
LikeLike
It was a very nice problem, thanks.
LikeLike
I’m so glad you posted this. Someone told this to me a while ago. I like logic puzzles with clever answers but I’m no mathematician, so I had an inkling the answer was something like that but I wanted to look up the exact solution with the function I wasn’t going to figure out myself. When Googling all I could find were pages about the other two envelope paradox you mentioned.
LikeLike
Matthias: You can’t use a step function with the step at a deterministic point. If you chose a function with a step at 3, then you wouldn’t beat 50% if Alice’s strategy was “I like 4 and 5, so let’s write those numbers down”.
If you used a step function with a step at a random point: I think that’s equivalent to Randall’s solution, but I can’t fit the proof in the margin.
Also: nothing special about the logistic (the solution I came up with used a Gaussian), any probability density strictly positive on all R should work, even if asymmetric. I don’t see why there’s anything special about the zero point.
LikeLike
bradluen wrote:
>> Also: nothing special about the logistic (the solution I came
>> up with used a Gaussian), any probability density strictly
>> positive on all R should work, even if asymmetric.
Did you mean cumulative distribution function, rather than probability density?
>> I don’t
>> see why there’s anything special about the zero point.
Your meaning is not quite clear here. Do you mean to ask why the logistic function given in the first post makes 0 special, or do you mean to simply repeat what I said earlier about the problem statement *not* making 0 special (in most cases)? If the former, it’s simply that there’s odd symmetry about that point (up to a vertical offset), and that’s the 50-50 point for guessing “higher” or “lower”. That’s pretty special.
LikeLike
It seems that your solution doesn’t take into account the possibility that Alice will chose the numbers based on the probability described by some function f(x). f(x) can be chosen in a way that will give your strategy exactly 50% chance of winning.
LikeLike
Putting the logistic function in there was a total red herring, as bradluen said. The start of the solution is just “Pick a real number according to a probability distribution such that every interval of positive length has positive probability (i.e., such that the only closed set with probability 1 is the entire real line)”. Any distribution with that property will work fine.
LikeLike
Er … Ask Alice if the number you picked is the higher one or the lower one, then based on your familiarity with Alice, decide if she’s likely to be lying to you.
It’ll give you better than 50%, although depending on how good a judge of human nature you are, the actual value will vary.
LikeLike
> WordPress seems to be turning all quotes into “smart quotes” (read: incompatible quotes).
No, they’re not incompatible quotes, it is using U+8220 and U+8221.
Incompatible quotes is using Windows-1252’s codepoints 0x93 and 0x94 (or worse, using those and then claiming the document is ISO-8859-1 instead of Windows-1252).
Any computer which has a font which has glyphs for those two Unicode codepoints (most of them) will show the quotes on this page correctly. And even many that don’t will still show it correctly (by falling back to normal quotes).
Back on-topic:
> If you used a step function with a step at a random point: I think that’s equivalent to Randall’s solution, but I can’t fit the proof in the margin.
Provided that:
(a) Either you pick the random point after Alice picks her numbers, or you ensure she doesn’t know the point.
(b) You have a non-zero probability of picking a point in any interval of real numbers (ie, there’s no interval of numbers you definitely won’t pick).
In that case, yes, it is equivalent… with the p(x) described by Randall being the CDF for your distribution for choosing the threshold. Point b ensures that the CDF is strictly increasing, rather than merely non-decreasing.
LikeLike
forgive me if i am wrong. i have lost academic touch with maths for years now. But isn’t simplest way to choose a strategy is to call it lower if you see a +ve number or call it higher if you see a -ve number(if u see zero you can toss a coin and call)? explanation being if you see a +ve number the set of lower numbers(lower than the one seen) is greater than the set of higher numbers(higher than the one seen) so higher probability of lower number turning up(as for us the numbers are random).
the function you quote is not special any symmetric monotonically increasing function which is zero at -ve infinity and 1 at +ve infinity should be used to call it lower.
then maybe i have not understood either the problem or solution.. or possibly both 🙂
LikeLike
0 isn’t special in this solution, even though the logistic function treats it as such. This just means that this method will work better if she happens to chose numbers closer to zero. But you could use a “shifted” logistic function which treats -93402384793285329.123 as special and it would still work as a solution.
In fact, as I was saying before any monotonic function with bounded range and unbounded domain should work for this (possibly with minor alterations, such as you could use a monotonically decreasing function, but then if your randomly chosen number is great than f(x) you would guess lower instead of higher).
Basically the solution is “use any rule by which the percentage of the time which you’ll say higher when presented with a given number x increases as x decreases.” It doesn’t have to be the logistic function. You could use arctan and pick a number between -1 and 1.
Also as to aaron meurer’s question as to why somebody couldn’t randomly select an infinite sequence of digits and then randomly select a place to put the decimal:
I could be wrong, but I think the issue is it’s not really possible to pick randomly from an unbounded series. So when randomly picking a number between 0 and 1 you can sequentially randomly pick between the digits 1-9. But picking out of the infinite possible places for the decimal is an impossible task, so turning this into a random number over the entire range of real numbers becomes impossible.
Correct me if I’m wrong but I suspect it’s also actually impossible to randomly select a number with infinite digits between 0 and 1, without taking an infinite amount of time. If that’s true then the method provided has it’s limits. The possible numbers you generate would only be to a certain precision and thus the chances of randomly picking a number which is larger than some p( some really big number) but smaller than p(some really big number + some really small number) would drop to 0. And you wouldn’t have any better than 50% chance at making the right guess in that case.
I might be wrong though.
LikeLike
Hi,
What i don’t understand is why the logistic function has to be:
p(x)=1/(1+e^-x)
Why not:
p(x)=1/(1+e^(-x/5)) or any other variant?
Is there a specific reason to consider that there is a high probability that one of the two numbers alice will pick will be between -3 and 3? (which by using the original logistic function we are considering to be 90% probability that she will pick a number from.)
Choosing one or another function will change:
http://www.wolframalpha.com/input/?i=+plot+1/(1%2Be(-x))+and+plot+1/(1%2Be(-x/5))
In the end my interpretation is:
If the number is greater than 0, say lower, if its less than 0, say higher, if it’s 0, take a will guess cause you’ll be 50-50….
LikeLike
Here is a simple way::
If the number you saw is +ve, chose a number less than it. If you saw is -ve, chose any number greater than this. If it is zero, u can choose any thing (you will have a 50% chance of getting the thing right in this case).
Explanation::
Lets look at the real-axis. It is divided into two equal halves (0 ~ oo) and (0 ~ -oo). If the number you saw is +ve (say, +x), the length of the line-segment left of the x is longer than the line-segment right of it. So, u’ve better chance if u choose something left-side of x. Similarly, for the -ve case too.
Well, u can say that length of both the segments are infinite, but it’s not necessary that both infinities have to be equal. This is same as if u have to choose a point on a 2D-plane and then estimate the probability if the point belongs to the 1st quadrant.
This also gets justified if you take a look at the frequency-distribution function of a hypothetical experiment, in which u choose a random real-number. The distribution curve would be an even Gaussian function centred around zero.
LikeLike
@michael: no he did mean pdf not cdf.
@bradluen: yes any pdf that is strictly positive will work. The proof only requires that p (which is the cdf) be strictly increasing, strictly positive and strictly less than 1. It doesn’t have to be continuous. For example
p(x) = 1/(4-x) for x = 1
would work, even though there’s a step from (1/3 to 1/2 at 1).
But yes a step function from 0 to 1 would *not* work, you can’t have a step from 0 to 1, you can’t have it be flat anywhere, it has to go asymptotically to 0 and 1 at -inf and +inf respectively.
LikeLike
Oops it looks like WordPress interpreted my greater than and less than signs as HTML and one line got lost in the above. I meant
p(x) = 1/(4-x) for x less than 1
p(x) = 1-1/(2x) for x greater than or equal to 1
LikeLike
@xkcd:
You’re making a trick here by introducing an unnecessary assumption.
If you take away the assumption “… if you call the smaller one A and the larger one B …” and treat A and B equally, what happens will be:
P(A>B) = 0.5
P(A<B) = 0.5
And your overall chances of guessing correctly — given that there is a 50% chance you’re seeing A and a 50% chance you’re seeing B — are:
N = 0.5 * N|(AB)
where N|(A<B) means N given condition (A<B) and,
N|(AB) = 0.5 * (1-p(B)) + 0.5 * p(A)
Then we can see N = 1 and p(x) doesn’t work at all.
The tricky part in your original proof is to replace “N|(A<B)" with "N" where extra knowledge(information) is introduced into this system which breaks the 50:50 balance, just like the Monty Hall paradox.
LikeLike
@xkcd:
You’re making a trick here by introducing an unnecessary assumption.
If you take away the assumption “… if you call the smaller one A and the larger one B …” and treat A and B equally, what happens will be:
P(A>B) = 0.5
P(A<B) = 0.5
And your overall chances of guessing correctly — given that there is a 50% chance you’re seeing A and a 50% chance you’re seeing B — are:
N = 0.5 * N|(A less than B) + 0.5 * N|(A greater than B)
where N|(A less than B) means N given condition (A less than B) and,
N|(A less than B) = 0.5 * (1-p(A)) + 0.5 * p(B)
N|(A greater than B) = 0.5 * (1-p(B)) + 0.5 * p(A)
Then we can see N = 1 and p(x) doesn't work at all.
The tricky part in your original proof is to replace "N|(A less than B)" with "N" where extra knowledge(information) is introduced into this system which breaks the 50:50 balance, just like the Monty Hall paradox.
LikeLike
Any real number can be mapped to the interval (-1,1) in a way that respects order (i.e. if a>b then map(a)>map(b)).
See arctangents for an example of a mapping with this property.
So the problem is equivalent to this: Alice chooses 2 numbers between -1 and 1, etc…
It’s clear at this point that if the first number is X, then the second number Y could either be:
between -1 and X with probability (X+1)/2
or between X and 1 with probability (1-X)/2
Let’s be clear:
in plain English this means (both in this version and the original):
If X>0 then X is more likely to be the larger
If X<0 then Y is more likely to be larger.
No surprises there…
LikeLike
This sounds like a complicated way of saying If the number you see is positive, then there are more infinite real numbers less than this number than there are greater than it and so you should guess the other number is lower, and vice-versa. It’s just that your equation actually gives the chance that you’re correct, rather than just telling you which way to guess.
LikeLike
Mmm, my first comment didn’t go through, either it was too stupid or I needed cookies…
Anyway, I don’t understand how in the proposed solution I could pick a real between 0 and 1 at random, if
“there’s no such thing as a random sample of all the numbers. There’s no way to choose one integer at random, where every integer has an equal chance of being chosen. ”
I could pick at random amongst a finite subset of [0,1], but then Alice would just have to write two numbers big enough that whichever envelope I choose, x will always be bigger than my randomly picked one.
LikeLike
Michael,
I believe the point is that the exact details of the chosen p(x) function are unimportant, as long as it is strictly increasing from 0 to 1.
So, for example,
p(x)=1/(1+e^(100,000,000-x)
should work just as well (and is certainly *not* symmetric around 0)
LikeLike
Correct me if I’m wrong, but in the end your solution basically states that if the number you saw is lower than zero, guess “higher”. If it is higher than zero, guess “lower”. Am I right?
LikeLike
Obviously you cannot have positive density for more than a countable number of points, but you only need some distribution giving positive probability for any non-empty interval.
This way, your guess is correct with probability (1 + P([a,b]))/2 > 1/2 for every two numbers a>b.
LikeLike
Oh perfect, a link to Wikipedia, now I’m going to be late.
LikeLike