Game moderation reloaded

Russ

RHP Code Monkey

Announcements

06 Mar 08 02:09

Yuga

Renaissance

OnceInALifetime

Joined: 24 Sep 05
Moves: 30579

07 Mar 08 01:52

Originally posted by HomerJSimpson
arriakis is rumored to be steve exeter. I would like proof that he isnt steve before voting for him.

Nonsense, here is the proof (bottom page 1). Thread 82784

GalaxyShield

Mr. Shield

Joined: 02 Sep 04
Moves: 174290

07 Mar 08 02:07

1 edit

My votes would go to:

Tony
Gatecrasher

I think at least 2 or 3 others would be needed to ensure proper analysis, and a little bit of speed (spreading the load out). Perhaps CMSmaster would be a good choice, too?

Gatecrasher

Whale watching

33°36'S 26°53'E

Joined: 05 Feb 04
Moves: 41150

07 Mar 08 02:59

Originally posted by gezza
What is the workload?
Gatecrasher:"After a week or so [...] I was about 60%-70% through the workload [...]"
How many hours/days work are we talking about?

I want to be stamped as clean at this time. I have completed 54 games over the last 2 1/2 years - If the last game mods had some sort of pretty tool to analyse, run that. I would rather deal with the ...[text shortened]... on. If I decide to give my name, I am sure there will be others who recognise me.

The workload is variable. cmsmaster suggests that the previous team could not cope with the number of complaints, but in the second half of 2007 we received very few complaints and every compliant received was investigated. There was seldom a backlog. I do agree more team members would be better to reduce workload, but too many can slow down effective decision making.

I worked hard on the case in question because it was so critical, and because so much evidence had been submitted. There were many analysed games, comments and observations on specific moves, and these all these had to be verified, replicated and evaluated for objectivity. Normally, a complaint is along the lines of "so-and-so's graph looks dodgy" and we would simply run our own checks.

A game mods should be prepared to spend several hours a week modding.

Automated tools have been developed which makes life easier. Manually analyzing games with a specific engine is time consuming, but sometimes unavoidable.

For the record, and without going to into specifics that could help cheats avoid detection, I want to clarify the use of statistics by game mods.

The misuse of statistics in that blog entry (and subsequent discussion in an RHP forum thread) is indeed appalling. The test statistic produced by the tool cannot be converted into a "% chance of cheating" simply by subtracting it from 1. Rather, it is the probability that a genuine human player could innocently equal or exceed the match-up stats produced from a suspect's batch of games.

This can be likened to the tossing of coins. Toss a fair coin 10 times, and you will most likely get 5 heads. A test statistic using a cumulative binomial distribution would suggest a 62.3% chance of achieving 5 or more heads. 37.7% chance of 6 or more, 17.2% chance of 7 or more, 5.5% of 8 or more, 1.1% chance of getting 9 or more. And a 0.1% of getting 10 heads out of 10. But if we toss the coin 10 times and actually get 10 heads, it cannot be converted to mean there is a 99.9% chance that the coin was not fair. We have already established that the coin is fair. Similarly, to say there is a 75% chance that Tal used an engine to make his moves is absurd. We know he didn't.

This is hypothesis testing. The null hypothesis is that the suspect is a strong human player. So how well do the findings fit the possibility that chance factors alone might be responsible for the outcome? What the tool measures is a type I error, the probability of a false positive. If we reject the null hypothesis due to a type I error we are identifying the suspect as an engine user. To apply this we need to determine an acceptable significance level. Say we used 5%, it would mean that for every 20 tests on innocent players, one of them would get falsely banned. A 1% significance level would produce 1 false positive in 100, 0.1% gives 1 in 1000, and 0.01% gives 1 in 10000. What level of error is acceptable? What other eivdence exists? What constitutes overwhelming evidence beyond reasonable doubt?

Lastly, whether the writer of the blog has been rightly or wrongly accused of cheating is immaterial. He has every right to play here, and if innocent, he has nothing to fear. No circumstances excuse the fact that he has taken privileged game mod information, and the contents of private messages and pasted them on a site external to this one. And that he is using a tool designed exclusively for the use of RHP game mods for the sole purpose of objective game modding here at RHP, as a means to defend himself and as a weapon to attack others. That is a betrayal of trust. As the author of the tool I resent its misuse.

As with any statistical tool, data selection is critical. Biased data selection will give biased results. The tool can be used to "prove" anything as it relies heavily on the objectivity of the operator.

!~TONY~!

1...c5!

Your Kingside

Joined: 28 Sep 01
Moves: 40665

07 Mar 08 03:13

This is quite interesting. As an engineer familiar with inferential statistics and mathematics, I always wondered how the game mods decided that there was overwhelming evidence of cheating. A simple hypothesis test. Awesome! 😀

caissad4

Child of the Novelty

San Antonio, Texas

Joined: 08 Mar 04
Moves: 619040

07 Mar 08 05:46

Originally posted by HomerJSimpson
arriakis is rumored to be steve exeter. he used to have an image in his profile that was linked to a image hosting service the exact image he has now with the name steve exeter in it. He did not answer the question when I asked him about it and he has since changed the image hosting of the picture. I would like proof that he isnt steve before voting for him.

Arrakis is not steve exeter.

leisurelysloth

Man of Steel

rushing to and fro

Joined: 13 Aug 05
Moves: 5930

07 Mar 08 06:36

Originally posted by Gatecrasher
...The misuse of statistics in that blog entry (and subsequent discussion in an RHP forum thread) is indeed appalling. The test statistic produced by the tool cannot be converted into a "% chance of cheating" simply by subtracting it from 1. Rather, it is the probability that a genuine human player could innocently equal or exceed the match-up stats produced from a suspect's batch of games....

LOL! If you were talking with a non-mathematically inclined individual and watched their eyes glaze over after you said, "the probability that a genuine human player could innocently equal or exceed the match-up stats produced from a suspect's batch of games...." how would you then explain all this gobbledygook. While it may not be mathematically correct, "% chance of cheating" basically expresses a layman's "equivalent" to what you are trying to say....

leisurelysloth

Man of Steel

rushing to and fro

Joined: 13 Aug 05
Moves: 5930

07 Mar 08 06:58

Originally posted by Gatecrasher
Lastly, whether the writer of the blog has been rightly or wrongly accused of cheating is immaterial. He has every right to play here, and if innocent, he has nothing to fear. No circumstances excuse the fact that he has taken privileged game mod information, and the contents of private messages and pasted them on a site external to this one. And that he ...[text shortened]... n to attack others. That is a betrayal of trust. As the author of the tool I resent its misuse.

Not knowing any of the players here, I don't particularly feel like I have a "side" to take. But I can see where the accused is coming from here. You are objecting to his use of "your" tool in defending himself. But the fact of the matter is that the system is set up such that he would otherwise never have had an opportunity to defend himself. And to make matters worse, in this particular case he'll never even have the opportunity to have what passes for a "fair trial" since the trial was suspended.

Instead he's being tried in the forums by people who are using the very same tools for offense which you object to him using for defense. If the shoe were on the other foot, how long would you tolerate having your name dragged through the mud in the forums?

Mahout

London

Joined: 04 Nov 05
Moves: 12606

07 Mar 08 09:53

Originally posted by GalaxyShield
My votes would go to:

Tony
Gatecrasher

I think at least 2 or 3 others would be needed to ensure proper analysis, and a little bit of speed (spreading the load out). Perhaps CMSmaster would be a good choice, too?

I agree that Tony and Gatecrasher would be worth voting for. I also consider Dragon Fire a good potential candidate although he hasn't put his name forward and is on holiday...has some catching up to do when he gets back I reckon!

Russ

RHP Code Monkey

RHP HQ

Joined: 21 Feb 01
Moves: 2808

07 Mar 08 10:05

I'll let this run until early next week before closing the thread and moving the process on.

-Russ

Marinkatomb

wotagr8game

tbc

Joined: 18 Feb 04
Moves: 61941

07 Mar 08 11:27

I would volunteer if my computer wasn't so damned ustable! Fritz over heats my computer and causes it to crash, i'm yet to complete a full analysis on a game since i bought it. 🙁

Tatarana Crocodilo

Joined: 12 Aug 04
Moves: 30813

07 Mar 08 13:15

Originally posted by Gatecrasher
The workload is variable. cmsmaster suggests that the previous team could not cope with the number of complaints, but in the second half of 2007 we received very few complaints and every compliant received was investigated. There was seldom a backlog. I do agree more team members would be better to reduce workload, but too many can slow down effective de ...[text shortened]... an be used to "prove" anything as it relies heavily on the objectivity of the operator.

Rec'd, and thanks for your clarifications.

Phlabibit

Mystic Meg

tinyurl.com/3sbbwd4

Joined: 27 Mar 03
Moves: 17242

07 Mar 08 14:20

Originally posted by Russ
I'll let this run until early next week before closing the thread and moving the process on.

-Russ

Russ, I'm in for feedback, getting coffee, and answering PM's and public outcry over decisions. The basic stuff I did before.

P-

no1marauder

Naturally Right

Back in the Saddle

Joined: 22 Jun 04
Moves: 43468

07 Mar 08 14:38

Originally posted by Phlabibit
Russ, I'm in for feedback, getting coffee, and answering PM's and public outcry over decisions. The basic stuff I did before.

P-

Can't we get someone with bigger breasts to do that?

Red Night

RHP Prophet

pursuing happiness

Joined: 22 Feb 06
Moves: 13669

07 Mar 08 15:26

Originally posted by GalaxyShield
My votes would go to:

Tony
Gatecrasher

I think at least 2 or 3 others would be needed to ensure proper analysis, and a little bit of speed (spreading the load out). Perhaps CMSmaster would be a good choice, too?

I think Tony and Gatecrasher would be fantastic choices.

Red Night

RHP Prophet

pursuing happiness

Joined: 22 Feb 06
Moves: 13669

07 Mar 08 15:30

Originally posted by Gatecrasher
The workload is variable. cmsmaster suggests that the previous team could not cope with the number of complaints, but in the second half of 2007 we received very few complaints and every compliant received was investigated. There was seldom a backlog. I do agree more team members would be better to reduce workload, but too many can slow down effective de ...[text shortened]... an be used to "prove" anything as it relies heavily on the objectivity of the operator.

I think that since gate crasher wrote the program that does the testing, he HAS to be a game mod.

What about David?