Validation of user input for quality

Taking Stackoverflow as an example, how can one one set out to eliminate bad questions from popping up? Questions that only contain:

  1. Spaces
  2. Gibberish
  3. Repetitive characters
  4. Combination of characters such as 'fsdgh' etc
  5. Other combinations as explained in Jeff's post on Meta.

If I want to do this in Javascript, is there a way to programmatically find such combinations of characters in a post? Jeff's post on Meta explains some heuristics and patterns for such a system.

I understand that there might be other ways to combat this like user registration and rate-limiting etc, but purely from validation and UX viewpoint, how can we make it happen, either on server-side or client-side? It would be a great way to tell the good users that the system will not expect bad posts either way.

This can use a combination of RegExps and loops for example. We do not even have to deploy all the requirements and perhaps a subset would do just fine.


Questions that only contain:

  1. Spaces
  2. Gibberish
  3. Repetitive characters
  4. Combination of characters such as 'fsdgh' etc
  1. Get the text of the post, do a trim(); on it (remove the white space), if the size of the text is now zero, your post only contained spaces. It is bad.
  2. Hard one, maybe search for common words and phrases like "the, of, if, it", if many are absent or all are absent, it might be bad.
  3. I don't think there are any words in the English language that have more than two repeated characters in a row. So as you parse the text, remember what the last character you saw was, if the next character is the last character you saw, set a flag if the flag is set the the next character is the one you last saw, bad post.
  4. Same idea as above but search for words that have lengths of characters but without vowels, do research figure out what words have the longest chains of non-vowels, if you ever find anything more, its a bad post.

Most of these are hard to do because as computer scientists know English, and pretty much any spoken language, is backaswards to anything a computer can understand easily or parse through easily.

As you pointed out the only good way to take care of these problems is mods, registration, etc.

Oops, didnt see you wanted to use Regex's... But still its hard to do from any standpoint. (I think)

Need Your Help

Search part of phone number with Sunspot Solr

solr sunspot solr4 sunspot-rails sunspot-solr

I am developing rails app with sunspot Solr search engine and I'm in need of indexing phone numbers in Solr 4.1.

Any better solution for this scheduling algorithm?


You have N persons. Each person has a list of free time periods.