r

SUBMISSION REQUIREMENTS: Please submit a single R script file named with your “First_Last Name.R” ONLY.  Your R script code must calculate the effectivness of your classification as described below.
Similar to the classification example.  process and classify the newsgroup document data. Download this data  and save it on your computer in your R packages folder under “tm/text/”. Your code MUST access it from there!
Note that the data is separated into one test and one train folder, each containing 20 sub folders on different subjects. Choose these 2 subjects to analyze (sci.space and rec.autos) and 100 documents from each.  
Consider “rec.autos” as positive and “sci.space” as negative event. Note that kNN  syntax expects (Positive First, Negative second)
Classify the Newsgroups data (by date version data set) from Blackboard:
•        Save data in your “tm/text/” folder so you can specify path using system.file()
•       Note that the data is separated into one test and one train folder, each containing 20 sub folders on different subjects.
Choose these 2 subjects to analyze (sci.space and rec.autos) and 100 documents from each.
•        For each subject select:
–       100 documents for training from the train folder
–       100 documents for testing from the test folder
•        Obtain the merged Corpus (of 400 documents), please keep the order as
–       Doc1.Train from the “sci.space” newsgroup train data
–       Doc1.Test from the “sci.space” newsgroup test data
–       Doc2.Train from the ” rec.autos” newsgroup train data
–       Doc2.Test from the ” rec.autos” newsgroup test data
•        Implement preprocessing (clearly indicate what you have used)
•        Create the Document-Term Matrix using the following arguments (word lengths of at least 2, word frequency of at least 5)
–      use: control=list(wordLengths=c(2,Inf), bounds=list(global=c(5,Inf)))
•        Split the Document-Term Matrix into proper test/train row ranges
–       train range containing rows (1:100) and (201:300)
–       test range  containing rows (101:200) and (301:400)
–       Note that knn expects the positive (“Rec”) event as first, so re-adjust your train/test range if necessary.  
•        Use the abbreviations “Positive” and “Negative” as tag factors in your classification.
–       Check if the tag order is correct using table(Tags)
–       You should get
•        Tags
•        Positive Negative
•        100      100
–       If your order is not right make proper changes.
•        Classify text using the kNN() function
•        Display classification results as a R dataframe and name the columns as:
–       “Doc”
–       “Predict”  – Tag factors of predicted subject (Positive or Negative)
–       “Prob” – The classification probability
–       “Correct’ – TRUE/FALSE
•        What is the percentage of correct (TRUE) classifications?
•        Estimate the effectiveness of your classification:
– Calculate and  clearly mark the values TP, TN, FP, FN
–       Create the confusion matrix and name the rows and columns with what is Positive/Negative event
–       Calculate Precision
–       Calculate Recall
–       Calculate F-score
Note that one way you can select only 100 documents is
> Temp1 <- DirSource(Doc1.TestPath)
> Doc1.Train <- Corpus(URISource(Temp1$filelist[1:100]),readerControl=list(reader=readPlain))

Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more

Order your essay today and save 15% with the discount code SUCCESS