Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON.parse error: cannot see the plot #94

Open
sameiali opened this issue Oct 17, 2018 · 0 comments
Open

JSON.parse error: cannot see the plot #94

sameiali opened this issue Oct 17, 2018 · 0 comments

Comments

@sameiali
Copy link

Hi,

as a beginner in R and topic modelling with R, I am following what other people suggested to first fit a lda model on a corpus of corporate annual reports, and then visualize the results thorugh LDAvis. Everything works fine until the very last step, when I open the directory on the browser and get the following error:

"SyntaxError: JSON.parse: bad control character in string literal at line 10 column 16177 of the JSON data"

Here are my codes:

#load text mining library
library(tm)
#load files into corpus
#get listing of .txt files in directory
ceoletters <- read.csv("ceoletters.csv")
corpus <- iconv(ceoletters$ceoletter, to = "ASCII", sub = "")
#create corpus from vector
letters <- Corpus(VectorSource(corpus))
#start preprocessing
letters <-tm_map(letters,content_transformer(tolower))
letters <- tm_map(letters, removePunctuation)
letters <- tm_map(letters, removeNumbers)
letters <- tm_map(letters,removeWords,stopwords("english"))
letters <- tm_map(letters, stripWhitespace)
#Stem document
letters <- tm_map(letters,stemDocument)
#Create document-term matrix
dtm <- DocumentTermMatrix(letters)
#convert rownames to filenames
rownames(dtm) <- ceoletters$letter_id
#collapse matrix by summing over columns
freq <- colSums(as.matrix(dtm))
#length should be total number of terms
length(freq)
#create sort order (descending)
ord <- order(freq,decreasing=TRUE)
#List all terms in decreasing order of freq and write to disk
freq[ord]
write.csv(freq[ord],"word_freq.csv")

##fitting LDA
#load topic models library
library(topicmodels)
library(doParallel)
#Set parameters for Gibbs sampling
burnin <- 2000
iter <- 2000
thin <- 500
seed <-list(2003,5,63,100001,765)
nstart <- 5
best <- TRUE
registerDoParallel(4)
#Number of topics
k <- 100
#Run LDA using Gibbs sampling
ldaOut <-LDA(dtm,k, method="Gibbs", control=list(nstart=nstart, seed = seed, best=best, burnin = burnin, iter = iter, thin=thin))
#write out results
#docs to topics
ldaOut.topics <- as.matrix(topics(ldaOut))
write.csv(ldaOut.topics,file=paste("LDAGibbs",k,"DocsToTopics.csv"))
#top 20 terms in each topic
ldaOut.terms <- as.matrix(terms(ldaOut,10))
write.csv(ldaOut.terms,file=paste("LDAGibbs",k,"TopicsToTerms.csv"))
#probabilities associated with each topic assignment
topicProbabilities <- as.data.frame(ldaOut@gamma)
write.csv(topicProbabilities,file=paste("LDAGibbs",k,"TopicProbabilities.csv"))

and here are my codes to visualize the results:

library(LDAvis)
library(servr)
topicmodels2LDAvis <- function(x, ...){
  post <- topicmodels::posterior(x)
  if (ncol(post[["topics"]]) < 3) stop("The model must contain > 2 topics")
  mat <- x@wordassignments
  LDAvis::createJSON(
    phi = post[["terms"]], 
    theta = post[["topics"]],
    vocab = colnames(post[["terms"]]),
    doc.length = slam::row_sums(mat, na.rm = TRUE),
    term.frequency = slam::col_sums(mat, na.rm = TRUE)
  )
}
serVis(topicmodels2LDAvis(ldaOut))

Any idea to solve this problem?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant