You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey all, I ran into an issue but also found a fix! I was passing a sparse matrix into the guidedLDA and there was an error i was getting where it was raising an error due to this if statement being reached in the utils.py
def matrix_to_lists(doc_word):
"""Convert a (sparse) matrix of counts into arrays of word and doc indices
Parameters
----------
doc_word : array or sparse matrix (D, V)
document-term matrix of counts
Returns
-------
(WS, DS) : tuple of two arrays
WS[k] contains the kth word in the corpus
DS[k] contains the document index for the kth word
"""
if np.count_nonzero(doc_word.sum(axis=1)) != doc_word.shape[0]:
logger.warning("all zero row in document-term matrix found")
if np.count_nonzero(doc_word.sum(axis=0)) != doc_word.shape[1]:
logger.warning("all zero column in document-term matrix found")
sparse = True
try:
# if doc_word is a scipy sparse matrix
doc_word = doc_word.copy().tolil()
except AttributeError:
sparse = False
if sparse and not np.issubdtype(doc_word.dtype, int):
raise ValueError("expected sparse matrix with integer values, found float values") <-----------------------------
ii, jj = np.nonzero(doc_word)
if sparse:
ss = tuple(doc_word[i, j] for i, j in zip(ii, jj))
else:
ss = doc_word[ii, jj]
n_tokens = int(doc_word.sum())
DS = np.repeat(ii, ss).astype(np.intc)
WS = np.empty(n_tokens, dtype=np.intc)
startidx = 0
for i, cnt in enumerate(ss):
cnt = int(cnt)
WS[startidx:startidx + cnt] = jj[i]
startidx += cnt
return WS, DS
The reason for this is because the data type of the sparse matrix going in gets converted to a little matrix and has a np.int64 data type which does not equate to base level "int" so I had to change it to np.int 64 in order to circumvent this issue, so the new one function just has this changed
if sparse and not np.issubdtype(doc_word.dtype, np.int64):
raise ValueError("expected sparse matrix with integer values, found float values")
Everything now is working as usual. let me know how i can do a commit request,push request if needed as i have not done it before. I believe a better work around would be a catch all like datatype isin then a list of int versions, because they should all work with LDA.
On windows 10-python3.8.5
The text was updated successfully, but these errors were encountered:
Would love to see this implemented, it sounds like it's only the faulty ValueError that's stopping the use of Sparse Matrix, while the underlying code can handle sparse matrix perfectly well.
Hey all, I ran into an issue but also found a fix! I was passing a sparse matrix into the guidedLDA and there was an error i was getting where it was raising an error due to this if statement being reached in the utils.py
The reason for this is because the data type of the sparse matrix going in gets converted to a little matrix and has a np.int64 data type which does not equate to base level "int" so I had to change it to np.int 64 in order to circumvent this issue, so the new one function just has this changed
Everything now is working as usual. let me know how i can do a commit request,push request if needed as i have not done it before. I believe a better work around would be a catch all like datatype isin then a list of int versions, because they should all work with LDA.
On windows 10-python3.8.5
The text was updated successfully, but these errors were encountered: