jwalk package¶
Submodules¶
jwalk.corpus module¶
Generate text corpus from random walks on graph.
-
jwalk.corpus.
walk_graph
(csr_matrix, labels, walk_length=40, num_walks=1, n_jobs=1)¶ Perform random walks on adjacency matrix.
Parameters: - csr_matrix – adjacency matrix.
- labels – list of node labels where index align with CSR matrix
- walk_length – maximum length of random walk (default=40)
- num_walks – number of walks to do for each node
- n_jobs – number of cores to use (default=1)
Returns: list of random walks
Return type: np.ndarray
-
jwalk.corpus.
build_corpus
(walks, outpath)¶ Build corpus by shuffling and then saving as text file.
Parameters: - walks – random walks
- outpath – file to write to
Returns: file path of corpus
Return type: str
jwalk.graph module¶
Build encoded sparse csr matrix.
-
jwalk.graph.
build_adjacency_matrix
(edges, undirected=False)¶ Build adjacency matrix.
Parameters: - edges (np.ndarray) – a 2 or 3 dim array of the form [src, tgt, [weight]]
- undirected (bool) – if True, add matrix with its transpose
Returns: adjacency matrix, np.ndarray: labels
Return type: scipy.sparse.csr_matrix
-
jwalk.graph.
encode_edges
(edges, nodes)¶ Encode data with dictionary
Parameters: - edges (np.ndarray) – np array of the form [node1, node2].
- nodes (np.array) – list of unique nodes
Returns: relabeled edges
Return type: np.ndarray
Examples
>>> import numpy as np >>> edges = np.array([['A', 'B'], ['A', 'C']]) >>> nodes = np.array(['C', 'B', 'A']) >>> print(encode_edges(edges, nodes)) [[2 1] [2 0]]
jwalk.io module¶
Load and save data.
-
jwalk.io.
load_edges
(fpath, delimiter=None, has_header=False)¶ Load edges in CSV format as numpy ndarray of strings.
Parameters: - fpath (str) – edges file
- delimiter (str) – alternative argument name for sep (default=None)
- has_header (bool) – True if has header row
Returns: array of edges
Return type: np.ndarray
-
jwalk.io.
load_graph
(filename)¶
-
jwalk.io.
save_graph
(filename, csr_matrix, labels=None)¶
jwalk.skipgram module¶
Build word2vec model.
-
jwalk.skipgram.
train_model
(corpus, size=200, window=5, workers=3, model_path=None, word_freq=None, corpus_count=None)¶ Train using Skipgram model.
Parameters: - corpus (str) – file path of corpus
- size (int) – embedding size (default=200)
- window (int) – window size (default=5)
- workers (int) – number of workers (default=3)
- model_path (str) – file path of model we want to update
- word_freq (dict) – dictionary of word frequencies
- corpus_count (int) – corpus size
Returns: word2vec model
Return type: Word2Vec
Module contents¶
jwalk library.
copyright: |
|
---|---|
license: | Apache 2.0, see LICENSE for more details. |
-
jwalk.
build_adjacency_matrix
(edges, undirected=False)¶ Build adjacency matrix.
Parameters: - edges (np.ndarray) – a 2 or 3 dim array of the form [src, tgt, [weight]]
- undirected (bool) – if True, add matrix with its transpose
Returns: adjacency matrix, np.ndarray: labels
Return type: scipy.sparse.csr_matrix
-
jwalk.
encode_edges
(edges, nodes)¶ Encode data with dictionary
Parameters: - edges (np.ndarray) – np array of the form [node1, node2].
- nodes (np.array) – list of unique nodes
Returns: relabeled edges
Return type: np.ndarray
Examples
>>> import numpy as np >>> edges = np.array([['A', 'B'], ['A', 'C']]) >>> nodes = np.array(['C', 'B', 'A']) >>> print(encode_edges(edges, nodes)) [[2 1] [2 0]]
-
jwalk.
walk_graph
(csr_matrix, labels, walk_length=40, num_walks=1, n_jobs=1)¶ Perform random walks on adjacency matrix.
Parameters: - csr_matrix – adjacency matrix.
- labels – list of node labels where index align with CSR matrix
- walk_length – maximum length of random walk (default=40)
- num_walks – number of walks to do for each node
- n_jobs – number of cores to use (default=1)
Returns: list of random walks
Return type: np.ndarray
-
jwalk.
build_corpus
(walks, outpath)¶ Build corpus by shuffling and then saving as text file.
Parameters: - walks – random walks
- outpath – file to write to
Returns: file path of corpus
Return type: str
-
jwalk.
train_model
(corpus, size=200, window=5, workers=3, model_path=None, word_freq=None, corpus_count=None)¶ Train using Skipgram model.
Parameters: - corpus (str) – file path of corpus
- size (int) – embedding size (default=200)
- window (int) – window size (default=5)
- workers (int) – number of workers (default=3)
- model_path (str) – file path of model we want to update
- word_freq (dict) – dictionary of word frequencies
- corpus_count (int) – corpus size
Returns: word2vec model
Return type: Word2Vec
-
jwalk.
load_edges
(fpath, delimiter=None, has_header=False)¶ Load edges in CSV format as numpy ndarray of strings.
Parameters: - fpath (str) – edges file
- delimiter (str) – alternative argument name for sep (default=None)
- has_header (bool) – True if has header row
Returns: array of edges
Return type: np.ndarray
-
jwalk.
load_graph
(filename)¶
-
jwalk.
save_graph
(filename, csr_matrix, labels=None)¶