jwalk package¶

Submodules¶

jwalk.corpus module¶

Generate text corpus from random walks on graph.

jwalk.corpus.walk_graph(csr_matrix, labels, walk_length=40, num_walks=1, n_jobs=1)¶

Perform random walks on adjacency matrix.

Parameters:	csr_matrix – adjacency matrix. labels – list of node labels where index align with CSR matrix walk_length – maximum length of random walk (default=40) num_walks – number of walks to do for each node n_jobs – number of cores to use (default=1)
Returns:	list of random walks
Return type:	np.ndarray

jwalk.corpus.build_corpus(walks, outpath)¶

Build corpus by shuffling and then saving as text file.

Parameters:	walks – random walks outpath – file to write to
Returns:	file path of corpus
Return type:	str

jwalk.graph module¶

Build encoded sparse csr matrix.

jwalk.graph.build_adjacency_matrix(edges, undirected=False)¶

Build adjacency matrix.

Parameters:	edges (np.ndarray) – a 2 or 3 dim array of the form [src, tgt, [weight]] undirected (bool) – if True, add matrix with its transpose
Returns:	adjacency matrix, np.ndarray: labels
Return type:	scipy.sparse.csr_matrix

jwalk.graph.encode_edges(edges, nodes)¶

Encode data with dictionary

Parameters:	edges (np.ndarray) – np array of the form [node1, node2]. nodes (np.array) – list of unique nodes
Returns:	relabeled edges
Return type:	np.ndarray

Examples

>>> import numpy as np
>>> edges = np.array([['A', 'B'], ['A', 'C']])
>>> nodes = np.array(['C', 'B', 'A'])
>>> print(encode_edges(edges, nodes))
[[2 1]
 [2 0]]

jwalk.io module¶

Load and save data.

jwalk.io.load_edges(fpath, delimiter=None, has_header=False)¶

Load edges in CSV format as numpy ndarray of strings.

Parameters:	fpath (str) – edges file delimiter (str) – alternative argument name for sep (default=None) has_header (bool) – True if has header row
Returns:	array of edges
Return type:	np.ndarray

jwalk.io.load_graph(filename)¶

jwalk.io.save_graph(filename, csr_matrix, labels=None)¶

jwalk.skipgram module¶

Build word2vec model.

jwalk.skipgram.train_model(corpus, size=200, window=5, workers=3, model_path=None, word_freq=None, corpus_count=None)¶

Train using Skipgram model.

Parameters:	corpus (str) – file path of corpus size (int) – embedding size (default=200) window (int) – window size (default=5) workers (int) – number of workers (default=3) model_path (str) – file path of model we want to update word_freq (dict) – dictionary of word frequencies corpus_count (int) – corpus size
Returns:	word2vec model
Return type:	Word2Vec

Module contents¶

jwalk library.

copyright:	2017 by JW Player.
license:	Apache 2.0, see LICENSE for more details.

jwalk.build_adjacency_matrix(edges, undirected=False)¶

Build adjacency matrix.

Parameters:	edges (np.ndarray) – a 2 or 3 dim array of the form [src, tgt, [weight]] undirected (bool) – if True, add matrix with its transpose
Returns:	adjacency matrix, np.ndarray: labels
Return type:	scipy.sparse.csr_matrix

jwalk.encode_edges(edges, nodes)¶

Encode data with dictionary

Parameters:	edges (np.ndarray) – np array of the form [node1, node2]. nodes (np.array) – list of unique nodes
Returns:	relabeled edges
Return type:	np.ndarray

Examples

>>> import numpy as np
>>> edges = np.array([['A', 'B'], ['A', 'C']])
>>> nodes = np.array(['C', 'B', 'A'])
>>> print(encode_edges(edges, nodes))
[[2 1]
 [2 0]]

jwalk.walk_graph(csr_matrix, labels, walk_length=40, num_walks=1, n_jobs=1)¶

Perform random walks on adjacency matrix.

Parameters:	csr_matrix – adjacency matrix. labels – list of node labels where index align with CSR matrix walk_length – maximum length of random walk (default=40) num_walks – number of walks to do for each node n_jobs – number of cores to use (default=1)
Returns:	list of random walks
Return type:	np.ndarray

jwalk.build_corpus(walks, outpath)¶

Build corpus by shuffling and then saving as text file.

Parameters:	walks – random walks outpath – file to write to
Returns:	file path of corpus
Return type:	str

jwalk.train_model(corpus, size=200, window=5, workers=3, model_path=None, word_freq=None, corpus_count=None)¶

Train using Skipgram model.

Parameters:	corpus (str) – file path of corpus size (int) – embedding size (default=200) window (int) – window size (default=5) workers (int) – number of workers (default=3) model_path (str) – file path of model we want to update word_freq (dict) – dictionary of word frequencies corpus_count (int) – corpus size
Returns:	word2vec model
Return type:	Word2Vec

jwalk.load_edges(fpath, delimiter=None, has_header=False)¶

Load edges in CSV format as numpy ndarray of strings.

Parameters:	fpath (str) – edges file delimiter (str) – alternative argument name for sep (default=None) has_header (bool) – True if has header row
Returns:	array of edges
Return type:	np.ndarray

jwalk.load_graph(filename)¶

jwalk.save_graph(filename, csr_matrix, labels=None)¶