jwalk package

Submodules

jwalk.corpus module

Generate text corpus from random walks on graph.

jwalk.corpus.walk_graph(csr_matrix, labels, walk_length=40, num_walks=1, n_jobs=1)

Perform random walks on adjacency matrix.

Parameters:
  • csr_matrix – adjacency matrix.
  • labels – list of node labels where index align with CSR matrix
  • walk_length – maximum length of random walk (default=40)
  • num_walks – number of walks to do for each node
  • n_jobs – number of cores to use (default=1)
Returns:

list of random walks

Return type:

np.ndarray

jwalk.corpus.build_corpus(walks, outpath)

Build corpus by shuffling and then saving as text file.

Parameters:
  • walks – random walks
  • outpath – file to write to
Returns:

file path of corpus

Return type:

str

jwalk.graph module

Build encoded sparse csr matrix.

jwalk.graph.build_adjacency_matrix(edges, undirected=False)

Build adjacency matrix.

Parameters:
  • edges (np.ndarray) – a 2 or 3 dim array of the form [src, tgt, [weight]]
  • undirected (bool) – if True, add matrix with its transpose
Returns:

adjacency matrix, np.ndarray: labels

Return type:

scipy.sparse.csr_matrix

jwalk.graph.encode_edges(edges, nodes)

Encode data with dictionary

Parameters:
  • edges (np.ndarray) – np array of the form [node1, node2].
  • nodes (np.array) – list of unique nodes
Returns:

relabeled edges

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> edges = np.array([['A', 'B'], ['A', 'C']])
>>> nodes = np.array(['C', 'B', 'A'])
>>> print(encode_edges(edges, nodes))
[[2 1]
 [2 0]]

jwalk.io module

Load and save data.

jwalk.io.load_edges(fpath, delimiter=None, has_header=False)

Load edges in CSV format as numpy ndarray of strings.

Parameters:
  • fpath (str) – edges file
  • delimiter (str) – alternative argument name for sep (default=None)
  • has_header (bool) – True if has header row
Returns:

array of edges

Return type:

np.ndarray

jwalk.io.load_graph(filename)
jwalk.io.save_graph(filename, csr_matrix, labels=None)

jwalk.skipgram module

Build word2vec model.

jwalk.skipgram.train_model(corpus, size=200, window=5, workers=3, model_path=None, word_freq=None, corpus_count=None)

Train using Skipgram model.

Parameters:
  • corpus (str) – file path of corpus
  • size (int) – embedding size (default=200)
  • window (int) – window size (default=5)
  • workers (int) – number of workers (default=3)
  • model_path (str) – file path of model we want to update
  • word_freq (dict) – dictionary of word frequencies
  • corpus_count (int) – corpus size
Returns:

word2vec model

Return type:

Word2Vec

Module contents

jwalk library.

copyright:
  1. 2017 by JW Player.
license:

Apache 2.0, see LICENSE for more details.

jwalk.build_adjacency_matrix(edges, undirected=False)

Build adjacency matrix.

Parameters:
  • edges (np.ndarray) – a 2 or 3 dim array of the form [src, tgt, [weight]]
  • undirected (bool) – if True, add matrix with its transpose
Returns:

adjacency matrix, np.ndarray: labels

Return type:

scipy.sparse.csr_matrix

jwalk.encode_edges(edges, nodes)

Encode data with dictionary

Parameters:
  • edges (np.ndarray) – np array of the form [node1, node2].
  • nodes (np.array) – list of unique nodes
Returns:

relabeled edges

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> edges = np.array([['A', 'B'], ['A', 'C']])
>>> nodes = np.array(['C', 'B', 'A'])
>>> print(encode_edges(edges, nodes))
[[2 1]
 [2 0]]
jwalk.walk_graph(csr_matrix, labels, walk_length=40, num_walks=1, n_jobs=1)

Perform random walks on adjacency matrix.

Parameters:
  • csr_matrix – adjacency matrix.
  • labels – list of node labels where index align with CSR matrix
  • walk_length – maximum length of random walk (default=40)
  • num_walks – number of walks to do for each node
  • n_jobs – number of cores to use (default=1)
Returns:

list of random walks

Return type:

np.ndarray

jwalk.build_corpus(walks, outpath)

Build corpus by shuffling and then saving as text file.

Parameters:
  • walks – random walks
  • outpath – file to write to
Returns:

file path of corpus

Return type:

str

jwalk.train_model(corpus, size=200, window=5, workers=3, model_path=None, word_freq=None, corpus_count=None)

Train using Skipgram model.

Parameters:
  • corpus (str) – file path of corpus
  • size (int) – embedding size (default=200)
  • window (int) – window size (default=5)
  • workers (int) – number of workers (default=3)
  • model_path (str) – file path of model we want to update
  • word_freq (dict) – dictionary of word frequencies
  • corpus_count (int) – corpus size
Returns:

word2vec model

Return type:

Word2Vec

jwalk.load_edges(fpath, delimiter=None, has_header=False)

Load edges in CSV format as numpy ndarray of strings.

Parameters:
  • fpath (str) – edges file
  • delimiter (str) – alternative argument name for sep (default=None)
  • has_header (bool) – True if has header row
Returns:

array of edges

Return type:

np.ndarray

jwalk.load_graph(filename)
jwalk.save_graph(filename, csr_matrix, labels=None)