We use Github Enterprise (GHE) and find it convenient to extract user information from within it that is easily adjusted by users.
We extract user information using the api wrapped with the octokit ruby gem. Here we preserve api terminology to maintain correspondence with docs. See Extract to Files
require 'octokit' require 'json' Octokit.auto_paginate = true Octokit.configure do |c| c.access_token = '0501 ...' c.api_endpoint = "https://source.datanerd.us/api/v3" c.web_endpoint = "https://source.datanerd.us/api/v3" end users = Octokit.all_users().map { |e| STDERR.print '.' u = Octokit.user(e.login) { login: e.login, id: e.id, type: e.type, site_admin: e.site_admin, name: u.name, blog: u.blog, location: u.location, email: u.email, bio: u.bio } } puts JSON.pretty_generate users STDERR.puts
This runs from a bash script that saves standard out and then writes an explain.yml file in the same directory with source details of the extraction.
We merge this with other sources of employee information using a for-purpose written helper that transforms available user information into a unique employee identifier. The api mixes user and org information which we choose to represent distinctly. See Merge and Transform
json('github-users/users.json').each do |user| props = { ghe_id: user['id'], ghe_blog: user['blog'], ghe_location: user['location'] ghe_bio: user['bio'] } if user['type'] == 'User' props[:name] = user['name'] props[:id] = find_employee_id user node 'EMPLOYEE', props else props[:name] = user['login'] node 'ORG', props end end
The find_employee_id has become rather complex in our implementation. We are careful to read and merge more authoritative sources first and then use a variety of heuristics to match other sources to the identities already found.
We record in the database what heuristics worked including no_match_found when all fail. The create or retrieve logic of the node helper then makes a node that doesn't relate well but can be further explored with queries to understand better what has gone wrong.