offthemesh

Search
Skip to content
  • About Me
  • Home
  • Next Up: MLOps
Software Engineering

Data Virtual

July 13, 2015 offthemesh

how to handle data at scale?

That’s the most pertinent question these days. Unfortunately, no single answer suffices. Here’s what we do at my workplace:

  1. identify available sources of data
  2. categorize by type, pricing and other metadata
  3. determine scope of data
  4. configure a virtual link to sources
  5. expose via web-API
  6. build custom APIs as needed

identify data

Many short-run or even skip this part. I feel that’s a mistake, unless there are only a few categories and/or sources. For those able, this is the best non-techie task for many techies – so why not start there?

Company or management usually has a wish-list. First filter that down to what’s available and not, and for what reason.

Another good one is cataloguing the method of data-source access.

categorize by metadata

How can the data be found or searched for? Numerous criteria for locating the data should be brainstormed, if not already available. Maybe it’s the flavor of social media that it comes from or related to. Contacts or other means to get access to it, should definitely be part of this.

If the data is publicly available, it should be included.

easiest, best case scenario: include public data API URI

determine what to include

everything – is not an option. Use the metadata to set priorities. The smaller or more divided the better.

select not more than a handful. How many depends on both the business and technology. Is there a set to make the aggregate meaningful? In what context?

tech might want to test threshold on something, given supposed high-volume of data

setup the virtual

finally we get to the actual implementation. For the first run, do the simplest, or maybe the smallest. Continue to add the source definitions and/or mappings, after at least the first one is exposed via a web service.

the sheer quantity of sources should take up some time to configure, depending on how organized the end-points need to be.

customize as needed

no first release candidate is going to be perfect. Different clients need different things. They may need it in as a SQL service, not a web service. Different representations to fit varying modes of consumption.

reap the benefits

this is so open, I can only say it depends on how the implementation went, as well as the allowed throughput from the sources.

there is another side, however. We need usage and performance metrics. How we will tell useless skeletons from data with some meat. How attractive one data gets over time, as well as lose its utility.

Advertisement

Share this:

  • Twitter
  • Facebook

Like this:

Like Loading...

Related

datametadatascalestepsvirtual

Post navigation

Previous PostObjective ConfusionNext Postthe curse of ASAP

storms in the brain + data + web

Recent Posts

  • Protected: Node Request Instructions
  • What do you mean Governance ?
  • Protected: Lightning Network (LN)
  • Making the most of Regulatory Compliance
  • What’s with Data-Flow Diagrams?

Archives

  • June 2022
  • May 2022
  • January 2022
  • January 2019
  • December 2018
  • January 2017
  • February 2016
  • July 2015
  • June 2015
  • February 2015
  • October 2014
  • July 2012
  • January 2012
  • December 2011
  • July 2011

Categories

  • Data Governance
  • Data Management
  • Information Governance
  • Information Management
  • Optimization
  • Process Management
  • Software Engineering

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com
Create a free website or blog at WordPress.com.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • offthemesh
    • Already have a WordPress.com account? Log in now.
    • offthemesh
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Copy shortlink
    • Report this content
    • View post in Reader
    • Manage subscriptions
    • Collapse this bar
%d bloggers like this: