End of Year Reflection

Over the last six months, VIA has shared 24 posts about our technology and partnerships. For our last post of 2020, we wanted to share a more personal note with our readers.

Looking back at our 2019 wrap-up blog, we had big ideas for 2020. Little did we know what that might entail.

Like every individual and company around the world, the pandemic re-routed us multiple times. However, we can’t help but be grateful for how our colleagues have responded, adapted, and accomplished in a year that was full of unforeseen speed bumps. Below are the ways our team found its way during 2020.

Transition to remote

Unbeknownst to us, Pi Day (3/14) would be the last day that we were all together in our offices.

Just as quickly as we demolished our pies, our team was up and running, fully remote. With some tweaks to home working spaces and more frequent touch points (like more one-to-one meetings and daily Scrum of Scrums), the team found its new groove. Having already worked across Montreal and Somerville for three years, the team remained productive while working from home.

Team growth and engagement

From a hiring standpoint, VIA just kept on growing! 

Our technology headquarters moved from Griffintown to Downtown Montreal. We can’t wait to be all together in the new space and host some exciting events for the local McGill and greater Montreal tech communities.

Our office move was motivated by our growth, since June we brought nine new team members on board and quickly refined the art of fully virtual recruitment and onboarding. We may not know the “in real life” height of our newest colleagues, but thanks to well-organised planning, frequent virtual check-ins, and the tremendous patience and generosity of our onboarding advisors and mentors, each one has been integrated to our team with overwhelming success. 

Always putting our people first

As a family-friendly company and one where “being each other’s biggest fan” is a core value, we doubled down on our efforts to consider the needs and wellbeing of our team. Sometimes, that meant cameos from children in meetings or adjusted schedules. For others, it was an encouraging reminder to keep going during one of our virtual workout sessions.

Above all, we maintained that there is no one way for people to respond in a year such as this. Whether team members needed to disconnect from the virtual world after work hours or join virtual “pub club”, paint night, or our first globally coordinated VIA Thanksgiving, either was equally fine.

Initiatives 2.0

With our expanding, distributed team, and a new style of working, it was important that VIA remained responsive to the rapidly changing environment. We introduced several wellness initiatives and upped our dosage of flexibility and compassion. From things like virtual yoga or frequent surveys to check-in, we always had our team’s wellbeing in mind.

As COVID-19 re-surged in the fall, we introduced the “VIA Vault” where team members earned points for things like technology fails (e.g., frozen video conference screens) or unexpected doorbells or pets joining meetings. The points will go towards a company-wide event, activity, or whatever it might be when we are all back together.

Lastly, this year was the first time in our company’s history that we did not have an in-person company offsite. That didn’t stop us, however, from having a multi-week mini-series to reflect on the year and unite and excite the team around VIA’s clean energy mission. 

We feel incredibly grateful for the resiliency and flexibility our team has shown during this challenging year. We thank them and we extend our thank you to our customers and investors for their continued support as well. We hope that no matter where you will be working from in 2021, you will have a happy, prosperous, and most of all “return to normal” 2021.

Let’s TAC™ About It: The Value of TAC™

For the twelfth installment of our blog series, we will cover how utilities quantify the value of using TAC™. So, are you ready to TAC™ about it?

In our last Let’s TAC™ About It post, we provided a few examples of customer use cases for TAC™.

A follow on question that we often get is, “how do you quantify the value of TAC™ in those use cases?” 

There are multiple ways to approach this, however the most common method is to compare what TAC™ is doing with “the next best alternative.” For most customers, that would be custom building your own data privacy and analysis platform or integrating multiple packages from multiple vendors.

One of VIA’s largest customers considered doing just this. They had evaluated more than 20 vendors and IT integration firms for their needs before we met them. This customer articulated these three big benefits when choosing TAC™ over their next best alternative:

  1. Speed: Because TAC™ has a wide array of built-in features, they could count on being up and running much faster than custom applications. In this instance, other vendors estimated as much as a two year timeframe to build and test an IT platform for their needs. VIA was able to do this in less than four months from the time our contract was signed with them. Even better, within three weeks of using the new platform, they were able to make significant decisions including upgrading $10 million in equipment and planning an equipment replacement, which they estimate will save them $1.2 million per year.
  2. Risk reduction: Custom IT solutions and integration are infamous for delays, budget overruns, and not meeting specifications. Since TAC™ has already been tested and deployment is automated, timelines for future deployment are highly reliable. Prospective TAC™ customers can also see and test features in advance to help make configuration decisions.
  3. Support: Because other customers are already using TAC™, VIA has a strong customer support team and process in place. In addition, upgrades and enhancements are rolled out roughly every month. These upgrades are for all users regardless of their implementation.

Interested in learning more? Write to info@solvewithvia.com for more details on the value of TAC™ for our customers.

Let’s TAC™ About It: TAC™: So, what’s it good for?

For the eleventh installment of our blog series, we will cover how utilities are using TAC™ and all of its new features. So, are you ready to TAC™ about it?

Careful readers of our Let’s TAC™ About It blog series will note that VIA has announced nine new features in the past 14 weeks. 

A valid question is what are utilities doing with all of these features?

For starters, utilities are increasing clean energy adoption.

The transition to reliable clean energy requires utilities to share more data with third parties such as other utilities, distributed energy resource providers, and electric vehicle (EV) infrastructure companies.

Many utilities, however, have yet to find efficient processes to make this data sharing possible. Utilities complain that current processes are often slow and manual or outright deny third parties access to data.

TAC™ features such as k-anonymity enable utilities to meet regulator data privacy and security requirements and also efficiently work with:

  • battery storage providers
  • distributed energy resource companies
  • EV infrastructure providers 
  • microgrid developers

The result is faster and better asset planning and increased readiness for clean energy technologies. In fact, some of VIA’s customers are reporting that their time saved ranges from one person-month to a year faster.

Interested in learning more? Write to info@solvewithvia.com for a more detailed case study regarding the benefits and use cases of TAC™.

TAC™ White Paper Release: Balancing Data Owner Protection and Analyst Flexibility

This year, we have published 10 blog posts about TAC™’s data privacy and security features. These are high level overviews of updates and enhancements that we have been rolling out as a result of customer feedback. 

Some of our more technical customers, however, have requested more detail. We’re pleased to release a new 10-page white paper that provides a more comprehensive overview and code examples of TAC™’s current data privacy and security features. The focus of this paper is how we balance data privacy while also providing analysts access and flexibility in their queries.

You can request your copy by sending us an email at info@solvewithvia.com.

Our original 2018 white paper with a general overview of TAC™ and its architecture is also available on request.

Preview of VIA’s recently released white paper, “TAC™: Balancing Data Protection and Query Flexibility”.

Reversing the Curse: The 80 / 20 Split In AI Projects

The USAF (a current VIA customer), as referenced in this 2020 congressional report, has prioritized funds to automate analyst workflows to, “directly tackle the challenge of analysts spending 80% of their time searching for data and 20% making sense of the data.” 

Interviews with analysts and our earliest experiences working with customers’ data has confirmed this 80/20 split. We even created a chart in our earliest VIA introductory materials:

To address this common challenge, VIA has been investing heavily in using our own AI and software development skills to reverse this ratio. We may never eliminate data gathering and cleaning tasks, but we can strive to reduce this to 20% of a project and thereby boost the productivity of data scientists. SWEET is the most recent example of how our TAC™ platform accomplishes this. 

We’re keen to hear from you about whether the 80/20 split is still the norm or if you are seeing it decrease (or *gasp* increase). Suggestions on tools or solutions that you would like to see are genuinely welcome.

Let’s TAC™ About It: Verified Templates: Balancing Security and Flexibility

For the tenth installment of our blog series, we will cover how TAC™ contracts can control what kinds of queries an analyst can ask. So, are you ready to TAC™ about it?

In our last Let’s TAC™ About It blog post, we gave a brief overview of how smart contracts work in TAC™. In addition to timing and user role, TAC™ contracts can also control what kinds of queries an authorized analyst can ask of the data. Arguably, this is the most unique and valuable feature in TAC™. 

We have three methods. On one extreme, TAC™ can do this through Verified Scripts. This provides strong data control but limited analyst flexibility. Any analysis code is reviewed and pre-approved. The limitation is that this requires a manual review and approval process for each script. This is most useful for common and frequently performed analysis. VIA provides a number of Verified Scripts (e.g., descriptive statistics like max, min, etc.) by default.

On the other extreme, we have Free Form Python (FFPy). In this scenario, anything an analyst can ask in Python code (including “send me all the data” kind of queries) is permitted. This is terrific for analyst flexibility but has limited data protection for Data Owners (DOs). Internal power users are the main audience for this method.

What if you need something in-between? This is the most common scenario and hardest to solve for. 

Here, TAC™ uses Verified Templates (VTs). DOs use VTs to set boundaries about what analysts can and cannot do. Within those boundaries, analysts have great flexibility. In short, each VT has two parts. The “end of a Python module” is written or agreed to by the DO and sets the boundaries. It is immutable. The second part, the header of the module, is filled in by the analyst. The VT can even include variables that are not defined. These provide entry points that must be fully defined to control the workflow before running. Any file where the end of the Python module matches the VT exactly (other than newlines and comments), constitutes a validated instance of the template. 

This is an example VT:

def main(db, tables, results_path):
   print(hello_world)
   return []

The variable hello_world is highlighted because it is not defined by the template, thus providing flexibility to an analyst.

Then, the following query is a valid instance of the template:

hello_world='Welcome to TAC!'
def main(db, tables, results_path):
   print(hello_world)
   return []

and the following query is an invalid instance of the template:

def main(db, tables, results_path):
   print('Welcome to TAC!')
   return []

As this example shows, the VT must include the main function. Once an analyst changes the body of the main function or any other functions defined in the template, their query will be rejected by TAC™ and not sent to a DO’s data to run.

Sophisticated DOs can create an infinite number of VTs to suit their specific needs. VIA also has predefined VTs for specific use cases such as 15/15 for utility meter data analysis.

A hash of the DOs VT is stored in the smart contract and must be verified to execute. The VTs themselves can be stored on or off the blockchain. A combination of off-chain VT storage and on-chain hash is a common practice for privacy reasons. 

Customers have been testing VTs on TAC™ for the past few months and we have been very pleased with the results so far. Balancing security and flexibility is a hard but universal challenge. We’re excited to have found a relatively high performance solution for the utility industry.

Let’s TAC™ About It: What’s a Smart Contract?

For the ninth installment of our blog series, we will cover how TAC™ uses smart contracts to govern data and analysis workflows. So, are you ready to TAC™ about it?

You may recall from previous posts that TAC™ uses smart contracts as the key control mechanism to govern data and analysis workflows. We have spoken before about the origins and need for smart contracts, in our blog post, Rock Science: How Van Halen Invented Smart Contracts

But, how exactly do we implement these controls?

To recap, a smart contract is a set of software-enforced rules agreed by two or more parties:

  • Rules are enforced through code (e.g., IF x happens, THEN allow (or don’t allow) y)
  • Parties agree to contract terms using digital signatures

Smart contracts are stored on a blockchain, an auditable and immutable ledger. The rules in the contract set specific restrictions around how data can be accessed.

In addition to a contract name and description (e.g., AI goal), there are five required terms for each TAC™ smart contract.

In a TAC™ release earlier this year, VIA also implemented a sub-contracting function. Sometimes customers need to change or update the terms of the contract. By definition, information stored on a blockchain is “immutable” so we technically can’t make a change to an original contract. To accommodate updates, we do allow sub-contracts that are linked to each other in a parent-child relationship and act as a kind of version control. 

One of the most important and unique terms of a TAC™ smart contract is the disclosure restriction. We will discuss this in more detail in our next blog post.

Meet the Team: Natalie Winger, Data Specialist

Through a Q&A-style interview, you will hear from VIA team members about things like a typical day at the office and favorite foods.


You are one of the first team members to join entirely remote during COVID-19. Can you tell us what that process was like?

It was of course a bit nerve-wracking and different than any other onboarding I’ve done before, but VIA was very organized about the entire process. My first 1-2 weeks were completely planned out with “Get to Know VIA” sessions, technical onboarding, first day lunch, and other events. I was pleasantly surprised that I never felt lost or didn’t know what I was supposed to do next or who to turn to for help. It has also been really nice to have many video calls because it has made it so much easier to get to know everyone when you can see their faces!

Have there been any funny “remote team” moments that stand out to you since joining the team?

My favorite was on my first day when Colin’s cat was climbing up the blinds behind him and playing. Colin just paused and asked “can you see my cat right now…?” and then laughed a little while proceeding to peel the cat off the curtains. It was a good moment to break from the more formal presentation and remember that everyone exists beyond the screen – and a lot of us have pets or family members or outside noises that we can’t control but are still a fun part of our lives.

What does a typical day at VIA look like for you?

So far I have had such a variety of days that it’s hard to answer this question! Before standup, I usually check my Slack and continue/follow up on any work from the previous day. Then, the rest of the day I could spend cleaning and wrangling new data, prioritizing the next steps for the future of GDAC™ data, or summarizing and verifying partner data. 

I look forward to Demo days most because we get to see in detail what everyone else has been working on and it’s always impressive to see how all the details I’ve heard about in the days and weeks before all come together. I also look forward to All Hands meetings because it’s a chance to learn more about the vision for VIA, any new initiatives, and even company-wide training. Finally, on Fridays I often attend (virtual) Pub Club which has been an especially useful resource for me. Since I did remote onboarding, it’s been nice to get to know other people who I haven’t had a chance to work with yet or learn more about the people I see every day, but don’t get a chance to chat with.

What motivates you to come to the office (or work remotely) each day?

I love seeing the rest of the GDAC™ team in stand-up every morning and starting out the day with a quick chat or some stories between us! Beyond seeing other friendly faces at VIA, just thinking about the amount of transformer data that is available to us and the possible insights we can get from it is very motivating – you never know what you’ll find! I especially like that even people who are on other epics or who have worked in the GDAC™ epic in the past are still so interested in what new updates we’re making and are very supportive and invested in helping to make our team a success.

What’s something you have worked on at VIA that you are most proud of?

I am most proud of the first time I took the lead on bringing data from raw files to ready-to-use on the GDAC™ portal. Through many iterations of cleaning and wrangling, I gained an appreciation of all the steps that go into helping the partners gain value from their data. I also had many opportunities to discuss future ideas for wrangling, predictions, and process improvements, which was motivating to see the possibilities for the future of my work and of GDAC™.

If you were given an extra hour in your day, what would you spend it doing?

Hopefully exercising – I like to bike or walk around new areas of Montreal, or swim in the outdoor pool in the summer – but I also might spend more time on making a nice dinner or reading my book.

What is your go-to food?

Cucumbers and hummus is my #1 default snack.

What’s something everyone may not, but should, know about working at VIA?

I am continuously impressed by the speed at which changes are made at VIA. Challenging and important questions (from process changes to technical issues) are constantly being asked, and I find that multiple people often immediately chip in to help solve them. VIA is definitely a place where questions are highly encouraged, and the whole company is willing to jump in to help give you the tools and support you need to put your ideas into action right away.

Let’s TAC™ About It: 15/15

For the eighth installment of our blog series, we will cover TAC™’s ability to meet the 15/15 rule. So, are you ready to TAC™ about it?

Utilities have a core mission to provide reliable power at fair rates to their customers.

A recent article in Energy News Network discussed the use of AI algorithms on utility meter data to rate fairness, particularly for low and moderate income customers.

David Kolata, Executive Director of the Illinois Citizens Utility Board commented after the analysis that “Potentially, lower-income consumers are subsidizing higher-income consumers.” Clustering of actual load curves showed that ”Because these [low income] customers use less electricity and contribute less to the grid’s peak load than others, and because peak load drives overall system costs higher, low-income customers could be paying more than their fair share for electricity.”

Given the seriousness of the issue, why haven’t more utilities performed similar analysis? 

The article continues to explain that data privacy and data confidentiality are the major barriers to this kind of analysis. In this specific instance, the Illinois Commerce Commission enacted the “15 by 15” privacy rule to help enable these findings.

As we wrote about in the fifth installment of our Let’s TAC About It series, 15/15 is a specific example of the k-anonymity privacy-preserving analytics feature that VIA recently released. For utilities interested in replicating the Illinois analytics example, VIA’s TAC™ platform automates the process of privacy-preserving analytics and provides an audited record for regulator review.

Meter data has been used for clean energy analytics and load curve predictions for some time. We’re excited to see privacy-protected AI analysis enabling new kinds of analysis, in particular, equity for low and moderate income communities.

Let’s TAC™ About It: Ease of Use Upgrade

For the seventh installment of our blog series, we will cover the latest release of TAC™ which is an easy to use upgrade to our Models for Private Data (MPD) service. So, are you ready to TAC™ about it?

Open source AI algorithms are rapidly improving and becoming the standard for numerous types of data analyses. Unfortunately, as we touched on in our Models for Private Data post, many companies can’t take advantage of these models because they are concerned about compromising the confidentiality of their data when uploading it to cloud-based analytics platforms. 

The latest release of VIA’s TAC™ platform is a leap forward in terms of simplicity for data owners. TAC™ now makes testing containerized models (e.g., TensorFlow, PyTorch, MLflow) models on private data as easy as four clicks.

This new functionality on TAC™ also provides an even easier to use interface for analysts to catalog and share their trained containerized models with each other. These models can be used on private data even if the models were pre-trained elsewhere on different data sources.

There are at least three different analysis scenarios where TAC™ customers find this functionality of value:

  1. Distributed collaborations: Many of our customers are working remotely or with external (e.g., academic) experts. This is for those who want to collaborate and share different ways of solving a problem.
  2. Large teams of data scientists: In some instances, customers may have large teams (e.g., 50+ data scientists). This helps keep track of who did what, where the results are, and more.
  3. Rotating teams: For at least one of our customers, their data science analysts rotate on a six month basis. Rotating teams can cause knowledge to be lost (e.g., US military). Analysis may be done differently each time causing variability in quality and is slower due to the learning curve for each new person.

What these three scenarios have in common is the challenge of knowledge sharing across analysts about what models work, expected results, and comparing new and old models. 

Data science skills are proliferating at utility companies. In addition to keeping their data private, we’re excited to help analysts collaborate more effectively. Our hope is that our latest release will create a virtuous circle. With greater collaboration, analysts will find insights to support utility reliability and cost savings exponentially faster.

 

How VIA TAC™-kles Sensor Data

The electric grid is not just filled with sensors. It is a giant sensor.

Smart meters collect data from millions of customers, and in some countries like Spain and Denmark, from over 90% of homes. In addition, every utility collects load data in near real-time. 

Load, SCADA, smart meter, and other sensor data on the grid are extremely powerful datasets for advanced analytics. However, to paraphrase the Spiderman franchise, with great power comes great responsibility.

Many utilities are highly concerned about the responsibility that comes with these datasets. As a result, the data gets trapped for one or more of these four reasons:

  1. Data is too big: After you install a sensor, you might go from a reading once a month to 30 times a second. Data volume becomes an issue.
  2. Data is too confidential: Something in the sensor reveals critical infrastructure or personally identifiable information (PII). Smart meters can tell you vacation schedules and household size, for example. Many of the analytics insights can be considered protected information by consumer privacy laws (think CCPA and GDPR).
  3. Data is too distributed: Data from sensors by definition is going to be in a lot of locations, perhaps even across regional jurisdictional boundaries. This can make it expensive or impossible to aggregate.
  4. Data is too messy: Different sensors or meters or subsidiaries of a utility may have different data formats. Cleaning data is a time intensive process when performed manually.

VIA’s software platform, TAC™ was designed to address all four of these issues in power sector data. TAC™ reduces risk and improves IT efficiency for data access and data analysis projects. Follow the latest section of our website to learn about TAC™ features (in our Let’s TAC™ About It blog series) as well as new customer and project announcements.

Let’s TAC™ About It: Homomorphic Encryption Algorithms on TAC™ (HEAT)

For the sixth installment of our blog series, we will cover homomorphic encryption algorithms on TAC™, known as HEAT. So, are you ready to TAC™ about it?

It’s standard practice to encrypt data at rest (e.g., data in a database) and in transit (e.g., data sent over the internet). When an analyst has to work with data, however, the data is usually unencrypted to perform any calculation. Homomorphic encryption (HE) is cryptography that holds the promise of allowing computations on data while it remains encrypted. 

Despite the increasing need for data security and privacy, HE is not widely used. There are at least two barriers to adoption:

  • Impracticality: NIST-compliant encryption schemes that power widely-used protocols such as RSA have well-known properties that make it possible to perform a subset of arithmetic operations on encrypted numbers. Many of these methods, however, require brute-force computation that limits their ability to add large numbers.
  • Lack of standardization: To overcome the limitations of NIST-compliant encryption schemes, many new HE protocols have been developed to perform operations on larger numbers with some powerful optimizations. Unfortunately, none have been proven reliable enough to be standardized. Even large projects such as Microsoft SEAL have recognized security challenges.

VIA’s HE algorithm overcomes both these barriers for addition. Our protocol uses NIST-compliant standard elliptic curve cryptography (ECC) algorithms to sum arbitrarily large numbers. The key innovation is to represent integers in a base of 2n and sum the numbers for each column in separate batches up to an overflow limit of 232. This resulting number (a vector of numbers less than 232) can thus be decoded using standard brute-force decoding algorithms. 

To avoid overflow limits, the protocol is limited to summing 2(32-n) numbers at a time. If, however, there is a need to sum more than 2(32-n) at a time, we can increase the overflow limit by increasing computation power available for decoding. We distribute the computation to multiple machines to execute the decoding in parallel. In practice, HE is normally used on numbers that are already aggregated, so the 2(32-n) number limit is not a significant barrier. 

As an example, imagine a decrypting system that could only handle numbers up to 1,000 in base 10. How would you add the numbers, 236 and 56,798? While such a decrypting system can only handle numbers up to 1,000, we are not hampered by this limitation. The number 236 can be represented as a vector of (6,3,2,0,0). The number 56,798 is represented as (8,9,7,6,5). Adding the two numbers, we get (14, 12, 9, 6, 5). We can decode each of the components of that vector because they are all less than 1,000. We can then reexpress this result as (4,3,0,7,5) and finally decode this as 57,034. With this approach, we can add up to 100 numbers at a time and be sure that none of the individual columns exceed 1,000 and thus our cryptosystem will work.

What if we need to add 200 numbers? In this case, we can choose a small base (e.g., from 216 to 24) and raise the limit beyond 1,000 by distributing the decoding in parallel. The size of the base determines the amount of brute-force computation required. By breaking the addition into a series of smaller problems, it now becomes possible to use standardized encryption algorithms and brute-force computation to solve the problem. We also leverage existing standards to make it practical to perform arbitrarily complex sum operations. 

VIA has incorporated this HE addition algorithm into its TAC™ platform, known as Homomorphic Encryption Algorithm on TAC™ or HEAT.

To benchmark the system, we compared execution times for HEAT versus Microsoft SEAL, a popular open source HE library. The simple benchmark consists of recording the execution time to encrypt, add, and decrypt up to 80,000 integers. The cryptographic parameters for the ECC were chosen to match the same level of protection as SEAL.

The graph above shows that HEAT is roughly twice as fast as SEAL.

At VIA, we’re excited to have found a “no-tradeoff” solution for HE addition that has wide applications. We are already using HEAT to enable encrypted benchmarking of data across utilities as part of our GDAC™ program. We are also looking into using HEAT for training a federated deep learning model.

HE is a rapidly developing field. VIA is increasing its dedicated resources to improve its HE implementation including exploring lattice cryptography to meet post-quantum computing requirements and extending HEAT to enable homomorphic multiplication.

Let’s TAC™ About It: K-Anonymity

For the fifth installment of our blog series, we will cover TAC™’s k-anonymity functionality. So, are you ready to TAC™ about it?

Regulators are increasingly demanding that utilities release their data to third parties to support a wide array of clean energy initiatives. At the same time, regulators are also mandating increased information privacy and security requirements like with CCPA and GDPR.

The data about a consumer’s energy behavior can provide enormous insights for efficiency but, of course, they can also reveal private details like vacation habits, income levels, family size, etc.

How can utilities balance these two reasonable but competing requirements? 

VIA has implemented a k-anonymity function to handle this use case. The essence of k-anonymity is to segment the data in such a way that similar consumers are in groups that are both big enough to hide an individual consumer’s behavior and small enough to be meaningful and useful for analysis and creating energy efficiency programs.

The above images show two normally distributed variables with 1,000 points. Each cross indicates one point. There can be several points at each location on top of each other. The second image shows the points grouped with at least 15 points within each group. The groups are smaller and more frequent in the middle because there are more points located in the middle due to the normal distributions.

K-anonymity algorithms are sometimes referred to as Mondrian algorithms due to the groupings resembling the paintings of Piet Mondrian.

Choosing an optimal grouping size is not just a challenge, it is actually an NP-hard mathematical problem. TAC™ now provides a simple function to allow utilities to implement k-anonymity groupings to meet both regulator constraints. When a utility chooses this function, any request of data must meet specific group size constraints. If not enough data exists to create a set of data that would maintain data privacy (e.g., only one consumer meets the specific request), then the utility does not provide the data. Similarly, if a data request is made where there is more than enough data to maintain an individual’s privacy, smaller groups of data will be created to allow for more targeted analysis.

Utilities facing regulatory constraints are excited by the opportunity to have an automated means of ensuring data privacy. As a sign of growing interest in the area, the utility non-profit EPRI (Electric Power Research Institute) is also facilitating a working group to test new solutions to the regulator dilemma.

Let’s TAC™ About It: Digital Watermarking

For the fourth installment of our blog series, we will cover TAC™’s digital watermarking functionality. So, are you ready to TAC™ about it?

For centuries, watermarks have been used to verify currencies and important documents and to discourage counterfeiting. Today, watermarks are used for the same purpose to protect music, photos, videos, software, and other digital data.

The main goal of our TAC™ platform is to keep data private by not transferring data. Having said that, there are times when it is necessary (e.g., for audit or regulatory purposes) to transfer some data between trusted parties. 

At VIA, we use watermarking to enable data auditability when data owners choose to transfer data. For example, two organizations can receive identical datasets. If it is later found that data has been leaked, the data owner can review the leaked data and trace it back to the original analyst that requested it.

TAC™ automates the process of maintaining the provenance of data. As long as the data was transferred using TAC™, no additional steps are required by the data owner and there is no noticeable difference to analysts.

Watermarking is a simple and automatic method to provide auditability to data.

Introducing SWEET: AI for Data Wrangling

For the third installment of our blog series, Let’s TAC™ About It, we’d like to introduce you to SWEET, a built-in function on VIA’s TAC™ platform. So, are you ready to TAC™ about it?


SWEET stands for Smart Wrangling Engine for Extraction and Transformation. Check out the video (or the transcription that follows) to learn how we use AI to wrangle data!

Below is a transcription of the “Introduction to SWEET: AI for Data Wrangling” video.

Analysts need to transfer data from a spreadsheet to a database, often known as data wrangling. The analyst usually identifies a rule or multiple rules (such as: column x is data, row 5 is a header and can be discarded, and so on). The analyst then writes code to execute that rule. This works well when rules are easily identifiable. In most cases, however, this is incredibly time consuming.

Data scientists spend more time wrangling and cleaning data than on analysis and AI. The problem with that is analytics insights are of the highest value, but get the least amount of resources. The big leap in AI is being able to process information without humans writing all the rules.

As an example, computer vision is used to identify a dog. Think of all the varieties of dogs and all the possible variations in context that those dogs could be in. There isn’t an army of people large enough to write rules to identify a random dog in a random photo.

And yet, AI can.

VIA’s approach to data wrangling is to use some of the exact same AI algorithms used in image recognition. This works across a much wider variety of contexts and spreadsheet or file formats. Let’s take a look at how SWEET works.

Here’s a spreadsheet. What SWEET is going to do is use a number of different machine learning algorithms to automate the process of getting the information into a database format.

The first model uses machine vision to map out the spreadsheet. Purple represents blank space, green is headings, yellow is actual data. Once that model has run, there’s a second algorithm that takes a look at the content.

The second algorithm skips over the purple. It looks at the green (which is the headers) to know which column to write where in the database. Finally, it would take a look at the yellow area to write the data to the database.

A third machine learning algorithm determines which column is derived from other parts of the sheet. For example, a total column is just the sum of the other columns and may not be necessary to write to the database. The third model separates these derived columns from the raw data.

In this example, the ACCOUNT column turns out to be the total of the other columns added up. It could be difficult for a human to understand immediately, but one of our models does this instantly.

SWEET’s approach works irrespective of the format. The model doesn’t have to be re-trained when it comes across spreadsheets that are new or in different formats.

So, what’s new and different here? AI algorithms have been evolving quickly. Many of the models that we implemented didn’t exist just a few years ago.

The other insight is that we broke the “convert this spreadsheet into a db” problem into multiple steps and have a different AI algorithm for each step.

Combined, SWEET, a built-in function in VIA’s TAC™ ingestion engine, helps make processes that used to take analysts days to do manually and makes them instantaneous.


Update: July 31, 2020

The image below shows the steps SWEET takes to transform raw files into a standard format.