Committed to your privacy, we’ve eliminated cookies from our site. As valued customers of VIA, we want you to always feel that we have your back. You can be assured that your data is always handled in an encrypted, secure manner without physical collection or storage involved.
This year, we have published 10 blog posts about TAC™’s data privacy and security features. These are high level overviews of updates and enhancements that we have been rolling out as a result of customer feedback.
Some of our more technical customers, however, have requested more detail. We’re pleased to release a new 10-page white paper that provides a more comprehensive overview and code examples of TAC™’s current data privacy and security features. The focus of this paper is how we balance data privacy while also providing analysts access and flexibility in their queries.
Our original 2018 white paper with a general overview of TAC™ and its architecture is also available on request.
Preview of VIA’s recently released white paper, “TAC™: Balancing Data Protection and Query Flexibility”.
https://www.solvewithvia.com/wp-content/uploads/2020/11/TAC-White-Paper-1.png6271201viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-11-05 16:31:232022-03-16 08:25:41TAC™ White Paper Release: Balancing Data Owner Protection and Analyst Flexibility
The USAF (a current VIA customer), as referenced in this 2020 congressional report, has prioritized funds to automate analyst workflows to, “directly tackle the challenge of analysts spending 80% of their time searching for data and 20% making sense of the data.”
Interviews with analysts and our earliest experiences working with customers’ data has confirmed this 80/20 split. We even created a chart in our earliest VIA introductory materials:
To address this common challenge, VIA has been investing heavily in using our own AI and software development skills to reverse this ratio. We may never eliminate data gathering and cleaning tasks, but we can strive to reduce this to 20% of a project and thereby boost the productivity of data scientists. SWEET is the most recent example of how our TAC™ platform accomplishes this.
We’re keen to hear from you about whether the 80/20 split is still the norm or if you are seeing it decrease (or *gasp* increase). Suggestions on tools or solutions that you would like to see are genuinely welcome.
https://www.solvewithvia.com/wp-content/uploads/2020/11/80-20-Split.png6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-10-29 08:56:272022-03-16 08:25:47Reversing the Curse: The 80 / 20 Split In AI Projects
For the tenth installment of our blog series, we will cover how TAC™ contracts can control what kinds of queries an analyst can ask. So, are you ready to TAC™ about it?
In our last Let’s TAC™ About It blog post, we gave a brief overview of how smart contracts work in TAC™. In addition to timing and user role, TAC™ contracts can also control what kinds of queries an authorized analyst can ask of the data. Arguably, this is the most unique and valuable feature in TAC™.
We have three methods. On one extreme, TAC™ can do this through Verified Scripts. This provides strong data control but limited analyst flexibility. Any analysis code is reviewed and pre-approved. The limitation is that this requires a manual review and approval process for each script. This is most useful for common and frequently performed analysis. VIA provides a number of Verified Scripts (e.g., descriptive statistics like max, min, etc.) by default.
On the other extreme, we have Free Form Python (FFPy). In this scenario, anything an analyst can ask in Python code (including “send me all the data” kind of queries) is permitted. This is terrific for analyst flexibility but has limited data protection for Data Owners (DOs). Internal power users are the main audience for this method.
What if you need something in-between? This is the most common scenario and hardest to solve for.
Here, TAC™ uses Verified Templates (VTs). DOs use VTs to set boundaries about what analysts can and cannot do. Within those boundaries, analysts have great flexibility. In short, each VT has two parts. The “end of a Python module” is written or agreed to by the DO and sets the boundaries. It is immutable. The second part, the header of the module, is filled in by the analyst. The VT can even include variables that are not defined. These provide entry points that must be fully defined to control the workflow before running. Any file where the end of the Python module matches the VT exactly (other than newlines and comments), constitutes a validated instance of the template.
The variable hello_world is highlighted because it is not defined by the template, thus providing flexibility to an analyst.
Then, the following query is a valid instance of the template:
hello_world='Welcome to TAC!'def main(db, tables, results_path): print(hello_world) return 
and the following query is an invalid instance of the template:
def main(db, tables, results_path): print('Welcome to TAC!') return 
As this example shows, the VT must include the main function. Once an analyst changes the body of the main function or any other functions defined in the template, their query will be rejected by TAC™ and not sent to a DO’s data to run.
Sophisticated DOs can create an infinite number of VTs to suit their specific needs. VIA also has predefined VTs for specific use cases such as 15/15 for utility meter data analysis.
A hash of the DOs VT is stored in the smart contract and must be verified to execute. The VTs themselves can be stored on or off the blockchain. A combination of off-chain VT storage and on-chain hash is a common practice for privacy reasons.
Customers have been testing VTs on TAC™ for the past few months and we have been very pleased with the results so far. Balancing security and flexibility is a hard but universal challenge. We’re excited to have found a relatively high performance solution for the utility industry.
https://www.solvewithvia.com/wp-content/uploads/2020/10/Lets-TAC-About-It-10-copy.png6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-10-22 14:27:202022-11-10 22:36:25Let’s TAC™ About It: Verified Templates: Balancing Security and Flexibility
For the ninth installment of our blog series, we will cover how TAC™ uses smart contracts to govern data and analysis workflows. So, are you ready to TAC™ about it?
You may recall from previous posts that TAC™ uses smart contracts as the key control mechanism to govern data and analysis workflows. We have spoken before about the origins and need for smart contracts, in our blog post, Rock Science: How Van Halen Invented Smart Contracts.
But, how exactly do we implement these controls?
To recap, a smart contract is a set of software-enforced rules agreed by two or more parties:
Rules are enforced through code (e.g., IF x happens, THEN allow (or don’t allow) y)
Parties agree to contract terms using digital signatures
Smart contracts are stored on a blockchain, an auditable and immutable ledger. The rules in the contract set specific restrictions around how data can be accessed.
In addition to a contract name and description (e.g., AI goal), there are five required terms for each TAC™ smart contract.
In a TAC™ release earlier this year, VIA also implemented a sub-contracting function. Sometimes customers need to change or update the terms of the contract. By definition, information stored on a blockchain is “immutable” so we technically can’t make a change to an original contract. To accommodate updates, we do allow sub-contracts that are linked to each other in a parent-child relationship and act as a kind of version control.
One of the most important and unique terms of a TAC™ smart contract is the disclosure restriction. We will discuss this in more detail in our next blog post.
https://www.solvewithvia.com/wp-content/uploads/2020/10/Lets-TAC-About-It-9.png6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-10-15 13:30:312022-03-16 08:26:09Let’s TAC™ About It: What’s a Smart Contract?
Through a Q&A-style interview, you will hear from VIA team members about things like a typical day at the office and favorite foods.
You are one of the first team members to join entirely remote during COVID-19. Can you tell us what that process was like?
It was of course a bit nerve-wracking and different than any other onboarding I’ve done before, but VIA was very organized about the entire process. My first 1-2 weeks were completely planned out with “Get to Know VIA” sessions, technical onboarding, first day lunch, and other events. I was pleasantly surprised that I never felt lost or didn’t know what I was supposed to do next or who to turn to for help. It has also been really nice to have many video calls because it has made it so much easier to get to know everyone when you can see their faces!
Have there been any funny “remote team” moments that stand out to you since joining the team?
My favorite was on my first day when Colin’s cat was climbing up the blinds behind him and playing. Colin just paused and asked “can you see my cat right now…?” and then laughed a little while proceeding to peel the cat off the curtains. It was a good moment to break from the more formal presentation and remember that everyone exists beyond the screen – and a lot of us have pets or family members or outside noises that we can’t control but are still a fun part of our lives.
What does a typical day at VIA look like for you?
So far I have had such a variety of days that it’s hard to answer this question! Before standup, I usually check my Slack and continue/follow up on any work from the previous day. Then, the rest of the day I could spend cleaning and wrangling new data, prioritizing the next steps for the future of GDAC™ data, or summarizing and verifying partner data.
I look forward to Demo days most because we get to see in detail what everyone else has been working on and it’s always impressive to see how all the details I’ve heard about in the days and weeks before all come together. I also look forward to All Hands meetings because it’s a chance to learn more about the vision for VIA, any new initiatives, and even company-wide training. Finally, on Fridays I often attend (virtual) Pub Club which has been an especially useful resource for me. Since I did remote onboarding, it’s been nice to get to know other people who I haven’t had a chance to work with yet or learn more about the people I see every day, but don’t get a chance to chat with.
What motivates you to come to the office (or work remotely) each day?
I love seeing the rest of the GDAC™ team in stand-up every morning and starting out the day with a quick chat or some stories between us! Beyond seeing other friendly faces at VIA, just thinking about the amount of transformer data that is available to us and the possible insights we can get from it is very motivating – you never know what you’ll find! I especially like that even people who are on other epics or who have worked in the GDAC™ epic in the past are still so interested in what new updates we’re making and are very supportive and invested in helping to make our team a success.
What’s something you have worked on at VIA that you are most proud of?
I am most proud of the first time I took the lead on bringing data from raw files to ready-to-use on the GDAC™ portal. Through many iterations of cleaning and wrangling, I gained an appreciation of all the steps that go into helping the partners gain value from their data. I also had many opportunities to discuss future ideas for wrangling, predictions, and process improvements, which was motivating to see the possibilities for the future of my work and of GDAC™.
If you were given an extra hour in your day, what would you spend it doing?
Hopefully exercising – I like to bike or walk around new areas of Montreal, or swim in the outdoor pool in the summer – but I also might spend more time on making a nice dinner or reading my book.
What is your go-to food?
Cucumbers and hummus is my #1 default snack.
What’s something everyone may not, but should, know about working at VIA?
I am continuously impressed by the speed at which changes are made at VIA. Challenging and important questions (from process changes to technical issues) are constantly being asked, and I find that multiple people often immediately chip in to help solve them. VIA is definitely a place where questions are highly encouraged, and the whole company is willing to jump in to help give you the tools and support you need to put your ideas into action right away.
https://www.solvewithvia.com/wp-content/uploads/2020/10/Meet-the-Team-Natalie_1-1.png7361105viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-10-12 13:40:112022-03-16 08:26:16Meet the Team: Natalie Winger, Data Specialist
For the eighth installment of our blog series, we will cover TAC™’s ability to meet the 15/15 rule. So, are you ready to TAC™ about it?
Utilities have a core mission to provide reliable power at fair rates to their customers.
A recent article in Energy News Network discussed the use of AI algorithms on utility meter data to rate fairness, particularly for low and moderate income customers.
David Kolata, Executive Director of the Illinois Citizens Utility Board commented after the analysis that “Potentially, lower-income consumers are subsidizing higher-income consumers.” Clustering of actual load curves showed that ”Because these [low income] customers use less electricity and contribute less to the grid’s peak load than others, and because peak load drives overall system costs higher, low-income customers could be paying more than their fair share for electricity.”
Given the seriousness of the issue, why haven’t more utilities performed similar analysis?
The article continues to explain that data privacy and data confidentiality are the major barriers to this kind of analysis. In this specific instance, the Illinois Commerce Commission enacted the “15 by 15” privacy rule to help enable these findings.
As we wrote about in the fifth installment of our Let’s TAC About It series, 15/15 is a specific example of the k-anonymity privacy-preserving analytics feature that VIA recently released. For utilities interested in replicating the Illinois analytics example, VIA’s TAC™ platform automates the process of privacy-preserving analytics and provides an audited record for regulator review.
Meter data has been used for clean energy analytics and load curve predictions for some time. We’re excited to see privacy-protected AI analysis enabling new kinds of analysis, in particular, equity for low and moderate income communities.
https://www.solvewithvia.com/wp-content/uploads/2020/10/Lets-TAC-About-It-8.png6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-10-08 19:46:492022-03-16 08:28:35Let’s TAC™ About It: 15/15
For the seventh installment of our blog series, we will cover the latest release of TAC™ which is an easy to use upgrade to our Models for Private Data (MPD) service. So, are you ready to TAC™ about it?
Open source AI algorithms are rapidly improving and becoming the standard for numerous types of data analyses. Unfortunately, as we touched on in our Models for Private Data post, many companies can’t take advantage of these models because they are concerned about compromising the confidentiality of their data when uploading it to cloud-based analytics platforms.
The latest release of VIA’s TAC™ platform is a leap forward in terms of simplicity for data owners. TAC™ now makes testing containerized models (e.g., TensorFlow, PyTorch, MLflow) models on private data as easy as four clicks.
This new functionality on TAC™ also provides an even easier to use interface for analysts to catalog and share their trained containerized models with each other. These models can be used on private data even if the models were pre-trained elsewhere on different data sources.
There are at least three different analysis scenarios where TAC™ customers find this functionality of value:
Distributed collaborations: Many of our customers are working remotely or with external (e.g., academic) experts. This is for those who want to collaborate and share different ways of solving a problem.
Large teams of data scientists: In some instances, customers may have large teams (e.g., 50+ data scientists). This helps keep track of who did what, where the results are, and more.
Rotating teams: For at least one of our customers, their data science analysts rotate on a six month basis. Rotating teams can cause knowledge to be lost (e.g., US military). Analysis may be done differently each time causing variability in quality and is slower due to the learning curve for each new person.
What these three scenarios have in common is the challenge of knowledge sharing across analysts about what models work, expected results, and comparing new and old models.
Data science skills are proliferating at utility companies. In addition to keeping their data private, we’re excited to help analysts collaborate more effectively. Our hope is that our latest release will create a virtuous circle. With greater collaboration, analysts will find insights to support utility reliability and cost savings exponentially faster.
https://www.solvewithvia.com/wp-content/uploads/2020/09/Lets-TAC-About-It-7.png6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-09-24 13:31:552022-11-15 02:56:55Let’s TAC™ About It: Ease of Use Upgrade
The electric grid is not just filled with sensors. It is a giant sensor.
Smart meters collect data from millions of customers, and in some countries like Spain and Denmark, from over 90% of homes. In addition, every utility collects load data in near real-time.
Load, SCADA, smart meter, and other sensor data on the grid are extremely powerful datasets for advanced analytics. However, to paraphrase the Spiderman franchise, with great power comes great responsibility.
Many utilities are highly concerned about the responsibility that comes with these datasets. As a result, the data gets trapped for one or more of these four reasons:
Data is too big: After you install a sensor, you might go from a reading once a month to 30 times a second. Data volume becomes an issue.
Data is too confidential: Something in the sensor reveals critical infrastructure or personally identifiable information (PII). Smart meters can tell you vacation schedules and household size, for example. Many of the analytics insights can be considered protected information by consumer privacy laws (think CCPA and GDPR).
Data is too distributed: Data from sensors by definition is going to be in a lot of locations, perhaps even across regional jurisdictional boundaries. This can make it expensive or impossible to aggregate.
Data is too messy: Different sensors or meters or subsidiaries of a utility may have different data formats. Cleaning data is a time intensive process when performed manually.
VIA’s software platform, TAC™ was designed to address all four of these issues in power sector data. TAC™ reduces risk and improves IT efficiency for data access and data analysis projects. Follow the latest section of our website to learn about TAC™ features (in our Let’s TAC™ About It blog series) as well as new customer and project announcements.
https://www.solvewithvia.com/wp-content/uploads/2020/09/smart-meter.jpg13652048viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-09-09 19:42:292022-03-16 08:27:42How VIA TAC™-kles Sensor Data
For the sixth installment of our blog series, we will cover homomorphic encryption algorithms on TAC™, known as HEAT. So, are you ready to TAC™ about it?
It’s standard practice to encrypt data at rest (e.g., data in a database) and in transit (e.g., data sent over the internet). When an analyst has to work with data, however, the data is usually unencrypted to perform any calculation. Homomorphic encryption (HE) is cryptography that holds the promise of allowing computations on data while it remains encrypted.
Despite the increasing need for data security and privacy, HE is not widely used. There are at least two barriers to adoption:
Impracticality:NIST-compliant encryption schemes that power widely-used protocols such as RSA have well-known properties that make it possible to perform a subset of arithmetic operations on encrypted numbers. Many of these methods, however, require brute-force computation that limits their ability to add large numbers.
Lack of standardization: To overcome the limitations of NIST-compliant encryption schemes, many new HE protocols have been developed to perform operations on larger numbers with some powerful optimizations. Unfortunately, none have been proven reliable enough to be standardized. Even large projects such as Microsoft SEAL have recognized security challenges.
VIA’s HE algorithm overcomes both these barriers for addition. Our protocol uses NIST-compliant standard elliptic curve cryptography (ECC) algorithms to sum arbitrarily large numbers. The key innovation is to represent integers in a base of 2n and sum the numbers for each column in separate batches up to an overflow limit of 232. This resulting number (a vector of numbers less than 232) can thus be decoded using standard brute-force decoding algorithms.
To avoid overflow limits, the protocol is limited to summing 2(32-n) numbers at a time. If, however, there is a need to sum more than 2(32-n) at a time, we can increase the overflow limit by increasing computation power available for decoding. We distribute the computation to multiple machines to execute the decoding in parallel. In practice, HE is normally used on numbers that are already aggregated, so the 2(32-n) number limit is not a significant barrier.
As an example, imagine a decrypting system that could only handle numbers up to 1,000 in base 10. How would you add the numbers, 236 and 56,798? While such a decrypting system can only handle numbers up to 1,000, we are not hampered by this limitation. The number 236 can be represented as a vector of (6,3,2,0,0). The number 56,798 is represented as (8,9,7,6,5). Adding the two numbers, we get (14, 12, 9, 6, 5). We can decode each of the components of that vector because they are all less than 1,000. We can then reexpress this result as (4,3,0,7,5) and finally decode this as 57,034. With this approach, we can add up to 100 numbers at a time and be sure that none of the individual columns exceed 1,000 and thus our cryptosystem will work.
What if we need to add 200 numbers? In this case, we can choose a small base (e.g., from 216 to 24) and raise the limit beyond 1,000 by distributing the decoding in parallel. The size of the base determines the amount of brute-force computation required. By breaking the addition into a series of smaller problems, it now becomes possible to use standardized encryption algorithms and brute-force computation to solve the problem. We also leverage existing standards to make it practical to perform arbitrarily complex sum operations.
VIA has incorporated this HE addition algorithm into its TAC™ platform, known as Homomorphic Encryption Algorithm on TAC™ or HEAT.
To benchmark the system, we compared execution times for HEAT versus Microsoft SEAL, a popular open source HE library. The simple benchmark consists of recording the execution time to encrypt, add, and decrypt up to 80,000 integers. The cryptographic parameters for the ECC were chosen to match the same level of protection as SEAL.
The graph above shows that HEAT is roughly twice as fast as SEAL.
At VIA, we’re excited to have found a “no-tradeoff” solution for HE addition that has wide applications. We are already using HEAT to enable encrypted benchmarking of data across utilities as part of our GDAC™ program. We are also looking into using HEAT for training a federated deep learning model.
HE is a rapidly developing field. VIA is increasing its dedicated resources to improve its HE implementation including exploring lattice cryptography to meet post-quantum computing requirements and extending HEAT to enable homomorphic multiplication.
https://www.solvewithvia.com/wp-content/uploads/2020/08/Lets-TAC-About-It-6.png6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-08-25 12:42:232022-03-16 08:28:03Let’s TAC™ About It: Homomorphic Encryption Algorithms on TAC™ (HEAT)
For the fifth installment of our blog series, we will cover TAC™’s k-anonymity functionality. So, are you ready to TAC™ about it?
Regulators are increasingly demanding that utilities release their data to third parties to support a wide array of clean energy initiatives. At the same time, regulators are also mandating increased information privacy and security requirements like with CCPA and GDPR.
The data about a consumer’s energy behavior can provide enormous insights for efficiency but, of course, they can also reveal private details like vacation habits, income levels, family size, etc.
How can utilities balance these two reasonable but competing requirements?
VIA has implemented a k-anonymity function to handle this use case. The essence of k-anonymity is to segment the data in such a way that similar consumers are in groups that are both big enough to hide an individual consumer’s behavior and small enough to be meaningful and useful for analysis and creating energy efficiency programs.
The above images show two normally distributed variables with 1,000 points. Each cross indicates one point. There can be several points at each location on top of each other. The second image shows the points grouped with at least 15 points within each group. The groups are smaller and more frequent in the middle because there are more points located in the middle due to the normal distributions.
K-anonymity algorithms are sometimes referred to as Mondrian algorithms due to the groupings resembling the paintings of Piet Mondrian.
Choosing an optimal grouping size is not just a challenge, it is actually an NP-hard mathematical problem. TAC™ now provides a simple function to allow utilities to implement k-anonymity groupings to meet both regulator constraints. When a utility chooses this function, any request of data must meet specific group size constraints. If not enough data exists to create a set of data that would maintain data privacy (e.g., only one consumer meets the specific request), then the utility does not provide the data. Similarly, if a data request is made where there is more than enough data to maintain an individual’s privacy, smaller groups of data will be created to allow for more targeted analysis.
Utilities facing regulatory constraints are excited by the opportunity to have an automated means of ensuring data privacy. As a sign of growing interest in the area, the utility non-profit EPRI (Electric Power Research Institute) is also facilitating a working group to test new solutionsto the regulator dilemma.
https://www.solvewithvia.com/wp-content/uploads/2020/08/Lets-TAC-About-It-5.jpg6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-08-17 12:58:102022-03-16 08:28:09Let’s TAC™ About It: K-Anonymity
For the fourth installment of our blog series, we will cover TAC™’s digital watermarking functionality. So, are you ready to TAC™ about it?
For centuries, watermarks have been used to verify currencies and important documents and to discourage counterfeiting. Today, watermarks are used for the same purpose to protect music, photos, videos, software, and other digital data.
The main goal of our TAC™ platform is to keep data private by not transferring data. Having said that, there are times when it is necessary (e.g., for audit or regulatory purposes) to transfer some data between trusted parties.
At VIA, we use watermarking to enable data auditability when data owners choose to transfer data. For example, two organizations can receive identical datasets. If it is later found that data has been leaked, the data owner can review the leaked data and trace it back to the original analyst that requested it.
TAC™ automates the process of maintaining the provenance of data. As long as the data was transferred using TAC™, no additional steps are required by the data owner and there is no noticeable difference to analysts.
Watermarking is a simple and automatic method to provide auditability to data.
https://www.solvewithvia.com/wp-content/uploads/2020/07/Lets-TAC-About-It-4-1.jpg6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-07-29 07:00:532022-03-16 08:35:09Let’s TAC™ About It: Digital Watermarking
For the third installment of our blog series, Let’s TAC™ About It, we’d like to introduce you to SWEET, a built-in function on VIA’s TAC™ platform. So, are you ready to TAC™ about it?
SWEET stands for Smart Wrangling Engine for Extraction and Transformation. Check out the video (or the transcription that follows) to learn how we use AI to wrangle data!
Below is a transcription of the “Introduction to SWEET: AI for Data Wrangling” video.
Analysts need to transfer data from a spreadsheet to a database, often known as data wrangling. The analyst usually identifies a rule or multiple rules (such as: column x is data, row 5 is a header and can be discarded, and so on). The analyst then writes code to execute that rule. This works well when rules are easily identifiable. In most cases, however, this is incredibly time consuming.
Data scientists spend more time wrangling and cleaning data than on analysis and AI. The problem with that is analytics insights are of the highest value, but get the least amount of resources. The big leap in AI is being able to process information without humans writing all the rules.
As an example, computer vision is used to identify a dog. Think of all the varieties of dogs and all the possible variations in context that those dogs could be in. There isn’t an army of people large enough to write rules to identify a random dog in a random photo.
And yet, AI can.
VIA’s approach to data wrangling is to use some of the exact same AI algorithms used in image recognition. This works across a much wider variety of contexts and spreadsheet or file formats. Let’s take a look at how SWEET works.
Here’s a spreadsheet. What SWEET is going to do is use a number of different machine learning algorithms to automate the process of getting the information into a database format.
The first model uses machine vision to map out the spreadsheet. Purple represents blank space, green is headings, yellow is actual data. Once that model has run, there’s a second algorithm that takes a look at the content.
The second algorithm skips over the purple. It looks at the green (which is the headers) to know which column to write where in the database. Finally, it would take a look at the yellow area to write the data to the database.
A third machine learning algorithm determines which column is derived from other parts of the sheet. For example, a total column is just the sum of the other columns and may not be necessary to write to the database. The third model separates these derived columns from the raw data.
In this example, the ACCOUNT column turns out to be the total of the other columns added up. It could be difficult for a human to understand immediately, but one of our models does this instantly.
SWEET’s approach works irrespective of the format. The model doesn’t have to be re-trained when it comes across spreadsheets that are new or in different formats.
So, what’s new and different here? AI algorithms have been evolving quickly. Many of the models that we implemented didn’t exist just a few years ago.
The other insight is that we broke the “convert this spreadsheet into a db” problem into multiple steps and have a different AI algorithm for each step.
Combined, SWEET, a built-in function in VIA’s TAC™ ingestion engine, helps make processes that used to take analysts days to do manually and makes them instantaneous.
Update: July 31, 2020
The image below shows the steps SWEET takes to transform raw files into a standard format.
https://www.solvewithvia.com/wp-content/uploads/2020/07/Lets-TAC-About-It-3.jpg6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-07-22 08:00:292022-11-30 05:24:50Introducing SWEET: AI for Data Wrangling
Through a Q&A-style interview, you will hear from VIA team members about things like a typical day at the office and favorite foods.
What does a typical day at VIA look like for you?
Typically I start my morning with coffee and code review. At VIA, we have a strong culture of peer review to help us learn from each other and improve our work. Reviewing my colleagues’ code is a good way to see all the amazing work my colleagues are doing, and setting aside this time makes sure I provide them timely feedback.
Once I’ve finished my coffee, I like to take a morning walk (before we moved to remote work due to COVID-19, this was the time I spent commuting to the office.) The rest of the day varies depending on the project I’m working on: it could be writing a software design proposal for a new feature, wrangling data, or automating deployments.
I spend a lot of my day working with my colleagues to solve technical challenges. Often this is pair programming, but sometimes it’s designing the solution together on a (virtual) whiteboard. I really like having the opportunity to work on such a variety of tasks.
What’s something you have worked on at VIA that you are most proud of?
I was the technical lead on VIA’s 30-Minute Pilot and am really proud of how much of a difference it has made for our partners. They were able to get valuable data analysis in less than 30 minutes, while learning about VIA’s privacy-protecting software.
What’s your favorite VIA memory?
At our last company-wide All Hands event, we did a scavenger hunt in the underground city of Montreal. It really fit VIA’s culture: we were split into teams, and each team had to collaborate to solve the puzzles and at the same time, we were competing to solve more puzzles than the other teams! I got to know a little more about Montreal, and got to collaborate with colleagues that I don’t work with day-to-day.
If you were given an extra hour in your day, what would you spend it doing?
That’s a tough one. I would probably practice painting. I love creating things, and that passion is not limited to software. I love the bright colors and rich textures of oil paints, and the challenge of exercising my creativity in a totally different way than I do at work.
What’s your go-to food?
Tacos! Since moving to Montreal, I have learned how to make corn tortillas from scratch, which is actually much easier than I thought it would be.
What’s something everyone may not, but should know about working at VIA?
VIA’s twice yearly All Hands events are held in an “unconference” style: all team members submit and vote on topics just a couple days ahead of time. Because everyone has input into the topics, and the fact that there’s not much time for session leaders to prepare a lot of talking points, the sessions focus on discussion, collaboration, and brainstorming.
https://www.solvewithvia.com/wp-content/uploads/2020/07/Meet-the-Team-Ashley-e1594669273532.jpg4201211viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-07-08 15:02:552022-03-16 08:35:31Meet the Team: Ashley DaSilva, Team Leader, Product Development
For the second installment of our blog series, we will cover VIA’s Models for Private Data (MPD) service. So, are you ready to TAC™ about it?
Companies are awash with data. To make sense of the growing data volume, AI models continue to proliferate in number and improve in performance. Even better, many of the best models are open sourced. Leading companies like Amazon, Google, Microsoft, and others have made freely available AI models for everything from facial recognition to text analysis.
So, why aren’t more companies taking advantage of these models for analysis of their corporate data?
One of the top challenges is data privacy. Many companies are fearful of sending their data to the cloud or to an external AI provider. Data and AI regulations from Europe (GDPR) to California (CCPA) are complex to navigate. Even without government oversight, companies worry about the reputational cost of a real or perceived privacy violation.
VIA’s TAC™ platform solves this with its Models for Private Data (MPD) service.
TAC™’s MPD service makes machine learning models available for download and easy incorporation into data science workflows. This has the dual benefit of eliminating the need to send data outside the organization or to the cloud and also the need to have internal AI experts to run state of the art machine learning models. The result is that data is kept private and secure.
As an example, imagine a company capturing images about their equipment to identify corrosion. Through TAC™’s MPD service, a data scientist can choose a corrosion analysis model (e.g., a pre-trained TensorFlow algorithm) and easily incorporate that model into their workflow. The MPD service accomplishes this by using containers to make the models easily accessible through their Python scripts. The models use input from their local databases and return a list of predictions locally. Data remains on premise in their local VPC at all times. This is one way that we keep data private across our Global Data Asset Collaborative™ (GDAC™) with multiple energy utilities.
Stay tuned for a video demo of this service in the coming weeks!
https://www.solvewithvia.com/wp-content/uploads/2020/06/Lets-TAC-About-It.jpg6221438viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-06-30 09:51:432022-03-16 08:36:03Let’s TAC™ About It: Models for Private Data Service
We’re back at it – this time giving you some laughs from our remote team. See our video below for the funniest moments since working remotely.
https://www.solvewithvia.com/wp-content/uploads/2020/06/Remote-QA-2-.jpg4231211viahttps://www.solvewithvia.com/wp-content/uploads/2018/02/VIA-Logo-header-300x101.pngvia2020-06-08 14:51:412022-03-16 08:36:20Need a laugh?