I’ll continue to update this throughout the next two days. Feel free to issue a pull request if you’re also here at the conference and want to add to this post.
Location: Open Networking User Group (ONUG) at Columbia University
ONUG currently has 6 working groups:
- Virtual Network Overlays
- Common Management tools across network, compute, and storage
- Network State Collection, correlation, and analytics
- Traffic Monitoring and Visibility
It is interesting and awesome to see that half of the working groups are all about Day 2 operations and management of networks. This is exactly what’s needed in the industry.
Creating Business Value with Cloud Infrastructure
Speaker: Adrian Cockcroft
- Developers don’t need any of that referring to NSV/NFV.
- 2009 developed the Cloudicorn, took knowledge gained to Battery
- Docker wasn’t on anyone’s roadmap for 2014. It’s on everyone’s roadmap for 2015
- 2014 was the year that Enterpises finally embraced cloud and DevOps
- Optimizing from IT cost to delivery and speed - Nordstrom - ended up yielding lower costs
- Product IT reports into the business
- Director is the highest Corp IT title
- Immutable microservice deployments scales
- If your QA team is saying there are too many bugs in a release, you aren’t releasing often enough - don’t you remember “CHANGE ONE THING AT TIME”. If you are doing 6 features in 6 weeks, try 2 features in 2 weeks!
- If you haven’t read the Phoenix Project, get it now and read it.
- Onto “how is that network managed” referring to a large global network with 1000s of devices?
- SDN? Nope!
- It’s with DNS!
- Built A tool to do an automated DNS management across multiple vendors
- Big flat HTTP network
- Same for public and private API access
- State of the art in networking (SDN) is VPC - this is the real competitor for Enterprise network vendors
- Adrian is interested in building on top of the public clouds and extending them (including from the perspective of network)
- Also recommends reading LEAN ENTERPRISE and Building Microservices
- Go, go, go - 80% of the projects Adrian has seen in the past 6 months have been written in go
SD-WAN Working Group Test Results Presentation
Companies: Talari, Glue, Viptela, Cisco, Riverbed, Silver Peak, VeloCloud
- Ixia heavily involved in testing each solution
- Co-authored test plan
- Certified the results
- Tests were run in vendor labs - Ixia observed remotely
- What was the testing about?
- SD-WAN architecture diagram shown: MPLS and INTERNET into each site (multi-site topology)
- All vendors used same test plan, design, and simulated the same types of traffic (http, voice, etc.)
- Tested with ISR-4451 and CSR1000V
- Passed all tests
- Three items are noted in yellow
- Cisco does not provide app/perf stats natively- they use a 3rd party partner LiveAction to do this
- Northbound API is on the roadmap - but LiveAction has one that can be leveraged
- FIPS validation is coming soon, not there yet.
It will be interesting to see the dynamic between Cisco and GLUE going forward as GLUE has been pretty much married to Cisco providing a similar solution to what Cisco is now trying to do on their own.
- Key to understand how this integrates into an existing network
- Can integrate and place Viptela vEdge devices north of the existing L2 switch or L3 device without modifying core L2/L3 functionality
- Offers virtual and physical appliances
- Tested with ISR-4451 and CSR 1000V
- Cisco’s IWAN architecture that Glue automates
- Glue automates best of breed architectures
- Operate at higher level in the stack compared to the other vendors that ran tests
- Northbound API is still pre-release
- < 2 min for full SDWAN/IWAN feature set using 3 ZTP methods
- FIPS coming this year (nor next)
- Open northbound API coming soon
- Declined to test Zero touch deployment at branch
- Has been in this market since 2007 (before SD-WAN was a thing)
- Tested physical and virtual appliances
- Tested dual internet and MPLS/INTERNET into physical and also MPLS/INTERNET into virtual
- As other vendors, FIPS and Northbound API were yellow
- Tested re-routing voice flows during brownout
- No reset to the application
- One-click business policy
- Ensure compliance, security, and app performance
- Focused on WAN optimization
- Accomplished this by building an overlay
- VMs are first class citizens!
- Over half of their deployments are running as VMs
- Example: Specific policy by application such as for Office 365 vs. Facebook
- FIPS is yellow - future?
- Northbound API is coming as with nearly every other vendor
- Uses IPsec for transport
- Hitless failover
- TCP sequence number stays during failover over internet/mpls links
- Tested with Steelhead devices
- Steelhead controller is where policy is defined and pushed out from there
- FIPS is certified!!!
- Northbound API, L2/L3 interop with directly connected switch/router, CPE in p/v form factors on commodity h/w, and dynamic traffic engineering are all yellow
- Offers a Python based scripting environment - SDK on GitHub
Blows my mind so many vendors don’t have the northbound API in their shipping product.
Nuage Networks Tech Field Day Event
- Started in DC networking, extended to the branch, announcing something new today!!
- It’s the velocity of IT that really matters right now
- Starting with Business Requirements and going through Dev/Test, Deployments, Ops, and Customer Feedback
- Dev/Test and Deployments covered in previous sessions, today the announcement is geared to Ops including Overlay and underlay correlation, service navigation, and fault management
- Nuage is talking about the Open Network Approach for assessing service health compared to the two other options: “hardware centric” and “service centric” views
- So what is the solution?
- The route monitor - peers with the physical network AND the Correlation Engine (aggregates p + v)
- Listens to all routing protocols updates, at least the LAYER 3 topology
- Peers via OSPF and BGP
- Uses SNMP to understand port status MAC table, etc.
- Analyze faults
- Many of the components were pulled from existing ALU SP products
- Example: Nuage agent on hypervisors listening to libvirt events
Top notch announcement by Nuage today. Have to say the product that was presented and shown could literally be a standalone product regardless of what is deployed in the virtual and physical networks. Great job, Nuage. Kudos to their UI designers!
The Great Debate
Lew Tucker is representing Open Source and Charlie Giancarlo is representing Proprietary Software
- Do you trust open source?
- Charlie: answering the question as it pertains to security. It depends on the number of “investigators.” Expectations need to be set properly.
- Lew: Agrees, but adds that software gets better overtime. Should not expect open source to be free.
- Charlie makes a great in that the biggest consumers of open source are companies selling closed source products.
- Discussions around open source governance: vendor-led open source projects vs. community-led
- One of the differences is the origin of the software - was it initially developed by Yahoo? Or was it initially a blueprint (no code) such as OpenStack?
- Thoughts of orgs that don’t have a governance model, but have a benevolent dictatorship such as OVS - “anything could work”
- Role of ONUG: proprietary user groups have been the norm, but it should help in getting users more options and more choice
- Missed a lot here, but the general consensus is, it all depends, who cares, and will a company provide support needed for open source?
- Would have been nice if this was a true debate
- Open source is a success if you are using it
NEC Tech Field Day Event
Here we are back at Day 2 at the Open Networking User Group. Getting started with NEC in a Tech Field Day event!
- NEC has a long history in the SDN market starting with working with the Stanford team back in the 2008 time frame
- They have a $3,000 SDN starter kit that includes HA licensing for their controllers and a license for up to 5 switches. Switches purchased separately from NEC or 3rd parties (including virtual switches). OF support required.
- 90-day free trial
- NEC participated in the ONUG testing that validated product support for Network Service Chaining
- Focus on orchestration using third party virtual network functions
- Policy can be global or local
- Device can be located anywhere in the network - service chaining using flow filters will ensure the traffic gets to where it needs to be
- Full set of REST APIs
- NEC declined to perform two tests in the 10 that were provided as part of the official test plan since those specific functions are provided by partners
- Partners include Palo Alto, F5, Riverbed, Radware (DDOS)
- Ixia is now presenting - they were brought in to help perform and execute the test plan for the NSV working group
- All tests were service agnostic - the point of the working group and tests were to validate the orchestration of the services, i.e. ensure the L2/L3/L4 of the network was able to redirect traffic as necessary to the appliances (physical + virtual)
- NEC went into a demo to show the NSV functionality as well as some failover functionality
- One part that is really slick is that they can show a hop by hop view of any particular traffic flow, i.e. show you in-port and out-port from every switch
- They also provide a health monitor to assist in the failover of network appliances that do not provide their own functionality (or given a design that needs it)
- Generates over 3+ Billion logs resulting in only .99% of alerts
- 99% of the alerts are handled automatically
- Quote “Engineers build robots, robots manage networks”
- Correlation Engine is key - ~1200 unique master alarms
- Use Case: memory leak
- Solving manually: lots of humans + coffee
- Automatically: Alarms generated, does the device have redundancy (can it be taken out of prod), if so, call drainer, take all traffic off device, active CPU is reloaded, standby CPU takes over, redundancy recovers (active is reloaded), redundancy is restored. Humans start looking at why it occurred?
- Engineers are focused on writing alert handlers
- Software Limitations
- Lack of programmability across control, config, and monitoring
- Topology of fabric dictates config for a given device - pushed dynamically
- Heat map: important to know what normal is - if problems, device is drained
- Najam went into an overview of the OCP Networking Project
- System Integration shop turned into an Engineering shop that is actually creating new things
- FB is on a journey to convert network engineering team to software engineering team
- Hiring more software engineers - they are now attracting more software engineers
- Everyone is writing some level of code
Town Hall Meeting: Will the DevOps Model Deliver in the Enterprise?
Moderator: Graeme Hay Panelists: Najam Ahmad (Facebook), Tim Gerla (Ansible), Marc Wooldward (vArmour), Mike Dvorkin (Cisco), Dimitri Stilladis (Nuage)
There is a name for it. It can now be called DevOps - it inherently is more tangible, but at the same time, there is now a multitude of APIs that can enable this. - Marc
Is it more about attitude than tools?
Scale, coupled with execution, allows for no other way to manage and operate networks (Facebook). - Najam
Speed is important, but not everyone is Facebook. For Enterprises, they need controlled speed because there are massively critical apps running over the network - it’s just Facebook. - Dimitri
Najam keeps referring to DevOps as FB building their own software features and capabilities that sit on the switch - probably a little different than the traditional DevOps conversation. This was further clarified by the panel too after the fact.
Networks have become extremely complex, but complexity must be reduced before software development and DevOps approaches are integrated with the networks.
Rather than build teams across the technology, build the team around a solution (or problem looking to be solved). Platform team that has multiple skill sets. Assign the right people to a problem and great things can happen.
Looking at the problem holistically could help you come to a solution faster and be more cost effective. Facebook built out a standalone network fabric for a back-end cluster and needed a faster sample rate for NetFlow than the switch hardware could support. The network team typically wants to trade-in their hardware and get more expensive gear to get features needed. FB looked at the problem and realized they could put an agent in the kernel of every host to do line rate sampling across the data center without ever requiring NetFlow functionality in the network. This could not have happened with silos internal to FB.