What is scribe logs roku

What are your top 10 blocked domains?

My top ten blocked domains are:

  1. scribe.logs.roku.com
  2. device-metrics-us.amazon.com
  3. e.crashlytics.com
  4. app-measurement.com
  5. e.reddit.com
  6. graph.facebook.com
  7. news.iadsdk.apple.com
  8. www . google-analytics.com
  9. iadsdk.apple.com
  10. msmetrics.ws.sonos.com

What are yours?

141 Upvotes

For YouTube, a number of reasons:

1) The recommendation algorithm provides entirely negative value for my use case:

1a) I have a very small set of videos I regularly enjoy watching, from a select group of channels. Recommendations tend to direct towards videos I have no interest in at best, or, more frequently, are actively irritating.

1b) In the rare case the algorithm happens on a video I might enjoy, it is almost always something I have already watched. I don't know if it entirely fails to take view history into account while recommending or if it's simply a matter of my small target interests, but either way, I do not remember the last time I intentionally followed a recommendation.

2) I often leave a video on either in the background or on a second monitor. When enabled, YouTube autoplays regardless of if the window is focused, meaning the obnoxious recommendations break my workflow by forcing me to locate and stop the video.

3) YouTube fails to normalize audio. Therefore videos autoplayed after a quiet video have the tendency to blast out my eardrums.

I also find autoplay a rather exploitative way of increasing viewer retention in general - it abuses the psyche and addictive nature of the internet for the sake of profits.

Note: I ran some searches on this problem and didn't notice anyone suggesting this solution, so I figured I would share it. If it's been done before and I just missed it, then I apologize.

Preface
There seems to be some apps/devices that are a bit spammy if they get blocked. Notably, roku smart TVs and related devices are a big offender, but I've also noticed that some app on my wife's smartphone apparently likes to get noisy when it doesn't get its way.

The two URLs I've experienced as major offenders are:
scribe.logs.roku.com (roku devices)
api2.branch.io (smartphone apps)

Since there's probably more out there than just the two as offenders, I thought I'd share my work-around to help reduce how chatty they are, just in case it helped someone else. Maybe it's not that big of a deal to you, and you just whitelist them. But I'm stubborn, and I refuse to let them win, so I've come up with an over-engineered 'solution'.

Concept
The concept is to make the device or app think it has a good domain name, and then waste time trying to connect to something that's not real. Normally, the pihole software replies with a 0.0.0.0 response telling the device that it couldn't be found, and so the device requests again and again - as though it's throwing a tantrum until it gets its way.

What to do
Practically speaking, here's what you do:

  • Set up a custom DNS entry under Local DNS --> DNS Records in the web interface for scribe.logs.roku.com (or whatever offending URL you have)
  • Point it at an IP address that you know will never resolve to anything real. (I chose 192.168.192.168)
  • Then whitelist that domain so that it resolves to the new IP instead of getting a blocked message.

I also like to set a block in my router for that domain as well, so I know it's not going to leave my network, just in case the device tries to bypass my pihole entirely, but that's a separate issue.

EDIT:---------------
I recently discovered a better way to do the above, using Pihole's regex functions. Here's two blacklist examples to get you started:

scribe\.logs\.roku\.com;reply=192.168.192.168 api2\.branch\.io;reply=192.168.192.168

Note that you can use the ;reply= notation to reply with any other kind of blocking message as well, making this method far more versatile than my previous one. (ex: "scribe.logs.roku.com;reply=nxdomain" would reply with the nxdomain blocking mode for queries matching "scribe.logs.roku.com".)

For information on the ";reply=" regex options, look here: docs.pi-hole.net/regex/pi-hole/


For information on blocking modes, look here: docs.pi-hole.net/ftldns/blockingmode/
---------------------End EDIT

After running this for a couple of weeks, I've noticed somewhere between 1/4th and 1/2 the number of requests to these domains on my network, (depending on time of day, how much the devices were used, and other factors) and the massive spikes causing my roku to get a DNS timeout have stopped.

Potential Improvements The only thing I can think of that would improve the functionality of this is if you could direct it at a web server that would always respond with an HTTP 503 message. Ideally one with a REALLY long delay specified in the Retry-After header. This would hopefully make the device think it successfully connected, but that it just had to wait a bit. Hours if we're lucky. But even just a few minutes of wait time would put a huge dent in this spam.

The obvious roadblock to this is having a trusted certificate for a domain you don't own so that the device will easily connect over SSL to get the message. There may be a way to trick a device or app into using plain HTTP instead, however, so I wouldn't rule this out entirely.

It would also be cosmetically nice if there was support for blocking certain URLs in this way baked into pihole. As it stands, these URLs now show as ALLOWED when I know they're being blocked. But that's a minor cost for saving my network from an internal DDoS caused by these spammy devices. I'll put in a proper feature request for this if this kind of rude behavior becomes more common in modern apps & devices.

Maybe some of the geniuses in here can improve on this technique further? What do you guys think?

Update 1: (12/16/2018) – GitHub Repository Made Public Here.

Update 2: (12/16/2018) – Added a new analysis as /u/Anchor-shark within the /r/pihole subreddit mentioned I should take a look at a Roku that does not have the logging servers blocked. I have done just that.

This is a work in progress. It’s not perfect but it’s just starting to get cool and I’m digging deeper! I think this is going to be the first post in a series. I say that because I need to get my hands on some older hardware and there are some other gears moving as well. Anyhow, here’s the post.

A while back (years ago), I added a PiHole to my network. The thing is a damn workhorse! If you don’t know what PiHole is, well, you’re wrong and you should! Long story short, it’s a network-wide add blocker with a ton of features. But most importantly, it has lots of color and looks pretty.

What is scribe logs roku
PiHole Web GUI

Anyhow, I was recently looking over the data on my PiHole and noticed a serious amount of traffic coming from my Roku’s. Before I start getting into the specifics, let me first describe the systems on my network.

I own 3 different Roku’s. All of which are 1-2 years old which is important for a few different reasons.

  1. They’re all running Roku OS v8+
  2. Old features such as TCPdump are now unavailable.
  3. Secret menus contain less functionality.

The Roku’s are all on a 192.168.1.x/24 network. This isn’t massively important but, it’s worth noting.

Roku Traffic Analysis

PiHole was showing that a large majority of all the traffic on my home LAN was coming from my three Roku devices. This isn’t too surprising since they’re streaming devices and at any given time one or two of them are active (wife, kids, etc.).

I still decided to investigate and take a deeper look into the data to see what the Roku’s were actually doing. First, let’s just take a look at the data within the PiHole Web UI.

What is scribe logs roku
Top Blocked Domains

What is scribe logs roku
Top Clients for Blocked Activity

The traffic displayed int eh images above are DNS queries that has been blocked and queries that have nothing to do with my streaming services (i.e., Netflix, Amazon, etc.). But still, those are only the blocked domains that are being seen so, a deeper look was necessary.

The next logical thing to do is to pull the logs from the server and start to parse them. The problem is PiHole rotates the logs every 5 days. So before you can jump right in, you need to change the logrotate configuration. I changed mine to rotate every 100 days. Full disclosure, the data that I will present here is from a 14 day analysis. I don’t expect a massive difference but, I thought I would put it out there.

I waited 24 days, pulled the logs, and wrote some python to strip the logs for the data I wanted. My initial criteria to narrow the logs down to a manageable size was:

  • Only log entries with the DNS request coming from the Roku IP’s.
  • Only the Date, IP, URI attributes shall be parsed.
What is scribe logs roku

The logs themselves are not in the best form so, first things first, translate the data I want to a CSV and store to disk for later parsing. Also, this makes it SOO much easier to ingest the data into a pandas dataframe for analysis.

Once the logs are in CSV and in a dataframe object, we can them parse out the following:

  • Log entries that have a Roku logging servers listed as the URI.
  • Log entries for logging servers on a per IP basis for individual analysis.

This left me with several csv’s:

  • all_logs.csv – Contains all logs parsed from the 14 log files from the Roku IP’s.
  • roku_logs.csv – All log entries that are *.logs.roku.com
  • <ip>.csv – Three logs segregated by IP subject to the roku_logs.csv

My main goal with this initial analysis is to determine how much traffic compared to all traffic do the Roku’s generate on my network and how much of that traffic was Roku logging traffic (i.e., not streaming traffic). The last thing is a differential time analysis. That is, how often are the Roku’s beaconing out to the logging servers.

Analysis Results

What is scribe logs roku

When you initially look at the logs, it seems that most of the Roku’s beacon out every 30 seconds to their logging servers. Sometimes, well most times, it’s multiple beacons every 30 seconds to different servers. Here’s an example:

What is scribe logs roku

24 Days of aggregated data

  • Roku overall traffic made up 34% of all traffic on my LAN
  • Roku direct logging traffic made up 14% of all traffic on my LAN
  • 192.168.1.58:
    • Total Number of Logging Records: 115,594
    • Beacons on average every 18.69 seconds over 24 days
  • 192.168.1.99:
    • Total Number of Logging Records: 129,408
    • Beacons on average every 16.69 seconds over 24 days
  • 192.168.1.209:
    • Total Number of Logging Records: 149,977
    • Beacons on average every 14.40 seconds over 24 days

So what does this mean? Well, it means that on average, a Roku is logging information about you and your family about 380 (2-4 sDNS requests per 30-40 sec) times per hour and 8,800 times per day, give or take a few hundred.

Okay Okay but, what are they logging? Well, I started to attack this problem and the first logical step is to look at the Roku’s privacy policy. So, let’s take a look.

Update 1: Roku Logging Allowed 

What is scribe logs roku

Since this project came from my PiHole logs, I thought I would get some internet constructive criticism from the /r/pihole subreddit. You can view the thread here. One of the redditors made a really good point that basically stated that the Roku’s are effectively freaking out because their DNS requests are getting blocked. As such, the frequency is subject to the blocking and not the true nature of an active Roku whos DNS requests are not getting blocked. 

This is a really good point as this very well might be the case. So, for the last 4 days, I have allowed all Roku logging traffic on my LAN. The PiHole logs still capture the DNS requests and therefore, the logs still maintain a valid record of unblocked requests. 

Before we look at the data, I want to be as transparent as possible. I have made some adjustments on my timing function. Whereas I was originally looking at only unique timestamps per IP and then obtaining the time differential of the datetime objects via pairs, I am now simply taking the number of records (DNS Requests) and dividing them by the total number of seconds. The total number of seconds is determined by taking the last record in the DELTA_DATES array (i.e. DELTA_DATES[len(DELTA_DATES)-1]), which is a DateTime object, and subtracting it by the most recent date (i.e., DELTA_DATES[0]). I felt that not only is this much simpler but, it’s more representative as I am no longer just measuring unique records. I have edited the initial results to reflect the new changes. 

The Data

For the last 4 days, here is the information:

  • Roku Overall Traffic Made up 47% of all traffic on my LAN.
  • Roku Direct Logging Traffic made up 9% of all traffic on my LAN
  • 192.168.1.58:
    • Total Number of Logging Records: 17,694
    • Average Beaconing Time Interval: 18.19s
  •  192.168.1.99:
    • Total Number of Logging Records: 3,541
    • Average Beaconing Time Interval: 90.72s
  • 192.168.1.209:
    • Total Number of Logging Records: 6,439
    • Average Beaconing Time Interval: 49.87s

What is scribe logs roku
Weekday Analysis by Device

This is very interesting to me. The Roku that gets the most use by far is the living room (192.168.1.58) because it’s connected to our large 4K TV and is at the center of everything in our home. This guy is in use pretty much all day when we are at home (i.e., Music, Netflix, Sling, etc.). So it would seem that the more the system is being used, the more it is going to beacon out. It’s also worth noting that the other two Roku’s are not used as much. Especially the basement system as that’s really only used during parties. 

If we look specifically at the Living Room Roku and breakdown all traffic over the 4 days by hour, we can easily see that right away in the mornings (breakfast, news) the logging spikes and gradually goes up and down over midday and night suggesting that Roku logs a ton more ( every 18.19 seconds) if that system is in use. 

What is scribe logs roku
Living Room Time Analysis

The data seems to support Reddit’s point that Roku’s will phone home more if they’re being blocked, however if you are using those Roku’s frequently, it’s a moot point as it seems they log just as much if not more.

Let’s put my assertion to the test. The Living Room analysis above supports my assumption so, if we look at the Master Bedroom, where the Roku gets used before going to bed, we should see a spike around that time. And we do.

What is scribe logs roku
Master Bedroom Time Analysis

I think the big takeaways are:

  • The system log activity is directly correlated to use. This is probably objective for everyone.
  • Mild/Medium use of a Roku system generates just as much traffic as blocking DNS requests coming from Roku. We see a minor change in overall traffic from the Living Room host when we allowed all logging (0.5% difference).
  • For systems not is regular use, the DNS logging traffic decreases by 50-70 seconds per beacon if traffic is allowed supporting Reddits point that blocking traffic is, therefore, causing more traffic. Again, if you use a system frequently, it will generate just as much traffic if you were to block the DNS requests. 

What’s Roku Logging?

I’ll list a few of the things but first, here is the Roku Privacy Policy.

  • Name
  • Email Address
  • Postal Address
  • Phone Number
  • Birth Date
  • Demographic
  • Social Media Accounts
  • Shipping Information
  • Purchase information
    • Web Cookies
    • Roku App Purchases
    • Gift purchases
  • Credit Card Information
  • Personal Information on friends/connections:
  • IP address
  • Operating System type
  • Operating System Version
  • WiFi network name
  • WiFi networking connection metrics
  • Web Cookie Data

The list goes on and on and on. But reading a privacy policy just isn’t sexy. So, I’ve pulled a few PCAP’s from my router  which, didn’t amount to much other than mapping the AWS buckets.

My next attempt to pull data from the logging PCAP’s was to DNS and ARP cache poison one of the Roku’s while running an HTTPS proxy and a self-signed certificate. This just ended up in a CA authenticity failure which, was to be expected. Maybe SSLStrip could work? Not sure yet but, this might not be the right path.

Future Research

Here is what I am currently working on in attempts to get more information all together.

Roku’s used to have a TCPDump utility when you enabled developer mode. All of my devices have been connected and have auto-update on. There’s also no way to revert the box to an older OS. However, I think I have a good lead on a system that has not been connected for a few years and might be interesting.

Roku’s developer API and Brightscript:

Roku uses their own programming language called brightscript. The API is pretty well documented and it’s very simple to enable developer mode via secret menus and start pulling XML information from the system. The issue is there is no direct contact to the underlying OS (Which is Linux) with the exception of a telnet shell with access to the free command. And trust me, I’ve tried all sorts of command injection with that!

What is scribe logs roku
Telnet Free Command

Within the Roku Developer documentation though it does talk about the way brightscript applications are sand-boxed and have limited access to system functions. The brightscript might be a dead end but, it’s worth a shot so I will build a few basic apps and see what information I can get from the system or the effetive application hypervisor.

Another interesting part of the Roku External API is the capability to send remote commands to the system via HTTP GET requests. By remote commands I mean the literal Roku Remote (Home, Netflix, Back, Up, etc.). This is interesting because it means that it may be possible to enable developer mode and manipulate settings without physical access to the system. So, if there was a possible exploit vector via a Brightscript application, that path to exploitation could hypothetically be automated.