yTransit and GTFS revisited

July 8th, 2011

It’s been a long time since I last looked at GTFS. Since then, I’ve gotten tons of emails and comments on the blog post about my failed little yTransit project. A Google engineer in the Czech Republic working on their transit team even contacted me, but still nothing from actual transit companies.

However, it’s been a little while and I think there may be a glimmer of hope for this project yet. I’m guessing (hoping) that since smartphones have become increasingly popular, more people in the industry are getting a bit more interested in the technology. I called the Summit County, Colorado transit system (Summit Stage Transit) this morning and talked to the dispatcher.

I told him that I was interested in getting their transit schedule into Google Maps and he didn’t just say “uuhhhh… what?” He actually said “I think we’d be very interested in that.” He told me the person I needed to speak with wasn’t in the office at that moment, but he’ll be in later today and that I should leave a voicemail.

Now remember, I had originally contacted Summit Stage Transit way back in 2009 and they weren’t interested and didn’t return my calls. So at least this time around, I actually got a favorable response. That is huge progress!

We’ll see if this goes anywhere, but if they’re able to help me get some requirements built, I might actually be able to make this happen.

So officially, the project is still dead pending resuscitation by John at Summit Stage Transit who is supposed to return my call this afternoon.

Update: I spoke with John and I have a meeting scheduled for the July 20th to discuss their needs and requirements. This thing might happen after all.

Google APIs, GTFS

Solaris Licensing Changes: The Real Story

April 14th, 2010

As you should already know, Sun was purchased by Oracle. Not too long ago, someone noticed a licensing change on the Solaris license website. A slow rumble of rumors has been building up about what those changes mean. Well, I contacted our Sun account manager to get the definitive answer, and here it is:

  1. The old Solaris subscriptions, the way people got software support for 3rd party hardware, are no longer available for purchase. Existing contracts are honored.
  2. Solaris support now comes through a contract on the hardware (Oracle SUN hardware)
  3. The license and accompanying entitlement from the web, without a contract and without hardware, only entitle the downloader to non-commercial, non-production, or personal use in perpetuity. Production use and evaluation for production are good for 90 days.
  4. When you purchase hardware, you receive an addendum to the entitlement that grants that piece of hardware perpetual, non-transferable license and entitlement to Solaris.
  5. For hardware purchasers, this is the same (in net effect) as always.
  6. For non-hardware purchasers – 3rd party, gray market, etc. – there is no legal way to obtain a permanent entitlement or to obtain support.

Personal Use

So lets get the easy one out of the way first. Solaris is still free for personal use. So that should satisfy the 0.0001% (yes, that number is an anatomical extraction) of the Solaris users that use Solaris for non-commercial activity.

Non-Sun Servers

Let’s move on to people that run Solaris on non-Sun servers: No Solaris for you, not yours! Items 1 and 6 make it clear that there is no possible way to legally run Solaris on non-Sun servers. Period. End of story.

Sun Servers without a Support Contract

Now lets talk about people that run Solaris on Sun servers, but do not purchase a hardware support contract: Some Solaris for you, but only a little! Item 4 says (and I clarified it with them), that purchasing new Sun hardware gives you a binary license only for the version of Solaris that’s available at the time of the hardware purchase. It does not entitle you to future upgrades or updates.

Sun Servers with a Support Contract

For people running Solaris on Sun hardware with a Sun hardware support contract, your support contract grants you rights to run future versions of Solaris.

Solaris

nVidia Overscan Correction fixed in Latest Drivers

April 1st, 2010

My solution for fixing overscan on nvidia cards is obsolete! I did find out just a few days ago that my solution does actually work.

The person that I was originally helping with this problem decided to give Linux another shot. He tested it out and reported that it did indeed fix his overscan problems.

However… for no particular reason I decided to check out the nVidia settings control panel again. When I opened it up in Ubuntu 10.04, I noticed this (and tested it to make sure it works, which it does):

Screenshot-NVIDIA X Server Settings

General , , ,

Solaris ZFS vs. Linux with Hardware Raid

April 1st, 2010

I’ve had to start using Xen virtualization for a current project we’re working on. I always hate switching back to Linux servers because all of our fancy tools and scripts for automation are written for Solaris since we only have a handful of Linux servers.

At any rate, I’ve got Xen all figured out and really started to dig into Linux’s LVM for the first time. There’s some similarities between LVM and ZFS, but most noticeably LVM doesn’t deal with RAID at all. You have to set up manual Linux software RAID and put a VolumeGroup on the RAID meta-device. So I set up a nice software RAID5 device, created a VolumeGroup, and off I went.

The write performance was horrendous.

So I begrudgingly went into the RAID controller BIOS and set up hardware RAID5 and put LVM on top of that. After the installation, I decided to see how fast this was compared to ZFS raid1z (which is more or less RAID5).

The machines are identical:

  • Dual 6 Core Opteron
  • Sun STK RAID Controller (Adaptec) — 256MB cache, write-back cache mode enabled
  • 16 Gigs of memory

Here’s the results:

Linux — 21GB Write

# time dd if=/dev/zero of=/root/test bs=10240 count=2009600
2009600+0 records in
2009600+0 records out
20578304000 bytes (21 GB) copied, 146.226 seconds, 141 MB/s

real    2m26.377s
user    0m4.068s
sys     1m53.823s

Linux — 1GB Write

# time dd if=/dev/zero of=/root/test bs=10240 count=102400
102400+0 records in
102400+0 records out
1048576000 bytes (1.0 GB) copied, 2.69437 seconds, 389 MB/s

real    0m2.702s
user    0m0.108s
sys     0m2.584s

Solaris — 21GB Write

# time dd if=/dev/zero of=/zonepool/test bs=10240 count=2009600
2009600+0 records in
2009600+0 records out
20578304000 bytes (21 GB) copied, 55.3566 s, 372 MB/s

real    0m55.412s
user    0m0.913s
sys     0m27.012s

Solaris — 1GB Write

# time dd if=/dev/zero of=/zonepool/test bs=10240 count=102400
102400+0 records in
102400+0 records out
1048576000 bytes (1.0 GB) copied, 1.25254 s, 837 MB/s

real    0m1.257s
user    0m0.046s
sys     0m1.211s

837MB/s for burst writes on raidz1! ZFS is too awesome.

Here’s the controller configurations:

Linux Controller Configuration
Solaris Controller Configuration

General, Solaris , ,

Patch for the VastHTML WordPress Forum Server

March 3rd, 2010

So, I’ve made a number of fixes to the VastHTML WordPress forum server plugin. It has some pretty big bugs, and I don’t know if the project is being maintained anymore or not. At any rate, the fixes I’ve made should have been considered critical and should have been fixed long ago by whoever is maintaining it, but I digress…

I’m not going to support people trying to apply this patch. If you don’t know what a diff is and you don’t know what the patch command does, you’re probably out of luck. If you want me to fix all of the problems in this code and release it, pay me a bunch of money…

Also, the security problems in their code makes babies cry… but that’s for another day.

Lastly, to make the search actually work, you need to connect to your wordpress mysql database and issue this SQL statement:

alter table wp_forum_posts add fulltext key `text` (`text`);

Here's the patch: vasthtml-forum-server.diff

Here's what it fixes (in no particular order):

  • RSS feeds now contain the username of the poster instead of "feeds@r.us"
  • All & characters in the links have been properly changed to & as they should be
  • Page 2+ of your forums will work
  • Page 2+ of posts will work
  • The number of replies shown in the topic list is properly set to number of posts - 1
  • The title delimeter is changed from » to "|" (don't remember why i did this, but there ya go)
  • The search form/box uses HTTP GET instead of POST so your back button works without complaining about having to resubmit your request
  • You can press enter in the search box to submit
  • A $ followed by a number doesn't get filtered out
  • Apostrophes in posts/titles get their slashes properly stripped

I may have fixed other things in this patch and forgot about it. This works for me... your mileage may vary.

General, PHP , ,

Threaded/Parallel Web Crawler (or Web Server Killing Software)

January 26th, 2010

Short Version


Parallel URL Fetcher – If you want to put load on a webserver by crawling it, this is what you’re looking for. No java, no python, just a nice small, fast C program.

Long Version


It’s time to re-evaluate our HTTP caching software. At present we use Apache mod_cache (disk cache) and we’ve run into some problems.

Apache mod_cache + ZFS + millions of URLs and hundreds of gigs of cache files = bad

I’m not sure which of these guys is the culprit in this one. But I do know that when the ZFS dataset holding Apache’s cache gets to a certain size, disk I/O requests go through the roof. By clearing the cache (and freeing up that I/O), we see a good 5%-10% (extremely significant) jump in traffic.

At any rate, this prompted us to start looking into alternatives to Apache. The obvious first choice is Squid in accelerator mode. So I got Squid all set up in our offline datacenter, fixed the little things, and was ready the beat the crap out of it with web requests.

I can easily request all of our 500k+ “static” URLs, but those pesky URLs with arguments aren’t quite that easy. I needed a crawler. Something like wget –mirror but much, much, much faster.

After a lot of searching, I found a few python apps that failed to compile on Solaris, had deprecated/old dependencies, required specific python, etc. Python is starting to feel more and more like Java. Either the developers are horrible or the language interpreter is too picky to work properly (think…. JRE 1.2.5 build 1482???? no no no, you need build 1761!!!).

Speaking of Java, I also found a Java app (JCrawler) that looked perfect for what I needed. It certainly claimed to be “perfect.” It actually worked better than the Python apps that failed to build/run properly, but it didn’t actually work. It just kept spawning threads until it ran out of memory.

I was almost to the point where I thought I would have to write one myself, until I clicked on a link and a bright light from the heavens shone down on my monitor and a choir started singing in the background.

I had found the Parallel URL Fetcher. It was exactly what I needed. It was like wget, but ran parallel requests. It didn’t compile on Solaris either, but adding timeradd() and timersub() macros fixed that real quick.

I don’t think it supports Keep-Alive requests either, which would have been nice, but either way it rocked through some URLs. After letting it run for a few hours, I had my Squid server maxed out at 100Gigs of cache and ready for some I/O testing.

General

PCM Audio | Part 3: Basic Audio Effects – Volume Control

January 12th, 2010

So now we know what data is stored in a PCM stream, let’s look at some real waveform examples. The easiest is a simple sine wave:

sine wave

Now if we “amplify” that wave by 5, we’d get a much louder sound, represented by a wave that looked like this:

sine wave times 10

So if you want to increase the volume of your PCM stream, just multiply every PCM value by some number. If we had 2048 bytes of audio (remember… that’s 1024 samples since each sample is two bytes), we could amplify the stream with this type of code:

int16_t pcm[1024] = read in some pcm data;
for (ctr = 0; ctr < 1024; ctr++) {
    pcm[ctr] *= 2;
}

Volume control is almost that simple. There's two catches.

Clipping

Clipping occurs when your resulting value increases above the maximum value for a sample. So since we're dealing with signed 16 bit integers our maximum positive sample is 32767. If we have a PCM sample value of 5000 and we multiplied it by 10, the resulting value is -15536, not the expected 50000. When clipping occurs, you end up with noise in the audio. You should always check to see if the result of your multiplication would cause clipping, and if so, set the value to 32767 (or -32768) instead.

So our code above becomes:

int16_t pcm[1024] = read in some pcm data;
int32_t pcmval;
for (ctr = 0; ctr < 1024; ctr++) {
    pcmval = pcm[ctr] * 2;
    if (pcmval < 32767 && pcmval > -32768) {
        pcm[ctr] = pcmval
    } else if (pcmval > 32767) {
        pcm[ctr] = 32767;
    } else if (pcmval < -32768) {
        pcm[ctr] = -32768;
    }
}

Volume Is Logarithmic

The other catch is that volume as perceived by humans (measured in decibels) is logarithmic, not linear. Your first instinct would be to think "Well if I wanted to double the volume, I should just multiply the samples by 2." Unfortunately, it's not quite that easy.

Multiplying a value by 1 will obviously give you no amplification. So to decrease volume, you would multiply by a value less than 1 and greater than 0. To increase volume, multiply by a number greater than one. Unfortunately, I didn't pay enough attention to logarithms in school, so I don't have a clever answer as to how to implement a proper volume control, but I've found that this function works pretty well:

int some_level;
float multiplier = tan(some_level/100.0);

If some_level is set to a value between 0 and 148 or so, this will give you a rather linear sounding multiplier. 79 is almost a multiplier of 1 (no amplification). It is far -- really far -- from perfect, but it worked well enough for my needs of implementing a volume slider. Graphing that function from 0 to 148 gives you this:

volume multiplier

So to set an appropriate level, now we have a volume slider at 39 (roughly half volume):

int16_t pcm[1024] = read in some pcm data;
int32_t pcmval;
uint8_t level = 39; // half as loud
// uint8_t level = 118 // twice as loud (79 * 1.5)
float multiplier = tan(level/100.0);
for (ctr = 0; ctr < 1024; ctr++) {
    pcmval = pcm[ctr] * multiplier;
    if (pcmval < 32767 && pcmval > -32768) {
        pcm[ctr] = pcmval
    } else if (pcmval > 32767) {
        pcm[ctr] = 32767;
    } else if (pcmval < -32768) {
        pcm[ctr] = -32768;
    }
}

I wasn't able to find a simple logarithmic slider example, so if you have one, please post in the comments. I'd love to replace my hack.

Using some simple algorithms and that function above, you could easily implement a fade-in/out effect on PCM data by stepping through all 148 possible values over a period of time. And don't worry, we'll get to "time" later in the series.

That's pretty much all there is to know about volume, in the next part of the series, we're going to discuss mixing two streams together to create one stream.

General , ,

PCM Audio | Part 2: What does a PCM stream look like?

January 9th, 2010

In Part 1, we looked at how a PCM stream is described. Once you know all of the parameters for your PCM stream, we can examine the data and put it in memory as useful data.

So, let’s assume we have a file that contains signed 16-bit little endian mono PCM. That means that data in the file is just a collection of 16 bit integers. Each integer represents one sample. So the first 9 samples in the file could be:

+------+------+------+------+------+------+------+------+------+
|  500 |  300 | -100 | -20  | -300 |  900 | -200 |  -50 |  250 |      
+------+------+------+------+------+------+------+------+------+

Each of those integers is stored in the file as 2 bytes (16-bit), so the 9 samples above take up 18 bytes of space. The value of each sample, obviously, can range from -32768 to 32767. If you take those samples and plot them on a graph, you’ll end up with a visualization of the waveform for the audio that you see in your music player.

If we wanted to read that into an array in C, we would do something like this (obviously this is pseudo-code):

FILE *pcmfile
int16_t *pcmdata;
pcmfile = fopen(your pcm data file);
pcmdata = malloc(size of the file);
fread(pcmdata, sizeof(int16_t), size of file / sizeof(int16_t), pcmfile);

Of course, if you’re dealing with large files, you probably shouldn’t read the whole thing into memory. You should buffer the data and read it in chunks at a time.

If you take that data and send it to your sound card, you’ll hear the sample being played. However, the sound card will require you to know the sample rate. If you have an 8kHz stream and tell the sound card to play it at 16kHz, it’s like playing a 33.3 RPM record at 45 RPM. For the younger crowd out there, that means it will be too fast and it’ll be high pitched… think Alvin and the Chipmunks here.

Since this is a description of the waveform, a stream of all zeros would be silence (a flat line if you graphed it).

I haven’t really explained what those samples actually MEAN though… just what they are. It will be incredibly obvious what those samples mean starting in the next post, when we get to the fun stuff: basic audio effects processing (don’t get scared… it’s actually really easy).

General , ,

PCM Audio | Part 1: What is PCM?

January 8th, 2010

It’s been a long time since I posted anything. Most of my free time has been spent working on my ventrilo client for linux project. Of course, that project adds tons of things to discuss, such as how PCM audio works. I’m going to make this a multi-part series, because there is so much information to discuss.

When I first started working on that project, I knew nothing about how audio worked. I knew a little bit about encoders and decoders, but not really the inner workings. What are they encoding/decoding? It turns out, that the answer is PCM (pulse control modulation) audio. After messing with PCM for a few months, there are a lot of things that are painfully obvious now that were confusing. This guide is meant to be an introduction to at least give you the working knowledge you’ll need to ask proper questions and perform simple tasks. So let’s get started…

If you’ve ever used a computer MP3 player, you’ve probably seen those options to display the waveform of the audio or the little bars that pop up and down showing you treble and bass levels. What those are measuring is the PCM audio as it plays it. So what does all that crap mean?

Let’s start with the basics. There’s five terms that are important to know for PCM:

Sample Rate

Real actual audio (like someone talking to you in person) is transmitted as a wave. PCM is a digital representation of that audio wave at a specified sample rate. The sample rate is measured in Hz (cycles per second) and more often in kilohertz. So when you hear someone talk about about 128kHz vs. 160kHz audio, what they’re talking about is the sample rate. If you’ve ever done integrals in calculus, it’s a lot like that. The higher the sample rate, the better your quality (at the cost of size). There is no guessing here. You need to know what the sample rate is.

Sign

Whether the data is signed or unsigned. It is almost always signed. Treating a signed PCM stream as unsigned will hurt your ears… painfully… (I speak from experience here).

Sample Size

This determines how many bits make up one sample. 16-bit seems to be the most common.

Byte Ordering

Byte ordering refers to little-endian vs. big-endian data. If you don’t know what endian-ness means, you can probably assume little endian. If you have the option to choose endian for your data, you should always choose little-endian.

Number of channels

I’m mostly going to cover mono (1 channel), but multichannel PCM is usually handled by interleaving the PCM samples. Don’t worry about this for now. Once you understand mono, stereo is easy.

Add those five things together and you’ll come up with a description of a PCM stream. For example: signed 16-bit little-endian mono @ 44.1kHz. In order to actually play audio, you’ll need to know those 5 things.

Various sound devices support various types of streams, but there’s usually a set list of sign, sample size, and endian-ness options. Different APIs use different constants to specify, but usually you’ll see them as something like S16LE (signed 16-bit little-endian) or S32BE (signed 32-bit big-endian) and so on.

In my next post, I’ll go over how those are represented in a PCM stream.

General , ,

Netflix on the PS3

October 29th, 2009

I would be remiss to not have a blog post about Netflix on the PS3. As much as I post about streaming video to the PS3 and as much as I love Netflix, I can’t resist chiming in on this one.

First of all, I don’t care what the CEO of Netflix says, but having to put in a disc to stream movies sucks. The streaming app should be an installable application that sits on the XMB.

It’s not a matter of being lazy, it is a matter of convenience. Back on the PS2, when I started working on streaming video to Sony devices using BroadQ (oh yeah… btw, i’ve been working on this for about 6 or 7 years now), it was annoying to have to load in the BroadQ disc to stream the movies. I can’t imagine it will be any less annoying 7 years later when the system has a hard drive that is perfectly capable of storing the application.

That said, I’m excited this is finally happening. There’s little doubt that Microsoft opened up the checkbook to prevent interoperability. Netflix will be available for the PS3 almost exactly one year after XBox. For games, I can understand these exclusive agreements. For third party services such as Netflix, I think it’s a dick move on Microsoft’s part. I view it as yet another good reason not to support their console.

As a Netflix subscriber, I think it’s a bad move by both Netflix and Microsoft. This should have happened long ago.

PlayStation3 , , ,