How to Have Google Index Flash Websites

*** I’ll better organize this post later, busy right now ***

Awesome. Jim Kremens from Flashcoders, started & involved in a thread about how to definitevly get Flash websites to be indexed by searching goes directly to the source, Google, and gets an answer.

Since searching the archives of chattyfig lists sucks major cheese now, I’m copying part of the thread here. There were many other good points brought up, so if you can scrounge up password #450-3b that you have remember, you can find the thread. Variety of opinions.

Anyway, good information from this email about duplicating Flash content inside an HTML file to help get the site indexed. Basically, you put the HTML within CDATA tags of XML, and use Flash to get it’s content from that static source (which of course can be generated from a dynamic one). There were fears techniques like this, of duplicating content, may be perceived as duplicating content with intent of fooling users and search engines, and thus get you blacklisted. One person responded this had worked for them on a few sites, and they weren’t blacklisted. Anway, for the meat, you’ll have to dig into the threads for the technique, but I’m posting just the 2 emails here.

——-

Hi all,

I actually wrote to the Google team to ask them some of the questions
raised in this thread. Just wanted to share their response. Note
where they say:

“The practice of creating HTML copies of these Flash pages for our
crawler is actually our recommended solutions to this kind of issue.”

That’s in agreement with what pretty much everyone on this list said,
but in direct contradiction with what the non-Flash developers here
said. Interesting how people make up their own minds about stuff…

Thanks to all of you for your ideas.

Jim Kremens

From: help@google.com [mailto:help@google.com]
Sent: Wednesday, April 06, 2005 1:19 PM
To: jimkremens
Subject: Re: [#24081437] Flash and Search Engine Optimization

Hi Jim,

Thank you for your note. The Google index does include pages that use
Macromedia Flash. However, this is a new feature, so our crawlers may
still experience problems indexing Flash pages. If you are concerned
that Flash content on your pages may be inhibiting Google’s ability to
crawl your site, you may want to consider using a text browser such as
Lynx to examine your site. If features such as Flash keep you from
seeing all of your site in a text browser, then search engine spiders
may have trouble crawling your site.

The practice of creating HTML copies of these Flash pages for our
crawler is actually our recommended solutions to this kind of issue.
If you do this, please be sure to include a robots.txt file that
disallows the Flash pages in order to ensure that these pages are not
seen as duplicate content.

We hope the information we have provided above is helpful to you. Due
to the tremendous volume of information and help requests we receive,
we are not always able to provide personal attention to questions
pertaining to individual websites. For additional information, please
visit http://www.google.com/webmasters/. Also, you may want to comb
http://groups.google.com/groups?q=google.public.support.general for
suggestions from our users and webmasters or to post a question of
your own.

Regards,
The Google Team

——-

No sweat.

What’s interesting is that they don’t say it’s OK to put hidden html
content on your Flash site. All they’re really advising us to do is
to make two sites: one html and one Flash.

And so, after much thrashing, my fellow developers here and I have
come up with a development plan that allows Flash and SEO to coexist.
Note that I work at a pretty big shop (idsociety.com) with some seious
back-end programmers. So some of what we’ve come up with may be
difficult for the average Flash developer to use. I’m hoping that
gaps will be filled in by future contributions to this thread. That
said, here goes:

1. Develop Flash site and html site that load content from the same
XML source(s). This way they can both be updated easily. Html site
can be as simple or elaborate as you like. It’s there for the few
people who don’t have Flash, for users with accessibility issues and
search engine robots. Per Google’s recommendation, include a
robots.txt file that disallows the Flash pages in order to ensure that
these pages are not seen as duplicate content.

2. Provide one entry point to your site. In our case, what might seem
like ‘pages’ in the site are just paths written by the Apache server
using mod_rewrite. (Google ‘mod_rewrite’ for more info). So, users
who click on a link in Google to come back to an ‘internal’ page in
your site are really just coming to the site’s entry point. There,
the server would typically use mod_rewrite to serve them up a page.

3. In this case, however, the server will do a Flash check. If the
user doesn’t have Flash, it’ll serve up the html page (duh). If they
do, it will serve up the Flash page, passing in the ‘url’ via
FlashVars. Again, the url written by mod_rewrite has no
correspondence to actual directories. I’ts made up. So, you can set
Flash to interpret it however you want.

4. Configure your Flash file to correctly interpret the mod_rewrite
path passed in and navigate to the appropriate content.

And there you have it. More work, to be sure, but you can give your
client the Flash content they (and you) want, it will be indexed by
the robots, and, if you build it right, users will be routed to the
correct location in the Flash site.

I’m sure some of you have better ways of doing this. And I’m guessing
Peter Hall’s way is among them. Also note that I know almost nothing
about the server-side stuff. So I’m curious to learn if there are
alternate ways to do this that don’t require dealing with an Apache
server, etc.

Kudos to Google for the swift reply.

Thanks,

Jim Kremens

——-

> And a question to your “Flash/XHTML-engine” in Flash:
> how did you combine XML content and HTML code? i´ve seen the
> HTML-sourcecode but don´t know, when you load a further HTML site into
> Flash, how you can ignore the HTML-Tags and only read out the XML
content???

“All is XML” in the source of the HTML document.
The trick is to
1) Make sure your HTML is put in a CDATA tag (so invalid structures are
ingnored)
2) Put your content in a structure you can read easily
3) “Navigate” to the right XML tag.

Again, this is my own version of Peter Joell’s thingy as presented here:
http://www.peterjoel.com/ripple/ (the slideshow does not work on my
browser…)

check out www.instantinterfaces.nl/demo/htmlparser.txt
(Use the “view source” option of your browser if the text does not appear.)

It is written for the FFIE (using an XML parser with callbacks), but it
gives you some idea.

Peter

——-

Peter Kaptein wrote
> > The trick is to publish your site both as Flash and as XHTML and include
> > your Flash-movie, then let the Flash-movie load that specific page.

I’ve put a site up using the technique as described.
The “HTML” content is presented as visible (normally you would hide it.)
Click on the links to see different “HTML” pages of this “site”

http://www.instantinterfaces.nl/demo/
http://www.instantinterfaces.nl/demo/website_falkstone_def__MNS.htm
http://www.instantinterfaces.nl/demo/website_falkstone_def__Lbeg.htm

It is still a prototype of the Flash/XHTML-engine (the menu loses it’s
“active item” when reloaded, but hey!)

Click in the menu in the site to see only the content being refreshed (and
the menu when you open another group)
As you can see in the hyperlink, the HTML page remains the same, thus
utilizing both the HTML / Google-esq findability and the strenght of the
“single page model” you can do with Flash.

Click on the hyperlinks on the bottom of the screen (scroll when not
visible) to go to another page.

Open the source of the HTML to see the “XHTML” setup.

<OBJECT classid= contains the SWF call. E.g:
"W2WSengine.swf?HTMLpage=website_falkstone_def__MNS.htm"
<NAVIGATION type="content"> contains the "XML" to build the navigation
(basically, the XML stru is scanned for the <A HREF=""> "XML" node and
passed to the menu on the left _when changed_. If the checksum of the
<NAVIGATION> items are the same as previous, no changes is in the menu and
nothing is done with it.)
<FLASHFORM type="content"> contains the XML to build the form as presented.
“FLASHFORM” is not an official thing or something, just a personal choice
for this solution.

Peter

——-

8 Replies to “How to Have Google Index Flash Websites”

  1. Another easy solution to this is to serve the text content in a NOEMBED tag. If you have a common XML source to your HTML and Flash versions, just render the XML into simple paragraph form.

    Works for us!

  2. cool, nice to see that google finally cleared that up. I’ve been using php to generate the static html pages if I detected a search engine coming to the site…if it’s not a search engine, I serve the flash movie. I’ve had AMAZING results so far, but I’ve always had that fear over my head that I may get blacklisted.

  3. I don’t think that Google can index stuff that’s included via loadMovie(). I’ve recently duplicated the content of my site ina hidden div and it looks as though Google doesn’t mind at all. I’m on the fourth page when searching for ‘rich internet applications php’ now and I used to be nowhere to be found. Yje description changed in a matter of two since I’ve introduced the hidden div.

  4. What about caching? When i search on google for a specific word, and i click the link and can’t find the word, I view the highlighted cached page, and if it won’t load then i skip the site.

  5. Thank you for sharing this nifty information with us :)

    There is no need to disallows the Flash pages in the robots.txt in order to ensure that
    these pages are not seen as duplicate content. Search engines can’t see dynamicly loaded content from inside your flash only static. But that might change in the future.

    some more info:
    http://blog.guya.net/?p=11

  6. Stupid question, but can you not create an html version with a redirect on those pages to the flash site and still include a nrobots.txt file on the flash site disallowing indexing?

Comments are closed.